lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
xiarixiaoyao	a0dae41409	[HUDI-2758] remove redundant code in the hoodieRealtimeInputFormatUitls.getRealtimeSplits (#3994 )	2021-11-15 11:29:40 +08:00
xiarixiaoyao	a40ac62e0c	[HUDI-2086]redo the logical of mor_incremental_view for hive (#3203 )	2021-11-10 15:41:07 +08:00
Genmao Yu	f41539a9cb	[HUDI-313] bugfix: NPE when select count start from a realtime table with Tez(#3630 ) Co-authored-by: dylonyu <dylonyu@tencent.com>	2021-11-06 12:16:13 -04:00
xiarixiaoyao	5517d292f9	[HUDI-2674] hudi hive reader should not print read values. (#3910 )	2021-11-02 23:10:18 -04:00
Sivabalan Narayanan	69ee790a47	[HUDI-1294] Adding inline read and seek based read(batch get) for hfile log blocks in metadata table (#3762 )	2021-10-29 12:12:44 -04:00
Sivabalan Narayanan	e3fc74668f	[HUDI-2625] Revert "[HUDI-2005] Avoiding direct fs calls in HoodieLogFileReader (#3757 )" (#3863 ) This reverts commit `1bb0532563`.	2021-10-25 21:43:15 -04:00
Sivabalan Narayanan	1bb0532563	[HUDI-2005] Avoiding direct fs calls in HoodieLogFileReader (#3757 )	2021-10-25 01:21:08 -04:00
zhangyue19921010	1e285dc399	[HUDI-2489]Tuning HoodieROTablePathFilter by caching hoodieTableFileSystemView, aiming to reduce unnecessary list/get requests (#3719 ) Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2021-10-22 12:03:58 -04:00
Danny Chan	abf3e3fe71	[HUDI-2548] Flink streaming reader misses the rolling over file handles (#3787 )	2021-10-14 10:36:18 +08:00
Sivabalan Narayanan	5f32162a2f	[HUDI-2285][HUDI-2476] Metadata table synchronous design. Rebased and Squashed from pull/3426 (#3590 ) * [HUDI-2285] Adding Synchronous updates to metadata before completion of commits in data timelime. - This patch adds synchronous updates to metadata table. In other words, every write is first committed to metadata table followed by data table. While reading metadata table, we ignore any delta commits that are present only in metadata table and not in data table timeline. - Compaction of metadata table is fenced by the condition that we trigger compaction only when there are no inflight requests in datatable. This ensures that all base files in metadata table is always in sync with data table(w/o any holes) and only there could be some extra invalid commits among delta log files in metadata table. - Due to this, archival of data table also fences itself up until compacted instant in metadata table. All writes to metadata table happens within the datatable lock. So, metadata table works in one writer mode only. This might be tough to loosen since all writers write to same FILES partition and so, will result in a conflict anyways. - As part of this, have added acquiring locks in data table for those operations which were not before while committing (rollback, clean, compaction, cluster). To note, we were not doing any conflict resolution. All we are doing here is to commit by taking a lock. So that all writes to metadata table is always a single writer. - Also added building block to add buckets for partitions, which will be leveraged by other indexes like record level index, etc. For now, FILES partition has only one bucket. In general, any number of buckets per partition is allowed and each partition has a fixed fileId prefix with incremental suffix for each bucket within each partition. Have fixed [HUDI-2476]. This fix is about retrying a failed compaction if it succeeded in metadata for first time, but failed w/ data table. - Enabling metadata table by default. - Adding more tests for metadata table Co-authored-by: Prashant Wason <pwason@uber.com>	2021-10-06 00:17:52 -04:00
Jimmy.Zhou	55df8f61e1	[MINOR] Fix typo."funcitons" corrected to "functions" (#3681 )	2021-09-21 20:30:13 -04:00
Udit Mehrotra	c350d05dd3	Restore 0.8.0 config keys with deprecated annotation (#3506 ) Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com> Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2021-08-19 13:36:40 -07:00
Udit Mehrotra	3e301196bf	Moving to 0.10.0-SNAPSHOT on master branch.	2021-08-14 18:51:09 -07:00
vinoyang	0e1c592c69	[MINOR] Delete useless com.uber.hoodie.hadoop.hive.HoodieCombineHiveInputFormat (#3298 )	2021-08-10 12:05:31 -07:00
Sivabalan Narayanan	fe508376fa	[HUDI-2177][HUDI-2200] Adding virtual keys support for MOR table (#3315 )	2021-08-02 09:45:09 -04:00
rmahindra123	8fef50e237	[HUDI-2044] Integrate consumers with rocksDB and compression within External Spillable Map (#3318 )	2021-07-28 01:31:03 -04:00
Danny Chan	ac75bda929	[HUDI-1969] Support reading logs for MOR Hive rt table (#3033 )	2021-07-13 23:43:30 -07:00
pengzhiwei	ca440ccf88	[HUDI-2107] Support Read Log Only MOR Table For Spark (#3193 )	2021-07-12 17:31:23 +08:00
wenningd	d412fb2fe6	[HUDI-89] Add configOption & refactor all configs based on that (#2833 ) Co-authored-by: Wenning Ding <wenningd@amazon.com>	2021-06-30 14:26:30 -07:00
s-sanjay	0fb8556b0d	Add ability to provide multi-region (global) data consistency across HMS in different regions (#2542 ) [global-hive-sync-tool] Add a global hive sync tool to sync hudi table across clusters. Add a way to rollback the replicated time stamp if we fail to sync or if we partly sync Co-authored-by: Jagmeet Bali <jsbali@uber.com>	2021-06-24 20:26:26 -07:00
Wei	7865da1e15	[MINOR] Fix Javadoc wrong references (#3115 )	2021-06-18 21:51:54 -07:00
Jintao Guan	b8fe5b91d5	[HUDI-764] [HUDI-765] ORC reader writer Implementation (#2999 ) Co-authored-by: Qingyun (Teresa) Kang <kteresa@uber.com>	2021-06-15 15:21:43 -07:00
Danny Chan	c2383ee904	[HUDI-1967] Fix the NPE for MOR Hive rt table query (#3032 ) The HoodieInputFormatUtils.getTableMetaClientByBasePath returns the map with table base path as keys while the HoodieRealtimeInputFormatUtils query it with the partition path.	2021-06-05 01:06:34 -07:00
xiarixiaoyao	081061e14b	[HUDI-1719] hive on spark/mr,Incremental query of the mor table, the partition field is incorrect (#2720 )	2021-05-20 11:00:08 -04:00
xiarixiaoyao	6f7ff7e8ca	[HUDI-1722]Fix hive beeline/spark-sql query specified field on mor table occur NPE (#2722 )	2021-05-12 20:52:37 +08:00
TeRS-K	be9db2c4f5	[HUDI-1055] Remove hardcoded parquet in tests (#2740 ) * Remove hardcoded parquet in tests * Use DataFileUtils.getInstance * Renaming DataFileUtils to BaseFileUtils Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2021-05-11 10:01:45 -07:00
jsbali	aa398f77f1	[HUDI-1789] Support reading older snapshots (#2809 ) * [HUDI-1789] In HoodieParquetInoutFormat we currently default to the latest version of base files. This PR attempts to add a new jobConf `hoodie.%s.consume.snapshot.time` This new config will allow us to read older snapshots. - Reusing hoodie.%s.consume.commit for point in time snapshot queries as well. - Adding javadocs and some more tests	2021-05-10 15:26:49 -07:00
xiarixiaoyao	1db904a12e	[HUDI-1718] When query incr view of mor table which has Multi level partitions, the query failed (#2716 )	2021-05-05 00:34:20 -04:00
Raymond Xu	faf3785a2d	[HUDI-1811] Fix TestHoodieRealtimeRecordReader (#2873 ) Pass basePath with scheme 'file://' to HoodieRealtimeFileSplit	2021-04-30 11:16:55 -07:00
xiarixiaoyao	929eca43fe	[HUDI-1817] Fix getting incorrect partition path while using incr query by spark-sql (#2858 )	2021-04-30 14:57:52 +08:00
xiarixiaoyao	65844a8d29	[HUDI-1720] Fix RealtimeCompactedRecordReader StackOverflowError (#2721 )	2021-04-13 18:23:26 +08:00
garyli1019	6e803e08b1	Moving to 0.9.0-SNAPSHOT on master branch.	2021-03-24 21:37:14 +08:00
xiarixiaoyao	02073235c3	[HUDI-1662] Fix hive date type conversion for mor table (#2634 )	2021-03-08 12:16:13 +08:00
satishkotha	7cc75e0be2	[HUDI-1646] Provide mechanism to read uncommitted data through InputFormat (#2611 )	2021-03-04 17:43:31 -08:00
n3nash	ffcfb58bac	[HUDI-1486] Remove inline inflight rollback in hoodie writer (#2359 ) 1. Refactor rollback and move cleaning failed commits logic into cleaner 2. Introduce hoodie heartbeat to ascertain failed commits 3. Fix test cases	2021-02-19 20:12:22 -08:00
Sivabalan Narayanan	c9fcf964b2	[HUDI-1315] Adding builder for HoodieTableMetaClient initialization (#2534 )	2021-02-20 09:54:26 +08:00
satishkotha	0d91c451b0	[HUDI-1539] Fix bug in HoodieCombineRealtimeRecordReader with reading empty iterators (#2583 )	2021-02-19 15:45:43 -08:00
Vinoth Chandar	3719e7b388	Moving to 0.8.0-SNAPSHOT on master branch.	2021-01-20 11:31:22 -08:00
vinoth chandar	5ca0625b27	[HUDI 1308] Harden RFC-15 Implementation based on production testing (#2441 ) Addresses leaks, perf degradation observed during testing. These were regressions from the original rfc-15 PoC implementation. * Pass a single instance of HoodieTableMetadata everywhere * Fix tests and add config for enabling metrics - Removed special casing of assumeDatePartitioning inside FSUtils#getAllPartitionPaths() - Consequently, IOException is never thrown and many files had to be adjusted - More diligent handling of open file handles in metadata table - Added config for controlling reuse of connections - Added config for turning off fallback to listing, so we can see tests fail - Changed all ipf listing code to cache/amortize the open/close for better performance - Timelineserver also reuses connections, for better performance - Without timelineserver, when metadata table is opened from executors, reuse is not allowed - HoodieMetadataConfig passed into HoodieTableMetadata#create as argument. - Fix TestHoodieBackedTableMetadata#testSync	2021-01-19 21:20:28 -08:00
Sivabalan Narayanan	a43e191d6c	[MINOR] Bumping snapshot version to 0.7.0 (#2435 )	2021-01-16 09:56:28 -05:00
n3nash	749f657856	[HUDI-1509]: Reverting LinkedHashSet changes to combine fields from oldSchema and newSchema in favor of using only new schema for record rewriting (#2424 )	2021-01-14 12:47:50 -08:00
Udit Mehrotra	7ce3ac778e	[HUDI-1479] Use HoodieEngineContext to parallelize fetching of partiton paths (#2417 ) * [HUDI-1479] Use HoodieEngineContext to parallelize fetching of partition paths * Adding testClass for FileSystemBackedTableMetadata Co-authored-by: Nishith Agarwal <nagarwal@uber.com>	2021-01-10 21:19:52 -08:00
Gary Li	79ec7b4894	[HUDI-920] Support Incremental query for MOR table (#1938 )	2021-01-09 08:02:08 -08:00
rmpifer	1a0579ca7d	[HUDI-1312] [RFC-15] Support for metadata listing for snapshot queries through Hive/SparkSQL (#2366 ) Co-authored-by: Ryan Pifer <ryanpife@amazon.com>	2021-01-04 07:59:47 -08:00
Udit Mehrotra	4e64226844	[HUDI-1450] Use metadata table for listing in HoodieROTablePathFilter (apache#2326) [HUDI-1394] [RFC-15] Use metadata table (if present) to get all partition paths (apache#2351)	2021-01-04 07:59:47 -08:00
Gary Li	605b617cfa	[HUDI-1434] fix incorrect log file path in HoodieWriteStat (#2300 ) * [HUDI-1434] fix incorrect log file path in HoodieWriteStat * HoodieWriteHandle#close() returns a list of WriteStatus objs * Handle rolled-over log files and return a WriteStatus per log file written - Combined data and delete block logging into a single call - Lazily initialize and manage write status based on returned AppendResult - Use FSUtils.getFileSize() to set final file size, consistent with other handles - Added tests around returned values in AppendResult - Added validation of the file sizes returned in write stat Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2020-12-30 14:22:15 -08:00
Balaji Varadarajan	3ec9270e8e	[HUDI-1490] Incremental Query should work even when there are partitions that have no incremental changes (#2371 ) * Incremental Query should work even when there are partitions that have no incremental changes Co-authored-by: Sivabalan Narayanan <sivabala@uber.com>	2020-12-26 12:17:49 -05:00
Danny Chan	4bc45a391a	[HUDI-1445] Refactor AbstractHoodieLogRecordScanner to use Builder (#2313 )	2020-12-10 20:02:02 +08:00
lw0090	1f0d5c077e	[HUDI-1349] spark sql support overwrite use insert_overwrite_table (#2196 )	2020-12-03 12:26:21 -08:00
lw0090	5f5c15b0d9	[HUDI-892] RealtimeParquetInputFormat skip adding projection columns if there are no log files (#2190 ) * [HUDI-892] RealtimeParquetInputFormat skip adding projection columns if there are no log files * [HUDI-892] for test * [HUDI-892] fix bug generate array from split * [HUDI-892] revert test log	2020-11-02 20:00:12 -08:00

1 2 3

136 Commits