1
0
Commit Graph

151 Commits

Author SHA1 Message Date
董可伦
56cd8ffae0 [HUDI-2837] Add support for using database name in incremental query (#4083) 2022-01-22 22:11:27 -08:00
Alexey Kudinkin
4bea758738 [HUDI-3191] Rebasing Hive's FileInputFormat onto AbstractHoodieTableFileIndex (#4531) 2022-01-18 14:54:51 -08:00
Yuwei XIAO
d36533735f [HUDI-3194] fix MOR snapshot query during compaction (#4540) 2022-01-17 17:24:24 -05:00
Alexey Kudinkin
75caa7d3d8 [HUDI-3179] Extracted common AbstractHoodieTableFileIndex to be shared across engines (#4520) 2022-01-16 22:46:20 -08:00
Alexey Kudinkin
6cdcd89afa [HUDI-3094] Unify Hive's InputFormat implementations to avoid duplication (#4417) 2022-01-11 15:02:13 -08:00
xuzifu666
f0c2912d35 [MINOR] Remove unused methods in HoodieColumnProjectionUtils (#4408) 2022-01-06 15:36:13 -08:00
Sivabalan Narayanan
a66212d204 [HUDI-2966] Closing LogRecordScanner in compactor (#4478)
* Closing LogRecordScanner in compactor

* Addressing comments
2022-01-05 10:57:18 +08:00
RexAn
f612a20815 [HUDI-2779] Cache BaseDir if HudiTableNotFound Exception thrown (#4014) 2021-12-09 16:04:11 +05:30
xuzifu666
c9e18d1e7d [HUDI-2942] add error message log in HoodieCombineHiveInputFormat (#4224) 2021-12-07 22:05:39 -08:00
xiarixiaoyao
57c4bf8152 [HUDI-2876] for hive/presto hudi should remove the temp file which created by HoodieMergedLogRecordSanner when the query finished. (#4139) 2021-12-06 21:33:10 +08:00
zhangyue19921010
5616830ae1 Revert "[HUDI-2489]Tuning HoodieROTablePathFilter by caching hoodieTableFileSystemView, aiming to reduce unnecessary list/get requests"
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2021-12-04 08:26:53 +05:30
yuzhao.cyz
a1d0ff4209 Moving to 0.11.0-SNAPSHOT on master branch. 2021-11-27 17:22:10 +08:00
Sivabalan Narayanan
8340ccb503 [HUDI-2005] Removing direct fs call in HoodieLogFileReader (#3865) 2021-11-25 18:51:38 -05:00
Danny Chan
a2eb2b0b0a [HUDI-2480] FileSlice after pending compaction-requested instant-time… (#3703)
* [HUDI-2480] FileSlice after pending compaction-requested instant-time is ignored by MOR snapshot reader

* include file slice after a pending compaction for spark reader

Co-authored-by: garyli1019 <yanjia.gary.li@gmail.com>
2021-11-25 22:30:09 +08:00
Jimmy.Zhou
0d1e7ecdab [MINOR] Fix typo,'multipe' corrected to 'multiple' (#4068) 2021-11-22 17:20:23 -08:00
xiarixiaoyao
a0dae41409 [HUDI-2758] remove redundant code in the hoodieRealtimeInputFormatUitls.getRealtimeSplits (#3994) 2021-11-15 11:29:40 +08:00
xiarixiaoyao
a40ac62e0c [HUDI-2086]redo the logical of mor_incremental_view for hive (#3203) 2021-11-10 15:41:07 +08:00
Genmao Yu
f41539a9cb [HUDI-313] bugfix: NPE when select count start from a realtime table with Tez(#3630)
Co-authored-by: dylonyu <dylonyu@tencent.com>
2021-11-06 12:16:13 -04:00
xiarixiaoyao
5517d292f9 [HUDI-2674] hudi hive reader should not print read values. (#3910) 2021-11-02 23:10:18 -04:00
Sivabalan Narayanan
69ee790a47 [HUDI-1294] Adding inline read and seek based read(batch get) for hfile log blocks in metadata table (#3762) 2021-10-29 12:12:44 -04:00
Sivabalan Narayanan
e3fc74668f [HUDI-2625] Revert "[HUDI-2005] Avoiding direct fs calls in HoodieLogFileReader (#3757)" (#3863)
This reverts commit 1bb0532563.
2021-10-25 21:43:15 -04:00
Sivabalan Narayanan
1bb0532563 [HUDI-2005] Avoiding direct fs calls in HoodieLogFileReader (#3757) 2021-10-25 01:21:08 -04:00
zhangyue19921010
1e285dc399 [HUDI-2489]Tuning HoodieROTablePathFilter by caching hoodieTableFileSystemView, aiming to reduce unnecessary list/get requests (#3719)
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2021-10-22 12:03:58 -04:00
Danny Chan
abf3e3fe71 [HUDI-2548] Flink streaming reader misses the rolling over file handles (#3787) 2021-10-14 10:36:18 +08:00
Sivabalan Narayanan
5f32162a2f [HUDI-2285][HUDI-2476] Metadata table synchronous design. Rebased and Squashed from pull/3426 (#3590)
* [HUDI-2285] Adding Synchronous updates to metadata before completion of commits in data timelime.

- This patch adds synchronous updates to metadata table. In other words, every write is first committed to metadata table followed by data table. While reading metadata table, we ignore any delta commits that are present only in metadata table and not in data table timeline.
- Compaction of metadata table is fenced by the condition that we trigger compaction only when there are no inflight requests in datatable. This ensures that all base files in metadata table is always in sync with data table(w/o any holes) and only there could be some extra invalid commits among delta log files in metadata table.
- Due to this, archival of data table also fences itself up until compacted instant in metadata table.
All writes to metadata table happens within the datatable lock. So, metadata table works in one writer mode only. This might be tough to loosen since all writers write to same FILES partition and so, will result in a conflict anyways.
- As part of this, have added acquiring locks in data table for those operations which were not before while committing (rollback, clean, compaction, cluster). To note, we were not doing any conflict resolution. All we are doing here is to commit by taking a lock. So that all writes to metadata table is always a single writer. 
- Also added building block to add buckets for partitions, which will be leveraged by other indexes like record level index, etc. For now, FILES partition has only one bucket. In general, any number of buckets per partition is allowed and each partition has a fixed fileId prefix with incremental suffix for each bucket within each partition.
Have fixed [HUDI-2476]. This fix is about retrying a failed compaction if it succeeded in metadata for first time, but failed w/ data table.
- Enabling metadata table by default.
- Adding more tests for metadata table

Co-authored-by: Prashant Wason <pwason@uber.com>
2021-10-06 00:17:52 -04:00
Jimmy.Zhou
55df8f61e1 [MINOR] Fix typo."funcitons" corrected to "functions" (#3681) 2021-09-21 20:30:13 -04:00
Udit Mehrotra
c350d05dd3 Restore 0.8.0 config keys with deprecated annotation (#3506)
Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>
Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2021-08-19 13:36:40 -07:00
Udit Mehrotra
3e301196bf Moving to 0.10.0-SNAPSHOT on master branch. 2021-08-14 18:51:09 -07:00
vinoyang
0e1c592c69 [MINOR] Delete useless com.uber.hoodie.hadoop.hive.HoodieCombineHiveInputFormat (#3298) 2021-08-10 12:05:31 -07:00
Sivabalan Narayanan
fe508376fa [HUDI-2177][HUDI-2200] Adding virtual keys support for MOR table (#3315) 2021-08-02 09:45:09 -04:00
rmahindra123
8fef50e237 [HUDI-2044] Integrate consumers with rocksDB and compression within External Spillable Map (#3318) 2021-07-28 01:31:03 -04:00
Danny Chan
ac75bda929 [HUDI-1969] Support reading logs for MOR Hive rt table (#3033) 2021-07-13 23:43:30 -07:00
pengzhiwei
ca440ccf88 [HUDI-2107] Support Read Log Only MOR Table For Spark (#3193) 2021-07-12 17:31:23 +08:00
wenningd
d412fb2fe6 [HUDI-89] Add configOption & refactor all configs based on that (#2833)
Co-authored-by: Wenning Ding <wenningd@amazon.com>
2021-06-30 14:26:30 -07:00
s-sanjay
0fb8556b0d Add ability to provide multi-region (global) data consistency across HMS in different regions (#2542)
[global-hive-sync-tool] Add a global hive sync tool to sync hudi table across clusters. Add a way to rollback the replicated time stamp if we fail to sync or if we partly sync

Co-authored-by: Jagmeet Bali <jsbali@uber.com>
2021-06-24 20:26:26 -07:00
Wei
7865da1e15 [MINOR] Fix Javadoc wrong references (#3115) 2021-06-18 21:51:54 -07:00
Jintao Guan
b8fe5b91d5 [HUDI-764] [HUDI-765] ORC reader writer Implementation (#2999)
Co-authored-by: Qingyun (Teresa) Kang <kteresa@uber.com>
2021-06-15 15:21:43 -07:00
Danny Chan
c2383ee904 [HUDI-1967] Fix the NPE for MOR Hive rt table query (#3032)
The HoodieInputFormatUtils.getTableMetaClientByBasePath returns the map
with table base path as keys while the HoodieRealtimeInputFormatUtils
query it with the partition path.
2021-06-05 01:06:34 -07:00
xiarixiaoyao
081061e14b [HUDI-1719] hive on spark/mr,Incremental query of the mor table, the partition field is incorrect (#2720) 2021-05-20 11:00:08 -04:00
xiarixiaoyao
6f7ff7e8ca [HUDI-1722]Fix hive beeline/spark-sql query specified field on mor table occur NPE (#2722) 2021-05-12 20:52:37 +08:00
TeRS-K
be9db2c4f5 [HUDI-1055] Remove hardcoded parquet in tests (#2740)
* Remove hardcoded parquet in tests
* Use DataFileUtils.getInstance
* Renaming DataFileUtils to BaseFileUtils

Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2021-05-11 10:01:45 -07:00
jsbali
aa398f77f1 [HUDI-1789] Support reading older snapshots (#2809)
* [HUDI-1789] In HoodieParquetInoutFormat we currently default to the latest version of base files.
This PR attempts to add a new jobConf
 `hoodie.%s.consume.snapshot.time`

This new config will allow us to read older snapshots.

-  Reusing hoodie.%s.consume.commit for point in time snapshot queries as well.
-  Adding javadocs and some more tests
2021-05-10 15:26:49 -07:00
xiarixiaoyao
1db904a12e [HUDI-1718] When query incr view of mor table which has Multi level partitions, the query failed (#2716) 2021-05-05 00:34:20 -04:00
Raymond Xu
faf3785a2d [HUDI-1811] Fix TestHoodieRealtimeRecordReader (#2873)
Pass basePath with scheme 'file://' to HoodieRealtimeFileSplit
2021-04-30 11:16:55 -07:00
xiarixiaoyao
929eca43fe [HUDI-1817] Fix getting incorrect partition path while using incr query by spark-sql (#2858) 2021-04-30 14:57:52 +08:00
xiarixiaoyao
65844a8d29 [HUDI-1720] Fix RealtimeCompactedRecordReader StackOverflowError (#2721) 2021-04-13 18:23:26 +08:00
garyli1019
6e803e08b1 Moving to 0.9.0-SNAPSHOT on master branch. 2021-03-24 21:37:14 +08:00
xiarixiaoyao
02073235c3 [HUDI-1662] Fix hive date type conversion for mor table (#2634) 2021-03-08 12:16:13 +08:00
satishkotha
7cc75e0be2 [HUDI-1646] Provide mechanism to read uncommitted data through InputFormat (#2611) 2021-03-04 17:43:31 -08:00
n3nash
ffcfb58bac [HUDI-1486] Remove inline inflight rollback in hoodie writer (#2359)
1. Refactor rollback and move cleaning failed commits logic into cleaner
2. Introduce hoodie heartbeat to ascertain failed commits
3. Fix test cases
2021-02-19 20:12:22 -08:00