1
0
Commit Graph

177 Commits

Author SHA1 Message Date
rmahindra123
b4c14eaa29 [HUDI-2090] Ensure Disk Maps create a subfolder with appropriate prefixes and cleans them up on close (#3329)
* Add UUID to the folder name for External Spillable File System

* Fix to ensure that Disk maps folders do not interefere across users

* Fix test

* Fix test

* Rebase with latest mater and address comments

* Add Shutdown Hooks for the Disk Map

Co-authored-by: Rajesh Mahindra <rmahindra@Rajeshs-MacBook-Pro.local>
2021-08-03 17:51:25 -07:00
Udit Mehrotra
1ff2d3459a [HUDI-1371] [HUDI-1893] Support metadata based listing for Spark DataSource and Spark SQL (#2893) 2021-08-03 14:47:40 -07:00
Sivabalan Narayanan
fe508376fa [HUDI-2177][HUDI-2200] Adding virtual keys support for MOR table (#3315) 2021-08-02 09:45:09 -04:00
rmahindra123
8fef50e237 [HUDI-2044] Integrate consumers with rocksDB and compression within External Spillable Map (#3318) 2021-07-28 01:31:03 -04:00
Sivabalan Narayanan
61148c1c43 [HUDI-2176, 2178, 2179] Adding virtual key support to COW table (#3306) 2021-07-26 17:21:04 -04:00
jsbali
66207ed91a [HUDI-1848] Adding support for HMS for running DDL queries in hive-sy… (#2879)
* [HUDI-1848] Adding support for HMS for running DDL queries in hive-sync-tool

* [HUDI-1848] Fixing test cases

* [HUDI-1848] CR changes

* [HUDI-1848] Fix checkstyle violations

* [HUDI-1848] Fixed a bug when metastore api fails for complex schemas with multiple levels.

* [HUDI-1848] Adding the complex schema and resolving merge conflicts

* [HUDI-1848] Adding some more javadocs

* [HUDI-1848] Added javadocs for DDLExecutor impls

* [HUDI-1848] Fixed style issue
2021-07-23 09:03:15 -07:00
rmahindra123
d024439764 [HUDI-2029] Implement compression for DiskBasedMap in Spillable Map (#3128) 2021-07-14 22:57:38 -04:00
Vinay Patil
7395a56dfb [HUDI-2168] Fix for AccessControlException for anonymous user (#3264) 2021-07-13 08:56:51 -04:00
zhangyue19921010
c8a2033c27 [HUDI-2144]Bug-Fix:Offline clustering(HoodieClusteringJob) will cause insert action losing data (#3240)
* fixed

* add testUpsertPartitionerWithSmallFileHandlingAndClusteringPlan ut

* fix CheckStyle

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2021-07-12 18:14:17 -07:00
Shawy Geng
55ecbc662e [HUDI-2115] FileSlices in the filegroup is not descending by timestamp (#3206) 2021-07-07 22:24:36 +08:00
rmahindra123
a4dcbb5c5a [HUDI-2028] Implement RockDbBasedMap as an alternate to DiskBasedMap in ExternalSpillableMap (#3194)
Co-authored-by: Rajesh Mahindra <rmahindra@Rajeshs-MacBook-Pro.local>
2021-07-05 23:03:41 -07:00
wenningd
d412fb2fe6 [HUDI-89] Add configOption & refactor all configs based on that (#2833)
Co-authored-by: Wenning Ding <wenningd@amazon.com>
2021-06-30 14:26:30 -07:00
Jintao Guan
b8fe5b91d5 [HUDI-764] [HUDI-765] ORC reader writer Implementation (#2999)
Co-authored-by: Qingyun (Teresa) Kang <kteresa@uber.com>
2021-06-15 15:21:43 -07:00
Raymond Xu
f922837064 [HUDI-1950] Fix Azure CI failure in TestParquetUtils (#2984)
* fix azure pipeline configs

* add pentaho.org in maven repositories

* Make sure file paths with scheme in TestParquetUtils

* add azure build status to README
2021-06-15 03:45:17 -07:00
Xuedong Luan
673d62f3c3 [MINOR] Add Tencent Cloud HDFS storage support for hudi (#3064) 2021-06-11 09:16:51 +08:00
JunZhang
e0108e972e [MINOR] Add Baidu BOS storage support for hudi (#3061)
Co-authored-by: zhangjun30 <zhangjun30@baidu.com>
2021-06-10 15:51:36 +08:00
rmpifer
0709c62a6b [HUDI-1800] Exclude file slices in pending compaction when performing small file sizing (#2902)
Co-authored-by: Ryan Pifer <ryanpife@amazon.com>
2021-05-29 08:06:01 -04:00
Raymond Xu
afa6bc0b10 [HUDI-1723] Fix path selector listing files with the same mod date (#2845) 2021-05-25 10:19:10 -04:00
Susu Dong
685f77b5dd [HUDI-1740] Fix insert-overwrite API archival (#2784)
- fix problem of archiving replace commits
- Fix problem when getting empty replacecommit.requested
- Improved the logic of handling empty and non-empty requested/inflight commit files. Added unit tests to cover both empty and non-empty inflight files cases and cleaned up some unused test util methods

Co-authored-by: yorkzero831 <yorkzero8312@gmail.com>
Co-authored-by: zheren.yu <zheren.yu@paypay-corp.co.jp>
2021-05-21 13:52:13 -07:00
xoln ann
12443e4187 [HUDI-1446] Support skip bootstrapIndex's init in abstract fs view init (#2520)
Co-authored-by: zhongliang <zhongliang@kuaishou.com>
Co-authored-by: Sivabalan Narayanan <sivabala@uber.com>
2021-05-14 00:29:26 -04:00
TeRS-K
be9db2c4f5 [HUDI-1055] Remove hardcoded parquet in tests (#2740)
* Remove hardcoded parquet in tests
* Use DataFileUtils.getInstance
* Renaming DataFileUtils to BaseFileUtils

Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2021-05-11 10:01:45 -07:00
jsbali
b31c520c66 [HUDI-1714] Added tests to TestHoodieTimelineArchiveLog for the archival of compl… (#2677)
* Added tests to TestHoodieTimelineArchiveLog for the archival of completed clean and rollback actions.

* Adding code review changes

* [HUDI-1714] Minor Fixes
2021-04-21 10:27:43 -07:00
Xu Guang Lv
1d53d6e6c2 [HUDI-1803] Support BAIDU AFS storage format in hudi (#2836) 2021-04-16 16:43:14 +08:00
Sivabalan Narayanan
8d29863c86 [HUDI-1615] Fixing usage of NULL schema for delete operation in HoodieSparkSqlWriter (#2777) 2021-04-14 15:35:39 +08:00
Sebastian Bernauer
aa0da72c59 Preparation for Avro update (#2650) 2021-03-30 21:50:17 -07:00
n3nash
74241947c1 [HUDI-845] Added locking capability to allow multiple writers (#2374)
* [HUDI-845] Added locking capability to allow multiple writers
1. Added LockProvider API for pluggable lock methodologies
2. Added Resolution Strategy API to allow for pluggable conflict resolution
3. Added TableService client API to schedule table services
4. Added Transaction Manager for wrapping actions within transactions
2021-03-16 16:43:53 -07:00
satishkotha
c4a66324cd [HUDI-1651] Fix archival of requested replacecommit (#2622) 2021-03-09 15:56:44 -08:00
satishkotha
11ad4ed26b [HUDI-1661] Exclude clustering commits from getExtraMetadataFromLatest API (#2632) 2021-03-05 13:42:19 -08:00
pengzhiwei
bc883db5de [HUDI-1636] Support Builder Pattern To Build Table Properties For HoodieTableConfig (#2596) 2021-03-05 14:10:27 +08:00
Raymond Xu
899ae70fdb [HUDI-1587] Add latency and freshness support (#2541)
Save min and max of event time in each commit and compute the latency and freshness metrics.
2021-03-03 20:13:12 -08:00
satishkotha
7a6b071647 [HUDI-1644] Do not delete older rollback instants as part of rollback. Archival can take care of removing old instants cleanly (#2610) 2021-03-01 09:40:00 -08:00
n3nash
ffcfb58bac [HUDI-1486] Remove inline inflight rollback in hoodie writer (#2359)
1. Refactor rollback and move cleaning failed commits logic into cleaner
2. Introduce hoodie heartbeat to ascertain failed commits
3. Fix test cases
2021-02-19 20:12:22 -08:00
Sivabalan Narayanan
c9fcf964b2 [HUDI-1315] Adding builder for HoodieTableMetaClient initialization (#2534) 2021-02-20 09:54:26 +08:00
Danny Chan
bc0325f6ea [HUDI-1522] Add a new pipeline for Flink writer (#2430)
* [HUDI-1522] Add a new pipeline for Flink writer
2021-01-28 08:53:13 +08:00
satishkotha
3d1d5d00b0 [HUDI-1533] Make SerializableSchema work for large schemas and add ability to sortBy numeric values (#2453) 2021-01-17 12:36:55 -08:00
Sivabalan Narayanan
e3d3677b7e [HUDI-1502] MOR rollback and restore support for metadata sync (#2421)
- Adds field to RollbackMetadata that capture the logs written for rollback blocks
- Adds field to RollbackMetadata that capture new logs files written by unsynced deltacommits

Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2021-01-11 13:23:13 -08:00
Udit Mehrotra
7ce3ac778e [HUDI-1479] Use HoodieEngineContext to parallelize fetching of partiton paths (#2417)
* [HUDI-1479] Use HoodieEngineContext to parallelize fetching of partition paths

* Adding testClass for FileSystemBackedTableMetadata

Co-authored-by: Nishith Agarwal <nagarwal@uber.com>
2021-01-10 21:19:52 -08:00
vinoth chandar
65866c45ec [HUDI-1276] [HUDI-1459] Make Clustering/ReplaceCommit and Metadata table be compatible (#2422)
* [HUDI-1276] [HUDI-1459] Make Clustering/ReplaceCommit and Metadata table be compatible

* Use filesystemview and json format from metadata. Add tests

Co-authored-by: Satish Kotha <satishkotha@uber.com>
2021-01-09 16:53:34 -08:00
satishkotha
2c4868e770 [HUDI-1507] Change timeline utils to support reading replacecommit metadata (#2407) 2021-01-06 07:55:14 -05:00
wangxianghu
47c5e518a7 [HUDI-1506] Fix wrong exception thrown in HoodieAvroUtils (#2405) 2021-01-06 19:49:17 +08:00
satishkotha
698694a157 [HUDI-1498] Read clustering plan from requested file for inflight instant (#2389) 2021-01-04 10:36:44 -08:00
Gary Li
c5e8a024f6 [HUDI-1418] Set up flink client unit test infra (#2281) 2020-12-31 08:57:22 +08:00
Gary Li
605b617cfa [HUDI-1434] fix incorrect log file path in HoodieWriteStat (#2300)
* [HUDI-1434] fix incorrect log file path in HoodieWriteStat

* HoodieWriteHandle#close() returns a list of WriteStatus objs

* Handle rolled-over log files and return a WriteStatus per log file written

 - Combined data and delete block logging into a single call
 - Lazily initialize and manage write status based on returned AppendResult
 - Use FSUtils.getFileSize() to set final file size, consistent with other handles
 - Added tests around returned values in AppendResult
 - Added validation of the file sizes returned in write stat

Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2020-12-30 14:22:15 -08:00
Satish Kotha
6dc03b65bf [HUDI-1075] Implement simple clustering strategies to create ClusteringPlan and to run the plan 2020-12-21 17:34:15 -08:00
Sivabalan Narayanan
33d338f392 [HUDI-115] Adding DefaultHoodieRecordPayload to honor ordering with combineAndGetUpdateValue (#2311)
* Added ability to pass in `properties` to payload methods, so they can perform table/record specific merges
* Added default methods so existing payload classes are backwards compatible. 
* Adding DefaultHoodiePayload to honor ordering while merging two records
* Fixing default payload based on feedback
2020-12-19 19:19:42 -08:00
Danny Chan
4bc45a391a [HUDI-1445] Refactor AbstractHoodieLogRecordScanner to use Builder (#2313) 2020-12-10 20:02:02 +08:00
lw0090
1f0d5c077e [HUDI-1349] spark sql support overwrite use insert_overwrite_table (#2196) 2020-12-03 12:26:21 -08:00
Prashant Wason
ac23d2587f [HUDI-1357] Added a check to validate records are not lost during merges. (#2216)
- Turned off by default
2020-12-01 13:44:57 -08:00
wenningd
0364498ae3 [HUDI-1375] Fix bug in HoodieAvroUtils.removeMetadataFields() method (#2232)
Co-authored-by: Wenning Ding <wenningd@amazon.com>
2020-11-05 17:30:17 -08:00
satishkotha
33ec88fc38 [HUDI-1352] Add FileSystemView APIs to query pending clustering operations (#2202) 2020-11-05 08:49:58 -08:00