1
0
Commit Graph

53 Commits

Author SHA1 Message Date
pengzhiwei
aacb8be521 [HUDI-1415] Read Hoodie Table As Spark DataSource Table (#2283) 2021-04-20 14:21:38 -07:00
Aditya Tiwari
ec2334ceac [HUDI-1716]: Resolving default values for schema from dataframe (#2765)
- Adding default values and setting null as first entry in UNION data types in avro schema. 

Co-authored-by: Aditya Tiwari <aditya.tiwari@flipkart.com>
2021-04-19 10:05:20 -04:00
Sivabalan Narayanan
8d29863c86 [HUDI-1615] Fixing usage of NULL schema for delete operation in HoodieSparkSqlWriter (#2777) 2021-04-14 15:35:39 +08:00
Danny Chan
ab4a7b0b4a [HUDI-1788] Insert overwrite (table) for Flink writer (#2808)
Supports `INSERT OVERWRITE` and `INSERT OVERWRITE TABLE` for Flink
writer.
2021-04-14 10:23:37 +08:00
wangxianghu
f3777f44fe [MINOR] Remove unused imports and some other checkstyle issues (#2800) 2021-04-11 21:42:34 +08:00
pengzhiwei
684622c7c9 [HUDI-1591] Implement Spark's FileIndex for Hudi to support queries via Hudi DataSource using non-globbed table path and partition pruning (#2651) 2021-04-01 11:12:28 -07:00
Gary Li
452f5e2d66 [HOTFIX] close spark session in functional test suite and disable spark3 test for spark2 (#2727) 2021-03-29 06:04:48 -07:00
garyli1019
6e803e08b1 Moving to 0.9.0-SNAPSHOT on master branch. 2021-03-24 21:37:14 +08:00
Liulietong
ce3e8ec870 [HUDI-1667]: Fix a null value related bug for spark vectorized reader. (#2636) 2021-03-20 07:54:20 -07:00
Volodymyr Burenin
900de34e45 [HUDI-1650] Custom avro kafka deserializer. (#2619)
* Custom avro kafka deserializer

Co-authored-by: volodymyr.burenin <volodymyr.burenin@cloudkitchens.com>
Co-authored-by: Sivabalan Narayanan <sivabala@uber.com>
2021-03-20 00:51:08 -07:00
xiarixiaoyao
d429169ff7 [HUDI-1688]hudi write should uncache rdd, when the write operation is finnished (#2673) 2021-03-18 10:19:18 -07:00
n3nash
74241947c1 [HUDI-845] Added locking capability to allow multiple writers (#2374)
* [HUDI-845] Added locking capability to allow multiple writers
1. Added LockProvider API for pluggable lock methodologies
2. Added Resolution Strategy API to allow for pluggable conflict resolution
3. Added TableService client API to schedule table services
4. Added Transaction Manager for wrapping actions within transactions
2021-03-16 16:43:53 -07:00
Sivabalan Narayanan
b038623ed3 [HUDI 1615] Fixing null schema in bulk_insert row writer path (#2653)
* [HUDI-1615] Avoid passing in null schema from row writing/deltastreamer
* Fixing null schema in bulk insert row writer path
* Fixing tests

Co-authored-by: vc <vinoth@apache.org>
2021-03-16 09:44:11 -07:00
pengzhiwei
bc883db5de [HUDI-1636] Support Builder Pattern To Build Table Properties For HoodieTableConfig (#2596) 2021-03-05 14:10:27 +08:00
Raymond Xu
899ae70fdb [HUDI-1587] Add latency and freshness support (#2541)
Save min and max of event time in each commit and compute the latency and freshness metrics.
2021-03-03 20:13:12 -08:00
liujinhui
8c2197ae5e [HUDI-1269] Make whether the failure of connect hive affects hudi ingest process configurable (#2443)
Co-authored-by: Sivabalan Narayanan <sivabala@uber.com>
2021-02-25 10:09:32 -05:00
n3nash
ffcfb58bac [HUDI-1486] Remove inline inflight rollback in hoodie writer (#2359)
1. Refactor rollback and move cleaning failed commits logic into cleaner
2. Introduce hoodie heartbeat to ascertain failed commits
3. Fix test cases
2021-02-19 20:12:22 -08:00
Sivabalan Narayanan
c9fcf964b2 [HUDI-1315] Adding builder for HoodieTableMetaClient initialization (#2534) 2021-02-20 09:54:26 +08:00
pengzhiwei
37972071ff [HUDI-1109] Support Spark Structured Streaming read from Hudi table (#2485) 2021-02-17 03:36:29 -08:00
teeyog
26da4f5462 [HUDI-1526] Translate the api partitionBy in spark datasource to hoodie.datasource.write.partitionpath.field (#2431) 2021-02-10 12:07:54 -05:00
Sun Ke
c30481f4b0 [HUDI-1545] Add test cases for INSERT_OVERWRITE Operation (#2483)
Co-authored-by: sunke.03 <sunke.03@bytedance.com>
2021-02-07 21:47:01 -08:00
pengzhiwei
0d8a4d0a56 [HUDI-1550] Honor ordering field for MOR Spark datasource reader (#2497) 2021-02-01 21:04:27 +08:00
jiangjiguang
5d053b495b [MINOR] Quickstart.generateUpdates method add check (#2505) 2021-01-30 10:28:00 +08:00
satishkotha
2d2d5c83b1 [HUDI-1555] Remove isEmpty to improve clustering execution performance (#2502) 2021-01-29 10:27:09 -08:00
wenningd
976420c49a [HUDI-1512] Fix spark 2 unit tests failure with Spark 3 (#2412)
* [HUDI-1512] Fix spark 2 unit tests failure with Spark 3

* resolve comments

Co-authored-by: Wenning Ding <wenningd@amazon.com>
2021-01-21 07:04:28 -08:00
Vinoth Chandar
3719e7b388 Moving to 0.8.0-SNAPSHOT on master branch. 2021-01-20 11:31:22 -08:00
liujinhui
244f6def9c [MINOR] Fix dataSource cannot use hoodie.datasource.hive_sync.auto_create_database (#2444)
fix dataSource cannot use hoodie.datasource.hive_sync.auto_create_database
2021-01-20 22:58:18 +08:00
vinoth chandar
5ca0625b27 [HUDI 1308] Harden RFC-15 Implementation based on production testing (#2441)
Addresses leaks, perf degradation observed during testing. These were regressions from the original rfc-15 PoC implementation.

* Pass a single instance of HoodieTableMetadata everywhere
* Fix tests and add config for enabling metrics
 - Removed special casing of assumeDatePartitioning inside FSUtils#getAllPartitionPaths()
 - Consequently, IOException is never thrown and many files had to be adjusted
- More diligent handling of open file handles in metadata table
 - Added config for controlling reuse of connections
 - Added config for turning off fallback to listing, so we can see tests fail
 - Changed all ipf listing code to cache/amortize the open/close for better performance
 - Timelineserver also reuses connections, for better performance
 - Without timelineserver, when metadata table is opened from executors, reuse is not allowed
 - HoodieMetadataConfig passed into HoodieTableMetadata#create as argument.
 -  Fix TestHoodieBackedTableMetadata#testSync
2021-01-19 21:20:28 -08:00
Sivabalan Narayanan
b9c2856d16 [HUDI-1535] Fix 0.7.0 snapshot (#2456)
* Revert "[MINOR] Bumping snapshot version to 0.7.0 (#2435)"

This reverts commit a43e191d6c.

* Fixing 0.7.0 snapshot bump
2021-01-19 12:20:43 -08:00
Sivabalan Narayanan
a43e191d6c [MINOR] Bumping snapshot version to 0.7.0 (#2435) 2021-01-16 09:56:28 -05:00
lw0090
de42adc230 [HUDI-1520] add configure for spark sql overwrite use INSERT_OVERWRITE_TABLE (#2428) 2021-01-11 09:07:47 -08:00
Udit Mehrotra
7ce3ac778e [HUDI-1479] Use HoodieEngineContext to parallelize fetching of partiton paths (#2417)
* [HUDI-1479] Use HoodieEngineContext to parallelize fetching of partition paths

* Adding testClass for FileSystemBackedTableMetadata

Co-authored-by: Nishith Agarwal <nagarwal@uber.com>
2021-01-10 21:19:52 -08:00
Gary Li
23e93d05c0 [MINOR] fix spark 3 build for incremental query on MOR (#2425) 2021-01-09 21:08:55 -08:00
lw0090
368c1a8f5c [HUDI-1399] support a independent clustering spark job to asynchronously clustering (#2379)
* [HUDI-1481]  add  structured streaming and delta streamer clustering unit test

* [HUDI-1399] support a independent clustering spark job to asynchronously clustering

* [HUDI-1399]  support a  independent clustering spark job to asynchronously clustering

* [HUDI-1498] Read clustering plan from requested file for inflight instant (#2389)

* [HUDI-1399]  support  a independent clustering spark job with schedule generate instant time

Co-authored-by: satishkotha <satishkotha@uber.com>
2021-01-09 17:30:16 -08:00
Gary Li
79ec7b4894 [HUDI-920] Support Incremental query for MOR table (#1938) 2021-01-09 08:02:08 -08:00
Udit Mehrotra
17df517b81 [HUDI-1510] Move HoodieEngineContext and its dependencies to hudi-common (#2410) 2021-01-07 11:34:06 -08:00
wangxianghu
b593f10629 [MINOR] Rename unit test package of hudi-spark3 from scala to java (#2411) 2021-01-06 23:07:24 +08:00
Ryan Pifer
4b94529aaf [HUDI-1325] [RFC-15] Merge updates of unsynced instants to metadata table (apache#2342)
[RFC-15] Fix partition key in metadata table when bootstrapping from file system (apache#2387)

Co-authored-by: Ryan Pifer <ryanpife@amazon.com>
2021-01-04 07:59:47 -08:00
Udit Mehrotra
4e64226844 [HUDI-1450] Use metadata table for listing in HoodieROTablePathFilter (apache#2326)
[HUDI-1394] [RFC-15] Use metadata table (if present) to get all partition paths (apache#2351)
2021-01-04 07:59:47 -08:00
Gary Li
c5e8a024f6 [HUDI-1418] Set up flink client unit test infra (#2281) 2020-12-31 08:57:22 +08:00
pengzhiwei
b83d1d3e61 [HUDI-1484] Escape the partition value in HiveSyncTool (#2363) 2020-12-28 23:02:36 -05:00
lw0090
9e6889a8ce [HUDI-1481] add structured streaming and delta streamer clustering unit test (#2360) 2020-12-27 20:27:09 -08:00
lw0090
e807bb895e [HUDI-1487] fix unit test testCopyOnWriteStorage random failed (#2364) 2020-12-25 09:54:23 -08:00
wenningd
286055ce34 [HUDI-1451] Support bulk insert v2 with Spark 3.0.0 (#2328)
Co-authored-by: Wenning Ding <wenningd@amazon.com>

- Added support for bulk insert v2 with datasource v2 api in Spark 3.0.0.
2020-12-25 09:43:34 -05:00
wenningd
89f482eaf2 [HUDI-1489] Fix null pointer exception when reading updated written bootstrap table (#2370)
Co-authored-by: Wenning Ding <wenningd@amazon.com>
2020-12-23 11:26:24 -08:00
wangxianghu
f8ccb2872d [HUDI-1471] Make QuickStartUtils generate deletes according to specific ts (#2357) 2020-12-22 21:14:18 +08:00
Sivabalan Narayanan
33d338f392 [HUDI-115] Adding DefaultHoodieRecordPayload to honor ordering with combineAndGetUpdateValue (#2311)
* Added ability to pass in `properties` to payload methods, so they can perform table/record specific merges
* Added default methods so existing payload classes are backwards compatible. 
* Adding DefaultHoodiePayload to honor ordering while merging two records
* Fixing default payload based on feedback
2020-12-19 19:19:42 -08:00
lw0090
8b5d6f9430 [HUDI-1437] support more accurate spark JobGroup for better performance tracking (#2322) 2020-12-17 15:20:13 -08:00
wangxianghu
4ddfc61d70 [MINOR] Make QuickstartUtil generate random timestamp instead of 0 (#2340) 2020-12-17 18:00:23 +08:00
wenningd
26cdc457f6 [HUDI-1376] Drop Hudi metadata cols at the beginning of Spark datasource writing (#2233)
Co-authored-by: Wenning Ding <wenningd@amazon.com>
2020-12-15 16:20:48 -08:00