1
0
Commit Graph

29 Commits

Author SHA1 Message Date
leesf
112732db81 [HUDI-1922] Bulk insert with row writer supports mor table (#2981) 2021-05-25 09:40:22 -07:00
Sivabalan Narayanan
5d1f592395 [HUDI-1806] Honoring skipROSuffix in spark ds (#2882)
* Honoring skipROSuffix in spark ds

* Adding tests

* fixing scala checkstype issue
2021-05-18 16:11:39 -07:00
xoln ann
12443e4187 [HUDI-1446] Support skip bootstrapIndex's init in abstract fs view init (#2520)
Co-authored-by: zhongliang <zhongliang@kuaishou.com>
Co-authored-by: Sivabalan Narayanan <sivabala@uber.com>
2021-05-14 00:29:26 -04:00
lw0090
5a8b2a4f86 [HUDI-1768] add spark datasource unit test for schema validate add column (#2776) 2021-05-11 16:49:18 -04:00
pengzhiwei
aacb8be521 [HUDI-1415] Read Hoodie Table As Spark DataSource Table (#2283) 2021-04-20 14:21:38 -07:00
Aditya Tiwari
ec2334ceac [HUDI-1716]: Resolving default values for schema from dataframe (#2765)
- Adding default values and setting null as first entry in UNION data types in avro schema. 

Co-authored-by: Aditya Tiwari <aditya.tiwari@flipkart.com>
2021-04-19 10:05:20 -04:00
Sivabalan Narayanan
8d29863c86 [HUDI-1615] Fixing usage of NULL schema for delete operation in HoodieSparkSqlWriter (#2777) 2021-04-14 15:35:39 +08:00
pengzhiwei
684622c7c9 [HUDI-1591] Implement Spark's FileIndex for Hudi to support queries via Hudi DataSource using non-globbed table path and partition pruning (#2651) 2021-04-01 11:12:28 -07:00
Sivabalan Narayanan
b038623ed3 [HUDI 1615] Fixing null schema in bulk_insert row writer path (#2653)
* [HUDI-1615] Avoid passing in null schema from row writing/deltastreamer
* Fixing null schema in bulk insert row writer path
* Fixing tests

Co-authored-by: vc <vinoth@apache.org>
2021-03-16 09:44:11 -07:00
pengzhiwei
bc883db5de [HUDI-1636] Support Builder Pattern To Build Table Properties For HoodieTableConfig (#2596) 2021-03-05 14:10:27 +08:00
Raymond Xu
899ae70fdb [HUDI-1587] Add latency and freshness support (#2541)
Save min and max of event time in each commit and compute the latency and freshness metrics.
2021-03-03 20:13:12 -08:00
Sivabalan Narayanan
c9fcf964b2 [HUDI-1315] Adding builder for HoodieTableMetaClient initialization (#2534) 2021-02-20 09:54:26 +08:00
pengzhiwei
37972071ff [HUDI-1109] Support Spark Structured Streaming read from Hudi table (#2485) 2021-02-17 03:36:29 -08:00
teeyog
26da4f5462 [HUDI-1526] Translate the api partitionBy in spark datasource to hoodie.datasource.write.partitionpath.field (#2431) 2021-02-10 12:07:54 -05:00
Sun Ke
c30481f4b0 [HUDI-1545] Add test cases for INSERT_OVERWRITE Operation (#2483)
Co-authored-by: sunke.03 <sunke.03@bytedance.com>
2021-02-07 21:47:01 -08:00
pengzhiwei
0d8a4d0a56 [HUDI-1550] Honor ordering field for MOR Spark datasource reader (#2497) 2021-02-01 21:04:27 +08:00
satishkotha
2d2d5c83b1 [HUDI-1555] Remove isEmpty to improve clustering execution performance (#2502) 2021-01-29 10:27:09 -08:00
lw0090
de42adc230 [HUDI-1520] add configure for spark sql overwrite use INSERT_OVERWRITE_TABLE (#2428) 2021-01-11 09:07:47 -08:00
Udit Mehrotra
7ce3ac778e [HUDI-1479] Use HoodieEngineContext to parallelize fetching of partiton paths (#2417)
* [HUDI-1479] Use HoodieEngineContext to parallelize fetching of partition paths

* Adding testClass for FileSystemBackedTableMetadata

Co-authored-by: Nishith Agarwal <nagarwal@uber.com>
2021-01-10 21:19:52 -08:00
lw0090
368c1a8f5c [HUDI-1399] support a independent clustering spark job to asynchronously clustering (#2379)
* [HUDI-1481]  add  structured streaming and delta streamer clustering unit test

* [HUDI-1399] support a independent clustering spark job to asynchronously clustering

* [HUDI-1399]  support a  independent clustering spark job to asynchronously clustering

* [HUDI-1498] Read clustering plan from requested file for inflight instant (#2389)

* [HUDI-1399]  support  a independent clustering spark job with schedule generate instant time

Co-authored-by: satishkotha <satishkotha@uber.com>
2021-01-09 17:30:16 -08:00
Gary Li
79ec7b4894 [HUDI-920] Support Incremental query for MOR table (#1938) 2021-01-09 08:02:08 -08:00
Ryan Pifer
4b94529aaf [HUDI-1325] [RFC-15] Merge updates of unsynced instants to metadata table (apache#2342)
[RFC-15] Fix partition key in metadata table when bootstrapping from file system (apache#2387)

Co-authored-by: Ryan Pifer <ryanpife@amazon.com>
2021-01-04 07:59:47 -08:00
Udit Mehrotra
4e64226844 [HUDI-1450] Use metadata table for listing in HoodieROTablePathFilter (apache#2326)
[HUDI-1394] [RFC-15] Use metadata table (if present) to get all partition paths (apache#2351)
2021-01-04 07:59:47 -08:00
lw0090
9e6889a8ce [HUDI-1481] add structured streaming and delta streamer clustering unit test (#2360) 2020-12-27 20:27:09 -08:00
lw0090
e807bb895e [HUDI-1487] fix unit test testCopyOnWriteStorage random failed (#2364) 2020-12-25 09:54:23 -08:00
wenningd
89f482eaf2 [HUDI-1489] Fix null pointer exception when reading updated written bootstrap table (#2370)
Co-authored-by: Wenning Ding <wenningd@amazon.com>
2020-12-23 11:26:24 -08:00
Sivabalan Narayanan
33d338f392 [HUDI-115] Adding DefaultHoodieRecordPayload to honor ordering with combineAndGetUpdateValue (#2311)
* Added ability to pass in `properties` to payload methods, so they can perform table/record specific merges
* Added default methods so existing payload classes are backwards compatible. 
* Adding DefaultHoodiePayload to honor ordering while merging two records
* Fixing default payload based on feedback
2020-12-19 19:19:42 -08:00
wenningd
26cdc457f6 [HUDI-1376] Drop Hudi metadata cols at the beginning of Spark datasource writing (#2233)
Co-authored-by: Wenning Ding <wenningd@amazon.com>
2020-12-15 16:20:48 -08:00
wenningd
fce1453fa6 [HUDI-1040] Make Hudi support Spark 3 (#2208)
* Fix flaky MOR unit test

* Update Spark APIs to make it be compatible with both spark2 & spark3

* Refactor bulk insert v2 part to make Hudi be able to compile with Spark3

* Add spark3 profile to handle fasterxml & spark version

* Create hudi-spark-common module & refactor hudi-spark related modules

Co-authored-by: Wenning Ding <wenningd@amazon.com>
2020-12-09 15:52:23 -08:00