Sivabalan Narayanan
5d1f592395
[HUDI-1806] Honoring skipROSuffix in spark ds ( #2882 )
...
* Honoring skipROSuffix in spark ds
* Adding tests
* fixing scala checkstype issue
2021-05-18 16:11:39 -07:00
xoln ann
12443e4187
[HUDI-1446] Support skip bootstrapIndex's init in abstract fs view init ( #2520 )
...
Co-authored-by: zhongliang <zhongliang@kuaishou.com >
Co-authored-by: Sivabalan Narayanan <sivabala@uber.com >
2021-05-14 00:29:26 -04:00
lw0090
5a8b2a4f86
[HUDI-1768] add spark datasource unit test for schema validate add column ( #2776 )
2021-05-11 16:49:18 -04:00
pengzhiwei
aacb8be521
[HUDI-1415] Read Hoodie Table As Spark DataSource Table ( #2283 )
2021-04-20 14:21:38 -07:00
Aditya Tiwari
ec2334ceac
[HUDI-1716]: Resolving default values for schema from dataframe ( #2765 )
...
- Adding default values and setting null as first entry in UNION data types in avro schema.
Co-authored-by: Aditya Tiwari <aditya.tiwari@flipkart.com >
2021-04-19 10:05:20 -04:00
Sivabalan Narayanan
8d29863c86
[HUDI-1615] Fixing usage of NULL schema for delete operation in HoodieSparkSqlWriter ( #2777 )
2021-04-14 15:35:39 +08:00
pengzhiwei
684622c7c9
[HUDI-1591] Implement Spark's FileIndex for Hudi to support queries via Hudi DataSource using non-globbed table path and partition pruning ( #2651 )
2021-04-01 11:12:28 -07:00
Sivabalan Narayanan
b038623ed3
[HUDI 1615] Fixing null schema in bulk_insert row writer path ( #2653 )
...
* [HUDI-1615] Avoid passing in null schema from row writing/deltastreamer
* Fixing null schema in bulk insert row writer path
* Fixing tests
Co-authored-by: vc <vinoth@apache.org >
2021-03-16 09:44:11 -07:00
pengzhiwei
bc883db5de
[HUDI-1636] Support Builder Pattern To Build Table Properties For HoodieTableConfig ( #2596 )
2021-03-05 14:10:27 +08:00
Raymond Xu
899ae70fdb
[HUDI-1587] Add latency and freshness support ( #2541 )
...
Save min and max of event time in each commit and compute the latency and freshness metrics.
2021-03-03 20:13:12 -08:00
Sivabalan Narayanan
c9fcf964b2
[HUDI-1315] Adding builder for HoodieTableMetaClient initialization ( #2534 )
2021-02-20 09:54:26 +08:00
pengzhiwei
37972071ff
[HUDI-1109] Support Spark Structured Streaming read from Hudi table ( #2485 )
2021-02-17 03:36:29 -08:00
teeyog
26da4f5462
[HUDI-1526] Translate the api partitionBy in spark datasource to hoodie.datasource.write.partitionpath.field ( #2431 )
2021-02-10 12:07:54 -05:00
Sun Ke
c30481f4b0
[HUDI-1545] Add test cases for INSERT_OVERWRITE Operation ( #2483 )
...
Co-authored-by: sunke.03 <sunke.03@bytedance.com >
2021-02-07 21:47:01 -08:00
pengzhiwei
0d8a4d0a56
[HUDI-1550] Honor ordering field for MOR Spark datasource reader ( #2497 )
2021-02-01 21:04:27 +08:00
satishkotha
2d2d5c83b1
[HUDI-1555] Remove isEmpty to improve clustering execution performance ( #2502 )
2021-01-29 10:27:09 -08:00
lw0090
de42adc230
[HUDI-1520] add configure for spark sql overwrite use INSERT_OVERWRITE_TABLE ( #2428 )
2021-01-11 09:07:47 -08:00
Udit Mehrotra
7ce3ac778e
[HUDI-1479] Use HoodieEngineContext to parallelize fetching of partiton paths ( #2417 )
...
* [HUDI-1479] Use HoodieEngineContext to parallelize fetching of partition paths
* Adding testClass for FileSystemBackedTableMetadata
Co-authored-by: Nishith Agarwal <nagarwal@uber.com >
2021-01-10 21:19:52 -08:00
lw0090
368c1a8f5c
[HUDI-1399] support a independent clustering spark job to asynchronously clustering ( #2379 )
...
* [HUDI-1481] add structured streaming and delta streamer clustering unit test
* [HUDI-1399] support a independent clustering spark job to asynchronously clustering
* [HUDI-1399] support a independent clustering spark job to asynchronously clustering
* [HUDI-1498] Read clustering plan from requested file for inflight instant (#2389 )
* [HUDI-1399] support a independent clustering spark job with schedule generate instant time
Co-authored-by: satishkotha <satishkotha@uber.com >
2021-01-09 17:30:16 -08:00
Gary Li
79ec7b4894
[HUDI-920] Support Incremental query for MOR table ( #1938 )
2021-01-09 08:02:08 -08:00
Ryan Pifer
4b94529aaf
[HUDI-1325] [RFC-15] Merge updates of unsynced instants to metadata table (apache#2342)
...
[RFC-15] Fix partition key in metadata table when bootstrapping from file system (apache#2387)
Co-authored-by: Ryan Pifer <ryanpife@amazon.com >
2021-01-04 07:59:47 -08:00
Udit Mehrotra
4e64226844
[HUDI-1450] Use metadata table for listing in HoodieROTablePathFilter (apache#2326)
...
[HUDI-1394] [RFC-15] Use metadata table (if present) to get all partition paths (apache#2351)
2021-01-04 07:59:47 -08:00
lw0090
9e6889a8ce
[HUDI-1481] add structured streaming and delta streamer clustering unit test ( #2360 )
2020-12-27 20:27:09 -08:00
lw0090
e807bb895e
[HUDI-1487] fix unit test testCopyOnWriteStorage random failed ( #2364 )
2020-12-25 09:54:23 -08:00
wenningd
89f482eaf2
[HUDI-1489] Fix null pointer exception when reading updated written bootstrap table ( #2370 )
...
Co-authored-by: Wenning Ding <wenningd@amazon.com >
2020-12-23 11:26:24 -08:00
Sivabalan Narayanan
33d338f392
[HUDI-115] Adding DefaultHoodieRecordPayload to honor ordering with combineAndGetUpdateValue ( #2311 )
...
* Added ability to pass in `properties` to payload methods, so they can perform table/record specific merges
* Added default methods so existing payload classes are backwards compatible.
* Adding DefaultHoodiePayload to honor ordering while merging two records
* Fixing default payload based on feedback
2020-12-19 19:19:42 -08:00
wenningd
26cdc457f6
[HUDI-1376] Drop Hudi metadata cols at the beginning of Spark datasource writing ( #2233 )
...
Co-authored-by: Wenning Ding <wenningd@amazon.com >
2020-12-15 16:20:48 -08:00
wenningd
fce1453fa6
[HUDI-1040] Make Hudi support Spark 3 ( #2208 )
...
* Fix flaky MOR unit test
* Update Spark APIs to make it be compatible with both spark2 & spark3
* Refactor bulk insert v2 part to make Hudi be able to compile with Spark3
* Add spark3 profile to handle fasterxml & spark version
* Create hudi-spark-common module & refactor hudi-spark related modules
Co-authored-by: Wenning Ding <wenningd@amazon.com >
2020-12-09 15:52:23 -08:00