swuferhong
5448cdde7e
[HUDI-2170] [HUDI-1763] Always choose the latest record for HoodieRecordPayload ( #3401 )
2021-08-11 10:20:55 +08:00
Sivabalan Narayanan
1196736185
[HUDI-1129] Improving schema evolution support in hudi ( #2927 )
...
* Adding support to ingest records with old schema after table's schema is evolved
* Rebasing against latest master
- Trimming test file to be < 800 lines
- Renaming config names
* Addressing feedback
Co-authored-by: Vinoth Chandar <vinoth@apache.org >
2021-08-10 09:15:37 -07:00
swuferhong
21db6d7a84
[HUDI-1771] Propagate CDC format for hoodie ( #3285 )
2021-08-10 20:23:23 +08:00
zhangyue19921010
b4441abcf7
[HUDI-2194] Skip the latest N partitions when choosing partitions to create ClusteringPlan ( #3300 )
...
* skip from latest partitions based on hoodie.clustering.plan.strategy.daybased.skipfromlatest.partitions && 0(default means skip nothing)
* change config verison
* add ut
Co-authored-by: yuezhang <yuezhang@freewheel.tv >
2021-08-09 10:10:15 -07:00
Sagar Sumit
70b6bd485f
[HUDI-1468] Support custom clustering strategies and preserve commit metadata as part of clustering ( #3419 )
...
Co-authored-by: Satish Kotha <satishkotha@uber.com >
2021-08-06 22:53:08 -04:00
Danny Chan
20feb1a897
[HUDI-2278] Use INT64 timestamp with precision 3 for flink parquet writer ( #3414 )
2021-08-06 11:06:21 +08:00
Danny Chan
b7586a5632
[HUDI-2274] Allows INSERT duplicates for Flink MOR table ( #3403 )
2021-08-06 10:30:52 +08:00
Sivabalan Narayanan
1df5ded433
[HUDI-2273] Migrating some long running tests to functional test profile ( #3398 )
2021-08-04 19:08:50 -04:00
yuzhaojing
b8b9d6db83
[HUDI-2087] Support Append only in Flink stream ( #3390 )
...
Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com >
2021-08-04 17:53:20 +08:00
Danny Chan
02331fc223
[HUDI-2258] Metadata table for flink ( #3381 )
2021-08-04 10:54:55 +08:00
wenningd
91bb0d1318
[HUDI-2255] Refactor Datasource options ( #3373 )
...
Co-authored-by: Wenning Ding <wenningd@amazon.com >
2021-08-03 17:50:30 -07:00
Udit Mehrotra
1ff2d3459a
[HUDI-1371] [HUDI-1893] Support metadata based listing for Spark DataSource and Spark SQL ( #2893 )
2021-08-03 14:47:40 -07:00
satishkotha
826a04d142
[HUDI-2072] Add pre-commit validator framework ( #3153 )
...
* [HUDI-2072] Add pre-commit validator framework
* trigger Travis rebuild
2021-08-03 12:07:45 -07:00
Danny Chan
bec23bda50
[HUDI-2269] Release the disk map resource for flink streaming reader ( #3384 )
2021-08-03 13:55:35 +08:00
Sivabalan Narayanan
fe508376fa
[HUDI-2177][HUDI-2200] Adding virtual keys support for MOR table ( #3315 )
2021-08-02 09:45:09 -04:00
Gary Li
6353fc865f
[HUDI-2218] Fix missing HoodieWriteStat in HoodieCreateHandle ( #3341 )
2021-07-30 02:36:57 -07:00
Danny Chan
c4e45a0010
[HUDI-2254] Builtin sort operator for flink bulk insert ( #3372 )
2021-07-30 16:58:11 +08:00
Shawy Geng
44e41dc9bb
[HUDI-2117] Unpersist the input rdd after the commit is completed to … ( #3207 )
...
Co-authored-by: Vinoth Chandar <vinoth@apache.org >
2021-07-29 08:16:58 -07:00
pengzhiwei
bbadac7de1
[HUDI-1425] Performance loss with the additional hoodieRecords.isEmpty() in HoodieSparkSqlWriter#write ( #2296 )
2021-07-28 21:30:18 -07:00
rmahindra123
8fef50e237
[HUDI-2044] Integrate consumers with rocksDB and compression within External Spillable Map ( #3318 )
2021-07-28 01:31:03 -04:00
Danny Chan
9d2a65a6a6
[HUDI-2209] Bulk insert for flink writer ( #3334 )
2021-07-27 10:58:23 +08:00
Sivabalan Narayanan
61148c1c43
[HUDI-2176, 2178, 2179] Adding virtual key support to COW table ( #3306 )
2021-07-26 17:21:04 -04:00
xiarixiaoyao
5353243449
[HUDI-2214]residual temporary files after clustering are not cleaned up ( #3335 )
2021-07-26 10:26:20 -07:00
Gary Li
a5638b995b
[MINOR] Close log scanner after compaction completed ( #3294 )
2021-07-26 17:39:13 +08:00
rmahindra123
a14b19fdd5
[HUDI-1241] Automate the generation of configs webpage as configs are added to Hudi repo ( #3302 )
2021-07-23 21:33:34 -07:00
Xuedong Luan
71e14cf866
[HUDI-2213] Remove unnecessary parameter for HoodieMetrics constructor and fix NPE in UT ( #3333 )
2021-07-23 19:57:35 +08:00
Xuedong Luan
6d592c5896
[HUDI-2211] Fix NullPointerException in TestHoodieConsoleMetrics ( #3331 )
2021-07-23 11:22:54 +08:00
pengzhiwei
5a2f3d439e
[HUDI-2139] MergeInto MOR Table May Result InCorrect Result ( #3230 )
2021-07-23 10:19:43 +08:00
Danny Chan
2370a9facb
[HUDI-2204] Add marker files for flink writer ( #3316 )
2021-07-22 13:34:15 +08:00
Danny Chan
858e84b5b2
[HUDI-2198] Clean and reset the bootstrap events for coordinator when task failover ( #3304 )
2021-07-21 10:13:05 +08:00
Samrat
a086d255c8
[HUDI-1860] Add INSERT_OVERWRITE and INSERT_OVERWRITE_TABLE support to DeltaStreamer ( #3184 )
2021-07-19 21:49:43 -04:00
Sivabalan Narayanan
d5026e9a24
[HUDI-2161] Adding support to disable meta columns with bulk insert operation ( #3247 )
2021-07-19 20:43:48 -04:00
yuzhao.cyz
50c2b76d72
Revert "[HUDI-2087] Support Append only in Flink stream ( #3252 )"
...
This reverts commit 783c9cb3
2021-07-16 21:36:27 +08:00
liujinhui
3b264e80d9
[HUDI-1633] Make callback return HoodieWriteStat ( #2445 )
...
* CALLBACK add partitionPath
* callback can send hoodieWriteStat
* add ApiMaturityLevel
2021-07-16 12:37:07 +08:00
Jintao Guan
38cd74b563
[MINOR] Allow users to choose ORC as base file format in Spark SQL ( #3279 )
2021-07-16 12:24:41 +08:00
rmahindra123
d024439764
[HUDI-2029] Implement compression for DiskBasedMap in Spillable Map ( #3128 )
2021-07-14 22:57:38 -04:00
vinoth chandar
75040ee9e5
[HUDI-2149] Ensure and Audit docs for every configuration class in the codebase ( #3272 )
...
- Added docs when missing
- Rewrote, reworded as needed
- Made couple more classes extend HoodieConfig
2021-07-14 10:56:08 -07:00
zhangyue19921010
c1810f210e
[MINOR] Correct the logs of enable/not-enable async cleaner service. ( #3271 )
...
Co-authored-by: yuezhang <yuezhang@freewheel.tv >
2021-07-15 00:08:29 +08:00
Jintao Guan
2debb9b3ed
[HUDI-1828] Update unit tests to support ORC as the base file format ( #3237 )
2021-07-15 00:05:42 +08:00
Sagar Sumit
b0089b894a
[MINOR] Fix EXTERNAL_RECORD_AND_SCHEMA_TRANSFORMATION config ( #3250 )
2021-07-13 00:24:40 -04:00
zhangyue19921010
c8a2033c27
[HUDI-2144]Bug-Fix:Offline clustering(HoodieClusteringJob) will cause insert action losing data ( #3240 )
...
* fixed
* add testUpsertPartitionerWithSmallFileHandlingAndClusteringPlan ut
* fix CheckStyle
Co-authored-by: yuezhang <yuezhang@freewheel.tv >
2021-07-12 18:14:17 -07:00
Sagar Sumit
5804ad8e32
[HUDI-1483] Support async clustering for deltastreamer and Spark streaming ( #3142 )
...
- Integrate async clustering service with HoodieDeltaStreamer and HoodieStreamingSink
- Added methods in HoodieAsyncService to reuse code
2021-07-11 14:43:38 -04:00
yuzhaojing
783c9cb369
[HUDI-2087] Support Append only in Flink stream ( #3252 )
...
Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com >
2021-07-10 14:49:35 +08:00
vinoth chandar
b4562e86e4
Revert "[HUDI-2087] Support Append only in Flink stream ( #3174 )" ( #3251 )
...
This reverts commit 371526789d .
2021-07-09 11:20:09 -07:00
yuzhaojing
371526789d
[HUDI-2087] Support Append only in Flink stream ( #3174 )
...
Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com >
2021-07-09 16:06:32 +08:00
Sivabalan Narayanan
8c0dbaa9b3
[HUDI-2009] Fixing extra commit metadata in row writer path ( #3075 )
2021-07-08 03:07:27 -04:00
Yungthuis
1d3cd06572
[HUDI-2134]Add generics to avoif forced conversion in BaseSparkCommitActionExecutor#partition ( #3232 )
2021-07-08 13:31:38 +08:00
Sivabalan Narayanan
16e90d30ea
[HUDI-1105] Adding dedup support for Bulk Insert w/ Rows ( #2206 )
2021-07-07 17:38:26 -04:00
Sivabalan Narayanan
ea9e5d0e8b
[HUDI-1104] Adding support for UserDefinedPartitioners and SortModes to BulkInsert with Rows ( #3149 )
2021-07-07 11:15:25 -04:00
Prashant Wason
990820476a
[HUDI-2140] Fixed the unit test TestHoodieBackedMetadata.testOnlyValidPartitionsAdded. ( #3234 )
2021-07-06 23:50:27 -07:00