lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Sivabalan Narayanan	7d9f9d7d82	[HUDI-1991] Fixing drop dups exception in bulk insert row writer path (#3055 )	2021-06-14 09:55:52 +08:00
wangxianghu	7261f08507	[HUDI-1929] Support configure KeyGenerator by type (#2993 )	2021-06-08 09:26:10 -04:00
pengzhiwei	f760ec543e	[HUDI-1659] Basic Implement Of Spark Sql Support For Hoodie (#2645 ) Main functions: Support create table for hoodie. Support CTAS. Support Insert for hoodie. Including dynamic partition and static partition insert. Support MergeInto for hoodie. Support DELETE Support UPDATE Both support spark2 & spark3 based on DataSourceV1. Main changes: Add sql parser for spark2. Add HoodieAnalysis for sql resolve and logical plan rewrite. Add commands implementation for CREATE TABLE、INSERT、MERGE INTO & CTAS. In order to push down the update&insert logical to the HoodieRecordPayload for MergeInto, I make same change to the HoodieWriteHandler and other related classes. 1、Add the inputSchema for parser the incoming record. This is because the inputSchema for MergeInto is different from writeSchema as there are some transforms in the update& insert expression. 2、Add WRITE_SCHEMA to HoodieWriteConfig to pass the write schema for merge into. 3、Pass properties to HoodieRecordPayload#getInsertValue to pass the insert expression and table schema. Verify this pull request Add TestCreateTable for test create hoodie tables and CTAS. Add TestInsertTable for test insert hoodie tables. Add TestMergeIntoTable for test merge hoodie tables. Add TestUpdateTable for test update hoodie tables. Add TestDeleteTable for test delete hoodie tables. Add TestSqlStatement for test supported ddl/dml currently.	2021-06-07 23:24:32 -07:00
Vinay Patil	2a7e1e091e	[HUDI-1942] Add Default value for HIVE_AUTO_CREATE_DATABASE_OPT_KEY in HoodieSparkSqlWriter (#3036 )	2021-06-05 18:02:26 -04:00
wangxianghu	870e97b5f8	[MINOR] Remove unused method in DataSourceUtils (#3031 )	2021-06-03 10:24:51 -07:00
pengzhiwei	dcd7c331dc	[HUDI-1879] Support Partition Prune For MergeOnRead Snapshot Table (#2926 )	2021-05-29 07:50:24 -07:00
leesf	112732db81	[HUDI-1922] Bulk insert with row writer supports mor table (#2981 )	2021-05-25 09:40:22 -07:00
wangxianghu	e7020748b5	[HUDI-1920] Set archived as the default value of HOODIE_ARCHIVELOG_FOLDER_PROP_NAME (#2978 )	2021-05-25 16:29:55 +08:00
mpouttu	369a849337	[HUDI-1873] collect() call causing issues with very large upserts (#2907 ) Co-authored-by: Sivabalan Narayanan <sivabala@uber.com>	2021-05-24 01:29:01 -04:00
Sivabalan Narayanan	5d1f592395	[HUDI-1806] Honoring skipROSuffix in spark ds (#2882 ) * Honoring skipROSuffix in spark ds * Adding tests * fixing scala checkstype issue	2021-05-18 16:11:39 -07:00
xoln ann	12443e4187	[HUDI-1446] Support skip bootstrapIndex's init in abstract fs view init (#2520 ) Co-authored-by: zhongliang <zhongliang@kuaishou.com> Co-authored-by: Sivabalan Narayanan <sivabala@uber.com>	2021-05-14 00:29:26 -04:00
lw0090	5a8b2a4f86	[HUDI-1768] add spark datasource unit test for schema validate add column (#2776 )	2021-05-11 16:49:18 -04:00
pengzhiwei	aacb8be521	[HUDI-1415] Read Hoodie Table As Spark DataSource Table (#2283 )	2021-04-20 14:21:38 -07:00
Aditya Tiwari	ec2334ceac	[HUDI-1716]: Resolving default values for schema from dataframe (#2765 ) - Adding default values and setting null as first entry in UNION data types in avro schema. Co-authored-by: Aditya Tiwari <aditya.tiwari@flipkart.com>	2021-04-19 10:05:20 -04:00
Sivabalan Narayanan	8d29863c86	[HUDI-1615] Fixing usage of NULL schema for delete operation in HoodieSparkSqlWriter (#2777 )	2021-04-14 15:35:39 +08:00
Danny Chan	ab4a7b0b4a	[HUDI-1788] Insert overwrite (table) for Flink writer (#2808 ) Supports `INSERT OVERWRITE` and `INSERT OVERWRITE TABLE` for Flink writer.	2021-04-14 10:23:37 +08:00
wangxianghu	f3777f44fe	[MINOR] Remove unused imports and some other checkstyle issues (#2800 )	2021-04-11 21:42:34 +08:00
pengzhiwei	684622c7c9	[HUDI-1591] Implement Spark's FileIndex for Hudi to support queries via Hudi DataSource using non-globbed table path and partition pruning (#2651 )	2021-04-01 11:12:28 -07:00
Gary Li	452f5e2d66	[HOTFIX] close spark session in functional test suite and disable spark3 test for spark2 (#2727 )	2021-03-29 06:04:48 -07:00
garyli1019	6e803e08b1	Moving to 0.9.0-SNAPSHOT on master branch.	2021-03-24 21:37:14 +08:00
Liulietong	ce3e8ec870	[HUDI-1667]: Fix a null value related bug for spark vectorized reader. (#2636 )	2021-03-20 07:54:20 -07:00
Volodymyr Burenin	900de34e45	[HUDI-1650] Custom avro kafka deserializer. (#2619 ) * Custom avro kafka deserializer Co-authored-by: volodymyr.burenin <volodymyr.burenin@cloudkitchens.com> Co-authored-by: Sivabalan Narayanan <sivabala@uber.com>	2021-03-20 00:51:08 -07:00
xiarixiaoyao	d429169ff7	[HUDI-1688]hudi write should uncache rdd， when the write operation is finnished (#2673 )	2021-03-18 10:19:18 -07:00
n3nash	74241947c1	[HUDI-845] Added locking capability to allow multiple writers (#2374 ) * [HUDI-845] Added locking capability to allow multiple writers 1. Added LockProvider API for pluggable lock methodologies 2. Added Resolution Strategy API to allow for pluggable conflict resolution 3. Added TableService client API to schedule table services 4. Added Transaction Manager for wrapping actions within transactions	2021-03-16 16:43:53 -07:00
Sivabalan Narayanan	b038623ed3	[HUDI 1615] Fixing null schema in bulk_insert row writer path (#2653 ) * [HUDI-1615] Avoid passing in null schema from row writing/deltastreamer * Fixing null schema in bulk insert row writer path * Fixing tests Co-authored-by: vc <vinoth@apache.org>	2021-03-16 09:44:11 -07:00
pengzhiwei	bc883db5de	[HUDI-1636] Support Builder Pattern To Build Table Properties For HoodieTableConfig (#2596 )	2021-03-05 14:10:27 +08:00
Raymond Xu	899ae70fdb	[HUDI-1587] Add latency and freshness support (#2541 ) Save min and max of event time in each commit and compute the latency and freshness metrics.	2021-03-03 20:13:12 -08:00
liujinhui	8c2197ae5e	[HUDI-1269] Make whether the failure of connect hive affects hudi ingest process configurable (#2443 ) Co-authored-by: Sivabalan Narayanan <sivabala@uber.com>	2021-02-25 10:09:32 -05:00
n3nash	ffcfb58bac	[HUDI-1486] Remove inline inflight rollback in hoodie writer (#2359 ) 1. Refactor rollback and move cleaning failed commits logic into cleaner 2. Introduce hoodie heartbeat to ascertain failed commits 3. Fix test cases	2021-02-19 20:12:22 -08:00
Sivabalan Narayanan	c9fcf964b2	[HUDI-1315] Adding builder for HoodieTableMetaClient initialization (#2534 )	2021-02-20 09:54:26 +08:00
pengzhiwei	37972071ff	[HUDI-1109] Support Spark Structured Streaming read from Hudi table (#2485 )	2021-02-17 03:36:29 -08:00
teeyog	26da4f5462	[HUDI-1526] Translate the api partitionBy in spark datasource to hoodie.datasource.write.partitionpath.field (#2431 )	2021-02-10 12:07:54 -05:00
Sun Ke	c30481f4b0	[HUDI-1545] Add test cases for INSERT_OVERWRITE Operation (#2483 ) Co-authored-by: sunke.03 <sunke.03@bytedance.com>	2021-02-07 21:47:01 -08:00
pengzhiwei	0d8a4d0a56	[HUDI-1550] Honor ordering field for MOR Spark datasource reader (#2497 )	2021-02-01 21:04:27 +08:00
jiangjiguang	5d053b495b	[MINOR] Quickstart.generateUpdates method add check (#2505 )	2021-01-30 10:28:00 +08:00
satishkotha	2d2d5c83b1	[HUDI-1555] Remove isEmpty to improve clustering execution performance (#2502 )	2021-01-29 10:27:09 -08:00
wenningd	976420c49a	[HUDI-1512] Fix spark 2 unit tests failure with Spark 3 (#2412 ) * [HUDI-1512] Fix spark 2 unit tests failure with Spark 3 * resolve comments Co-authored-by: Wenning Ding <wenningd@amazon.com>	2021-01-21 07:04:28 -08:00
Vinoth Chandar	3719e7b388	Moving to 0.8.0-SNAPSHOT on master branch.	2021-01-20 11:31:22 -08:00
liujinhui	244f6def9c	[MINOR] Fix dataSource cannot use hoodie.datasource.hive_sync.auto_create_database (#2444 ) fix dataSource cannot use hoodie.datasource.hive_sync.auto_create_database	2021-01-20 22:58:18 +08:00
vinoth chandar	5ca0625b27	[HUDI 1308] Harden RFC-15 Implementation based on production testing (#2441 ) Addresses leaks, perf degradation observed during testing. These were regressions from the original rfc-15 PoC implementation. * Pass a single instance of HoodieTableMetadata everywhere * Fix tests and add config for enabling metrics - Removed special casing of assumeDatePartitioning inside FSUtils#getAllPartitionPaths() - Consequently, IOException is never thrown and many files had to be adjusted - More diligent handling of open file handles in metadata table - Added config for controlling reuse of connections - Added config for turning off fallback to listing, so we can see tests fail - Changed all ipf listing code to cache/amortize the open/close for better performance - Timelineserver also reuses connections, for better performance - Without timelineserver, when metadata table is opened from executors, reuse is not allowed - HoodieMetadataConfig passed into HoodieTableMetadata#create as argument. - Fix TestHoodieBackedTableMetadata#testSync	2021-01-19 21:20:28 -08:00
Sivabalan Narayanan	b9c2856d16	[HUDI-1535] Fix 0.7.0 snapshot (#2456 ) * Revert "[MINOR] Bumping snapshot version to 0.7.0 (#2435)" This reverts commit `a43e191d6c`. * Fixing 0.7.0 snapshot bump	2021-01-19 12:20:43 -08:00
Sivabalan Narayanan	a43e191d6c	[MINOR] Bumping snapshot version to 0.7.0 (#2435 )	2021-01-16 09:56:28 -05:00
lw0090	de42adc230	[HUDI-1520] add configure for spark sql overwrite use INSERT_OVERWRITE_TABLE (#2428 )	2021-01-11 09:07:47 -08:00
Udit Mehrotra	7ce3ac778e	[HUDI-1479] Use HoodieEngineContext to parallelize fetching of partiton paths (#2417 ) * [HUDI-1479] Use HoodieEngineContext to parallelize fetching of partition paths * Adding testClass for FileSystemBackedTableMetadata Co-authored-by: Nishith Agarwal <nagarwal@uber.com>	2021-01-10 21:19:52 -08:00
Gary Li	23e93d05c0	[MINOR] fix spark 3 build for incremental query on MOR (#2425 )	2021-01-09 21:08:55 -08:00
lw0090	368c1a8f5c	[HUDI-1399] support a independent clustering spark job to asynchronously clustering (#2379 ) * [HUDI-1481] add structured streaming and delta streamer clustering unit test * [HUDI-1399] support a independent clustering spark job to asynchronously clustering * [HUDI-1399] support a independent clustering spark job to asynchronously clustering * [HUDI-1498] Read clustering plan from requested file for inflight instant (#2389) * [HUDI-1399] support a independent clustering spark job with schedule generate instant time Co-authored-by: satishkotha <satishkotha@uber.com>	2021-01-09 17:30:16 -08:00
Gary Li	79ec7b4894	[HUDI-920] Support Incremental query for MOR table (#1938 )	2021-01-09 08:02:08 -08:00
Udit Mehrotra	17df517b81	[HUDI-1510] Move HoodieEngineContext and its dependencies to hudi-common (#2410 )	2021-01-07 11:34:06 -08:00
wangxianghu	b593f10629	[MINOR] Rename unit test package of hudi-spark3 from scala to java (#2411 )	2021-01-06 23:07:24 +08:00
Ryan Pifer	4b94529aaf	[HUDI-1325] [RFC-15] Merge updates of unsynced instants to metadata table (apache#2342) [RFC-15] Fix partition key in metadata table when bootstrapping from file system (apache#2387) Co-authored-by: Ryan Pifer <ryanpife@amazon.com>	2021-01-04 07:59:47 -08:00

1 2

65 Commits