lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Dongwook Kwon	74d7b4d751	[HUDI-4471] Relocate AWSDmsAvroPayload class to hudi-common	2022-07-25 17:51:27 -07:00
Shiyan Xu	71c2c3102b	[HUDI-4455] Improve test classes for TestHiveSyncTool (#6202 ) Improve HiveTestService, HiveTestUtil, and related classes.	2022-07-25 19:05:34 +05:30
Shiyan Xu	d5c7c79d87	Revert "[HUDI-4324] Remove use_jdbc config from hudi sync (#6072 )" (#6160 ) This reverts commit `046044c83d`.	2022-07-22 17:18:45 -07:00
Shiyan Xu	6b84384022	Revert "[MINOR] Fix CI issue with TestHiveSyncTool (#6110 )" (#6192 ) This reverts commit `d5c904e10e`.	2022-07-22 12:20:39 -07:00
Shiyan Xu	d5c904e10e	[MINOR] Fix CI issue with TestHiveSyncTool (#6110 )	2022-07-22 10:30:00 -05:00
Sivabalan Narayanan	3964c476e0	Fix file group count issue with metadata partitions (#5892 )	2022-07-18 07:19:29 +05:30
Shiyan Xu	046044c83d	[HUDI-4324] Remove use_jdbc config from hudi sync (#6072 ) * [HUDI-4324] Remove use_jdbc config from hudi sync * Users should use HIVE_SYNC_MODE instead	2022-07-10 11:16:09 +05:30
liujinhui	126b88b48d	[HUDI-2150] Rename/Restructure configs for better modularity (#6061 ) - Move clean related configuration to HoodieCleanConfig - Move Archival related configuration to HoodieArchivalConfig - hoodie.compaction.payload.class move this to HoodiePayloadConfig	2022-07-09 20:00:48 +05:30
xi chaomin	23c9c5c296	[HUDI-3836] Improve the way of fetching metadata partitions from table (#5286 ) Co-authored-by: xicm <xicm@asiainfo.com>	2022-07-05 07:50:17 -07:00
Y Ethan Guo	fbda4ad5bd	[HUDI-4360] Fix HoodieDropPartitionsTool based on refactored meta sync (#6043 )	2022-07-04 23:37:21 -07:00
YueZhang	45fdcf68a1	[HUDI-3116]Add a new HoodieDropPartitionsTool to let users drop table partitions through a standalone job. (#4459 ) Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2022-07-04 19:24:18 -07:00
Shiyan Xu	c0e1587966	[HUDI-3730] Improve meta sync class design and hierarchies (#5854 ) * [HUDI-3730] Improve meta sync class design and hierarchies (#5754) * Implements class design proposed in RFC-55 Co-authored-by: jian.feng <fengjian428@gmial.com> Co-authored-by: jian.feng <jian.feng@shopee.com>	2022-07-03 14:47:25 +05:30
bschell	fd7d25ab63	[HUDI-1176] Upgrade hudi to log4j2 (#5366 ) * Move to log4j2 cr: https://code.amazon.com/reviews/CR-71010705 * Upgrade unit tests to log4j2 * update exclusion Co-authored-by: Brandon Scheller <bschelle@amazon.com>	2022-06-28 12:54:23 -07:00
董可伦	7689e62cd9	[HUDI-4265] Deprecate useless targetTableName parameter in HoodieMultiTableDeltaStreamer (#5883 )	2022-06-17 16:57:14 +08:00
董可伦	c291b05699	[HUDI-4218] [HUDI-4218] Expose the real exception information when an exception occurs in the tableExists method (#5827 )	2022-06-15 18:10:35 +08:00
Qi Ji	4774c4248f	[HUDI-4006] failOnDataLoss on delta-streamer kafka sources (#5718 ) add new config key hoodie.deltastreamer.source.kafka.enable.failOnDataLoss when failOnDataLoss=false (current behaviour, the default), log a warning instead of seeking to earliest silently when failOnDataLoss is set, fail explicitly	2022-06-13 10:31:57 -04:00
luoyajun	0d859fe58b	[HUDI-3863] Add UT for drop partition column in deltastreamer testsuite (#5727 )	2022-06-13 10:29:32 -04:00
Shiyan Xu	5aaac21d1d	[HUDI-4224] Fix CI issues (#5842 ) - Upgrade junit to 5.7.2 - Downgrade surefire and failsafe to 2.22.2 - Fix test failures that were previously not reported - Improve azure pipeline configs Co-authored-by: liujinhui1994 <965147871@qq.com> Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com>	2022-06-12 11:44:18 -07:00
Sivabalan Narayanan	21b903fddb	[HUDI-4197] Fix Async indexer to support building FILES partition (#5766 ) - When async indexer is invoked only with "FILES" partition, it fails. Fixing it to work with Async indexer. Also, if metadata table itself is not initialized, and if someone is looking to build indexes via AsyncIndexer, first they are expected to index "FILES" partition followed by other partitions. In general, we have a limitation of building only one index at a time w/ AsyncIndexer and hence. Have added guards to ensure these conditions are met.	2022-06-06 15:47:11 -04:00
Qi Ji	7276d0eaa6	[HUDI-3670] free temp views in sql transformers (#5080 )	2022-06-01 07:35:40 -07:00
Kumud Kumar Srivatsava Tirupati	795a99ba73	[HUDI-4107] Added --sync-tool-classes config option in HoodieMultiTableDeltaStreamer (#5597 ) * added --sync-tool-classes config option in multitable delta streamer * added a testcase to assert if syncClientToolClassNames is getting picked to the deltastreamer execution context	2022-05-31 20:27:50 +05:30
wangxianghu	58014c147a	[HUDI-4160] Make database regex of MaxwellJsonKafkaSourcePostProcessor optional (#5697 )	2022-05-28 11:13:24 +04:00
Sagar Sumit	31e13db1f0	[HUDI-4023] Decouple hudi-spark from hudi-utilities-slim-bundle (#5641 )	2022-05-26 11:28:49 +05:30
Sivabalan Narayanan	10363c1412	[HUDI-4132] Fixing determining target table schema for delta sync with empty batch (#5648 )	2022-05-24 08:17:15 -04:00
Heap	47b764ec33	[HUDI-4134] Fix Method naming consistency issues in FSUtils (#5655 )	2022-05-23 15:28:48 -07:00
wangxianghu	2af98303d3	[HUDI-4122] Fix NPE caused by adding kafka nodes (#5632 )	2022-05-21 11:12:53 +08:00
Sivabalan Narayanan	7d02b1fd3c	[MINOR] Minor fixes to exception log and removing unwanted metrics flush in integ test (#5646 )	2022-05-21 07:27:35 +08:00
wqwl611	52e63b39d6	[HUDI-4097] add table info to jobStatus (#5529 ) Co-authored-by: wqwl611 <wqwl611@gmail.com>	2022-05-13 21:01:15 -04:00
Sivabalan Narayanan	5c4813f101	[HUDI-4072] Fix NULL schema for empty batches in deltastreamer (#5543 )	2022-05-13 17:56:47 +05:30
Sivabalan Narayanan	b10ca7e69f	[HUDI-4085] Fixing flakiness with parquet empty batch tests in TestHoodieDeltaStreamer (#5559 )	2022-05-11 16:02:54 -04:00
Sivabalan Narayanan	569a76a9a5	[MINOR] fixing flaky tests in deltastreamer tests (#5521 )	2022-05-07 15:37:20 -04:00
Sivabalan Narayanan	52fe1c9fae	[HUDI-3675] Adding post write termination strategy to deltastreamer continuous mode (#5073 ) - Added a postWriteTerminationStrategy to deltastreamer continuous mode. One can enable by setting the appropriate termination strategy using DeltastreamerConfig.postWriteTerminationStrategyClass. If not, continuous mode is expected to run forever. - Added one concrete impl for termination strategy as NoNewDataTerminationStrategy which shuts down deltastreamer if there is no new data to consume from source for N consecutive rounds.	2022-05-06 09:27:29 -04:00
qianchutao	d794f4fbf9	[MINOR] Optimize code logic (#5499 )	2022-05-05 09:33:06 -07:00
Y Ethan Guo	a1d82b4dc5	[MINOR] Fix CI by ignoring SparkContext error (#5468 ) Sets spark.driver.allowMultipleContexts = true when constructing Spark conf in UtilHelpers	2022-04-29 11:19:07 -07:00
watermelon12138	cacbd98687	[HUDI-3945] After the async compaction operation is complete, the task should exit. (#5391 ) Co-authored-by: y00617041 <yangxuan42@huawei.com>	2022-04-27 21:16:09 +08:00
Alexey Kudinkin	4b296f79cc	[HUDI-3935] Adding config to fallback to enabled Partition Values extraction from Partition path (#5377 )	2022-04-21 01:36:19 -07:00
Y Ethan Guo	28fdddfee0	[HUDI-3920] Fix partition path construction in metadata table validator (#5365 )	2022-04-19 19:40:09 -04:00
Sagar Sumit	4f44e6aeb5	[HUDI-3899] Drop index to delete pending index instants from timeline if applicable (#5342 ) Co-authored-by: sivabalan <n.siva.b@gmail.com>	2022-04-18 22:28:46 -04:00
Sagar Sumit	1718bcab84	[HUDI-3707] Fix target schema handling in HoodieSparkUtils while creating RDD (#5347 )	2022-04-18 13:34:04 -04:00
Sivabalan Narayanan	05dfc39c29	Fixing async clustering job test in TestHoodieDeltaStreamer (#5317 )	2022-04-18 17:38:33 +05:30
董可伦	b8e465fdfc	[MINOR] Fix typos in log4j-surefire.properties (#5212 )	2022-04-15 13:33:37 -07:00
Raymond Xu	9e8664f4d2	[HOTFIX] add missing license (#5322 ) (#5324 )	2022-04-14 12:35:20 -07:00
Vinoth Govindarajan	2d46d5287e	[HUDI-3838] Moved the getPartitionColumns logic to driver. (#5303 )	2022-04-12 18:03:00 -04:00
Vinoth Govindarajan	d16740976e	[HUDI-3838] Implemented drop partition column feature for delta streamer code path (#5294 ) * [HUDI-3838] Implemented drop partition column feature for delta streamer code path * Ensure drop partition table config is updated in hoodie.props Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>	2022-04-12 18:10:30 +05:30
Sagar Sumit	3d8fc78c66	[HUDI-3844] Update props in indexer based on table config (#5293 )	2022-04-11 18:16:06 -04:00
Y Ethan Guo	63a099c5b7	[HUDI-3847] Fix NPE due to null schema in HoodieMetadataTableValidator (#5284 )	2022-04-10 17:59:29 -07:00
Alexey Kudinkin	81b25c543a	[HUDI-3825] Fixing Column Stats Index updating sequence (#5267 )	2022-04-08 23:14:08 -07:00
Y Ethan Guo	cd2c346df6	[HUDI-3637] Exclude uncommitted log files from metadata table validation (#5234 )	2022-04-07 13:03:03 -07:00
Raymond Xu	e96f08f355	Moving to 0.12.0-SNAPSHOT on master branch.	2022-04-06 15:24:10 +08:00
Prashant Wason	b28f0d6ceb	[HUDI-3290] Different file formats for the partition metadata file. (#5179 ) * [HUDI-3290] Different file formats for the partition metadata file. Partition metadata files are stored in each partition to help identify the base path of a table. These files are saved in the properties file format. Some query engines do not work when non Parquet/ORC files are found in a partition. Added a new table config 'hoodie.partition.metafile.use.data.format' which when enabled (default false for backward compatibility) ensures that partition metafiles will be saved in the same format as the base files of a dataset. For new datasets, the config can be set via hudi-cli. Deltastreamer has a new parameter --partition-metafile-use-data-format which will create a table with this setting. * Code review comments - Adding a new command to migrate from text to base file formats for meta file. - Reimplementing readFromFS() to first read the text format, then base format - Avoid extra exists() checks in readFromFS() - Added unit tests, enabled parquet format across hoodie-hadoop-mr - Code cleanup, restructuring, naming consistency. * Wiring in all the other Spark code paths to respect this config - Turned on parquet meta format for COW data source tests - Removed the deltastreamer command line to keep it shorter * populate HoodiePartitionMetadata#format after readFromFS() Co-authored-by: Vinoth Chandar <vinoth@apache.org> Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>	2022-04-04 08:08:20 -07:00

1 2 3 4 5 ...

465 Commits