lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Sivabalan Narayanan	2e0dd29714	[HUDI-4204] Fixing NPE with row writer path and with OCC (#5850 )	2022-07-21 15:57:34 -07:00
Alexey Kudinkin	a33bdd32e3	[HUDI-3993] Replacing UDF in Bulk Insert w/ RDD transformation (#5470 )	2022-07-21 06:20:47 -07:00
冯健	382d19e85b	[HUDI-4065] Add FileBasedLockProvider (#6071 )	2022-07-19 07:52:47 +08:00
liujinhui	1959b843b7	[HUDI-4409] Improve LockManager wait logic when catch exception (#6122 )	2022-07-18 22:45:52 +08:00
Alexey Kudinkin	4bda6afe0b	[HUDI-4249] Fixing in-memory `HoodieData` implementation to operate lazily (#5855 )	2022-07-16 18:26:48 -05:00
Danny Chan	05606708fa	[HUDI-4393] Add marker file for target file when flink merge handle rolls over (#6103 )	2022-07-14 16:00:08 +08:00
liujinhui	126b88b48d	[HUDI-2150] Rename/Restructure configs for better modularity (#6061 ) - Move clean related configuration to HoodieCleanConfig - Move Archival related configuration to HoodieArchivalConfig - hoodie.compaction.payload.class move this to HoodiePayloadConfig	2022-07-09 20:00:48 +05:30
xiarixiaoyao	b686c07407	[HUDI-4276] Reconcile schema-inject null values for missing fields and add new fields (#6017 ) * [HUDI-4276] Reconcile schema-inject null values for missing fields and add new fields. * fix comments Co-authored-by: public (bdcee5037027) <mengtao0326@qq.com>	2022-07-09 03:08:38 +08:00
xi chaomin	23c9c5c296	[HUDI-3836] Improve the way of fetching metadata partitions from table (#5286 ) Co-authored-by: xicm <xicm@asiainfo.com>	2022-07-05 07:50:17 -07:00
bschell	fd7d25ab63	[HUDI-1176] Upgrade hudi to log4j2 (#5366 ) * Move to log4j2 cr: https://code.amazon.com/reviews/CR-71010705 * Upgrade unit tests to log4j2 * update exclusion Co-authored-by: Brandon Scheller <bschelle@amazon.com>	2022-06-28 12:54:23 -07:00
Alexey Kudinkin	c86edfc28e	[HUDI-4319] Fixed Parquet's `PLAIN_DICTIONARY` encoding not being applied when bulk-inserting (#5966 ) * Fixed Dictionary encoding config not being properly propagated to Parquet writer (making it unable to apply it, substantially bloating the storage footprint)	2022-06-24 23:52:28 -04:00
Zhaojing Yu	6456bd3a51	[HUDI-4273] Support inline schedule clustering for Flink stream (#5890 ) * [HUDI-4273] Support inline schedule clustering for Flink stream * delete deprecated clustering plan strategy and add clustering ITTest	2022-06-24 11:28:06 +08:00
Zhaojing Yu	c7e430bb46	Revert master (#5925 ) * Revert "udate" This reverts commit `092e35c1e3`. * Revert "[HUDI-3475] Initialize hudi table management module." This reverts commit `4640a3bbb8`.	2022-06-21 16:58:50 +08:00
喻兆靖	4640a3bbb8	[HUDI-3475] Initialize hudi table management module.	2022-06-21 15:21:30 +08:00
huberylee	d4f0326b4b	[HUDI-4275] Refactor rollback inflight instant for clustering/compaction to reuse some code (#5894 )	2022-06-20 14:29:21 +08:00
Danny Chan	0811bb38fb	[HUDI-4255] Make the flink merge and replace handle intermediate file visible (#5866 )	2022-06-15 14:23:23 +08:00
Danny Chan	25bbff64cf	[minor] Following HUDI-4207, remote the new wrapper #init method (#5865 )	2022-06-15 08:48:13 +08:00
HunterXHunter	264b15df87	[HUDI-4207] HoodieFlinkWriteClient.getOrCreateWriteHandle throws an e… (#5788 ) Adding more logs to assist in debugging with HoodieFlinkWriteClient.getOrCreateWriteHandle throwing exception	2022-06-13 10:36:06 -04:00
xi chaomin	e89f5627e4	[HUDI-3682] testReaderFilterRowKeys fails in TestHoodieOrcReaderWriter (#5790 ) TestReaderFilterRowKeys needs to get the key from RECORD_KEY_METADATA_FIELD, but the writer in current UT does not populate the meta field and the schema does not contains meta fields. This fix writes data with schema which contains meta fields and calls writeAvroWithMetadata for writing. Co-authored-by: xicm <xicm@asiainfo.com>	2022-06-13 10:22:12 -04:00
Alexey Kudinkin	35afdb4316	[HUDI-4178] Addressing performance regressions in Spark DataSourceV2 Integration (#5737 ) There are multiple issues with our current DataSource V2 integrations: b/c we advertise Hudi tables as V2, Spark expects it to implement certain APIs which are not implemented at the moment, instead we're using custom Resolution rule (in HoodieSpark3Analysis) to instead manually fallback to V1 APIs. This commit fixes the issue by reverting DSv2 APIs and making Spark use V1, except for schema evaluation logic.	2022-06-07 16:30:46 -07:00
Sivabalan Narayanan	f85cd9b16d	[HUDI-4200] Fixing sorting of keys fetched from metadata table (#5773 ) - Key fetched from metadata table especially from base file reader is not sorted. and hence may result in throwing NPE (key prefix search) or unnecessary seeks to starting of Hfile (full key look ups). Fixing the same in this patch. This is not an issue with log blocks, since sorting is taking care within HoodieHfileDataBlock. - Commit where the sorting was mistakenly reverted [HUDI-3760] Adding capability to fetch Metadata Records by prefix #5208	2022-06-07 08:19:52 -04:00
Sivabalan Narayanan	21b903fddb	[HUDI-4197] Fix Async indexer to support building FILES partition (#5766 ) - When async indexer is invoked only with "FILES" partition, it fails. Fixing it to work with Async indexer. Also, if metadata table itself is not initialized, and if someone is looking to build indexes via AsyncIndexer, first they are expected to index "FILES" partition followed by other partitions. In general, we have a limitation of building only one index at a time w/ AsyncIndexer and hence. Have added guards to ensure these conditions are met.	2022-06-06 15:47:11 -04:00
Sivabalan Narayanan	4f6fc726d0	[HUDI-4140] Fixing hive style partitioning and default partition with bulk insert row writer with SimpleKeyGen and virtual keys (#5664 ) Bulk insert row writer code path had a gap wrt hive style partitioning and default partition when virtual keys are enabled with SimpleKeyGen. This patch fixes the issue.	2022-06-06 10:21:00 -07:00
marchpure	73b0be3c96	[HUDI-4192] HoodieHFileReader scan top cells after bottom cells throw NullPointerException (#5755 ) SeekTo top cells avoid NullPointerException	2022-06-06 12:07:26 +08:00
Danny Chan	7f8630cc57	[HUDI-4167] Remove the timeline refresh with initializing hoodie table (#5716 ) The timeline refresh on table initialization invokes the fs view #sync, which has two actions now: 1. reload the timeline of the fs view, so that the next fs view request is based on this timeline metadata 2. if this is a local fs view, clear all the local states; if this is a remote fs view, send request to sync the remote fs view But, let's see the construction, the meta client is instantiated freshly so the timeline is already the latest, the table is also constructed freshly, so the fs view has no local states, that means, the #sync is unnecessary totally. In this patch, the metadata lifecycle and data set fs view are kept in sync, when the fs view is refreshed, the underneath metadata is also refreshed synchronouly. The freshness of the metadata follows the same rules as data fs view: 1. if the fs view is local, the visibility is based on the client table metadata client's latest commit 2. if the fs view is remote, the timeline server would #sync the fs view and metadata together based on the lagging server local timeline From the perspective of client, no need to care about the refresh action anymore no matter whether the metadata table is enabled or not. That make the client logic more clear and less error-prone. Removes the timeline refresh has another benefit: if avoids unncecessary #refresh of the remote fs view, if all the clients send request to #sync the remote fs view, the server would encounter conflicts and the client encounters a response error.	2022-06-02 09:48:48 +08:00
Danny Chan	329da34ee0	[HUDI-4163] Catch general exception instead of IOException while fetching rollback plan during rollback (#5703 ) If the avro file is corrupted, an InvalidAvroMagicException throws.	2022-05-30 13:08:02 +08:00
苏承祥	7e86884604	[HUDI-4086] Use CustomizedThreadFactory in async compaction and clustering (#5563 ) Co-authored-by: 苏承祥 <sucx@tuya.com>	2022-05-28 22:35:47 -07:00
komao	8d2f009048	[HUDI-4124] Add valid check in Spark Datasource configs (#5637 ) Co-authored-by: wangzixuan.wzxuan <wangzixuan.wzxuan@bytedance.com>	2022-05-26 05:21:28 -07:00
Danny Chan	4e42ed5eae	[HUDI-4145] Archives the metadata file in HoodieInstant.State sequence (part2) (#5676 )	2022-05-26 11:21:39 +08:00
Sagar Sumit	cf837b4900	[HUDI-3193] Decouple hudi-aws from hudi-client-common (#5666 ) Move HoodieMetricsCloudWatchConfig to hudi-client-common	2022-05-25 19:38:56 +05:30
喻兆靖	c20db99a7b	[HUDI-2207] Support independent flink hudi clustering function	2022-05-24 20:16:48 +08:00
Danny Chan	eb219010d2	[HUDI-4145] Archives the metadata file in HoodieInstant.State sequence (#5669 )	2022-05-24 17:33:30 +08:00
Sivabalan Narayanan	c05ebf2417	[HUDI-2473] Fixing compaction write operation in commit metadata (#5203 )	2022-05-24 13:03:21 +05:30
Danny Chan	676d5cefe0	[HUDI-4138] Fix the concurrency modification of hoodie table config for flink (#5660 ) * Remove the metadata cleaning strategy for flink, that means the multi-modal index may be affected * Improve the HoodieTable#clearMetadataTablePartitionsConfig to only update table config when necessary * Remove the modification of read code path in HoodieTableConfig	2022-05-24 13:07:55 +08:00
Heap	47b764ec33	[HUDI-4134] Fix Method naming consistency issues in FSUtils (#5655 )	2022-05-23 15:28:48 -07:00
Danny Chan	c7576f7613	[HUDI-4130] Remove the upgrade/downgrade for flink #initTable (#5642 )	2022-05-20 21:31:23 +08:00
Danny Chan	6f37863ba8	[HUDI-4114] Remove the unnecessary fs view sync for BaseWriteClient#initTable (#5617 ) No need to #sync actively because the table instance is instantiated freshly, its view manager has empty fiew instantces, the fs view would be synced lazily when is it requested.	2022-05-19 10:59:05 +08:00
Danny Chan	f1f8a1abb7	[HUDI-4109] Copy the old record directly when it is chosen for merging (#5603 )	2022-05-18 10:17:00 +08:00
Danny Chan	ebbe56e862	[minor] Some code refactoring for LogFileComparator and Instant instantiation (#5600 )	2022-05-18 09:30:09 +08:00
Danny Chan	d52d13302d	[HUDI-4101] BucketIndexPartitioner should take partition path for better dispersion (#5590 )	2022-05-17 10:34:57 +08:00
Shawy Geng	ad773b3d96	[HUDI-3654] Preparations for hudi metastore. (#5572 ) * [HUDI-3654] Preparations for hudi metastore. Co-authored-by: gengxiaoyu <gengxiaoyu@bytedance.com>	2022-05-17 09:47:10 +08:00
Yuwei XIAO	61030d8e7a	[HUDI-3123] consistent hashing index: basic write path (upsert/insert) (#4480 ) 1. basic write path(insert/upsert) implementation 2. adapt simple bucket index	2022-05-16 11:07:01 +08:00
xi chaomin	6e16e719cd	[HUDI-3980] Suport kerberos hbase index (#5464 ) - Add configurations in HoodieHBaseIndexConfig.java to support kerberos hbase connection. Co-authored-by: xicm <xicm@asiainfo.com>	2022-05-14 07:37:31 -04:00
wqwl611	52e63b39d6	[HUDI-4097] add table info to jobStatus (#5529 ) Co-authored-by: wqwl611 <wqwl611@gmail.com>	2022-05-13 21:01:15 -04:00
Alexey Kudinkin	4a8589f222	[HUDI-4038] Avoid calling `getDataSize` after every record written (#5497 ) - getDataSize has non-trivial overhead in the current ParquetWriter impl, requiring traversal of already composed Column Groups in memory. Instead we can sample these calls to getDataSize to amortize its cost. Co-authored-by: sivabalan <n.siva.b@gmail.com>	2022-05-11 08:08:31 -04:00
guanziyue	abb4893b25	[HUDI-2875] Make HoodieParquetWriter Thread safe and memory executor exit gracefully (#4264 )	2022-05-05 13:49:34 -07:00
xicm	f492c52ee4	[HUDI-3862] Fix default configurations of HoodieHBaseIndexConfig (#5308 ) Co-authored-by: xicm <xicm@asiainfo.com>	2022-04-29 16:21:52 -07:00
LiChuang	4e928a6fe1	[HUDI-3943] Some description fixes for 0.10.1 docs (#5447 )	2022-04-28 15:18:56 -07:00
Danny Chan	e1ccf2e00b	[HUDI-3977] Flink hudi table with date type partition path throws HoodieNotSupportedException (#5432 )	2022-04-27 13:19:55 +08:00
Yuwei XIAO	f2ba0fead2	[HUDI-3085] Improve bulk insert partitioner abstraction (#4441 )	2022-04-25 18:42:17 +08:00

1 2 3 4 5 ...

493 Commits