lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
komao	8d2f009048	[HUDI-4124] Add valid check in Spark Datasource configs (#5637 ) Co-authored-by: wangzixuan.wzxuan <wangzixuan.wzxuan@bytedance.com>	2022-05-26 05:21:28 -07:00
Danny Chan	4e42ed5eae	[HUDI-4145] Archives the metadata file in HoodieInstant.State sequence (part2) (#5676 )	2022-05-26 11:21:39 +08:00
Sagar Sumit	cf837b4900	[HUDI-3193] Decouple hudi-aws from hudi-client-common (#5666 ) Move HoodieMetricsCloudWatchConfig to hudi-client-common	2022-05-25 19:38:56 +05:30
喻兆靖	c20db99a7b	[HUDI-2207] Support independent flink hudi clustering function	2022-05-24 20:16:48 +08:00
Danny Chan	eb219010d2	[HUDI-4145] Archives the metadata file in HoodieInstant.State sequence (#5669 )	2022-05-24 17:33:30 +08:00
Sivabalan Narayanan	c05ebf2417	[HUDI-2473] Fixing compaction write operation in commit metadata (#5203 )	2022-05-24 13:03:21 +05:30
Danny Chan	676d5cefe0	[HUDI-4138] Fix the concurrency modification of hoodie table config for flink (#5660 ) * Remove the metadata cleaning strategy for flink, that means the multi-modal index may be affected * Improve the HoodieTable#clearMetadataTablePartitionsConfig to only update table config when necessary * Remove the modification of read code path in HoodieTableConfig	2022-05-24 13:07:55 +08:00
Heap	47b764ec33	[HUDI-4134] Fix Method naming consistency issues in FSUtils (#5655 )	2022-05-23 15:28:48 -07:00
Danny Chan	c7576f7613	[HUDI-4130] Remove the upgrade/downgrade for flink #initTable (#5642 )	2022-05-20 21:31:23 +08:00
Danny Chan	6f37863ba8	[HUDI-4114] Remove the unnecessary fs view sync for BaseWriteClient#initTable (#5617 ) No need to #sync actively because the table instance is instantiated freshly, its view manager has empty fiew instantces, the fs view would be synced lazily when is it requested.	2022-05-19 10:59:05 +08:00
Danny Chan	f1f8a1abb7	[HUDI-4109] Copy the old record directly when it is chosen for merging (#5603 )	2022-05-18 10:17:00 +08:00
Danny Chan	ebbe56e862	[minor] Some code refactoring for LogFileComparator and Instant instantiation (#5600 )	2022-05-18 09:30:09 +08:00
Danny Chan	d52d13302d	[HUDI-4101] BucketIndexPartitioner should take partition path for better dispersion (#5590 )	2022-05-17 10:34:57 +08:00
Shawy Geng	ad773b3d96	[HUDI-3654] Preparations for hudi metastore. (#5572 ) * [HUDI-3654] Preparations for hudi metastore. Co-authored-by: gengxiaoyu <gengxiaoyu@bytedance.com>	2022-05-17 09:47:10 +08:00
Yuwei XIAO	61030d8e7a	[HUDI-3123] consistent hashing index: basic write path (upsert/insert) (#4480 ) 1. basic write path(insert/upsert) implementation 2. adapt simple bucket index	2022-05-16 11:07:01 +08:00
xi chaomin	6e16e719cd	[HUDI-3980] Suport kerberos hbase index (#5464 ) - Add configurations in HoodieHBaseIndexConfig.java to support kerberos hbase connection. Co-authored-by: xicm <xicm@asiainfo.com>	2022-05-14 07:37:31 -04:00
wqwl611	52e63b39d6	[HUDI-4097] add table info to jobStatus (#5529 ) Co-authored-by: wqwl611 <wqwl611@gmail.com>	2022-05-13 21:01:15 -04:00
Alexey Kudinkin	4a8589f222	[HUDI-4038] Avoid calling `getDataSize` after every record written (#5497 ) - getDataSize has non-trivial overhead in the current ParquetWriter impl, requiring traversal of already composed Column Groups in memory. Instead we can sample these calls to getDataSize to amortize its cost. Co-authored-by: sivabalan <n.siva.b@gmail.com>	2022-05-11 08:08:31 -04:00
guanziyue	abb4893b25	[HUDI-2875] Make HoodieParquetWriter Thread safe and memory executor exit gracefully (#4264 )	2022-05-05 13:49:34 -07:00
xicm	f492c52ee4	[HUDI-3862] Fix default configurations of HoodieHBaseIndexConfig (#5308 ) Co-authored-by: xicm <xicm@asiainfo.com>	2022-04-29 16:21:52 -07:00
LiChuang	4e928a6fe1	[HUDI-3943] Some description fixes for 0.10.1 docs (#5447 )	2022-04-28 15:18:56 -07:00
Danny Chan	e1ccf2e00b	[HUDI-3977] Flink hudi table with date type partition path throws HoodieNotSupportedException (#5432 )	2022-04-27 13:19:55 +08:00
Yuwei XIAO	f2ba0fead2	[HUDI-3085] Improve bulk insert partitioner abstraction (#4441 )	2022-04-25 18:42:17 +08:00
xiarixiaoyao	037f89ee7c	[HUDI-3921] Fixed schema evolution cannot work with HUDI-3855 (#5376 ) - when columns names are renamed (schema evolution enabled), while copying records from old data file with HoodieMergeHande, renamed columns wasn't handled well.	2022-04-21 18:27:54 -04:00
Sagar Sumit	de5fa1fe03	[HUDI-3940] Fix retry count increment in lock manager (#5387 )	2022-04-21 16:52:05 -04:00
Alexey Kudinkin	4b296f79cc	[HUDI-3935] Adding config to fallback to enabled Partition Values extraction from Partition path (#5377 )	2022-04-21 01:36:19 -07:00
Sivabalan Narayanan	a9506aa545	[HUDI-3938] Fix default value for num retries to acquire lock (#5380 )	2022-04-21 01:08:43 -07:00
Sagar Sumit	4f44e6aeb5	[HUDI-3899] Drop index to delete pending index instants from timeline if applicable (#5342 ) Co-authored-by: sivabalan <n.siva.b@gmail.com>	2022-04-18 22:28:46 -04:00
董可伦	b8e465fdfc	[MINOR] Fix typos in log4j-surefire.properties (#5212 )	2022-04-15 13:33:37 -07:00
Sivabalan Narayanan	57612c5c32	[HUDI-3848] Fixing restore with cleaned up commits (#5288 )	2022-04-15 14:47:53 -04:00
Y Ethan Guo	bab691692e	[HUDI-3686] Fix inline and async table service check in HoodieWriteConfig (#5307 )	2022-04-13 17:33:26 -04:00
Alexey Kudinkin	7b78dff45f	[HUDI-3855] Fixing `FILENAME_METADATA_FIELD` not being correctly updated in `HoodieMergeHandle` (#5296 ) Fixing FILENAME_METADATA_FIELD not being correctly updated in HoodieMergeHandle, in cases when old-record is carried over from existing file as is. - Revisited HoodieFileWriter API to accept HoodieKey instead of HoodieRecord - Fixed FILENAME_METADATA_FIELD not being overridden in cases when simply old record is carried over - Exposing standard JVM's debugger ports in Docker setup	2022-04-12 20:42:15 -04:00
Alexey Kudinkin	101b82a679	[HUDI-3839] Fixing incorrect selection of MT partitions to be updated (#5274 ) * Fixing incorrect selection of MT partitions to be updated * Ensure that metadata partitions table config is inherited correctly Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>	2022-04-12 13:37:52 +05:30
Sivabalan Narayanan	f91e9e63e1	[HUDI-3799] Fixing not deleting empty instants w/o archiving (#5261 )	2022-04-11 21:02:43 -07:00
Sagar Sumit	3d8fc78c66	[HUDI-3844] Update props in indexer based on table config (#5293 )	2022-04-11 18:16:06 -04:00
Sivabalan Narayanan	2245a9515f	[HUDI-3798] Fixing ending of a transaction by different owner and removing some extraneous methods in trxn manager (#5255 )	2022-04-11 10:16:07 +05:30
董可伦	15c264535f	[MINOR] Fix typos in the comments of HoodieMergeHandle (#5271 )	2022-04-09 17:51:58 -07:00
Y Ethan Guo	3e97c88c4f	[HUDI-3807] Add a new config to control the use of metadata index in HoodieBloomIndex (#5268 )	2022-04-09 15:30:11 -04:00
Alexey Kudinkin	81b25c543a	[HUDI-3825] Fixing Column Stats Index updating sequence (#5267 )	2022-04-08 23:14:08 -07:00
Alexey Kudinkin	d7cc767dbc	[HUDI-3825] Fixing non-partitioned table Partition Records persistence in MT (#5259 ) * Filter out empty string (for non-partitioned table) being added to "__all_partitions__" record * Instead of filtering, transform empty partition-id to `NON_PARTITIONED_NAME` * Cleaned up `HoodieBackedTableMetadataWriter` * Make sure REPLACE_COMMITS are handled as well	2022-04-08 15:58:31 +05:30
Y Ethan Guo	9d744bb35c	[HUDI-3805] Delete existing corrupted requested rollback plan during rollback (#5245 )	2022-04-07 15:32:34 +05:30
Alexey Kudinkin	9e87d164b3	[HUDI-3760] Adding capability to fetch Metadata Records by prefix (#5208 ) - Adding capability to fetch Metadata Records by key prefix so that Data Skipping could fetch only Column Stats - Index records pertaining to the columns being queried by, instead of reading out whole Index. - Fixed usages of HFileScanner in HFileReader. few code paths uses cached scanner if available. Other code paths uses its own HFileScanner w/ positional read. Brief change log - Rebasing ColumnStatsIndexSupport to rely on HoodieBackedTableMetadata in lieu of reading t/h Spark DS - Adding methods enabling key-prefix lookups to HoodiFileReader, HoodieHFileReader - Wiring key-prefix lookup t/h LogRecordScanner impls - Cleaning up HoodieHFileReader impl Co-authored-by: sivabalan <n.siva.b@gmail.com> Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>	2022-04-06 12:11:08 -04:00
Sivabalan Narayanan	8683fb1d49	[HUDI-3800] Fixed preserve commit metadata for compaction for untouched records (#5232 )	2022-04-06 13:26:53 +05:30
Raymond Xu	e96f08f355	Moving to 0.12.0-SNAPSHOT on master branch.	2022-04-06 15:24:10 +08:00
Sagar Sumit	898be6174a	[HUDI-3782] Fixing table config when any of the index is disabled (#5222 )	2022-04-05 23:06:52 -04:00
Prashant Wason	b28f0d6ceb	[HUDI-3290] Different file formats for the partition metadata file. (#5179 ) * [HUDI-3290] Different file formats for the partition metadata file. Partition metadata files are stored in each partition to help identify the base path of a table. These files are saved in the properties file format. Some query engines do not work when non Parquet/ORC files are found in a partition. Added a new table config 'hoodie.partition.metafile.use.data.format' which when enabled (default false for backward compatibility) ensures that partition metafiles will be saved in the same format as the base files of a dataset. For new datasets, the config can be set via hudi-cli. Deltastreamer has a new parameter --partition-metafile-use-data-format which will create a table with this setting. * Code review comments - Adding a new command to migrate from text to base file formats for meta file. - Reimplementing readFromFS() to first read the text format, then base format - Avoid extra exists() checks in readFromFS() - Added unit tests, enabled parquet format across hoodie-hadoop-mr - Code cleanup, restructuring, naming consistency. * Wiring in all the other Spark code paths to respect this config - Turned on parquet meta format for COW data source tests - Removed the deltastreamer command line to keep it shorter * populate HoodiePartitionMetadata#format after readFromFS() Co-authored-by: Vinoth Chandar <vinoth@apache.org> Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>	2022-04-04 08:08:20 -07:00
Y Ethan Guo	c34eb07598	[MINOR] Reuse deleteMetadataTable for disabling metadata table (#5217 )	2022-04-03 16:12:14 +05:30
Sivabalan Narayanan	84064a9b08	[HUDI-3772] Fixing auto adjustment of lock configs for deltastreamer (#5207 )	2022-04-02 23:44:10 -07:00
Alexey Kudinkin	cc3737be50	[HUDI-3664] Fixing Column Stats Index composition (#5181 ) Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>	2022-04-02 17:15:52 -07:00
Sagar Sumit	74eb09be9b	[HUDI-3776] Fix BloomIndex incorrectly using ColStats to lookup records locations (#5213 )	2022-04-02 18:22:57 -04:00

1 2 3 4 5 ...

466 Commits