lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Sivabalan Narayanan	b00d03fd62	[HUDI-3886] Adding default null for some of the fields in col stats in MDT schema (#5329 )	2022-04-18 10:37:03 -04:00
Sivabalan Narayanan	05dfc39c29	Fixing async clustering job test in TestHoodieDeltaStreamer (#5317 )	2022-04-18 17:38:33 +05:30
董可伦	b8e465fdfc	[MINOR] Fix typos in log4j-surefire.properties (#5212 )	2022-04-15 13:33:37 -07:00
董可伦	99dd1cb6e6	[HUDI-3835] Add UT for delete in java client (#5270 )	2022-04-15 15:03:48 -04:00
Sivabalan Narayanan	e8ab915aff	[MINOR] Removing invalid code to close parquet reader iterator (#5182 )	2022-04-15 14:50:07 -04:00
Sivabalan Narayanan	57612c5c32	[HUDI-3848] Fixing restore with cleaned up commits (#5288 )	2022-04-15 14:47:53 -04:00
Raymond Xu	9e8664f4d2	[HOTFIX] add missing license (#5322 ) (#5324 )	2022-04-14 12:35:20 -07:00
Raymond Xu	d6a64f765e	Revert "[HUDI-3652] Make ObjectSizeCalculator threadlocal to reduce memory footprint (#5060 )" (#5323 ) This reverts commit `f0ab4a6e9e`.	2022-04-14 12:28:27 -07:00
sekaiga	f0ab4a6e9e	[HUDI-3652] Make ObjectSizeCalculator threadlocal to reduce memory footprint (#5060 ) Co-authored-by: zhouhuidong <zhouhuidong@bilibili.co>	2022-04-14 03:08:14 -07:00
ForwardXu	6621f3cdbb	[HUDI-3845] Fix delete mor table's partition with urlencode's error (#5282 )	2022-04-14 01:49:00 -07:00
ForwardXu	44b3630b5d	[HUDI-3826] Make truncate partition use delete_partition operation (#5272 ) Make truncate partition and drop partition behave as drop partition with purge, which delete all records via Hudi DELETE_PARTITION; partition removed from metastore	2022-04-14 00:53:05 -07:00
Sivabalan Narayanan	a081c2b9b5	[HUDI-3876] Fixing fetching partitions in GlueSyncClient (#5318 )	2022-04-13 21:03:05 -07:00
Y Ethan Guo	571cbe4c11	[MINOR] Code cleanup in test utils (#5312 )	2022-04-13 17:37:07 -04:00
Y Ethan Guo	bab691692e	[HUDI-3686] Fix inline and async table service check in HoodieWriteConfig (#5307 )	2022-04-13 17:33:26 -04:00
Y Ethan Guo	c7f41f9018	[HUDI-3869] Improve error handling of loading Hudi conf (#5311 )	2022-04-13 17:25:31 -04:00
Danny Chan	6f9b02decb	[HUDI-3870] Add timeout rollback for flink online compaction (#5314 )	2022-04-13 20:05:48 +08:00
Danny Chan	0281725c6b	[MINOR] Inline the partition path logic into the builder (#5310 )	2022-04-13 16:54:39 +05:30
Danny Chan	43de2b4702	[HUDI-3868] Disable the sort input for flink streaming append mode (#5309 )	2022-04-13 14:21:08 +08:00
Alexey Kudinkin	434e782b7d	[HUDI-3867] Disable Data Skipping by default (#5306 )	2022-04-13 11:21:12 +05:30
Alexey Kudinkin	7b78dff45f	[HUDI-3855] Fixing `FILENAME_METADATA_FIELD` not being correctly updated in `HoodieMergeHandle` (#5296 ) Fixing FILENAME_METADATA_FIELD not being correctly updated in HoodieMergeHandle, in cases when old-record is carried over from existing file as is. - Revisited HoodieFileWriter API to accept HoodieKey instead of HoodieRecord - Fixed FILENAME_METADATA_FIELD not being overridden in cases when simply old record is carried over - Exposing standard JVM's debugger ports in Docker setup	2022-04-12 20:42:15 -04:00
Raymond Xu	2e6e302efe	[HUDI-3859] Fix spark profiles and utilities-slim dep (#5297 )	2022-04-12 15:33:08 -07:00
Vinoth Govindarajan	2d46d5287e	[HUDI-3838] Moved the getPartitionColumns logic to driver. (#5303 )	2022-04-12 18:03:00 -04:00
satishm	25dce94ba2	[MINOR] Integ Test Reducing partitions for log running multi partition yaml (#5300 )	2022-04-12 12:15:17 -04:00
Raymond Xu	84783b9779	[HUDI-3843] Make flink profiles build with scala-2.11 (#5279 )	2022-04-12 08:33:48 -07:00
Vinoth Govindarajan	d16740976e	[HUDI-3838] Implemented drop partition column feature for delta streamer code path (#5294 ) * [HUDI-3838] Implemented drop partition column feature for delta streamer code path * Ensure drop partition table config is updated in hoodie.props Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>	2022-04-12 18:10:30 +05:30
Alexey Kudinkin	101b82a679	[HUDI-3839] Fixing incorrect selection of MT partitions to be updated (#5274 ) * Fixing incorrect selection of MT partitions to be updated * Ensure that metadata partitions table config is inherited correctly Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>	2022-04-12 13:37:52 +05:30
Sivabalan Narayanan	f91e9e63e1	[HUDI-3799] Fixing not deleting empty instants w/o archiving (#5261 )	2022-04-11 21:02:43 -07:00
Sagar Sumit	3d8fc78c66	[HUDI-3844] Update props in indexer based on table config (#5293 )	2022-04-11 18:16:06 -04:00
Alexey Kudinkin	458fdd5611	[HUDI-3841] Fixing Column Stats in the presence of Schema Evolution (#5275 ) Currently, Data Skipping is not handling correctly the case when column-stats are not aligned and, for ex, some of the (column, file) combinations are missing from the CSI. This could occur in different scenarios (schema evolution, CSI config changes), and has to be handled properly when we're composing CSI projection for Data Skipping. This PR addresses that. - Added appropriate aligning for the transposed CSI projection	2022-04-11 15:45:53 -04:00
Sivabalan Narayanan	52ea1e4964	[MINOR] fixing timeline server for integ tests (#5289 )	2022-04-11 10:14:51 -04:00
RexXiong	5c41e30ac5	[HUDI-3817] shade parquet dependency for hudi-hadoop-mr-bundle (#5250 ) Co-authored-by: lvshuang.xjs <lvshuang.xjs@alibaba-inc.com>	2022-04-11 05:44:46 -07:00
Sivabalan Narayanan	2245a9515f	[HUDI-3798] Fixing ending of a transaction by different owner and removing some extraneous methods in trxn manager (#5255 )	2022-04-11 10:16:07 +05:30
Y Ethan Guo	63a099c5b7	[HUDI-3847] Fix NPE due to null schema in HoodieMetadataTableValidator (#5284 )	2022-04-10 17:59:29 -07:00
Sivabalan Narayanan	12731f5b89	[HUDI-3842] Integ tests for non partitioned datasets (#5276 ) - Adding non-partitioned support to integ tests - Fixing some of the test yamls and properties	2022-04-10 20:09:48 -04:00
Alexey Kudinkin	976840e8eb	[HUDI-3812] Fixing Data Skipping configuration to respect Metadata Table configs (#5244 ) Addressing the problem of Data Skipping not respecting Metadata Table configs which might differ b/w write/read paths. More details could be found in HUDI-3812. - Fixing Data Skipping configuration to respect MT configs (on the Read path) - Tightening up DS handling of cases when no top-level columns are in the target query - Enhancing tests to cover all possible case	2022-04-10 13:43:47 -04:00
Alexey Kudinkin	7a9d48d126	[HUDI-3834] Fixing performance hits in reading Column Stats Index (#5266 ) Fixing performance hits in reading Column Stats Index: [HUDI-3834] There's substantial performance degradation in Avro 1.10 default generated Builder classes: they by default rely on SpecificData.getForSchema that load corresponding model's class using reflection, which takes a hit when executed on the hot-path (this was bringing overall runtime to read full Column Stats Index of 800k records to 60s, whereas now it's taking mere 3s) Addressing memory churn by over-used Hadoop's Path creation: Path ctor is not a lightweight sequence and produces quite a bit of memory churn adding pressure on GC. Cleaning such avoidable allocations up to make sure there's no unnecessarily added pressure on GC.	2022-04-10 13:42:06 -04:00
董可伦	15c264535f	[MINOR] Fix typos in the comments of HoodieMergeHandle (#5271 )	2022-04-09 17:51:58 -07:00
Y Ethan Guo	3e97c88c4f	[HUDI-3807] Add a new config to control the use of metadata index in HoodieBloomIndex (#5268 )	2022-04-09 15:30:11 -04:00
Raymond Xu	5e65aefc61	[HUDI-3837] Fix license and rat check settings (#5273 ) - add missing licenses - fix CI setting to run rat plugin - fix deploy script to include integ test modules	2022-04-09 11:01:18 -07:00
Alexey Kudinkin	81b25c543a	[HUDI-3825] Fixing Column Stats Index updating sequence (#5267 )	2022-04-08 23:14:08 -07:00
Y Ethan Guo	1cc7542357	[MINOR] Update README of docker build setup (#5256 )	2022-04-08 16:12:25 -07:00
satishm	26eb7b8183	[HUDI-3571] Spark datasource continuous checkpoint should have own fs variable (#5265 )	2022-04-08 07:16:01 -04:00
Alexey Kudinkin	d7cc767dbc	[HUDI-3825] Fixing non-partitioned table Partition Records persistence in MT (#5259 ) * Filter out empty string (for non-partitioned table) being added to "__all_partitions__" record * Instead of filtering, transform empty partition-id to `NON_PARTITIONED_NAME` * Cleaned up `HoodieBackedTableMetadataWriter` * Make sure REPLACE_COMMITS are handled as well	2022-04-08 15:58:31 +05:30
Danny Chan	67215abaf0	[HUDI-3827] Promote the inetAddress picking strategy for NetworkUtils#getHostname (#5260 )	2022-04-08 14:33:56 +08:00
KnightChess	7a6272fba1	[HUDI-3781] fix spark delete sql can not delete record (#5215 )	2022-04-08 14:26:40 +08:00
Sagar Sumit	df87095ef0	[HUDI-3454] Fix partition name in all code paths for LogRecordScanner (#5252 ) * Depend on FSUtils#getRelativePartitionPath(basePath, logFilePath.getParent) to get the partition. * If the list of log file paths in the split is empty, then fallback to usual behaviour.	2022-04-08 09:59:36 +05:30
Y Ethan Guo	672974c412	[HUDI-3823] Fix hudi-hive-sync-bundle to include HBase dependencies and shading (#5257 )	2022-04-07 17:30:33 -07:00
Sivabalan Narayanan	ef06e4a526	[HUDI-3810] Fixing lazy read for metadata log record readers (#5241 )	2022-04-07 18:40:51 -04:00
Y Ethan Guo	cd2c346df6	[HUDI-3637] Exclude uncommitted log files from metadata table validation (#5234 )	2022-04-07 13:03:03 -07:00
Sivabalan Narayanan	b3c834a242	[HUDI-3571] Spark datasource continuous ingestion tool (#5156 )	2022-04-07 14:13:46 -04:00

... 2 3 4 5 6 ...

2923 Commits