1
0

[HUDI-1757] Assigns the buckets by record key for Flink writer (#2757)

Currently we assign the buckets by record partition path which could
cause hotspot if the partition field is datetime type. Changes to assign
buckets by grouping the record whth their key first, the assignment is
valid if only there is no conflict(two task write to the same bucket).

This patch also changes the coordinator execution to be asynchronous.
This commit is contained in:
Danny Chan
2021-04-06 19:06:41 +08:00
committed by GitHub
parent 920537cac8
commit 9c369c607d
25 changed files with 638 additions and 400 deletions

View File

@@ -41,7 +41,6 @@ public class HoodieIndexUtils {
* Fetches Pair of partition path and {@link HoodieBaseFile}s for interested partitions.
*
* @param partition Partition of interest
* @param context Instance of {@link HoodieEngineContext} to use
* @param hoodieTable Instance of {@link HoodieTable} of interest
* @return the list of {@link HoodieBaseFile}
*/