[HUDI-1757] Assigns the buckets by record key for Flink writer (#2757)
Currently we assign the buckets by record partition path which could cause hotspot if the partition field is datetime type. Changes to assign buckets by grouping the record whth their key first, the assignment is valid if only there is no conflict(two task write to the same bucket). This patch also changes the coordinator execution to be asynchronous.
This commit is contained in:
@@ -260,11 +260,17 @@ public class HoodieFlinkWriteClient<T extends HoodieRecordPayload> extends
|
||||
* but cleaning action should trigger after all the write actions within a
|
||||
* checkpoint finish.
|
||||
*
|
||||
* @param instantTime The latest successful commit time
|
||||
* @param table Table to commit on
|
||||
* @param metadata Commit Metadata corresponding to committed instant
|
||||
* @param instantTime Instant Time
|
||||
* @param extraMetadata Additional Metadata passed by user
|
||||
*/
|
||||
public void postCommit(String instantTime) {
|
||||
@Override
|
||||
protected void postCommit(HoodieTable<T, List<HoodieRecord<T>>, List<HoodieKey>, List<WriteStatus>> table,
|
||||
HoodieCommitMetadata metadata,
|
||||
String instantTime,
|
||||
Option<Map<String, String>> extraMetadata) {
|
||||
try {
|
||||
HoodieTable<?, ?, ?, ?> table = createTable(config, hadoopConf);
|
||||
// Delete the marker directory for the instant.
|
||||
new MarkerFiles(createTable(config, hadoopConf), instantTime)
|
||||
.quietDeleteMarkerDir(context, config.getMarkersDeleteParallelism());
|
||||
|
||||
Reference in New Issue
Block a user