[HUDI-1918] Fix incorrect keyBy field cause serious data skew, to avoid multiple subtasks write to a partition at the same time (#2972)
This commit is contained in:
@@ -88,8 +88,8 @@ public class HoodieFlinkStreamer {
|
|||||||
.name("kafka_source")
|
.name("kafka_source")
|
||||||
.uid("uid_kafka_source")
|
.uid("uid_kafka_source")
|
||||||
.map(new RowDataToHoodieFunction<>(rowType, conf), TypeInformation.of(HoodieRecord.class))
|
.map(new RowDataToHoodieFunction<>(rowType, conf), TypeInformation.of(HoodieRecord.class))
|
||||||
// Key-by partition path, to avoid multiple subtasks write to a partition at the same time
|
// Key-by record key, to avoid multiple subtasks write to a partition at the same time
|
||||||
.keyBy(HoodieRecord::getPartitionPath)
|
.keyBy(HoodieRecord::getRecordKey)
|
||||||
.transform(
|
.transform(
|
||||||
"bucket_assigner",
|
"bucket_assigner",
|
||||||
TypeInformation.of(HoodieRecord.class),
|
TypeInformation.of(HoodieRecord.class),
|
||||||
|
|||||||
Reference in New Issue
Block a user