[HUDI-993] Let delete API use "hoodie.delete.shuffle.parallelism" (#1703)
For Delete API, "hoodie.delete.shuffle.parallelism" isn't used as opposed to "hoodie.upsert.shuffle.parallelism" is used for upsert, this creates the performance difference between delete by upsert API with "EmptyHoodieRecordPayload" and delete API for certain cases. This patch makes the following fixes in this regard. - Let deduplicateKeys method use "hoodie.delete.shuffle.parallelism" - Repartition inputRDD as "hoodie.delete.shuffle.parallelism" in case "hoodie.combine.before.delete=false"
This commit is contained in:
@@ -88,7 +88,8 @@ public class HoodieWriteClientExample {
|
||||
|
||||
// Create the write client to write some records in
|
||||
HoodieWriteConfig cfg = HoodieWriteConfig.newBuilder().withPath(tablePath)
|
||||
.withSchema(HoodieExampleDataGenerator.TRIP_EXAMPLE_SCHEMA).withParallelism(2, 2).forTable(tableName)
|
||||
.withSchema(HoodieExampleDataGenerator.TRIP_EXAMPLE_SCHEMA).withParallelism(2, 2)
|
||||
.withDeleteParallelism(2).forTable(tableName)
|
||||
.withIndexConfig(HoodieIndexConfig.newBuilder().withIndexType(HoodieIndex.IndexType.BLOOM).build())
|
||||
.withCompactionConfig(HoodieCompactionConfig.newBuilder().archiveCommitsWith(20, 30).build()).build();
|
||||
HoodieWriteClient<HoodieAvroPayload> client = new HoodieWriteClient<>(jsc, cfg);
|
||||
|
||||
Reference in New Issue
Block a user