[HUDI-993] Let delete API use "hoodie.delete.shuffle.parallelism" (#1703)
For Delete API, "hoodie.delete.shuffle.parallelism" isn't used as opposed to "hoodie.upsert.shuffle.parallelism" is used for upsert, this creates the performance difference between delete by upsert API with "EmptyHoodieRecordPayload" and delete API for certain cases. This patch makes the following fixes in this regard. - Let deduplicateKeys method use "hoodie.delete.shuffle.parallelism" - Repartition inputRDD as "hoodie.delete.shuffle.parallelism" in case "hoodie.combine.before.delete=false"
This commit is contained in:
@@ -111,6 +111,7 @@ public class TestHoodieSnapshotExporter extends FunctionalTestHarness {
|
||||
.withSchema(HoodieTestDataGenerator.TRIP_EXAMPLE_SCHEMA)
|
||||
.withParallelism(2, 2)
|
||||
.withBulkInsertParallelism(2)
|
||||
.withDeleteParallelism(2)
|
||||
.forTable(TABLE_NAME)
|
||||
.withIndexConfig(HoodieIndexConfig.newBuilder().withIndexType(IndexType.BLOOM).build())
|
||||
.build();
|
||||
|
||||
@@ -17,4 +17,5 @@
|
||||
###
|
||||
hoodie.upsert.shuffle.parallelism=2
|
||||
hoodie.insert.shuffle.parallelism=2
|
||||
hoodie.delete.shuffle.parallelism=2
|
||||
hoodie.bulkinsert.shuffle.parallelism=2
|
||||
|
||||
Reference in New Issue
Block a user