1
0

[HUDI-993] Let delete API use "hoodie.delete.shuffle.parallelism" (#1703)

For Delete API, "hoodie.delete.shuffle.parallelism" isn't used as opposed to "hoodie.upsert.shuffle.parallelism" is used for upsert, this creates the performance difference between delete by upsert API with "EmptyHoodieRecordPayload" and delete API for certain cases.

This patch makes the following fixes in this regard. 
- Let deduplicateKeys method use "hoodie.delete.shuffle.parallelism"
- Repartition inputRDD as "hoodie.delete.shuffle.parallelism" in case "hoodie.combine.before.delete=false"
This commit is contained in:
Dongwook
2020-09-01 09:55:31 -07:00
committed by GitHub
parent 48a58c98a1
commit 8d19ebfd0f
21 changed files with 234 additions and 15 deletions

View File

@@ -17,5 +17,6 @@
#
hoodie.upsert.shuffle.parallelism=2
hoodie.insert.shuffle.parallelism=2
hoodie.delete.shuffle.parallelism=2
hoodie.bulkinsert.shuffle.parallelism=2
hoodie.datasource.write.partitionpath.field=timestamp