1
0

Enabling auto tuning of insert splits by default

This commit is contained in:
Vinoth Chandar
2018-11-07 17:14:53 -08:00
committed by vinoth chandar
parent 25cd05b24e
commit 1362942aa3
2 changed files with 2 additions and 2 deletions

View File

@@ -74,7 +74,7 @@ summary: "Here we list all possible configurations and what they mean"
<span style="color:grey">Small files can always happen because of the number of insert records in a paritition in a batch. Hoodie has an option to auto-resolve small files by masking inserts into this partition as updates to existing small files. The size here is the minimum file size considered as a "small file size". This should be less < maxFileSize and setting it to 0, turns off this feature. </span>
- [insertSplitSize](#insertSplitSize) (size = 500000) <br/>
<span style="color:grey">Insert Write Parallelism. Number of inserts grouped for a single partition. Writing out 100MB files, with atleast 1kb records, means 100K records per file. Default is to overprovision to 500K. To improve insert latency, tune this to match the number of records in a single file. Setting this to a low number, will result in small files (particularly when compactionSmallFileSize is 0)</span>
- [autoTuneInsertSplits](#autoTuneInsertSplits) (false) <br/>
- [autoTuneInsertSplits](#autoTuneInsertSplits) (true) <br/>
<span style="color:grey">Should hoodie dynamically compute the insertSplitSize based on the last 24 commit's metadata. Turned off by default. </span>
- [approxRecordSize](#approxRecordSize) () <br/>
<span style="color:grey">The average record size. If specified, hoodie will use this and not compute dynamically based on the last 24 commit's metadata. No value set as default. This is critical in computing the insert parallelism and bin-packing inserts into small files. See above.</span>

View File

@@ -63,7 +63,7 @@ public class HoodieCompactionConfig extends DefaultHoodieConfig {
public static final String COPY_ON_WRITE_TABLE_AUTO_SPLIT_INSERTS =
"hoodie.copyonwrite.insert" + ".auto.split";
// its off by default
public static final String DEFAULT_COPY_ON_WRITE_TABLE_AUTO_SPLIT_INSERTS = String.valueOf(false);
public static final String DEFAULT_COPY_ON_WRITE_TABLE_AUTO_SPLIT_INSERTS = String.valueOf(true);
// This value is used as a guessimate for the record size, if we can't determine this from
// previous commits
public static final String COPY_ON_WRITE_TABLE_RECORD_SIZE_ESTIMATE =