Adding documentation for migration guide and COW vs MOR tradeoffs, moving some docs around for more clarity
This commit is contained in:
committed by
vinoth chandar
parent
1628d044ac
commit
48aa026dc4
@@ -160,7 +160,8 @@ summary: "Here we list all possible configurations and what they mean"
|
||||
|
||||
Writing data via Hoodie happens as a Spark job and thus general rules of spark debugging applies here too. Below is a list of things to keep in mind, if you are looking to improving performance or reliability.
|
||||
|
||||
- **Right operations** : Use `bulkinsert` to load new data into a table, and there on use `upsert`/`insert`. Difference between them is that bulk insert uses a disk based write path to scale to load large inputs without need to cache it.
|
||||
- **Write operations** : Use `bulkinsert` to load new data into a table, and there on use `upsert`/`insert`.
|
||||
Difference between them is that bulk insert uses a disk based write path to scale to load large inputs without need to cache it.
|
||||
- **Input Parallelism** : By default, Hoodie tends to over-partition input (i.e `withParallelism(1500)`), to ensure each Spark partition stays within the 2GB limit for inputs upto 500GB. Bump this up accordingly if you have larger inputs. We recommend having shuffle parallelism `hoodie.[insert|upsert|bulkinsert].shuffle.parallelism` such that its atleast input_data_size/500MB
|
||||
- **Off-heap memory** : Hoodie writes parquet files and that needs good amount of off-heap memory proportional to schema width. Consider setting something like `spark.yarn.executor.memoryOverhead` or `spark.yarn.driver.memoryOverhead`, if you are running into such failures.
|
||||
- **Spark Memory** : Typically, hoodie needs to be able to read a single file into memory to perform merges or compactions and thus the executor memory should be sufficient to accomodate this. In addition, Hoodie caches the input to be able to intelligently place data and thus leaving some `spark.storage.memoryFraction` will generally help boost performance.
|
||||
|
||||
Reference in New Issue
Block a user