1
0
Commit Graph

14 Commits

Author SHA1 Message Date
Nishith Agarwal
48aa026dc4 Adding documentation for migration guide and COW vs MOR tradeoffs, moving some docs around for more clarity 2018-10-19 15:00:38 -07:00
vinothchandar
9ca6f91e97 Perform consistency checks during write finalize
- Check to ensure written files are listable on storage
 - Docs reflected to capture how this helps with s3 storage
 - Unit tests added, corrections to existing tests
 - Fix DeltaStreamer to manage archived commits in a separate folder
2018-09-28 08:04:41 +05:30
vinothchandar
8f1d362015 Fixing deps & serialization for RTView
- hoodie-hadoop-mr now needs objectsize bundled
 - Also updated docs with additional tuning tips
2018-06-10 19:16:44 -07:00
Vinoth Chandar
85dd265b7b Improving out of box experience for data source
- Fixes #246
 - Bump up default parallelism to 1500, to handle large upserts
 - Add docs on s3 confuration & tuning tips with tested spark knobs
 - Fix bug to not duplicate hoodie metadata fields when input dataframe is another hoodie dataset
 - Improve speed of ROTablePathFilter by removing directory check
 - Move to spark-avro 4.0 to handle issue with nested fields with same name
 - Keep AvroConversionUtils in sync with spark-avro 4.0
2018-06-10 19:16:44 -07:00
Nishith Agarwal
987f5d6b96 Making ExternalSpillableMap generic for any datatype
- Introduced concept of converters to be able to serde generic datatype for SpillableMap
	- Fixed/Added configs to Hoodie Configs
	- Changed HoodieMergeHandle to start using SpillableMap
2018-03-28 07:56:07 -07:00
Nishith Agarwal
5405a6287b Introducing HoodieLogFormat V2 with versioning support
- HoodieLogFormat V2 has support for LogFormat evolution through versioning
			- LogVersion is associated with a LogBlock not a LogFile
			- Based on a version for a LogBlock, approporiate code path is executed
		- Implemented LazyReading of Hoodie Log Blocks with Memory / IO tradeoff
		- Implemented Reverse pointer to be able to traverse the log in reverse
		- Introduce new MAGIC for backwards compatibility with logs without versions
2018-03-06 21:14:11 -08:00
Nishith Agarwal
d495484399 Write smaller sized multiple blocks to log file instead of a large one
- Use SizeEstimator to size number of records to write
	- Configurable block size
   	- Configurable log file size
2018-02-23 07:31:39 -08:00
Vinoth Chandar
274aaf49fe Incorporating code review feedback for DataSource 2017-10-02 20:44:53 -07:00
Vinoth Chandar
64e0573aca Adding hoodie-spark to support Spark Datasource for Hoodie
- Write with COW/MOR paths work fully
 - Read with RO view works on both storages*
 - Incremental view supported on COW
 - Refactored out HoodieReadClient methods, to just contain key based access
 - HoodieDataSourceHelpers class can be now used to construct inputs to datasource
 - Tests in hoodie-client using new helpers and mechanisms
 - Basic tests around save modes & insert/upserts (more to follow)
 - Bumped up scala to 2.11, since 2.10 is deprecated & complains with scalatest
 - Updated documentation to describe usage
 - New sample app written using the DataSource API
2017-10-02 20:44:53 -07:00
Vinoth Chandar
86209640f7 Adding range based pruning to bloom index
- keys compared lexicographically using String::compareTo
 - Range metadata additionally written into parquet file footers
 - Trim fat & few optimizations to speed up indexing
 - Add param to control whether input shall be cached, to speed up lookup
 - Add param to turn on/off range pruning
 - Auto compute of parallelism now simply factors in amount of comparisons done
 - More accurate parallelism computation when range pruning is on
 - tests added & hardened, docs updated
2017-08-04 13:22:13 -07:00
Zeeshan Qureshi
43a55b09fd Add GCS to supported filesystems 2017-05-18 10:30:34 -07:00
Vinoth Chandar
e0fc4ec38e Documentation update + helper method for WriteConfig builder 2017-04-03 18:28:01 -07:00
Yash Sharma
bca7e7dae4 improve documentations 2017-03-28 05:08:54 -07:00
Prasanna Rajaperumal
a725382549 Add Configurations to Documentation 2017-02-06 14:12:18 -08:00