lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Balaji Varadarajan	ea23c9b7a0	Minor bug fixes found during testing	2018-08-07 08:19:50 -07:00
Balaji Varadarajan	2e12c86d01	Ensure Compaction Operation compacts the data file as defined in the workload	2018-08-07 08:19:50 -07:00
Balaji Varadarajan	2f8ce93030	Async Compaction Main API changes	2018-08-07 08:19:50 -07:00
Balaji Varadarajan	9b78523d62	Ensure Cleaner and Archiver do not delete file-slices and workload marked for compaction	2018-08-07 08:19:50 -07:00
Balaji Varadarajan	0a0451a765	Ensure Compaction workload is stored in write-once meta-data files separate from timeline files. This avoids concurrency issues when compactor(s) and ingestor are running in parallel. In the Next PR -> Safety concern regarding Cleaner retaining all meta-data and file-slices for pending compactions will be addressed	2018-08-07 08:19:50 -07:00
Balaji Varadarajan	9d99942564	Track fileIds with pending compaction in FileSystemView to provide correct API semantics	2018-08-07 08:19:50 -07:00
Balaji Varadarajan	1b61f04e05	(1) Define CompactionWorkload in avro to allow storing them in instant files. (2) Split APIs in HoodieRealtimeCompactor to separate generating compaction workload from running compaction	2018-08-07 08:19:50 -07:00
Balaji Varadarajan	6d01ae8ca0	FileSystemView and Timeline level changes to support Async Compaction	2018-08-07 08:19:50 -07:00
Omkar Joshi	f62890ca1f	adding setters so that subclasses can set it	2018-07-18 12:53:11 -07:00
Nishith Agarwal	34ab54a9d3	Fixing bug introducted in rollback for MOR table type with inserts into log files	2018-07-17 17:20:34 -07:00
Nishith Agarwal	a6fe96fdfe	Changing Day based compaction strategy to be IO agnostic	2018-06-18 15:22:56 -07:00
Nishith Agarwal	3da063f83b	Adding ability for inserts to be written to log files	2018-06-11 14:08:59 -07:00
Vinoth Chandar	85dd265b7b	Improving out of box experience for data source - Fixes #246 - Bump up default parallelism to 1500, to handle large upserts - Add docs on s3 confuration & tuning tips with tested spark knobs - Fix bug to not duplicate hoodie metadata fields when input dataframe is another hoodie dataset - Improve speed of ROTablePathFilter by removing directory check - Move to spark-avro 4.0 to handle issue with nested fields with same name - Keep AvroConversionUtils in sync with spark-avro 4.0	2018-06-10 19:16:44 -07:00
Sunil Ramaiah	a97814462d	Added a filter function to filter the record keys in a parquet file	2018-05-17 19:01:11 -07:00
Nishith Agarwal	23d53763c4	enabling global index for MOR	2018-05-16 10:36:25 -07:00
Balaji Varadarajan	dfc0c61eb7	Support union mode in HoodieRealtimeRecordReader for pure insert workloads Also Replace BufferedIteratorPayload abstraction with function passing	2018-05-10 17:39:56 -07:00
Nishith Agarwal	93f345a032	Minor fixes for MergeOnRead MVP release readiness	2018-05-09 07:23:58 -07:00
Nishith Agarwal	04655e9e85	Adding metrics for MOR and COW	2018-04-26 09:32:45 -07:00
Sunil Ramaiah	4d1fba24c9	Fix for updating duplicate records in same/different files in same parition	2018-04-25 13:09:15 -07:00
Nishith Agarwal	c3c205fc02	Using BufferedFsInputStream to wrap FSInputStream for FSDataInputStream	2018-04-18 08:05:19 -07:00
Nishith Agarwal	720e42f52a	Parallelized read-write operations in Hoodie Merge phase	2018-04-12 11:46:42 -07:00
Balaji Varadarajan	788e4f2d2e	CodeStyle formatting to conform to basic Checkstyle rules. The code-style rules follow google style with some changes: 1. Increase line length from 100 to 120 2. Disable JavaDoc related checkstyles as this needs more manual work. Both source and test code are checked for code-style	2018-03-30 11:09:40 -07:00
Nishith Agarwal	987f5d6b96	Making ExternalSpillableMap generic for any datatype - Introduced concept of converters to be able to serde generic datatype for SpillableMap - Fixed/Added configs to Hoodie Configs - Changed HoodieMergeHandle to start using SpillableMap	2018-03-28 07:56:07 -07:00
Nishith Agarwal	1b756db221	Adding config for parquet compression ratio	2018-03-25 22:17:36 -07:00
Jian Xu	48643795b8	Checking storage level before persisting preppedRecords	2018-03-22 22:15:52 -07:00
Kaushik Devarajaiah	291a88ba94	DeduplicateRecords based on recordKey if global index is used	2018-03-22 09:15:44 -07:00
Jian Xu	d3df32fa03	Add back UseTempFolder changes in HoodieMergeHandle	2018-03-15 17:11:15 -07:00
Omkar Joshi	c5b4cb1b75	Spawning parallel writer thread to separate reading records from spark and writing records to parquet file	2018-03-15 16:58:14 -07:00
Jian Xu	ba7c258c61	Add more options in HoodieWriteConfig	2018-03-13 23:26:36 -07:00
Nishith Agarwal	0eaa21111a	Re-factoring Compaction as first level API in WriteClient similar to upsert/insert	2018-03-07 16:16:39 -08:00
Nishith Agarwal	5405a6287b	Introducing HoodieLogFormat V2 with versioning support - HoodieLogFormat V2 has support for LogFormat evolution through versioning - LogVersion is associated with a LogBlock not a LogFile - Based on a version for a LogBlock, approporiate code path is executed - Implemented LazyReading of Hoodie Log Blocks with Memory / IO tradeoff - Implemented Reverse pointer to be able to traverse the log in reverse - Introduce new MAGIC for backwards compatibility with logs without versions	2018-03-06 21:14:11 -08:00
Jian Xu	dfd1979c51	Handle inflight clean instants during Hoodie instants archiving	2018-03-05 15:01:58 -08:00
Jian Xu	5d5c306e64	Add new APIs in HoodieReadClient and HoodieWriteClient	2018-02-28 13:58:12 -08:00
Nishith Agarwal	6fec9655a8	Added support for Disk Spillable Compaction to prevent OOM issues	2018-02-26 16:00:35 -08:00
Nishith Agarwal	d495484399	Write smaller sized multiple blocks to log file instead of a large one - Use SizeEstimator to size number of records to write - Configurable block size - Configurable log file size	2018-02-23 07:31:39 -08:00
Vinoth Chandar	eb3d0c470f	Fix formatting in HoodieWriteClient	2018-02-14 10:03:20 -08:00
Nishith Agarwal	7076c2e9f0	refactor classes to accept Map passed by RealtimeCompactor to avoid multiple map creations in HoodieMergeHandle	2018-02-07 11:16:01 -08:00
Nishith Agarwal	30049383f5	Small File Size correction handling for MOR table type	2018-02-07 11:01:10 -08:00
Nishith Agarwal	2116815261	Fixing Rollback for compaction/commit operation, added check for null commit - Fallback to old way of rollback by listing all partitions - Added null check to ensure only partitions which are to be rolledback are considered - Added location (committime) to workload stat - Added checks in CompactedScanner to guard against task retries - Introduce new logic for rollback (bounded by instant_time and target_instant time) - Reversed logfiles order	2018-02-06 16:55:23 -08:00
Nishith Agarwal	be0b1f3e57	Adding global indexing to HbaseIndex implementation - Adding tests or HbaseIndex - Enabling global index functionality	2018-02-05 15:21:22 -08:00
Jian Xu	15e669c60c	Incorporating code review feedback for finalizeWrite for COW #4	2018-02-02 11:38:25 -08:00
Jian Xu	3736243fb3	Rebases with latest upstream	2018-02-02 11:38:25 -08:00
Jian Xu	363e35bb0f	Add finalizeWrite support for HoodieMergeHandle	2018-02-02 11:38:25 -08:00
Jian Xu	acae6586f3	Incorporating code review feedback for finalizeWrite for COW #3	2018-02-02 11:38:25 -08:00
Jian Xu	37f2cdd7e4	Incorporating code review feedback for finalizeWrite for COW #2	2018-02-02 11:38:25 -08:00
Jian Xu	2fe4fef625	Incorporating code review feedback for finalizeWrite for COW	2018-02-02 11:38:25 -08:00
Jian Xu	c874248f23	Add FinalizeWrite in HoodieCreateHandle for COW tables	2018-02-02 11:38:25 -08:00
Nishith Agarwal	e10100fe32	Reducing list status calls from listing logfile versions, some associated refactoring	2018-01-29 08:26:39 -08:00
Nishith Agarwal	937ae322ba	Reducing memory footprint required in HoodieAvroDataBlock and HoodieAppendHandle	2018-01-29 08:22:29 -08:00
vinothchandar	cf7f7aabb9	Nicer handling of timeline archival for Cloud storage - When append() is not supported, rollover to new file always (instead of failing) - Provide way to configure archive log folder (avoids small files inside .hoodie) - Datasets written via Spark datasource archive to .hoodie/archived - HoodieClientExample will now retain only 2,3 commits to exercise archival path during dev cycles - Few tweaks to code structure around CommitArchiveLog	2018-01-17 23:34:21 -08:00

1 2 3 4

170 Commits