lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Balaji Varadarajan	2f8ce93030	Async Compaction Main API changes	2018-08-07 08:19:50 -07:00
Balaji Varadarajan	9b78523d62	Ensure Cleaner and Archiver do not delete file-slices and workload marked for compaction	2018-08-07 08:19:50 -07:00
Balaji Varadarajan	0a0451a765	Ensure Compaction workload is stored in write-once meta-data files separate from timeline files. This avoids concurrency issues when compactor(s) and ingestor are running in parallel. In the Next PR -> Safety concern regarding Cleaner retaining all meta-data and file-slices for pending compactions will be addressed	2018-08-07 08:19:50 -07:00
Balaji Varadarajan	1b61f04e05	(1) Define CompactionWorkload in avro to allow storing them in instant files. (2) Split APIs in HoodieRealtimeCompactor to separate generating compaction workload from running compaction	2018-08-07 08:19:50 -07:00
Balaji Varadarajan	6d01ae8ca0	FileSystemView and Timeline level changes to support Async Compaction	2018-08-07 08:19:50 -07:00
Nishith Agarwal	44caf0d40c	Fixing missing hoodie record location in HoodieRecord when record is read from disk after being spilled	2018-07-18 12:53:35 -07:00
Nishith Agarwal	34ab54a9d3	Fixing bug introducted in rollback for MOR table type with inserts into log files	2018-07-17 17:20:34 -07:00
Nishith Agarwal	a6fe96fdfe	Changing Day based compaction strategy to be IO agnostic	2018-06-18 15:22:56 -07:00
Nishith Agarwal	3da063f83b	Adding ability for inserts to be written to log files	2018-06-11 14:08:59 -07:00
Balaji Varadarajan	dfc0c61eb7	Support union mode in HoodieRealtimeRecordReader for pure insert workloads Also Replace BufferedIteratorPayload abstraction with function passing	2018-05-10 17:39:56 -07:00
Nishith Agarwal	04655e9e85	Adding metrics for MOR and COW	2018-04-26 09:32:45 -07:00
Sunil Ramaiah	b9b9b24993	Added more comments and removed the extra new lines	2018-04-25 13:09:15 -07:00
Sunil Ramaiah	4d1fba24c9	Fix for updating duplicate records in same/different files in same parition	2018-04-25 13:09:15 -07:00
Nishith Agarwal	c3c205fc02	Using BufferedFsInputStream to wrap FSInputStream for FSDataInputStream	2018-04-18 08:05:19 -07:00
Nishith Agarwal	720e42f52a	Parallelized read-write operations in Hoodie Merge phase	2018-04-12 11:46:42 -07:00
Balaji Varadarajan	6c226ca21a	Issue-329 : Refactoring TestHoodieClientOnCopyOnWriteStorage and adding test-cases	2018-04-09 16:34:58 -07:00
Balaji Varadarajan	788e4f2d2e	CodeStyle formatting to conform to basic Checkstyle rules. The code-style rules follow google style with some changes: 1. Increase line length from 100 to 120 2. Disable JavaDoc related checkstyles as this needs more manual work. Both source and test code are checked for code-style	2018-03-30 11:09:40 -07:00
Nishith Agarwal	1b756db221	Adding config for parquet compression ratio	2018-03-25 22:17:36 -07:00
Kaushik Devarajaiah	291a88ba94	DeduplicateRecords based on recordKey if global index is used	2018-03-22 09:15:44 -07:00
Omkar Joshi	c5b4cb1b75	Spawning parallel writer thread to separate reading records from spark and writing records to parquet file	2018-03-15 16:58:14 -07:00
Jian Xu	7f079632a6	Use hadoopConf in HoodieTableMetaClient and related tests	2018-03-12 11:47:55 -07:00
Nishith Agarwal	0eaa21111a	Re-factoring Compaction as first level API in WriteClient similar to upsert/insert	2018-03-07 16:16:39 -08:00
Nishith Agarwal	5405a6287b	Introducing HoodieLogFormat V2 with versioning support - HoodieLogFormat V2 has support for LogFormat evolution through versioning - LogVersion is associated with a LogBlock not a LogFile - Based on a version for a LogBlock, approporiate code path is executed - Implemented LazyReading of Hoodie Log Blocks with Memory / IO tradeoff - Implemented Reverse pointer to be able to traverse the log in reverse - Introduce new MAGIC for backwards compatibility with logs without versions	2018-03-06 21:14:11 -08:00
Jian Xu	dfd1979c51	Handle inflight clean instants during Hoodie instants archiving	2018-03-05 15:01:58 -08:00
Jian Xu	5d5c306e64	Add new APIs in HoodieReadClient and HoodieWriteClient	2018-02-28 13:58:12 -08:00
Nishith Agarwal	30049383f5	Small File Size correction handling for MOR table type	2018-02-07 11:01:10 -08:00
Nishith Agarwal	2116815261	Fixing Rollback for compaction/commit operation, added check for null commit - Fallback to old way of rollback by listing all partitions - Added null check to ensure only partitions which are to be rolledback are considered - Added location (committime) to workload stat - Added checks in CompactedScanner to guard against task retries - Introduce new logic for rollback (bounded by instant_time and target_instant time) - Reversed logfiles order	2018-02-06 16:55:23 -08:00
Nishith Agarwal	be0b1f3e57	Adding global indexing to HbaseIndex implementation - Adding tests or HbaseIndex - Enabling global index functionality	2018-02-05 15:21:22 -08:00
Jian Xu	15e669c60c	Incorporating code review feedback for finalizeWrite for COW #4	2018-02-02 11:38:25 -08:00
Jian Xu	3736243fb3	Rebases with latest upstream	2018-02-02 11:38:25 -08:00
Jian Xu	363e35bb0f	Add finalizeWrite support for HoodieMergeHandle	2018-02-02 11:38:25 -08:00
Jian Xu	2fe4fef625	Incorporating code review feedback for finalizeWrite for COW	2018-02-02 11:38:25 -08:00
Jian Xu	c874248f23	Add FinalizeWrite in HoodieCreateHandle for COW tables	2018-02-02 11:38:25 -08:00
vinothchandar	21ce846f18	Remove stateful fs member from HoodieTestUtils & FSUtils	2018-01-17 23:34:21 -08:00
vinothchandar	cf7f7aabb9	Nicer handling of timeline archival for Cloud storage - When append() is not supported, rollover to new file always (instead of failing) - Provide way to configure archive log folder (avoids small files inside .hoodie) - Datasets written via Spark datasource archive to .hoodie/archived - HoodieClientExample will now retain only 2,3 commits to exercise archival path during dev cycles - Few tweaks to code structure around CommitArchiveLog	2018-01-17 23:34:21 -08:00
Vinoth Chandar	0cd186c899	Multi FS Support - Reviving PR 191, to make FileSystem creation off actual path - Streamline all filesystem access to HoodieTableMetaClient - Hadoop Conf from Spark Context serialized & passed to executor code too - Pick up env vars prefixed with HOODIE_ENV_ into Configuration object - Cleanup usage of FSUtils.getFS, piggybacking off HoodieTableMetaClient.getFS - Adding s3a to supported schemes & support escaping "." in env vars - Tests use HoodieTestUtils.getDefaultHadoopConf	2018-01-17 23:34:21 -08:00
Nishith Agarwal	44839b88c6	Removing compaction action type and associated compaction timeline operations, replace with commit action type	2018-01-09 09:56:15 -08:00
Nishith Agarwal	4aed5c7338	Adding a new Partition/Time based compaction strategy	2017-12-05 16:30:38 -08:00
Nishith Agarwal	9b610f82c7	Separating out compaction() API	2017-11-14 22:56:29 -08:00
Vinoth Chandar	e45679f5e2	Reformatting code per Google Code Style all over	2017-11-12 23:19:02 -08:00
Nishith Agarwal	c7d63a7622	1) Separated rollback as a table operation 2) Implement rollback for MOR	2017-10-12 07:36:46 -07:00
Vinoth Chandar	274aaf49fe	Incorporating code review feedback for DataSource	2017-10-02 20:44:53 -07:00
Vinoth Chandar	64e0573aca	Adding hoodie-spark to support Spark Datasource for Hoodie - Write with COW/MOR paths work fully - Read with RO view works on both storages* - Incremental view supported on COW - Refactored out HoodieReadClient methods, to just contain key based access - HoodieDataSourceHelpers class can be now used to construct inputs to datasource - Tests in hoodie-client using new helpers and mechanisms - Basic tests around save modes & insert/upserts (more to follow) - Bumped up scala to 2.11, since 2.10 is deprecated & complains with scalatest - Updated documentation to describe usage - New sample app written using the DataSource API	2017-10-02 20:44:53 -07:00
Kaushik Devarajaiah	c98ee057fc	capture record metadata before deflating for record counting	2017-10-02 10:46:06 -07:00
Omkar Joshi	ec40d04d51	Fixing UpsertPartitioner to ensure that input records are deterministically assigned to output partitions	2017-09-07 17:03:56 -07:00
Nishith Agarwal	e2d13c6305	Fix build failing issues	2017-09-07 10:54:36 -07:00
Vinoth Chandar	45dd8980c3	Temporary fix for build break after rebase	2017-08-04 17:36:39 -07:00
Vinoth Chandar	86209640f7	Adding range based pruning to bloom index - keys compared lexicographically using String::compareTo - Range metadata additionally written into parquet file footers - Trim fat & few optimizations to speed up indexing - Add param to control whether input shall be cached, to speed up lookup - Add param to turn on/off range pruning - Auto compute of parallelism now simply factors in amount of comparisons done - More accurate parallelism computation when range pruning is on - tests added & hardened, docs updated	2017-08-04 13:22:13 -07:00
Nishith Agarwal	0b26b60a5c	fix for cleaning log files(mor)	2017-08-02 11:54:42 -07:00
Nishith Agarwal	19c22b231e	1. Use HoodieLogFormat to archive commits and other actions 2. Introduced avro schema for commits and compactions and an avro wrapper schema	2017-07-26 14:27:44 -07:00

1 2

81 Commits