lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Jian Xu	dfd1979c51	Handle inflight clean instants during Hoodie instants archiving	2018-03-05 15:01:58 -08:00
Jian Xu	5d5c306e64	Add new APIs in HoodieReadClient and HoodieWriteClient	2018-02-28 13:58:12 -08:00
Nishith Agarwal	30049383f5	Small File Size correction handling for MOR table type	2018-02-07 11:01:10 -08:00
Nishith Agarwal	2116815261	Fixing Rollback for compaction/commit operation, added check for null commit - Fallback to old way of rollback by listing all partitions - Added null check to ensure only partitions which are to be rolledback are considered - Added location (committime) to workload stat - Added checks in CompactedScanner to guard against task retries - Introduce new logic for rollback (bounded by instant_time and target_instant time) - Reversed logfiles order	2018-02-06 16:55:23 -08:00
Nishith Agarwal	be0b1f3e57	Adding global indexing to HbaseIndex implementation - Adding tests or HbaseIndex - Enabling global index functionality	2018-02-05 15:21:22 -08:00
Jian Xu	15e669c60c	Incorporating code review feedback for finalizeWrite for COW #4	2018-02-02 11:38:25 -08:00
Jian Xu	3736243fb3	Rebases with latest upstream	2018-02-02 11:38:25 -08:00
Jian Xu	363e35bb0f	Add finalizeWrite support for HoodieMergeHandle	2018-02-02 11:38:25 -08:00
Jian Xu	2fe4fef625	Incorporating code review feedback for finalizeWrite for COW	2018-02-02 11:38:25 -08:00
Jian Xu	c874248f23	Add FinalizeWrite in HoodieCreateHandle for COW tables	2018-02-02 11:38:25 -08:00
vinothchandar	21ce846f18	Remove stateful fs member from HoodieTestUtils & FSUtils	2018-01-17 23:34:21 -08:00
vinothchandar	cf7f7aabb9	Nicer handling of timeline archival for Cloud storage - When append() is not supported, rollover to new file always (instead of failing) - Provide way to configure archive log folder (avoids small files inside .hoodie) - Datasets written via Spark datasource archive to .hoodie/archived - HoodieClientExample will now retain only 2,3 commits to exercise archival path during dev cycles - Few tweaks to code structure around CommitArchiveLog	2018-01-17 23:34:21 -08:00
Vinoth Chandar	0cd186c899	Multi FS Support - Reviving PR 191, to make FileSystem creation off actual path - Streamline all filesystem access to HoodieTableMetaClient - Hadoop Conf from Spark Context serialized & passed to executor code too - Pick up env vars prefixed with HOODIE_ENV_ into Configuration object - Cleanup usage of FSUtils.getFS, piggybacking off HoodieTableMetaClient.getFS - Adding s3a to supported schemes & support escaping "." in env vars - Tests use HoodieTestUtils.getDefaultHadoopConf	2018-01-17 23:34:21 -08:00
Nishith Agarwal	44839b88c6	Removing compaction action type and associated compaction timeline operations, replace with commit action type	2018-01-09 09:56:15 -08:00
Nishith Agarwal	4aed5c7338	Adding a new Partition/Time based compaction strategy	2017-12-05 16:30:38 -08:00
Nishith Agarwal	9b610f82c7	Separating out compaction() API	2017-11-14 22:56:29 -08:00
Vinoth Chandar	e45679f5e2	Reformatting code per Google Code Style all over	2017-11-12 23:19:02 -08:00
Nishith Agarwal	c7d63a7622	1) Separated rollback as a table operation 2) Implement rollback for MOR	2017-10-12 07:36:46 -07:00
Vinoth Chandar	274aaf49fe	Incorporating code review feedback for DataSource	2017-10-02 20:44:53 -07:00
Vinoth Chandar	64e0573aca	Adding hoodie-spark to support Spark Datasource for Hoodie - Write with COW/MOR paths work fully - Read with RO view works on both storages* - Incremental view supported on COW - Refactored out HoodieReadClient methods, to just contain key based access - HoodieDataSourceHelpers class can be now used to construct inputs to datasource - Tests in hoodie-client using new helpers and mechanisms - Basic tests around save modes & insert/upserts (more to follow) - Bumped up scala to 2.11, since 2.10 is deprecated & complains with scalatest - Updated documentation to describe usage - New sample app written using the DataSource API	2017-10-02 20:44:53 -07:00
Kaushik Devarajaiah	c98ee057fc	capture record metadata before deflating for record counting	2017-10-02 10:46:06 -07:00
Omkar Joshi	ec40d04d51	Fixing UpsertPartitioner to ensure that input records are deterministically assigned to output partitions	2017-09-07 17:03:56 -07:00
Nishith Agarwal	e2d13c6305	Fix build failing issues	2017-09-07 10:54:36 -07:00
Vinoth Chandar	45dd8980c3	Temporary fix for build break after rebase	2017-08-04 17:36:39 -07:00
Vinoth Chandar	86209640f7	Adding range based pruning to bloom index - keys compared lexicographically using String::compareTo - Range metadata additionally written into parquet file footers - Trim fat & few optimizations to speed up indexing - Add param to control whether input shall be cached, to speed up lookup - Add param to turn on/off range pruning - Auto compute of parallelism now simply factors in amount of comparisons done - More accurate parallelism computation when range pruning is on - tests added & hardened, docs updated	2017-08-04 13:22:13 -07:00
Nishith Agarwal	0b26b60a5c	fix for cleaning log files(mor)	2017-08-02 11:54:42 -07:00
Nishith Agarwal	19c22b231e	1. Use HoodieLogFormat to archive commits and other actions 2. Introduced avro schema for commits and compactions and an avro wrapper schema	2017-07-26 14:27:44 -07:00
Nishith Agarwal	616c9a68c3	Enabled deletes in merge_on_read	2017-07-26 13:37:27 -07:00
Prasanna Rajaperumal	5cc071f74e	Savepoint should not create a hole in the commit timeline	2017-06-27 16:36:09 -07:00
Vinoth Chandar	754ab88a2d	Introduce ReadOptimizedView & RealtimeView out of TableFileSystemView - Usage now marks code as clearly using either RO or RT views, for future evolution - Tests on all of FileGroups and FileSlices	2017-06-22 17:16:13 -07:00
Vinoth Chandar	c00f1a9ed9	Refactoring HoodieTableFileSystemView using FileGroups/FileSlices - Merged all filter* and get* methods - new constructor takes filestatus[] - All existing tests pass - FileGroup is all files that belong to a fileID within a partition - FileSlice is a generation of data and log files, starting at a base commit	2017-06-22 17:16:13 -07:00
Vinoth Chandar	23e7badd8a	Rename IO Handles & introduce stub for BucketedIndex - UpdateHandle -> MergeHandle, InsertHandle -> CreateHandle - Also bunch of code cleanup in different places	2017-06-22 17:16:13 -07:00
Kaushik Devarajaiah	3aa8083913	Correct clean bug that causes clean failure when partitionPaths are empty	2017-06-20 15:45:32 -07:00
gekath	52c507f83e	Writes relative paths to .commit files Handle case where path is read in as null from commit file Merged with updated release	2017-06-16 12:51:19 -07:00
gekath	db7311f85e	Writes relative paths to .commit files instead of absolute paths Clean up code Removed commented out code Fixed merge conflict with master	2017-06-16 12:51:19 -07:00
Kaushik Devarajaiah	521555c576	Parallelize file version deletes during clean and related tests	2017-06-15 18:20:42 -07:00
Vinoth Chandar	da17c5c607	Introduce getCommitsAndCompactionsTimeline() explicitly & adjust usage across code base	2017-05-01 21:48:27 -07:00
Vinoth Chandar	bae0528013	Cleanup calls to HoodieTimeline.compareTimeStamps	2017-05-01 21:48:27 -07:00
Prasanna Rajaperumal	7bca428a0a	Test to check if properties set are properly propogated	2017-04-28 12:47:14 -07:00
Prasanna Rajaperumal	3f97bdcccf	Test to check if properties set are properly propogated	2017-04-28 12:40:58 -07:00
Prasanna Rajaperumal	91b088f29f	Implement Compaction policy abstraction. Implement LogSizeBased Bounded IO Compaction as the default strategy	2017-04-20 16:59:06 -07:00
Vinoth Chandar	dce35ff0d7	Adding a config to control whether date partitioning can be assumed - false by default - CAUTION: If you have an existing tables without partition metadata, you need to set this to "true"	2017-04-03 18:28:01 -07:00
Vinoth Chandar	f9fd16069d	FSUtils.getAllPartitionsPaths() works based on .hoodie_partition_metadata - clean/rollback/write paths covered by existing tests - Snapshot copier fixed to copy metadata file also, and test fixed - Existing tables need to be repaired by addition of metadata, before this can be rolled out	2017-04-03 18:28:01 -07:00
Prasanna Rajaperumal	aee136777b	Fixes needed to run merge-on-read testing on production scale data	2017-04-02 22:25:47 -07:00
ovj	21898907c1	tool for importing hive tables (in parquet format) into hoodie dataset (#89 ) * tool for importing hive tables (in parquet format) into hoodie dataset * review fixes * review fixes * review fixes	2017-03-21 14:42:13 -07:00
Prasanna Rajaperumal	d83b671ada	Implement Savepoints and required metadata timeline - Part 2	2017-03-13 23:09:29 -07:00
prazanna	6f36e1eaaf	Implement Savepoints and required metadata timeline (#86 ) - Introduce avro to save clean metadata with details about the last commit that was retained - Save rollback metadata in the meta timeline - Create savepoint metadata and add API to createSavepoint, deleteSavepoint and rollbackToSavepoint - Savepointed commit should not be rolledback or cleaned or archived - introduce cli commands to show, create and rollback to savepoints - Write unit tests to test savepoints and rollbackToSavepoints	2017-03-13 15:12:03 -07:00
siddharthagunda	348a48aa80	Add delete support to Hoodie (#85 )	2017-03-04 01:33:49 -08:00
Prasanna Rajaperumal	1132f3533d	Merge and pull master commits	2017-02-21 17:53:28 -08:00
prazanna	eb46e7c72b	Implement Merge on Read Storage (#76 ) 1. Create HoodieTable abstraction for commits and fileSystemView 2. HoodieMergeOnReadTable created 3. View is now always obtained from the table and the correct view based on the table type is returned	2017-02-21 16:24:38 -08:00

1 2

58 Commits