lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Jian Xu	7f079632a6	Use hadoopConf in HoodieTableMetaClient and related tests	2018-03-12 11:47:55 -07:00
Nishith Agarwal	5405a6287b	Introducing HoodieLogFormat V2 with versioning support - HoodieLogFormat V2 has support for LogFormat evolution through versioning - LogVersion is associated with a LogBlock not a LogFile - Based on a version for a LogBlock, approporiate code path is executed - Implemented LazyReading of Hoodie Log Blocks with Memory / IO tradeoff - Implemented Reverse pointer to be able to traverse the log in reverse - Introduce new MAGIC for backwards compatibility with logs without versions	2018-03-06 21:14:11 -08:00
Nishith Agarwal	6fec9655a8	Added support for Disk Spillable Compaction to prevent OOM issues	2018-02-26 16:00:35 -08:00
Nishith Agarwal	2116815261	Fixing Rollback for compaction/commit operation, added check for null commit - Fallback to old way of rollback by listing all partitions - Added null check to ensure only partitions which are to be rolledback are considered - Added location (committime) to workload stat - Added checks in CompactedScanner to guard against task retries - Introduce new logic for rollback (bounded by instant_time and target_instant time) - Reversed logfiles order	2018-02-06 16:55:23 -08:00
vinothchandar	21ce846f18	Remove stateful fs member from HoodieTestUtils & FSUtils	2018-01-17 23:34:21 -08:00
vinothchandar	cf7f7aabb9	Nicer handling of timeline archival for Cloud storage - When append() is not supported, rollover to new file always (instead of failing) - Provide way to configure archive log folder (avoids small files inside .hoodie) - Datasets written via Spark datasource archive to .hoodie/archived - HoodieClientExample will now retain only 2,3 commits to exercise archival path during dev cycles - Few tweaks to code structure around CommitArchiveLog	2018-01-17 23:34:21 -08:00
Vinoth Chandar	0cd186c899	Multi FS Support - Reviving PR 191, to make FileSystem creation off actual path - Streamline all filesystem access to HoodieTableMetaClient - Hadoop Conf from Spark Context serialized & passed to executor code too - Pick up env vars prefixed with HOODIE_ENV_ into Configuration object - Cleanup usage of FSUtils.getFS, piggybacking off HoodieTableMetaClient.getFS - Adding s3a to supported schemes & support escaping "." in env vars - Tests use HoodieTestUtils.getDefaultHadoopConf	2018-01-17 23:34:21 -08:00
Nishith Agarwal	44839b88c6	Removing compaction action type and associated compaction timeline operations, replace with commit action type	2018-01-09 09:56:15 -08:00
Vinoth Chandar	e45679f5e2	Reformatting code per Google Code Style all over	2017-11-12 23:19:02 -08:00
Nishith Agarwal	abe964bebd	Implementing custom payload/merge hooks abstractions for application specific merge logic	2017-11-07 18:55:55 -08:00
Nishith Agarwal	c7d63a7622	1) Separated rollback as a table operation 2) Implement rollback for MOR	2017-10-12 07:36:46 -07:00
Vinoth Chandar	274aaf49fe	Incorporating code review feedback for DataSource	2017-10-02 20:44:53 -07:00
Vinoth Chandar	64e0573aca	Adding hoodie-spark to support Spark Datasource for Hoodie - Write with COW/MOR paths work fully - Read with RO view works on both storages* - Incremental view supported on COW - Refactored out HoodieReadClient methods, to just contain key based access - HoodieDataSourceHelpers class can be now used to construct inputs to datasource - Tests in hoodie-client using new helpers and mechanisms - Basic tests around save modes & insert/upserts (more to follow) - Bumped up scala to 2.11, since 2.10 is deprecated & complains with scalatest - Updated documentation to describe usage - New sample app written using the DataSource API	2017-10-02 20:44:53 -07:00
Jian Xu	b1cf097b0c	Add nested fields support for MOR tables	2017-08-16 10:35:26 -07:00
Vinoth Chandar	754ab88a2d	Introduce ReadOptimizedView & RealtimeView out of TableFileSystemView - Usage now marks code as clearly using either RO or RT views, for future evolution - Tests on all of FileGroups and FileSlices	2017-06-22 17:16:13 -07:00
Vinoth Chandar	c00f1a9ed9	Refactoring HoodieTableFileSystemView using FileGroups/FileSlices - Merged all filter* and get* methods - new constructor takes filestatus[] - All existing tests pass - FileGroup is all files that belong to a fileID within a partition - FileSlice is a generation of data and log files, starting at a base commit	2017-06-22 17:16:13 -07:00
Prasanna Rajaperumal	4b26be9f61	Fixes to RealtimeInputFormat and RealtimeRecordReader and update documentation for HiveSyncTool	2017-06-15 18:21:07 -07:00
Nishith Agarwal	ba050973e3	updated HoodieRealtimeRecordReader to use HoodieCompactedLogRecordScanner, added test for recordreader	2017-06-02 11:33:59 -07:00
Vinoth Chandar	da17c5c607	Introduce getCommitsAndCompactionsTimeline() explicitly & adjust usage across code base	2017-05-01 21:48:27 -07:00
Vinoth Chandar	bae0528013	Cleanup calls to HoodieTimeline.compareTimeStamps	2017-05-01 21:48:27 -07:00
Vinoth Chandar	7b1446548f	Initial impl of HoodieRealtimeInputFormat - Works end-end for flat schemas - Schema evolution & hardening remains - HoodieClientExample can now write mor tables as well	2017-05-01 21:48:27 -07:00
Prasanna Rajaperumal	8974e11161	Make sure properties set in HoodieWriteConfig is propogated down to individual configs. Fix a race condition which lets InputFormat to think file size is 0 when it is actually not	2017-04-27 10:52:25 -07:00
Vinoth Chandar	3129770fd0	Create .hoodie_partition_metadata in each partition, linking back to basepath - Concurreny handled via taskID, failure recovery handled via renames - Falls back to search 3 levels up - Cli tool has command to add this to existing tables	2017-04-03 18:28:01 -07:00
Wei Yan	c4fa585b27	Switch some info log to debug (#83 ) * Switch some info log to debug * fix a typo * remote HoodieTableMetadata file	2017-02-23 20:12:36 -08:00
Prasanna Rajaperumal	fe5c5e8021	Test Failure in Travis-ci	2017-02-21 20:25:01 -08:00
Prasanna Rajaperumal	1132f3533d	Merge and pull master commits	2017-02-21 17:53:28 -08:00
prazanna	eb46e7c72b	Implement Merge on Read Storage (#76 ) 1. Create HoodieTable abstraction for commits and fileSystemView 2. HoodieMergeOnReadTable created 3. View is now always obtained from the table and the correct view based on the table type is returned	2017-02-21 16:24:38 -08:00
Prasanna Rajaperumal	ccd8cb2407	Take 2: Refactor hoodie-common and create right abstractions for Hoodie Storage V2.0 - Refactored timelines to be a single timeline for all active events and one for archived events. CommitTimeline and other timelines can be inferred by applying a filter on the activeTimelime - Introduced HoodieInstant to abstract different types of action, commit time and if isInFlight - Implemented other review comments	2017-02-21 16:23:53 -08:00
Prasanna Rajaperumal	8ee777a9bb	Refactor hoodie-common and create right abstractions for Hoodie Storage V2.0 The following is the gist of changes done - All low-level operation of creating a commit code was in HoodieClient which made it hard to share code if there was a compaction commit. - HoodieTableMetadata contained a mix of metadata and filtering files. (Also few operations required FileSystem to be passed in because those were called from TaskExecutors and others had FileSystem as a global variable). Since merge-on-read requires a lot of that code, but will have to change slightly on how it operates on the metadata and how it filters the files. The two set of operation are split into HoodieTableMetaClient and TableFileSystemView. - Everything (active commits, archived commits, cleaner log, save point log and in future delta and compaction commits) in HoodieTableMetaClient is a HoodieTimeline. Timeline is a series of instants, which has an in-built concept of inflight and completed commit markers. - A timeline can be queries for ranges, contains and also use to create new datapoint (create a new commit etc). Commit (and all the above metadata) creation/deletion is streamlined in a timeline - Multiple timelines can be merged into a single timeline, giving us an audit timeline to whatever happened in a hoodie dataset. This also helps with #55. - Move to java 8 and introduce java 8 succinct syntax in refactored code	2017-02-21 16:23:53 -08:00
Prasanna Rajaperumal	4a47d26818	Fixing a javadoc lint issue	2017-02-20 15:57:58 -08:00
vinoth chandar	54409b07ea	Implement HoodieROTablePathFilter for use with Spark 2.0 (#66 ) - Unit tested - Tested with few queries on Spark 2.x at Uber	2017-01-26 11:13:33 -08:00
Vinoth Chandar	dd8638c2cc	Annotate HoodieInputFormat with UseFileSplitsFromInputFormat - Tested with corresponding Presto changes (pending PR)	2017-01-04 11:19:25 -08:00
Prasanna Rajaperumal	5d44ae3dbd	Make hoodie run on travis-ci Test logs > 4MB, which is a limit for travis-ci. Reducing the logs by setting appropriate log levels for tests Add sudo: required on travis.yml to get more memory for running the tests. (https://github.com/travis-ci/travis-ci/issues/5926) Fixed requirement that fsclient.lastDataFileForDataset always returns files in order	2016-12-20 19:26:48 -08:00
Prasanna Rajaperumal	388457b6b2	Add hoodie-hive module to support hive registration of hoodie datasets	2016-12-19 23:05:39 -08:00
Prasanna Rajaperumal	61200b1207	Adding hoodie-hadoop-mr module to add HoodieInputFormat	2016-12-16 19:29:53 -08:00

35 Commits