1
0
Commit Graph

20 Commits

Author SHA1 Message Date
Jian Xu
7f079632a6 Use hadoopConf in HoodieTableMetaClient and related tests 2018-03-12 11:47:55 -07:00
Nishith Agarwal
5405a6287b Introducing HoodieLogFormat V2 with versioning support
- HoodieLogFormat V2 has support for LogFormat evolution through versioning
			- LogVersion is associated with a LogBlock not a LogFile
			- Based on a version for a LogBlock, approporiate code path is executed
		- Implemented LazyReading of Hoodie Log Blocks with Memory / IO tradeoff
		- Implemented Reverse pointer to be able to traverse the log in reverse
		- Introduce new MAGIC for backwards compatibility with logs without versions
2018-03-06 21:14:11 -08:00
vinothchandar
21ce846f18 Remove stateful fs member from HoodieTestUtils & FSUtils 2018-01-17 23:34:21 -08:00
vinothchandar
cf7f7aabb9 Nicer handling of timeline archival for Cloud storage
- When append() is not supported, rollover to new file always (instead of failing)
 - Provide way to configure archive log folder (avoids small files inside .hoodie)
 - Datasets written via Spark datasource archive to .hoodie/archived
 - HoodieClientExample will now retain only 2,3 commits to exercise archival path during dev cycles
 - Few tweaks to code structure around CommitArchiveLog
2018-01-17 23:34:21 -08:00
Vinoth Chandar
0cd186c899 Multi FS Support
- Reviving PR 191, to make FileSystem creation off actual path
 - Streamline all filesystem access to HoodieTableMetaClient
 - Hadoop Conf from Spark Context serialized & passed to executor code too
 - Pick up env vars prefixed with HOODIE_ENV_ into Configuration object
 - Cleanup usage of FSUtils.getFS, piggybacking off HoodieTableMetaClient.getFS
 - Adding s3a to supported schemes & support escaping "." in env vars
 - Tests use HoodieTestUtils.getDefaultHadoopConf
2018-01-17 23:34:21 -08:00
Vinoth Chandar
e45679f5e2 Reformatting code per Google Code Style all over 2017-11-12 23:19:02 -08:00
Nishith Agarwal
abe964bebd Implementing custom payload/merge hooks abstractions for application specific merge logic 2017-11-07 18:55:55 -08:00
Nishith Agarwal
c7d63a7622 1) Separated rollback as a table operation 2) Implement rollback for MOR 2017-10-12 07:36:46 -07:00
Vinoth Chandar
274aaf49fe Incorporating code review feedback for DataSource 2017-10-02 20:44:53 -07:00
Jian Xu
b1cf097b0c Add nested fields support for MOR tables 2017-08-16 10:35:26 -07:00
Vinoth Chandar
c00f1a9ed9 Refactoring HoodieTableFileSystemView using FileGroups/FileSlices
- Merged all filter* and get* methods
 - new constructor takes filestatus[]
 - All existing tests pass
 - FileGroup is all files that belong to a fileID within a partition
 - FileSlice is a generation of data and log files, starting at a base commit
2017-06-22 17:16:13 -07:00
Nishith Agarwal
ba050973e3 updated HoodieRealtimeRecordReader to use HoodieCompactedLogRecordScanner, added test for recordreader 2017-06-02 11:33:59 -07:00
Prasanna Rajaperumal
fe5c5e8021 Test Failure in Travis-ci 2017-02-21 20:25:01 -08:00
Prasanna Rajaperumal
1132f3533d Merge and pull master commits 2017-02-21 17:53:28 -08:00
Prasanna Rajaperumal
8ee777a9bb Refactor hoodie-common and create right abstractions for Hoodie Storage V2.0
The following is the gist of changes done

- All low-level operation of creating a commit code was in HoodieClient which made it hard to share code if there was a compaction commit.
- HoodieTableMetadata contained a mix of metadata and filtering files. (Also few operations required FileSystem to be passed in because those were called from TaskExecutors and others had FileSystem as a global variable). Since merge-on-read requires a lot of that code, but will have to change slightly on how it operates on the metadata and how it filters the files. The two set of operation are split into HoodieTableMetaClient and TableFileSystemView.
- Everything (active commits, archived commits, cleaner log, save point log and in future delta and compaction commits) in HoodieTableMetaClient is a HoodieTimeline. Timeline is a series of instants, which has an in-built concept of inflight and completed commit markers.
- A timeline can be queries for ranges, contains and also use to create new datapoint (create a new commit etc). Commit (and all the above metadata) creation/deletion is streamlined in a timeline
- Multiple timelines can be merged into a single timeline, giving us an audit timeline to whatever happened in a hoodie dataset. This also helps with #55.
- Move to java 8 and introduce java 8 succinct syntax in refactored code
2017-02-21 16:23:53 -08:00
vinoth chandar
54409b07ea Implement HoodieROTablePathFilter for use with Spark 2.0 (#66)
- Unit tested
 - Tested with few queries on Spark 2.x at Uber
2017-01-26 11:13:33 -08:00
Vinoth Chandar
dd8638c2cc Annotate HoodieInputFormat with UseFileSplitsFromInputFormat
- Tested with corresponding Presto changes (pending PR)
2017-01-04 11:19:25 -08:00
Prasanna Rajaperumal
5d44ae3dbd Make hoodie run on travis-ci
Test logs > 4MB, which is a limit for travis-ci. Reducing the logs by setting appropriate log levels for tests
Add sudo: required on travis.yml to get more memory for running the tests. (https://github.com/travis-ci/travis-ci/issues/5926)
Fixed requirement that fsclient.lastDataFileForDataset always returns files in order
2016-12-20 19:26:48 -08:00
Prasanna Rajaperumal
388457b6b2 Add hoodie-hive module to support hive registration of hoodie datasets 2016-12-19 23:05:39 -08:00
Prasanna Rajaperumal
61200b1207 Adding hoodie-hadoop-mr module to add HoodieInputFormat 2016-12-16 19:29:53 -08:00