Nishith Agarwal
19c22b231e
1. Use HoodieLogFormat to archive commits and other actions 2. Introduced avro schema for commits and compactions and an avro wrapper schema
2017-07-26 14:27:44 -07:00
Nishith Agarwal
616c9a68c3
Enabled deletes in merge_on_read
2017-07-26 13:37:27 -07:00
Prasanna Rajaperumal
7d3963b4ab
Pushing master to 0.4.0 as we continue to make minor releases over 0.3.8 (MVP for MOR)
2017-06-30 11:41:23 -07:00
Nishith Agarwal
3eba812a1b
[maven-release-plugin] prepare for next development iteration
2017-06-30 11:17:07 -07:00
Nishith Agarwal
06d44daea3
[maven-release-plugin] prepare release hoodie-0.3.9
2017-06-30 11:16:58 -07:00
Nishith Agarwal
348250d960
Using FsUtils instead of Files API to extract file extension
2017-06-29 19:26:31 -07:00
Prasanna Rajaperumal
5cc071f74e
Savepoint should not create a hole in the commit timeline
2017-06-27 16:36:09 -07:00
Vinoth Chandar
754ab88a2d
Introduce ReadOptimizedView & RealtimeView out of TableFileSystemView
...
- Usage now marks code as clearly using either RO or RT views, for future evolution
- Tests on all of FileGroups and FileSlices
2017-06-22 17:16:13 -07:00
Vinoth Chandar
c00f1a9ed9
Refactoring HoodieTableFileSystemView using FileGroups/FileSlices
...
- Merged all filter* and get* methods
- new constructor takes filestatus[]
- All existing tests pass
- FileGroup is all files that belong to a fileID within a partition
- FileSlice is a generation of data and log files, starting at a base commit
2017-06-22 17:16:13 -07:00
gekath
52c507f83e
Writes relative paths to .commit files
...
Handle case where path is read in as null from commit file
Merged with updated release
2017-06-16 12:51:19 -07:00
gekath
db7311f85e
Writes relative paths to .commit files instead of absolute paths
...
Clean up code
Removed commented out code
Fixed merge conflict with master
2017-06-16 12:51:19 -07:00
Prasanna Rajaperumal
0ed3fac5e3
[maven-release-plugin] prepare for next development iteration
2017-06-16 11:03:17 -07:00
Prasanna Rajaperumal
45732e440c
[maven-release-plugin] prepare release hoodie-0.3.8
2017-06-16 10:59:58 -07:00
Prasanna Rajaperumal
4b26be9f61
Fixes to RealtimeInputFormat and RealtimeRecordReader and update documentation for HiveSyncTool
2017-06-15 18:21:07 -07:00
Kaushik Devarajaiah
521555c576
Parallelize file version deletes during clean and related tests
2017-06-15 18:20:42 -07:00
Prasanna Rajaperumal
db6150c5ef
Refactor hoodie-hive
2017-06-09 13:06:33 -07:00
Danny Chen
c192dd60b4
Change from deprecated closeQuietly to try with resources
2017-06-05 19:11:53 -07:00
Nishith Agarwal
ba050973e3
updated HoodieRealtimeRecordReader to use HoodieCompactedLogRecordScanner, added test for recordreader
2017-06-02 11:33:59 -07:00
Prasanna Rajaperumal
933cc8071f
[maven-release-plugin] prepare for next development iteration
2017-05-24 14:02:50 -07:00
Prasanna Rajaperumal
bebae06b5b
[maven-release-plugin] prepare release hoodie-0.3.7
2017-05-24 14:02:41 -07:00
Prasanna Rajaperumal
240c91241b
Implement HoodieLogFormat replacing Avro as the default log format
2017-05-23 08:35:11 -07:00
Vinoth Chandar
da17c5c607
Introduce getCommitsAndCompactionsTimeline() explicitly & adjust usage across code base
2017-05-01 21:48:27 -07:00
Vinoth Chandar
bae0528013
Cleanup calls to HoodieTimeline.compareTimeStamps
2017-05-01 21:48:27 -07:00
Vinoth Chandar
7b1446548f
Initial impl of HoodieRealtimeInputFormat
...
- Works end-end for flat schemas
- Schema evolution & hardening remains
- HoodieClientExample can now write mor tables as well
2017-05-01 21:48:27 -07:00
Vinoth Chandar
9f526396a0
Add support for merge_on_read tables to HoodieClientExample
2017-05-01 21:48:27 -07:00
Prasanna Rajaperumal
c3258039f0
[maven-release-plugin] prepare for next development iteration
2017-04-27 11:00:56 -07:00
Prasanna Rajaperumal
de1bdad756
[maven-release-plugin] prepare release hoodie-0.3.6
2017-04-27 11:00:45 -07:00
Prasanna Rajaperumal
8974e11161
Make sure properties set in HoodieWriteConfig is propogated down to individual configs. Fix a race condition which lets InputFormat to think file size is 0 when it is actually not
2017-04-27 10:52:25 -07:00
Prasanna Rajaperumal
91b088f29f
Implement Compaction policy abstraction. Implement LogSizeBased Bounded IO Compaction as the default strategy
2017-04-20 16:59:06 -07:00
Vinoth Chandar
2b6322318c
CR feedback
2017-04-03 18:28:01 -07:00
Vinoth Chandar
dce35ff0d7
Adding a config to control whether date partitioning can be assumed
...
- false by default
- CAUTION: If you have an existing tables without partition metadata, you need to set this to "true"
2017-04-03 18:28:01 -07:00
Vinoth Chandar
f9fd16069d
FSUtils.getAllPartitionsPaths() works based on .hoodie_partition_metadata
...
- clean/rollback/write paths covered by existing tests
- Snapshot copier fixed to copy metadata file also, and test fixed
- Existing tables need to be repaired by addition of metadata, before this can be rolled out
2017-04-03 18:28:01 -07:00
Vinoth Chandar
3129770fd0
Create .hoodie_partition_metadata in each partition, linking back to basepath
...
- Concurreny handled via taskID, failure recovery handled via renames
- Falls back to search 3 levels up
- Cli tool has command to add this to existing tables
2017-04-03 18:28:01 -07:00
Prasanna Rajaperumal
1e802ad4f2
Move HoodieAvroReader to hoodie-common, it will be used for compaction and in the record reader
2017-04-03 13:58:35 -07:00
Prasanna Rajaperumal
aee136777b
Fixes needed to run merge-on-read testing on production scale data
2017-04-02 22:25:47 -07:00
Prasanna Rajaperumal
57ab7a2405
[maven-release-plugin] prepare for next development iteration
2017-03-31 14:58:55 -07:00
Prasanna Rajaperumal
803c635098
[maven-release-plugin] prepare release hoodie-0.3.5
2017-03-31 14:58:46 -07:00
Prasanna Rajaperumal
f4bb44c1b1
Update snapshot version to 0.3.5-SNAPSHOT
2017-03-31 14:54:54 -07:00
Prasanna Rajaperumal
77e54e78f8
Create the partition path if it does not exist when listing data files in a partition
2017-03-28 05:20:15 -07:00
Yash Sharma
bca7e7dae4
improve documentations
2017-03-28 05:08:54 -07:00
ovj
21898907c1
tool for importing hive tables (in parquet format) into hoodie dataset ( #89 )
...
* tool for importing hive tables (in parquet format) into hoodie dataset
* review fixes
* review fixes
* review fixes
2017-03-21 14:42:13 -07:00
prazanna
d835710c51
Metadata timeline marks an already complete instant as complete again ( #98 )
2017-03-17 12:42:26 -07:00
Prasanna Rajaperumal
d83b671ada
Implement Savepoints and required metadata timeline - Part 2
2017-03-13 23:09:29 -07:00
prazanna
6f36e1eaaf
Implement Savepoints and required metadata timeline ( #86 )
...
- Introduce avro to save clean metadata with details about the last commit that was retained
- Save rollback metadata in the meta timeline
- Create savepoint metadata and add API to createSavepoint, deleteSavepoint and rollbackToSavepoint
- Savepointed commit should not be rolledback or cleaned or archived
- introduce cli commands to show, create and rollback to savepoints
- Write unit tests to test savepoints and rollbackToSavepoints
2017-03-13 15:12:03 -07:00
vinoth chandar
69d3950a32
Revamped Deltastreamer ( #93 )
...
* Add analytics to site
* Fix ugly favicon
* New & Improved HoodieDeltaStreamer
- Can incrementally consume from HDFS or Kafka, with exactly-once semantics!
- Supports Json/Avro data, Source can also do custom things
- Source is totally pluggable, via reflection
- Key generation is pluggable, currently added SimpleKeyGenerator
- Schema provider is pluggable, currently Filebased schemas
- Configurable field to break ties during preCombine
- Finally, can also plugin the HoodieRecordPayload, to get other merge types than overwriting
- Handles efficient avro serialization in Spark
Pending :
- Rewriting of HiveIncrPullSource
- Hive sync via hoodie-hive
- Cleanup & tests
* Minor fixes from master rebase
* Implementation of HiveIncrPullSource
- Copies commit by commit from source to target
* Adding TimestampBasedKeyGenerator
- Supports unix time & date strings
2017-03-13 12:41:29 -07:00
siddharthagunda
348a48aa80
Add delete support to Hoodie ( #85 )
2017-03-04 01:33:49 -08:00
Prasanna Rajaperumal
fe5c5e8021
Test Failure in Travis-ci
2017-02-21 20:25:01 -08:00
prazanna
eb46e7c72b
Implement Merge on Read Storage ( #76 )
...
1. Create HoodieTable abstraction for commits and fileSystemView
2. HoodieMergeOnReadTable created
3. View is now always obtained from the table and the correct view based on the table type is returned
2017-02-21 16:24:38 -08:00
prazanna
11d2fd3428
Introduce RealtimeTableView and Implement HoodieRealtimeTableCompactor ( #73 )
2017-02-21 16:24:18 -08:00
Prasanna Rajaperumal
48fbb0f425
Implement reliable log file management for Merge on read, which is fault tolerant and allows random block level access on avro file
2017-02-21 16:23:53 -08:00