1
0
Commit Graph

2402 Commits

Author SHA1 Message Date
Prasanna Rajaperumal
bebae06b5b [maven-release-plugin] prepare release hoodie-0.3.7 2017-05-24 14:02:41 -07:00
Prasanna Rajaperumal
bae98efeee Delete other instant files (.clean) as well during commit archival 2017-05-24 13:51:49 -07:00
prazanna
e1d13f2bc8 https://repository.cloudera.com/artifactory/repo/ has been changed to https://repository.cloudera.com/artifactory/public/ 2017-05-23 12:05:01 -07:00
Prasanna Rajaperumal
240c91241b Implement HoodieLogFormat replacing Avro as the default log format 2017-05-23 08:35:11 -07:00
Nishith Agarwal
3c984447da view scheme added 2017-05-22 12:27:40 -07:00
Prasanna Rajaperumal
70dd7a25ea Clean should not create a .inflight file 2017-05-22 10:48:35 -07:00
Vinoth Chandar
7014670795 Update contributor list 2017-05-18 10:48:42 -07:00
Zeeshan Qureshi
43a55b09fd Add GCS to supported filesystems 2017-05-18 10:30:34 -07:00
prazanna
21e334592f Update java version to 8 in travis.yml 2017-05-17 13:43:11 -07:00
vinoth chandar
1b0a027942 Update community.md with committership guidelines 2017-05-04 17:25:54 -07:00
Vinoth Chandar
b4e787ce1d Update docs 2017-05-01 21:48:27 -07:00
Vinoth Chandar
da17c5c607 Introduce getCommitsAndCompactionsTimeline() explicitly & adjust usage across code base 2017-05-01 21:48:27 -07:00
Vinoth Chandar
bae0528013 Cleanup calls to HoodieTimeline.compareTimeStamps 2017-05-01 21:48:27 -07:00
Vinoth Chandar
7b1446548f Initial impl of HoodieRealtimeInputFormat
- Works end-end for flat schemas
 - Schema evolution & hardening remains
 - HoodieClientExample can now write mor tables as well
2017-05-01 21:48:27 -07:00
Vinoth Chandar
9f526396a0 Add support for merge_on_read tables to HoodieClientExample 2017-05-01 21:48:27 -07:00
Prasanna Rajaperumal
7bca428a0a Test to check if properties set are properly propogated 2017-04-28 12:47:14 -07:00
Prasanna Rajaperumal
3f97bdcccf Test to check if properties set are properly propogated 2017-04-28 12:40:58 -07:00
Prasanna Rajaperumal
c3258039f0 [maven-release-plugin] prepare for next development iteration 2017-04-27 11:00:56 -07:00
Prasanna Rajaperumal
de1bdad756 [maven-release-plugin] prepare release hoodie-0.3.6 2017-04-27 11:00:45 -07:00
Prasanna Rajaperumal
8974e11161 Make sure properties set in HoodieWriteConfig is propogated down to individual configs. Fix a race condition which lets InputFormat to think file size is 0 when it is actually not 2017-04-27 10:52:25 -07:00
Prasanna Rajaperumal
91b088f29f Implement Compaction policy abstraction. Implement LogSizeBased Bounded IO Compaction as the default strategy 2017-04-20 16:59:06 -07:00
Vinoth Chandar
82b211d2e6 Rebase with generic partition support 2017-04-03 21:27:49 -07:00
Vinoth Chandar
848814bece Adding docs for deltastreamer, hivesync tool usage 2017-04-03 21:27:49 -07:00
Vinoth Chandar
542d622e49 Adding HiveSyncTool to sync hoodie dataset schema/partitions to Hive
- Designed to be run by your workflow manager after hoodie upsert
 - Assumes jdbc connectivity via HiveServer2, which should work with all major distros
2017-04-03 21:27:49 -07:00
Vinoth Chandar
2b6322318c CR feedback 2017-04-03 18:28:01 -07:00
Vinoth Chandar
e0fc4ec38e Documentation update + helper method for WriteConfig builder 2017-04-03 18:28:01 -07:00
Vinoth Chandar
dce35ff0d7 Adding a config to control whether date partitioning can be assumed
- false by default
 - CAUTION: If you have an existing tables without partition metadata, you need to set this to "true"
2017-04-03 18:28:01 -07:00
Vinoth Chandar
f9fd16069d FSUtils.getAllPartitionsPaths() works based on .hoodie_partition_metadata
- clean/rollback/write paths covered by existing tests
 - Snapshot copier fixed to copy metadata file also, and test fixed
 - Existing tables need to be repaired by addition of metadata, before this can be rolled out
2017-04-03 18:28:01 -07:00
Vinoth Chandar
3129770fd0 Create .hoodie_partition_metadata in each partition, linking back to basepath
- Concurreny handled via taskID, failure recovery handled via renames
 - Falls back to search 3 levels up
 - Cli tool has command to add this to existing tables
2017-04-03 18:28:01 -07:00
Prasanna Rajaperumal
1e802ad4f2 Move HoodieAvroReader to hoodie-common, it will be used for compaction and in the record reader 2017-04-03 13:58:35 -07:00
Prasanna Rajaperumal
aee136777b Fixes needed to run merge-on-read testing on production scale data 2017-04-02 22:25:47 -07:00
Prasanna Rajaperumal
57ab7a2405 [maven-release-plugin] prepare for next development iteration 2017-03-31 14:58:55 -07:00
Prasanna Rajaperumal
803c635098 [maven-release-plugin] prepare release hoodie-0.3.5 2017-03-31 14:58:46 -07:00
Prasanna Rajaperumal
f4bb44c1b1 Update snapshot version to 0.3.5-SNAPSHOT 2017-03-31 14:54:54 -07:00
Prasanna Rajaperumal
77e54e78f8 Create the partition path if it does not exist when listing data files in a partition 2017-03-28 05:20:15 -07:00
Yash Sharma
e3b273e9fd formatting for docs 2017-03-28 05:08:54 -07:00
Yash Sharma
bca7e7dae4 improve documentations 2017-03-28 05:08:54 -07:00
Yash Sharma
d6f94b998d Hoodie operability with S3 2017-03-28 05:08:54 -07:00
prazanna
a7cd021f26 Update incremental pull query documentation 2017-03-23 16:20:54 -07:00
prazanna
0e3f635adb remove hardcoding of autoClean 2017-03-23 15:54:26 -07:00
Zeeshan Qureshi
a94f3a638e Pass table path as argument to HoodieClientExample 2017-03-23 08:12:20 -07:00
fishie9
b7047ab4fb Pass in String StroageLevel for WriteStatus (#113) 2017-03-23 04:31:30 -07:00
ovj
b02910c588 few fixes to quick start document (#112) 2017-03-22 18:25:26 -07:00
prazanna
f1b7afad21 Add config for index parallelism and make clean public (#109)
* Add config for index parallelism and make clean public

* Review comments on clean api modification
2017-03-21 17:36:46 -07:00
ovj
21898907c1 tool for importing hive tables (in parquet format) into hoodie dataset (#89)
* tool for importing hive tables (in parquet format) into hoodie dataset

* review fixes

* review fixes

* review fixes
2017-03-21 14:42:13 -07:00
prazanna
d835710c51 Metadata timeline marks an already complete instant as complete again (#98) 2017-03-17 12:42:26 -07:00
Prasanna Rajaperumal
d83b671ada Implement Savepoints and required metadata timeline - Part 2 2017-03-13 23:09:29 -07:00
prazanna
6f36e1eaaf Implement Savepoints and required metadata timeline (#86)
- Introduce avro to save clean metadata with details about the last commit that was retained
- Save rollback metadata in the meta timeline
- Create savepoint metadata and add API to createSavepoint, deleteSavepoint and rollbackToSavepoint
- Savepointed commit should not be rolledback or cleaned or archived
- introduce cli commands to show, create and rollback to savepoints
- Write unit tests to test savepoints and rollbackToSavepoints
2017-03-13 15:12:03 -07:00
vinoth chandar
69d3950a32 Revamped Deltastreamer (#93)
* Add analytics to site

* Fix ugly favicon

* New & Improved HoodieDeltaStreamer

 - Can incrementally consume from HDFS or Kafka, with exactly-once semantics!
 - Supports Json/Avro data, Source can also do custom things
 - Source is totally pluggable, via reflection
 - Key generation is pluggable, currently added SimpleKeyGenerator
 - Schema provider is pluggable, currently Filebased schemas
 - Configurable field to break ties during preCombine
 - Finally, can also plugin the HoodieRecordPayload, to get other merge types than overwriting
 - Handles efficient avro serialization in Spark

 Pending :
 - Rewriting of HiveIncrPullSource
 - Hive sync via hoodie-hive
 - Cleanup & tests

* Minor fixes from master rebase

* Implementation of HiveIncrPullSource
 - Copies commit by commit from source to target

* Adding TimestampBasedKeyGenerator
 - Supports unix time & date strings
2017-03-13 12:41:29 -07:00
Vinoth Chandar
c3257b9680 Fix ugly favicon 2017-03-12 20:30:42 -07:00