1
0
Commit Graph

71 Commits

Author SHA1 Message Date
Balaji Varadarajan
a0d7ab2384 HUDI-70 : Making DeltaStreamer run in continuous mode with concurrent compaction 2019-06-18 17:48:14 -07:00
Nishith Agarwal
129e433641 - Ugrading to Hive 2.x
- Eliminating in-memory deltaRecordsMap
- Use writerSchema to generate generic record needed by custom payloads
- changes to make tests work with hive 2.x
2019-06-13 12:46:14 -07:00
Balaji Varadarajan
479908fd20 HUDI-125 : Change License for all source files and update RAT configurations 2019-06-09 11:41:55 -07:00
Balaji Varadarajan
30b0f2636f Changes related to Licensing work
1. Go through dependencies list one round to ensure compliance. Generated current NOTICE list in all submodules (other apache projects like flink does this).
   To be on conservative side regarding licensing, NOTICE.txt lists all dependencies including transitive. Pending Compliance questions reported in https://issues.apache.org/jira/browse/LEGAL-461
2. Automate generating NOTICE.txt files to allow future package compliance issues be identified early as part of code-review process.
3. Added NOTICE.txt and LICENSE.txt to all HUDI jars
2019-06-07 17:58:57 -07:00
Vinoth Chandar
7b4a28ecf8 Move depedency repos to https urls 2019-05-31 20:37:03 -07:00
Vinoth Chandar
acd74129cd Create hoodie-utilities-bundle to host the shaded jar
- hoodie-utilities can now be pulled in as compile time dependency
  - Lets users test their DeltaStreamer transformers for e.g
  - Tested the docker demo works & takes in the bundle
  - Doc changes to follow, to move DeltaStreamer commands to bundle jar
2019-05-30 22:46:24 -07:00
vinothchandar
66c0b81b49 [maven-release-plugin] prepare for next development iteration 2019-05-28 19:17:26 -07:00
vinothchandar
227785c022 [maven-release-plugin] prepare release hoodie-0.4.7 2019-05-28 19:17:15 -07:00
Balaji Varadarajan
64fec64097 Timeline Service with Incremental View Syncing support 2019-05-16 13:25:33 -07:00
vinothchandar
446f99aa0f [maven-release-plugin] prepare for next development iteration 2019-05-14 07:29:22 -07:00
vinothchandar
cc38abecc8 [maven-release-plugin] prepare release hoodie-0.4.6 2019-05-14 07:29:11 -07:00
Abhishek Sharma
e2dcef8606 HUDI-101: added exclusion filters for signature files. 2019-05-07 18:35:18 -07:00
Omkar Joshi
738635306b migrating kryo's dependency from twitter chill to plain kryo library 2019-05-06 20:32:00 -07:00
Omkar Joshi
e35d24f31d Revert "Replacing Apache commons-lang3 object serializer with Kryo serializer"
This reverts commit a6c45feb2c.
2019-04-17 09:23:37 -07:00
Bhavani Sudha Saktheeswaran
83b6aa5e91 Fix multiple issues when using build_local_docker_images for setting up the demo
Details here - https://issues.apache.org/jira/browse/HUDI-98
2019-04-15 10:10:05 -07:00
Omkar Joshi
a6c45feb2c Replacing Apache commons-lang3 object serializer with Kryo serializer 2019-03-18 14:12:25 -07:00
vinothchandar
687395e40f [maven-release-plugin] prepare for next development iteration 2019-02-27 07:16:27 -08:00
vinothchandar
bbf40ef987 [maven-release-plugin] prepare release hoodie-0.4.5 2019-02-27 07:16:15 -08:00
Balaji Varadarajan
3a0044216c New Features in DeltaStreamer :
(1) Apply transformation when using delta-streamer to ingest data.
 (2) Add Hudi Incremental Source for Delta Streamer
 (3) Allow delta-streamer config-property to be passed as command-line
 (4) Add Hive Integration to Delta-Streamer and address Review comments
 (5) Ensure MultiPartKeysValueExtractor  handle hive style partition description
 (6) Reuse same spark session on both source and transformer
 (7) Support extracting partition fields from _hoodie_partition_path for HoodieIncrSource
 (8) Reuse Binary Avro coders
 (9) Add push down filter for Incremental source
 (10) Add Hoodie DeltaStreamer metrics to track total time taken
2019-02-11 18:22:05 -08:00
vinothchandar
7ba842c0fe [maven-release-plugin] prepare for next development iteration 2018-09-28 11:27:00 +05:30
vinothchandar
5847b61f44 [maven-release-plugin] prepare release hoodie-0.4.4 2018-09-28 11:26:15 +05:30
Vinoth Chandar
bd5af89f12 [maven-release-plugin] rollback the release of hoodie-0.4.4 2018-09-13 15:01:53 +05:30
Vinoth Chandar
d1cc864a43 [maven-release-plugin] prepare for next development iteration 2018-09-12 23:59:47 +05:30
Vinoth Chandar
b748bc836d [maven-release-plugin] prepare release hoodie-0.4.4 2018-09-12 23:59:34 +05:30
Vinoth Chandar
a5359662be Moving depedencies off cdh to apache + Hive2 support
- Tests redone in the process
 - Main changes are to RealtimeRecordReader and how it treats maps/arrays
 - Make hive sync work with Hive 1/2 and CDH environments
 - Fixes to make corner cases for Hive queries
 - Spark Hive integration - Working version across Apache and CDH versions
 - Known Issue - https://github.com/uber/hudi/issues/439
2018-09-11 11:03:30 +05:30
Vinoth Chandar
d58ddbd999 Reworking the deltastreamer tool
- Standardize version of jackson
 - DFSPropertiesConfiguration replaces usage of commons PropertiesConfiguration
 - Remove dependency on ConstructorUtils
 - Throw error if ordering value is not present, during key generation
 - Switch to shade plugin for hoodie-utilities
 - Added support for consumption for Confluent avro kafka serdes
 - Support for Confluent schema registry
 - KafkaSource now deals with skews nicely, by doing round robin allocation of source limit across partitions
 - Added support for BULK_INSERT operations as well
 - Pass in the payload class config properly into HoodieWriteClient
 - Fix documentation based on new usage
 - Adding tests on deltastreamer, sources and all new util classes.
2018-09-08 10:24:32 +08:00
Vinoth Chandar
89cd6b0726 [maven-release-plugin] prepare for next development iteration 2018-08-22 21:30:05 -07:00
Vinoth Chandar
8d305c5a86 [maven-release-plugin] prepare release hoodie-0.4.3 2018-08-22 21:29:53 -07:00
Vinoth Chandar
34827d50e1 [maven-release-plugin] prepare for next development iteration 2018-06-11 08:59:13 -07:00
Vinoth Chandar
43ef385730 [maven-release-plugin] prepare release hoodie-0.4.2 2018-06-11 08:59:02 -07:00
Vinoth Chandar
73534d467f [maven-release-plugin] prepare for next development iteration 2018-03-07 21:04:10 -08:00
Vinoth Chandar
f2e5c6f9f8 [maven-release-plugin] prepare release hoodie-0.4.1 2018-03-07 21:04:00 -08:00
Vinoth Chandar
e45679f5e2 Reformatting code per Google Code Style all over 2017-11-12 23:19:02 -08:00
Vinoth Chandar
e1fe3ab937 [maven-release-plugin] prepare for next development iteration 2017-10-02 22:42:54 -07:00
Vinoth Chandar
50139fe904 [maven-release-plugin] prepare release hoodie-0.4.0 2017-10-02 22:42:32 -07:00
Vinoth Chandar
64e0573aca Adding hoodie-spark to support Spark Datasource for Hoodie
- Write with COW/MOR paths work fully
 - Read with RO view works on both storages*
 - Incremental view supported on COW
 - Refactored out HoodieReadClient methods, to just contain key based access
 - HoodieDataSourceHelpers class can be now used to construct inputs to datasource
 - Tests in hoodie-client using new helpers and mechanisms
 - Basic tests around save modes & insert/upserts (more to follow)
 - Bumped up scala to 2.11, since 2.10 is deprecated & complains with scalatest
 - Updated documentation to describe usage
 - New sample app written using the DataSource API
2017-10-02 20:44:53 -07:00
Prasanna Rajaperumal
7d3963b4ab Pushing master to 0.4.0 as we continue to make minor releases over 0.3.8 (MVP for MOR) 2017-06-30 11:41:23 -07:00
Nishith Agarwal
3eba812a1b [maven-release-plugin] prepare for next development iteration 2017-06-30 11:17:07 -07:00
Nishith Agarwal
06d44daea3 [maven-release-plugin] prepare release hoodie-0.3.9 2017-06-30 11:16:58 -07:00
Prasanna Rajaperumal
0ed3fac5e3 [maven-release-plugin] prepare for next development iteration 2017-06-16 11:03:17 -07:00
Prasanna Rajaperumal
45732e440c [maven-release-plugin] prepare release hoodie-0.3.8 2017-06-16 10:59:58 -07:00
Prasanna Rajaperumal
933cc8071f [maven-release-plugin] prepare for next development iteration 2017-05-24 14:02:50 -07:00
Prasanna Rajaperumal
bebae06b5b [maven-release-plugin] prepare release hoodie-0.3.7 2017-05-24 14:02:41 -07:00
Prasanna Rajaperumal
c3258039f0 [maven-release-plugin] prepare for next development iteration 2017-04-27 11:00:56 -07:00
Prasanna Rajaperumal
de1bdad756 [maven-release-plugin] prepare release hoodie-0.3.6 2017-04-27 11:00:45 -07:00
Prasanna Rajaperumal
57ab7a2405 [maven-release-plugin] prepare for next development iteration 2017-03-31 14:58:55 -07:00
Prasanna Rajaperumal
803c635098 [maven-release-plugin] prepare release hoodie-0.3.5 2017-03-31 14:58:46 -07:00
Prasanna Rajaperumal
f4bb44c1b1 Update snapshot version to 0.3.5-SNAPSHOT 2017-03-31 14:54:54 -07:00
ovj
21898907c1 tool for importing hive tables (in parquet format) into hoodie dataset (#89)
* tool for importing hive tables (in parquet format) into hoodie dataset

* review fixes

* review fixes

* review fixes
2017-03-21 14:42:13 -07:00
vinoth chandar
69d3950a32 Revamped Deltastreamer (#93)
* Add analytics to site

* Fix ugly favicon

* New & Improved HoodieDeltaStreamer

 - Can incrementally consume from HDFS or Kafka, with exactly-once semantics!
 - Supports Json/Avro data, Source can also do custom things
 - Source is totally pluggable, via reflection
 - Key generation is pluggable, currently added SimpleKeyGenerator
 - Schema provider is pluggable, currently Filebased schemas
 - Configurable field to break ties during preCombine
 - Finally, can also plugin the HoodieRecordPayload, to get other merge types than overwriting
 - Handles efficient avro serialization in Spark

 Pending :
 - Rewriting of HiveIncrPullSource
 - Hive sync via hoodie-hive
 - Cleanup & tests

* Minor fixes from master rebase

* Implementation of HiveIncrPullSource
 - Copies commit by commit from source to target

* Adding TimestampBasedKeyGenerator
 - Supports unix time & date strings
2017-03-13 12:41:29 -07:00