1
0
Commit Graph

12 Commits

Author SHA1 Message Date
Balaji Varadarajan
3a0044216c New Features in DeltaStreamer :
(1) Apply transformation when using delta-streamer to ingest data.
 (2) Add Hudi Incremental Source for Delta Streamer
 (3) Allow delta-streamer config-property to be passed as command-line
 (4) Add Hive Integration to Delta-Streamer and address Review comments
 (5) Ensure MultiPartKeysValueExtractor  handle hive style partition description
 (6) Reuse same spark session on both source and transformer
 (7) Support extracting partition fields from _hoodie_partition_path for HoodieIncrSource
 (8) Reuse Binary Avro coders
 (9) Add push down filter for Incremental source
 (10) Add Hoodie DeltaStreamer metrics to track total time taken
2019-02-11 18:22:05 -08:00
Nishith Agarwal
48aa026dc4 Adding documentation for migration guide and COW vs MOR tradeoffs, moving some docs around for more clarity 2018-10-19 15:00:38 -07:00
Balaji Varadarajan
f3418e4718 Docker Container Build and Run setup with foundations for adding docker integration tests. Docker images built with Hadoop 2.8.4 Hive 2.3.3 and Spark 2.3.1 and published to docker-hub
Look at quickstart document for how to setup docker and run demo
2018-10-02 09:28:21 +05:30
Vinoth Chandar
a5359662be Moving depedencies off cdh to apache + Hive2 support
- Tests redone in the process
 - Main changes are to RealtimeRecordReader and how it treats maps/arrays
 - Make hive sync work with Hive 1/2 and CDH environments
 - Fixes to make corner cases for Hive queries
 - Spark Hive integration - Working version across Apache and CDH versions
 - Known Issue - https://github.com/uber/hudi/issues/439
2018-09-11 11:03:30 +05:30
Vinoth Chandar
64e0573aca Adding hoodie-spark to support Spark Datasource for Hoodie
- Write with COW/MOR paths work fully
 - Read with RO view works on both storages*
 - Incremental view supported on COW
 - Refactored out HoodieReadClient methods, to just contain key based access
 - HoodieDataSourceHelpers class can be now used to construct inputs to datasource
 - Tests in hoodie-client using new helpers and mechanisms
 - Basic tests around save modes & insert/upserts (more to follow)
 - Bumped up scala to 2.11, since 2.10 is deprecated & complains with scalatest
 - Updated documentation to describe usage
 - New sample app written using the DataSource API
2017-10-02 20:44:53 -07:00
Prasanna Rajaperumal
4b26be9f61 Fixes to RealtimeInputFormat and RealtimeRecordReader and update documentation for HiveSyncTool 2017-06-15 18:21:07 -07:00
Vinoth Chandar
b4e787ce1d Update docs 2017-05-01 21:48:27 -07:00
Vinoth Chandar
848814bece Adding docs for deltastreamer, hivesync tool usage 2017-04-03 21:27:49 -07:00
prazanna
a7cd021f26 Update incremental pull query documentation 2017-03-23 16:20:54 -07:00
ovj
b02910c588 few fixes to quick start document (#112) 2017-03-22 18:25:26 -07:00
Vinoth Chandar
958f7ceda6 Adding Documentation for Getting Started Section
- Overview, Use Cases, Powered By  are very detailed
 - Cleaned up QuickStart
 - Redistribute the content from README to correct pages to be improved upon
 - Switch to blue theme
2017-01-04 20:50:44 -08:00
Vinoth Chandar
2bf0db14c6 Adding docs folder, with skeleton jekyll based site
- Uses https://github.com/tomjohnson1492/documentation-theme-jekyll
 - Have filler pages
2016-12-30 11:05:22 -08:00