1
0
Commit Graph

801 Commits

Author SHA1 Message Date
Nishith Agarwal
68723764ed Adding compaction to HoodieClient example 2019-01-28 10:23:44 -08:00
Nishith Agarwal
169e3f66bb Filtering partition paths before performing a list status on all partitions 2019-01-25 11:34:00 -08:00
Nishith Agarwal
d1bb804577 Passing a path filter to avoid including folders under .hoodie directory as partition paths 2019-01-11 19:21:09 -08:00
Nishith Agarwal
110df7190b Enabling hard deletes for MergeOnRead table type 2018-12-31 12:49:58 -08:00
Manu Sridharan
345aaa31aa Add m2 directory to Travis cache 2018-12-31 10:31:12 -08:00
arukavytsia
6946dd7557 General enhancements 2018-12-18 12:52:39 -08:00
Balaji Varadarajan
30c5f8b7bd Ensure Hoodie works for non-partitioned Hive table 2018-12-12 13:35:16 -08:00
xubo245
466ff73ffb fix some spell errorin Hudi 2018-12-12 13:06:25 -08:00
jiale.tan
bf65219b73 feat(SparkDataSource): add structured streaming 2018-12-04 16:33:00 -08:00
Nishith Agarwal
7243ce40c9 Serializing the complete payload object instead of serializing just the GenericRecord
Removing Converter hierarchy as we now depend purely on JavaSerialization and require the payload to be java serializable
2018-12-04 11:43:41 -08:00
Nishith Agarwal
e83dde3b95 Returning empty Statues for an empty spark partition caused due to incorrect bin packing 2018-12-04 11:41:38 -08:00
Vinoth Chandar
0015c9b00e Update committership for balaji 2018-11-30 16:23:10 -08:00
Balaji Varadarajan
f999e4960c Avoid WriteStatus collect() call when committing batch 2018-11-28 10:41:49 -08:00
Vinoth Chandar
fa65db9c4c Explicitly handle lack of append() support during LogWriting 2018-11-27 17:58:43 -08:00
Nishith Agarwal
d0fde47458 Fixing number of insert buckets to be generated by rounding off to the closest greater integer 2018-11-15 10:04:45 -08:00
Vinoth Chandar
1362942aa3 Enabling auto tuning of insert splits by default 2018-11-08 09:48:23 -08:00
Balaji Varadarajan
25cd05b24e Useful Hudi CLI commands to debug/analyze production workloads 2018-10-30 10:28:01 -07:00
Balaji Varadarajan
07324e7a20 Compaction validate, unschedule and repair 2018-10-25 14:12:47 -07:00
Xinli shang
d904fe69ca Fix addMetadataFields() to carry over 'props' 2018-10-24 10:55:13 -07:00
Nishith Agarwal
48aa026dc4 Adding documentation for migration guide and COW vs MOR tradeoffs, moving some docs around for more clarity 2018-10-19 15:00:38 -07:00
jiale.tan
1628d044ac feat(SparkDataSource): add additional feature to drop later arriving dups 2018-10-16 11:52:50 -07:00
Balaji Varadarajan
8485b9e263 Fix regression which broke HudiInputFormat handling of non-hoodie datasets 2018-10-16 18:39:56 +01:00
Vinoth Chandar
1fca9b21cc Add --filter-dupes to DeltaStreamer
- Optionally filter out duplicates before inserting data
 - Unit tests
2018-10-04 11:25:18 +05:30
vinoth chandar
0a200c32e5 Reflect new committership, id changes for devs 2018-10-02 11:00:50 +05:30
Balaji Varadarajan
f3418e4718 Docker Container Build and Run setup with foundations for adding docker integration tests. Docker images built with Hadoop 2.8.4 Hive 2.3.3 and Spark 2.3.1 and published to docker-hub
Look at quickstart document for how to setup docker and run demo
2018-10-02 09:28:21 +05:30
Balaji Varadarajan
9710b5a3a6 Ensure Hoodie metadata folder and files are filtered out when constructing Parquet Data Source 2018-10-01 14:27:14 +05:30
vinoth chandar
06bdba3cef Update Gemfile.lock with newer jekyll version 2018-09-29 20:50:03 +05:30
vinothchandar
b5a75fdd91 Adding Jiale & Anbu to contributors list 2018-09-29 20:20:28 +05:30
jiale.tan
98fd97b65f feature(HoodieGlobalBloomIndex): adds a new type of bloom index to allow global record key lookup 2018-09-29 19:55:20 +05:30
vinothchandar
7ba842c0fe [maven-release-plugin] prepare for next development iteration 2018-09-28 11:27:00 +05:30
vinothchandar
5847b61f44 [maven-release-plugin] prepare release hoodie-0.4.4 2018-09-28 11:26:15 +05:30
vinothchandar
05bf14a42e Update RELEASE_NOTES for release 0.4.4 2018-09-28 11:05:24 +05:30
vinothchandar
9ca6f91e97 Perform consistency checks during write finalize
- Check to ensure written files are listable on storage
 - Docs reflected to capture how this helps with s3 storage
 - Unit tests added, corrections to existing tests
 - Fix DeltaStreamer to manage archived commits in a separate folder
2018-09-28 08:04:41 +05:30
Balaji Varadarajan
4c74dd4cad Travis CI tests needs to be run in quieter mode (WARN log level) to avoid max log-size errors 2018-09-26 21:10:20 +05:30
Yishuang Lu
faf93b6340 Fix the name of avro schema file in Test
Fixed the name of avro schema file in Test

Signed-off-by: Yishuang Lu <luystu@gmail.com>
2018-09-24 21:58:34 +05:30
Balaji Varadarajan
460e24e84b Hive Sync handling must work for datasets with multi-partition keys 2018-09-20 16:53:26 +05:30
Balaji Varadarajan
5cb28e7b1f Explicitly release resources in LogFileReader and TestHoodieClientBase 2018-09-20 13:24:57 +05:30
Balaji Varadarajan
2728f96505 Add dummy classes to dump all classes loaded as part of packaging modules to ensure javadoc and sources jars are getting created 2018-09-18 09:24:33 +05:30
Vinoth Chandar
f44bcc5b03 Fix bug with incrementally pulling older data 2018-09-18 02:34:00 +05:30
Vinoth Chandar
bd5af89f12 [maven-release-plugin] rollback the release of hoodie-0.4.4 2018-09-13 15:01:53 +05:30
Vinoth Chandar
d1cc864a43 [maven-release-plugin] prepare for next development iteration 2018-09-12 23:59:47 +05:30
Vinoth Chandar
b748bc836d [maven-release-plugin] prepare release hoodie-0.4.4 2018-09-12 23:59:34 +05:30
Vinoth Chandar
0b1a949a87 Release notes for 0.4.4 2018-09-12 23:39:57 +05:30
Balaji Varadarajan
cce88b36d2 Use spark Master from environment if set 2018-09-12 01:24:11 +05:30
Balaji Varadarajan
605af8a82f Reduce minimum delta-commits required for compaction 2018-09-12 01:23:28 +05:30
Balaji Varadarajan
18a39715c9 Bump up versions in packaging modules and remove commons-lang3 dep 2018-09-11 11:03:30 +05:30
Vinoth Chandar
eca49a255e Rebasing and fixing conflicts against master 2018-09-11 11:03:30 +05:30
Vinoth Chandar
a5359662be Moving depedencies off cdh to apache + Hive2 support
- Tests redone in the process
 - Main changes are to RealtimeRecordReader and how it treats maps/arrays
 - Make hive sync work with Hive 1/2 and CDH environments
 - Fixes to make corner cases for Hive queries
 - Spark Hive integration - Working version across Apache and CDH versions
 - Known Issue - https://github.com/uber/hudi/issues/439
2018-09-11 11:03:30 +05:30
Nishith Agarwal
2b1af18941 Adding check for rolling stats not present to handle backwards compatibility of existing timeline 2018-09-10 11:53:46 +08:00
Balaji Varadarajan
ea7823a9dd Docs for describing async compaction and how to operate it 2018-09-10 11:52:20 +08:00