Nishith Agarwal
7243ce40c9
Serializing the complete payload object instead of serializing just the GenericRecord
...
Removing Converter hierarchy as we now depend purely on JavaSerialization and require the payload to be java serializable
2018-12-04 11:43:41 -08:00
Nishith Agarwal
e83dde3b95
Returning empty Statues for an empty spark partition caused due to incorrect bin packing
2018-12-04 11:41:38 -08:00
Vinoth Chandar
0015c9b00e
Update committership for balaji
2018-11-30 16:23:10 -08:00
Balaji Varadarajan
f999e4960c
Avoid WriteStatus collect() call when committing batch
2018-11-28 10:41:49 -08:00
Vinoth Chandar
fa65db9c4c
Explicitly handle lack of append() support during LogWriting
2018-11-27 17:58:43 -08:00
Nishith Agarwal
d0fde47458
Fixing number of insert buckets to be generated by rounding off to the closest greater integer
2018-11-15 10:04:45 -08:00
Vinoth Chandar
1362942aa3
Enabling auto tuning of insert splits by default
2018-11-08 09:48:23 -08:00
Balaji Varadarajan
25cd05b24e
Useful Hudi CLI commands to debug/analyze production workloads
2018-10-30 10:28:01 -07:00
Balaji Varadarajan
07324e7a20
Compaction validate, unschedule and repair
2018-10-25 14:12:47 -07:00
Xinli shang
d904fe69ca
Fix addMetadataFields() to carry over 'props'
2018-10-24 10:55:13 -07:00
Nishith Agarwal
48aa026dc4
Adding documentation for migration guide and COW vs MOR tradeoffs, moving some docs around for more clarity
2018-10-19 15:00:38 -07:00
jiale.tan
1628d044ac
feat(SparkDataSource): add additional feature to drop later arriving dups
2018-10-16 11:52:50 -07:00
Balaji Varadarajan
8485b9e263
Fix regression which broke HudiInputFormat handling of non-hoodie datasets
2018-10-16 18:39:56 +01:00
Vinoth Chandar
1fca9b21cc
Add --filter-dupes to DeltaStreamer
...
- Optionally filter out duplicates before inserting data
- Unit tests
2018-10-04 11:25:18 +05:30
vinoth chandar
0a200c32e5
Reflect new committership, id changes for devs
2018-10-02 11:00:50 +05:30
Balaji Varadarajan
f3418e4718
Docker Container Build and Run setup with foundations for adding docker integration tests. Docker images built with Hadoop 2.8.4 Hive 2.3.3 and Spark 2.3.1 and published to docker-hub
...
Look at quickstart document for how to setup docker and run demo
2018-10-02 09:28:21 +05:30
Balaji Varadarajan
9710b5a3a6
Ensure Hoodie metadata folder and files are filtered out when constructing Parquet Data Source
2018-10-01 14:27:14 +05:30
vinoth chandar
06bdba3cef
Update Gemfile.lock with newer jekyll version
2018-09-29 20:50:03 +05:30
vinothchandar
b5a75fdd91
Adding Jiale & Anbu to contributors list
2018-09-29 20:20:28 +05:30
jiale.tan
98fd97b65f
feature(HoodieGlobalBloomIndex): adds a new type of bloom index to allow global record key lookup
2018-09-29 19:55:20 +05:30
vinothchandar
7ba842c0fe
[maven-release-plugin] prepare for next development iteration
2018-09-28 11:27:00 +05:30
vinothchandar
5847b61f44
[maven-release-plugin] prepare release hoodie-0.4.4
2018-09-28 11:26:15 +05:30
vinothchandar
05bf14a42e
Update RELEASE_NOTES for release 0.4.4
2018-09-28 11:05:24 +05:30
vinothchandar
9ca6f91e97
Perform consistency checks during write finalize
...
- Check to ensure written files are listable on storage
- Docs reflected to capture how this helps with s3 storage
- Unit tests added, corrections to existing tests
- Fix DeltaStreamer to manage archived commits in a separate folder
2018-09-28 08:04:41 +05:30
Balaji Varadarajan
4c74dd4cad
Travis CI tests needs to be run in quieter mode (WARN log level) to avoid max log-size errors
2018-09-26 21:10:20 +05:30
Yishuang Lu
faf93b6340
Fix the name of avro schema file in Test
...
Fixed the name of avro schema file in Test
Signed-off-by: Yishuang Lu <luystu@gmail.com >
2018-09-24 21:58:34 +05:30
Balaji Varadarajan
460e24e84b
Hive Sync handling must work for datasets with multi-partition keys
2018-09-20 16:53:26 +05:30
Balaji Varadarajan
5cb28e7b1f
Explicitly release resources in LogFileReader and TestHoodieClientBase
2018-09-20 13:24:57 +05:30
Balaji Varadarajan
2728f96505
Add dummy classes to dump all classes loaded as part of packaging modules to ensure javadoc and sources jars are getting created
2018-09-18 09:24:33 +05:30
Vinoth Chandar
f44bcc5b03
Fix bug with incrementally pulling older data
2018-09-18 02:34:00 +05:30
Vinoth Chandar
bd5af89f12
[maven-release-plugin] rollback the release of hoodie-0.4.4
2018-09-13 15:01:53 +05:30
Vinoth Chandar
d1cc864a43
[maven-release-plugin] prepare for next development iteration
2018-09-12 23:59:47 +05:30
Vinoth Chandar
b748bc836d
[maven-release-plugin] prepare release hoodie-0.4.4
2018-09-12 23:59:34 +05:30
Vinoth Chandar
0b1a949a87
Release notes for 0.4.4
2018-09-12 23:39:57 +05:30
Balaji Varadarajan
cce88b36d2
Use spark Master from environment if set
2018-09-12 01:24:11 +05:30
Balaji Varadarajan
605af8a82f
Reduce minimum delta-commits required for compaction
2018-09-12 01:23:28 +05:30
Balaji Varadarajan
18a39715c9
Bump up versions in packaging modules and remove commons-lang3 dep
2018-09-11 11:03:30 +05:30
Vinoth Chandar
eca49a255e
Rebasing and fixing conflicts against master
2018-09-11 11:03:30 +05:30
Vinoth Chandar
a5359662be
Moving depedencies off cdh to apache + Hive2 support
...
- Tests redone in the process
- Main changes are to RealtimeRecordReader and how it treats maps/arrays
- Make hive sync work with Hive 1/2 and CDH environments
- Fixes to make corner cases for Hive queries
- Spark Hive integration - Working version across Apache and CDH versions
- Known Issue - https://github.com/uber/hudi/issues/439
2018-09-11 11:03:30 +05:30
Nishith Agarwal
2b1af18941
Adding check for rolling stats not present to handle backwards compatibility of existing timeline
2018-09-10 11:53:46 +08:00
Balaji Varadarajan
ea7823a9dd
Docs for describing async compaction and how to operate it
2018-09-10 11:52:20 +08:00
Vinoth Chandar
d58ddbd999
Reworking the deltastreamer tool
...
- Standardize version of jackson
- DFSPropertiesConfiguration replaces usage of commons PropertiesConfiguration
- Remove dependency on ConstructorUtils
- Throw error if ordering value is not present, during key generation
- Switch to shade plugin for hoodie-utilities
- Added support for consumption for Confluent avro kafka serdes
- Support for Confluent schema registry
- KafkaSource now deals with skews nicely, by doing round robin allocation of source limit across partitions
- Added support for BULK_INSERT operations as well
- Pass in the payload class config properly into HoodieWriteClient
- Fix documentation based on new usage
- Adding tests on deltastreamer, sources and all new util classes.
2018-09-08 10:24:32 +08:00
Balaji Varadarajan
fb95dbdedb
CLI to create and desc hoodie table
2018-09-08 10:03:38 +08:00
Nishith Agarwal
0fe92dee55
Fix a failing test case intermittenly in TestMergeOnRead due to incorrect prev commit time
2018-09-08 09:39:18 +08:00
Balaji Varadarajan
e2dee68ccd
Simplify and fix CLI to schedule and run compactions
2018-09-07 05:28:13 +08:00
vinoth chandar
fad4b513ea
Update Gemfile.lock with higher ffi version
2018-09-06 08:54:32 +08:00
Nishith Agarwal
459e523d9e
1. Small file size handling for inserts into log files. In summary, the total size of the log file is compared with the parquet max file size and if there is scope to add inserts the add it.
2018-09-06 08:52:08 +08:00
Nishith Agarwal
324de298bc
Removing dependency on apache-commons lang 3, adding necessary classes as needed
2018-09-06 08:26:48 +08:00
Saravanan Elumalai
2eaa42abde
Updated jcommander version to fix NPE in HoodieDeltaStreamer tool
2018-08-31 07:28:13 -07:00
Vinoth Chandar
89cd6b0726
[maven-release-plugin] prepare for next development iteration
2018-08-22 21:30:05 -07:00