1
0
Commit Graph

325 Commits

Author SHA1 Message Date
vinoth chandar
06bdba3cef Update Gemfile.lock with newer jekyll version 2018-09-29 20:50:03 +05:30
vinothchandar
b5a75fdd91 Adding Jiale & Anbu to contributors list 2018-09-29 20:20:28 +05:30
jiale.tan
98fd97b65f feature(HoodieGlobalBloomIndex): adds a new type of bloom index to allow global record key lookup 2018-09-29 19:55:20 +05:30
vinothchandar
7ba842c0fe [maven-release-plugin] prepare for next development iteration 2018-09-28 11:27:00 +05:30
vinothchandar
5847b61f44 [maven-release-plugin] prepare release hoodie-0.4.4 2018-09-28 11:26:15 +05:30
vinothchandar
05bf14a42e Update RELEASE_NOTES for release 0.4.4 2018-09-28 11:05:24 +05:30
vinothchandar
9ca6f91e97 Perform consistency checks during write finalize
- Check to ensure written files are listable on storage
 - Docs reflected to capture how this helps with s3 storage
 - Unit tests added, corrections to existing tests
 - Fix DeltaStreamer to manage archived commits in a separate folder
2018-09-28 08:04:41 +05:30
Balaji Varadarajan
4c74dd4cad Travis CI tests needs to be run in quieter mode (WARN log level) to avoid max log-size errors 2018-09-26 21:10:20 +05:30
Yishuang Lu
faf93b6340 Fix the name of avro schema file in Test
Fixed the name of avro schema file in Test

Signed-off-by: Yishuang Lu <luystu@gmail.com>
2018-09-24 21:58:34 +05:30
Balaji Varadarajan
460e24e84b Hive Sync handling must work for datasets with multi-partition keys 2018-09-20 16:53:26 +05:30
Balaji Varadarajan
5cb28e7b1f Explicitly release resources in LogFileReader and TestHoodieClientBase 2018-09-20 13:24:57 +05:30
Balaji Varadarajan
2728f96505 Add dummy classes to dump all classes loaded as part of packaging modules to ensure javadoc and sources jars are getting created 2018-09-18 09:24:33 +05:30
Vinoth Chandar
f44bcc5b03 Fix bug with incrementally pulling older data 2018-09-18 02:34:00 +05:30
Vinoth Chandar
bd5af89f12 [maven-release-plugin] rollback the release of hoodie-0.4.4 2018-09-13 15:01:53 +05:30
Vinoth Chandar
d1cc864a43 [maven-release-plugin] prepare for next development iteration 2018-09-12 23:59:47 +05:30
Vinoth Chandar
b748bc836d [maven-release-plugin] prepare release hoodie-0.4.4 2018-09-12 23:59:34 +05:30
Vinoth Chandar
0b1a949a87 Release notes for 0.4.4 2018-09-12 23:39:57 +05:30
Balaji Varadarajan
cce88b36d2 Use spark Master from environment if set 2018-09-12 01:24:11 +05:30
Balaji Varadarajan
605af8a82f Reduce minimum delta-commits required for compaction 2018-09-12 01:23:28 +05:30
Balaji Varadarajan
18a39715c9 Bump up versions in packaging modules and remove commons-lang3 dep 2018-09-11 11:03:30 +05:30
Vinoth Chandar
eca49a255e Rebasing and fixing conflicts against master 2018-09-11 11:03:30 +05:30
Vinoth Chandar
a5359662be Moving depedencies off cdh to apache + Hive2 support
- Tests redone in the process
 - Main changes are to RealtimeRecordReader and how it treats maps/arrays
 - Make hive sync work with Hive 1/2 and CDH environments
 - Fixes to make corner cases for Hive queries
 - Spark Hive integration - Working version across Apache and CDH versions
 - Known Issue - https://github.com/uber/hudi/issues/439
2018-09-11 11:03:30 +05:30
Nishith Agarwal
2b1af18941 Adding check for rolling stats not present to handle backwards compatibility of existing timeline 2018-09-10 11:53:46 +08:00
Balaji Varadarajan
ea7823a9dd Docs for describing async compaction and how to operate it 2018-09-10 11:52:20 +08:00
Vinoth Chandar
d58ddbd999 Reworking the deltastreamer tool
- Standardize version of jackson
 - DFSPropertiesConfiguration replaces usage of commons PropertiesConfiguration
 - Remove dependency on ConstructorUtils
 - Throw error if ordering value is not present, during key generation
 - Switch to shade plugin for hoodie-utilities
 - Added support for consumption for Confluent avro kafka serdes
 - Support for Confluent schema registry
 - KafkaSource now deals with skews nicely, by doing round robin allocation of source limit across partitions
 - Added support for BULK_INSERT operations as well
 - Pass in the payload class config properly into HoodieWriteClient
 - Fix documentation based on new usage
 - Adding tests on deltastreamer, sources and all new util classes.
2018-09-08 10:24:32 +08:00
Balaji Varadarajan
fb95dbdedb CLI to create and desc hoodie table 2018-09-08 10:03:38 +08:00
Nishith Agarwal
0fe92dee55 Fix a failing test case intermittenly in TestMergeOnRead due to incorrect prev commit time 2018-09-08 09:39:18 +08:00
Balaji Varadarajan
e2dee68ccd Simplify and fix CLI to schedule and run compactions 2018-09-07 05:28:13 +08:00
vinoth chandar
fad4b513ea Update Gemfile.lock with higher ffi version 2018-09-06 08:54:32 +08:00
Nishith Agarwal
459e523d9e 1. Small file size handling for inserts into log files. In summary, the total size of the log file is compared with the parquet max file size and if there is scope to add inserts the add it. 2018-09-06 08:52:08 +08:00
Nishith Agarwal
324de298bc Removing dependency on apache-commons lang 3, adding necessary classes as needed 2018-09-06 08:26:48 +08:00
Saravanan Elumalai
2eaa42abde Updated jcommander version to fix NPE in HoodieDeltaStreamer tool 2018-08-31 07:28:13 -07:00
Vinoth Chandar
89cd6b0726 [maven-release-plugin] prepare for next development iteration 2018-08-22 21:30:05 -07:00
Vinoth Chandar
8d305c5a86 [maven-release-plugin] prepare release hoodie-0.4.3 2018-08-22 21:29:53 -07:00
Vinoth Chandar
6fffda5c70 Update Release notes for 0.4.3 release 2018-08-22 21:11:43 -07:00
Kaushik Devarajaiah
e624480259 Throttling to limit QPS from HbaseIndex 2018-08-21 21:10:38 -07:00
Nishith Agarwal
3746ace76a Fixing Null pointer exception in finally block 2018-08-21 21:07:53 -07:00
Nishith Agarwal
88274b8261 Adding another metric to HoodieWriteStat to determine if there were inserts converted to updates, added one test for this 2018-08-14 06:22:16 -07:00
Balaji Varadarajan
989afddd54 BUGFIX - Use Guava Optional (which is Serializable) in CompactionOperation wcached to avoid NoSerializableException 2018-08-08 06:00:55 -07:00
Balaji Varadarajan
ea23c9b7a0 Minor bug fixes found during testing 2018-08-07 08:19:50 -07:00
Balaji Varadarajan
594059a19c Add CLI support inspect, schedule and run compaction 2018-08-07 08:19:50 -07:00
Balaji Varadarajan
2e12c86d01 Ensure Compaction Operation compacts the data file as defined in the workload 2018-08-07 08:19:50 -07:00
Balaji Varadarajan
2f8ce93030 Async Compaction Main API changes 2018-08-07 08:19:50 -07:00
Balaji Varadarajan
9b78523d62 Ensure Cleaner and Archiver do not delete file-slices and workload marked for compaction 2018-08-07 08:19:50 -07:00
Balaji Varadarajan
0a0451a765 Ensure Compaction workload is stored in write-once meta-data files separate from timeline files.
This avoids concurrency issues when compactor(s) and ingestor are running in parallel.
    In the Next PR -> Safety concern regarding Cleaner retaining all meta-data and file-slices for pending compactions will be addressed
2018-08-07 08:19:50 -07:00
Balaji Varadarajan
9d99942564 Track fileIds with pending compaction in FileSystemView to provide correct API semantics 2018-08-07 08:19:50 -07:00
Balaji Varadarajan
1b61f04e05 (1) Define CompactionWorkload in avro to allow storing them in instant files.
(2) Split APIs in HoodieRealtimeCompactor to separate generating compaction workload from running compaction
2018-08-07 08:19:50 -07:00
Balaji Varadarajan
6d01ae8ca0 FileSystemView and Timeline level changes to support Async Compaction 2018-08-07 08:19:50 -07:00
Nishith Agarwal
44caf0d40c Fixing missing hoodie record location in HoodieRecord when record is read from disk after being spilled 2018-07-18 12:53:35 -07:00
Omkar Joshi
f62890ca1f adding setters so that subclasses can set it 2018-07-18 12:53:11 -07:00