Nishith Agarwal
3d9041e216
Fixing source schema and writer schema distinction in payloads
2019-03-26 19:44:27 -07:00
ambition119
395806fc68
[HUDI-63] Removed unused BucketedIndex code
2019-03-26 10:12:47 -07:00
Nishith Agarwal
9e59da7fd9
Refactor HoodieTable Rollback to write one rollback instant for a batch of commits to rollback
2019-03-19 10:10:16 -07:00
Nishith Agarwal
0dd4a90b03
Enable multi/nested rollbacks for MOR table type
2019-03-19 10:10:16 -07:00
kaka11chen
48797b1ae1
Add compression codec configurations for HoodieParquetWriter.
2019-03-18 07:48:20 -07:00
Omkar Joshi
4a8bec7ea5
Handling duplicate record update for single partition (duplicates in single or different parquet files)
2019-03-10 20:15:17 -07:00
Balaji Varadarajan
3ae6cb4ed5
FileSystem View must treat same fileIds present in different partitions as different file-groups and handle pending compaction correctly
2019-03-01 10:49:04 -08:00
vinothchandar
687395e40f
[maven-release-plugin] prepare for next development iteration
2019-02-27 07:16:27 -08:00
vinothchandar
bbf40ef987
[maven-release-plugin] prepare release hoodie-0.4.5
2019-02-27 07:16:15 -08:00
Balaji Varadarajan
8adaca3454
Table rollback for inflight compactions MUST not delete instant files at any time to avoid race conditions
2019-02-11 18:30:21 -08:00
Balaji Varadarajan
3a0044216c
New Features in DeltaStreamer :
...
(1) Apply transformation when using delta-streamer to ingest data.
(2) Add Hudi Incremental Source for Delta Streamer
(3) Allow delta-streamer config-property to be passed as command-line
(4) Add Hive Integration to Delta-Streamer and address Review comments
(5) Ensure MultiPartKeysValueExtractor handle hive style partition description
(6) Reuse same spark session on both source and transformer
(7) Support extracting partition fields from _hoodie_partition_path for HoodieIncrSource
(8) Reuse Binary Avro coders
(9) Add push down filter for Incremental source
(10) Add Hoodie DeltaStreamer metrics to track total time taken
2019-02-11 18:22:05 -08:00
Nishith Agarwal
7985eb72b5
Fixing behavior of Merge/CreateHandle for invalid/wrong schema records
2019-01-28 16:01:03 -08:00
Nishith Agarwal
994d42d307
cleaner should now use commit timeline and not include deltacomits
2019-01-28 10:46:33 -08:00
Nishith Agarwal
68723764ed
Adding compaction to HoodieClient example
2019-01-28 10:23:44 -08:00
Nishith Agarwal
169e3f66bb
Filtering partition paths before performing a list status on all partitions
2019-01-25 11:34:00 -08:00
Nishith Agarwal
110df7190b
Enabling hard deletes for MergeOnRead table type
2018-12-31 12:49:58 -08:00
arukavytsia
6946dd7557
General enhancements
2018-12-18 12:52:39 -08:00
Balaji Varadarajan
30c5f8b7bd
Ensure Hoodie works for non-partitioned Hive table
2018-12-12 13:35:16 -08:00
xubo245
466ff73ffb
fix some spell errorin Hudi
2018-12-12 13:06:25 -08:00
Nishith Agarwal
7243ce40c9
Serializing the complete payload object instead of serializing just the GenericRecord
...
Removing Converter hierarchy as we now depend purely on JavaSerialization and require the payload to be java serializable
2018-12-04 11:43:41 -08:00
Nishith Agarwal
e83dde3b95
Returning empty Statues for an empty spark partition caused due to incorrect bin packing
2018-12-04 11:41:38 -08:00
Balaji Varadarajan
f999e4960c
Avoid WriteStatus collect() call when committing batch
2018-11-28 10:41:49 -08:00
Vinoth Chandar
fa65db9c4c
Explicitly handle lack of append() support during LogWriting
2018-11-27 17:58:43 -08:00
Nishith Agarwal
d0fde47458
Fixing number of insert buckets to be generated by rounding off to the closest greater integer
2018-11-15 10:04:45 -08:00
Vinoth Chandar
1362942aa3
Enabling auto tuning of insert splits by default
2018-11-08 09:48:23 -08:00
Balaji Varadarajan
07324e7a20
Compaction validate, unschedule and repair
2018-10-25 14:12:47 -07:00
jiale.tan
1628d044ac
feat(SparkDataSource): add additional feature to drop later arriving dups
2018-10-16 11:52:50 -07:00
jiale.tan
98fd97b65f
feature(HoodieGlobalBloomIndex): adds a new type of bloom index to allow global record key lookup
2018-09-29 19:55:20 +05:30
vinothchandar
7ba842c0fe
[maven-release-plugin] prepare for next development iteration
2018-09-28 11:27:00 +05:30
vinothchandar
5847b61f44
[maven-release-plugin] prepare release hoodie-0.4.4
2018-09-28 11:26:15 +05:30
vinothchandar
9ca6f91e97
Perform consistency checks during write finalize
...
- Check to ensure written files are listable on storage
- Docs reflected to capture how this helps with s3 storage
- Unit tests added, corrections to existing tests
- Fix DeltaStreamer to manage archived commits in a separate folder
2018-09-28 08:04:41 +05:30
Balaji Varadarajan
4c74dd4cad
Travis CI tests needs to be run in quieter mode (WARN log level) to avoid max log-size errors
2018-09-26 21:10:20 +05:30
Balaji Varadarajan
5cb28e7b1f
Explicitly release resources in LogFileReader and TestHoodieClientBase
2018-09-20 13:24:57 +05:30
Vinoth Chandar
bd5af89f12
[maven-release-plugin] rollback the release of hoodie-0.4.4
2018-09-13 15:01:53 +05:30
Vinoth Chandar
d1cc864a43
[maven-release-plugin] prepare for next development iteration
2018-09-12 23:59:47 +05:30
Vinoth Chandar
b748bc836d
[maven-release-plugin] prepare release hoodie-0.4.4
2018-09-12 23:59:34 +05:30
Balaji Varadarajan
605af8a82f
Reduce minimum delta-commits required for compaction
2018-09-12 01:23:28 +05:30
Vinoth Chandar
eca49a255e
Rebasing and fixing conflicts against master
2018-09-11 11:03:30 +05:30
Vinoth Chandar
a5359662be
Moving depedencies off cdh to apache + Hive2 support
...
- Tests redone in the process
- Main changes are to RealtimeRecordReader and how it treats maps/arrays
- Make hive sync work with Hive 1/2 and CDH environments
- Fixes to make corner cases for Hive queries
- Spark Hive integration - Working version across Apache and CDH versions
- Known Issue - https://github.com/uber/hudi/issues/439
2018-09-11 11:03:30 +05:30
Nishith Agarwal
2b1af18941
Adding check for rolling stats not present to handle backwards compatibility of existing timeline
2018-09-10 11:53:46 +08:00
Vinoth Chandar
d58ddbd999
Reworking the deltastreamer tool
...
- Standardize version of jackson
- DFSPropertiesConfiguration replaces usage of commons PropertiesConfiguration
- Remove dependency on ConstructorUtils
- Throw error if ordering value is not present, during key generation
- Switch to shade plugin for hoodie-utilities
- Added support for consumption for Confluent avro kafka serdes
- Support for Confluent schema registry
- KafkaSource now deals with skews nicely, by doing round robin allocation of source limit across partitions
- Added support for BULK_INSERT operations as well
- Pass in the payload class config properly into HoodieWriteClient
- Fix documentation based on new usage
- Adding tests on deltastreamer, sources and all new util classes.
2018-09-08 10:24:32 +08:00
Nishith Agarwal
0fe92dee55
Fix a failing test case intermittenly in TestMergeOnRead due to incorrect prev commit time
2018-09-08 09:39:18 +08:00
Nishith Agarwal
459e523d9e
1. Small file size handling for inserts into log files. In summary, the total size of the log file is compared with the parquet max file size and if there is scope to add inserts the add it.
2018-09-06 08:52:08 +08:00
Nishith Agarwal
324de298bc
Removing dependency on apache-commons lang 3, adding necessary classes as needed
2018-09-06 08:26:48 +08:00
Vinoth Chandar
89cd6b0726
[maven-release-plugin] prepare for next development iteration
2018-08-22 21:30:05 -07:00
Vinoth Chandar
8d305c5a86
[maven-release-plugin] prepare release hoodie-0.4.3
2018-08-22 21:29:53 -07:00
Kaushik Devarajaiah
e624480259
Throttling to limit QPS from HbaseIndex
2018-08-21 21:10:38 -07:00
Nishith Agarwal
3746ace76a
Fixing Null pointer exception in finally block
2018-08-21 21:07:53 -07:00
Nishith Agarwal
88274b8261
Adding another metric to HoodieWriteStat to determine if there were inserts converted to updates, added one test for this
2018-08-14 06:22:16 -07:00
Balaji Varadarajan
ea23c9b7a0
Minor bug fixes found during testing
2018-08-07 08:19:50 -07:00