Vinoth Chandar
a1f287d359
Release notes for 0.4.7
2019-05-28 18:28:59 -07:00
Balaji Varadarajan
93f8f12a30
HUDI-135 - Skip Meta folder when looking for partitions
2019-05-28 17:37:10 -07:00
Balaji Varadarajan
33f5208c1e
Only inflight commit timeline (.commit/.deltacommit) must be used when checking for sanity during compaction scheduling
2019-05-28 16:54:20 -07:00
Balaji Varadarajan
9c8f8212ef
HUDI-134 - Disable inline compaction for Hoodie Demo
2019-05-28 11:19:48 -07:00
Balaji Varadarajan
d0d2fa0337
Reduce logging in unit-test runs
2019-05-24 23:43:54 -07:00
Venkat
f2d91a455e
default implementation for HBase index qps allocator ( #685 )
...
* default implementation and configs for HBase index qps allocator
* Test for QPS allocator and address CR
* fix QPS allocator test
2019-05-24 18:43:46 -07:00
Balaji Varadarajan
99b0c72aa6
HUDI-131 Zero FIle Listing in Compactor run
2019-05-24 18:34:14 -07:00
Vinoth Chandar
4074c5eb23
Fixed HUDI-116 : Handle duplicate record keys across partitions
...
- Join based on HoodieKey and not RecordKey during tagging
- Unit tests changed to run with duplicate keys
- Special casing GlobalBloom to still join by recordkey
2019-05-24 18:32:49 -07:00
leiline
f120427607
HUDI-105 : Fix up offsets not available on leader exception ( #650 )
...
* Fix up offsets not available on leader exception
2019-05-23 19:32:31 -07:00
Balaji Varadarajan
2fe526d548
Allow users to set hoodie configs figs for Compactor, Cleaner and HDFSParquetImporter utility scripts
2019-05-23 17:35:53 -07:00
Balaji Varadarajan
145034c5fa
Spark Stage retry handling
2019-05-21 14:49:51 -07:00
David Muto (pseudomuto)
3fd2fd6e9d
Remove redundant string from file comp rdd
2019-05-21 13:07:32 -07:00
Balaji Varadarajan
a7e6cf5197
Support nested types for recordKey, partitionPath and combineKey
2019-05-18 07:14:58 -07:00
Vinoth Chandar
e43efa042f
Downgrading fasterxml jackson to 2.6.7 to be spark compatible
2019-05-16 13:53:54 -07:00
Balaji Varadarajan
64fec64097
Timeline Service with Incremental View Syncing support
2019-05-16 13:25:33 -07:00
vinothchandar
446f99aa0f
[maven-release-plugin] prepare for next development iteration
2019-05-14 07:29:22 -07:00
vinothchandar
cc38abecc8
[maven-release-plugin] prepare release hoodie-0.4.6
2019-05-14 07:29:11 -07:00
Vinoth Chandar
7002ca6775
Update release notes for 0.4.6 release
2019-05-14 05:16:58 -07:00
Balaji Varadarajan
6e1e626357
Minor CLI documentation change in delta-streamer
2019-05-14 04:05:47 -07:00
Nishith Agarwal
af46078a82
converting map task memory from mb to bytes
2019-05-13 21:23:30 -07:00
Balaji Varadarajan
9cce9abf4d
Fix various errors found by long running delta-streamer tests
...
1. Parquet Avro schema mismatch errors when ingesting are sometimes silently ignored due to race-condition in BoundedInMemoryExecutor. This was reproducible when running long-running delta-streamer with wrong schema and it caused data-loss
2. Fix behavior of Delta-Streamer to error out by default if there are any error records
3. Fix a bug in tracking write errors in WriteStats. Earlier the write errors were tracking sampled errors as opposed to total errors.
4. Delta Streamer does not commit the changes done as part of inline compaction as auto-commit is force disabled. Fix this behavior to always auto-commit inline compaction as it would not otherwise commit.
2019-05-13 10:47:34 -07:00
Vinoth Chandar
a0e62b7919
Bucketized Bloom Filter checking
...
- Tackles the skew seen in sort based partitioning/checking
- Parameterized the HoodieBloomIndex test
- Config to turn on/off (on by default)
- Unit tests & also tested at scale
2019-05-11 16:38:28 -07:00
David Muto (pseudomuto)
4b27cc72bb
Don't raise when spark-defaults.conf doesn't exist
2019-05-08 17:30:23 -07:00
Abhishek Sharma
e2dcef8606
HUDI-101: added exclusion filters for signature files.
2019-05-07 18:35:18 -07:00
Omkar Joshi
738635306b
migrating kryo's dependency from twitter chill to plain kryo library
2019-05-06 20:32:00 -07:00
Nishith Agarwal
a33a55fcb5
Caching Avro Binary encoder/decoder to avoid creating new one for every record
2019-05-06 11:28:08 -07:00
Balaji Varadarajan
ee1feb7c75
Revert "HUDI-101: added mevn-shade plugin with filters."
...
Creates fat jars for all hoodie packages
This reverts commit f47f0eb6cb .
2019-05-05 18:39:38 -07:00
Abhishek Sharma
f47f0eb6cb
HUDI-101: added mevn-shade plugin with filters.
2019-05-03 13:49:51 -07:00
Balaji Varadarajan
978470af33
Rollback inflights when using Spark [Streaming] write
2019-05-02 12:51:02 -07:00
vinothchandar
57a8b9cc8c
Making DataSource/DeltaStreamer use defaults for combining
...
- Addresses issue where insert will combine and remove duplicates within batch
- Setting default insert combining to false (write client default)
- Set to true if filtering duplicates on insert/bulk_insert
2019-05-01 13:21:21 -07:00
Vinoth Chandar
ea20d47248
Introduce config to control interval tree pruning
...
- turned on by default
- Minor code refactoring/restructuring
2019-04-29 11:38:23 -07:00
Sivabalan Narayanan
7129dc5bb7
Improving Tag location using interval trees for index files
...
Adding interface for index look up
Adding index filtering implementations for global bloom index too
2019-04-29 11:38:23 -07:00
Naoki Takezoe
461ce18bd1
Fix to enable hoodie.datasource.read.incr.filters
2019-04-26 11:14:06 -07:00
Nishith Agarwal
26f24b6728
Removing OLD MAGIC header since a) it's no longer used b) causes issues when the data actually has OLD MAGIC
2019-04-25 20:47:16 -07:00
Balaji Varadarajan
2f1e3e15fb
Revert "Read and apply schema for each log block from the metadata header instead of the latest schema"
...
This reverts commit 9e7ce19b06 .
2019-04-18 08:54:34 -07:00
lyogev
9ef51deb84
Add empty payload class to support deletes via apache spark
2019-04-17 23:00:20 -07:00
Balaji Varadarajan
243c58f77c
Move to apachehudi dockerhub repository & use openjdk docker containers
2019-04-17 16:37:58 -07:00
Balaji Varadarajan
36ef94004e
Fix Hive RT query failure in hoodie demo
2019-04-17 16:36:32 -07:00
Omkar Joshi
e35d24f31d
Revert "Replacing Apache commons-lang3 object serializer with Kryo serializer"
...
This reverts commit a6c45feb2c .
2019-04-17 09:23:37 -07:00
Nishith Agarwal
9e7ce19b06
Read and apply schema for each log block from the metadata header instead of the latest schema
2019-04-16 17:20:03 -07:00
Bhavani Sudha Saktheeswaran
83b6aa5e91
Fix multiple issues when using build_local_docker_images for setting up the demo
...
Details here - https://issues.apache.org/jira/browse/HUDI-98
2019-04-15 10:10:05 -07:00
Nishith Agarwal
a8feee9293
Performing commit archiving in batches to avoid keeping a huge chunk in memory
2019-04-10 15:17:04 -07:00
Balaji Varadarajan
b07110b9fd
Essential Hive packages missing in hoodie spark bundle
2019-04-09 21:42:42 -07:00
Nishith Agarwal
2577014617
1. Minor changes to fix compaction 2. Adding 2 compaction policies
2019-04-03 17:38:17 -07:00
Jing Chen
d1d33f725e
[HUDI-66] FSUtils.getRelativePartitionPath does not handle repeated folder names
2019-04-03 17:37:03 -07:00
Vinoth Chandar
b34a204a52
Fixing small file handling, inline compaction defaults
...
- Small file limit is now 100MB by default
- Turned on inline compaction by default for MOR
- Changes take effect on DataSource and DeltaStreamer
2019-04-03 10:56:10 -07:00
Vinoth Chandar
51f4908989
Follow up HUDI-27 : Call super.close() in HoodieWraperFileSystem::close()
2019-04-02 21:31:41 -07:00
Vinoth Chandar
5847f0c934
Fix HUDI-27 : Support num_cores > 1 for writing through spark
...
- Users using spark.executor.cores > 1 used to fail due to "FileSystem closed"
- This is due to HoodieWrapperFileSystem closing the wrapped filesytem obj
- FileSystem.getInternal caching code races threads and closes the extra fs instance(s)
- Bumped up num cores in tests to 8, speeds up tests by 3-4 mins
2019-03-28 15:56:21 -07:00
Vinoth Chandar
f1410bfdcd
Fixes HUDI-38: Reduce memory overhead of WriteStatus
...
- For implicit indexes (e.g BloomIndex), don't buffer up written records
- By default, only collect 10% of failing records to avoid OOMs
- Improves debuggability via above, since data errors can now show up in collect()
- Unit tests & fixing subclasses & adjusting tests
2019-03-28 10:32:59 -07:00
Vinoth Chandar
e56c1612e4
Fixed HUDI-87 : Remove schemastr from BaseAvroPayload
2019-03-27 23:03:25 -07:00