1
0
Commit Graph

413 Commits

Author SHA1 Message Date
David Muto (pseudomuto)
4b27cc72bb Don't raise when spark-defaults.conf doesn't exist 2019-05-08 17:30:23 -07:00
Abhishek Sharma
e2dcef8606 HUDI-101: added exclusion filters for signature files. 2019-05-07 18:35:18 -07:00
Omkar Joshi
738635306b migrating kryo's dependency from twitter chill to plain kryo library 2019-05-06 20:32:00 -07:00
Nishith Agarwal
a33a55fcb5 Caching Avro Binary encoder/decoder to avoid creating new one for every record 2019-05-06 11:28:08 -07:00
Balaji Varadarajan
ee1feb7c75 Revert "HUDI-101: added mevn-shade plugin with filters."
Creates fat jars for all hoodie packages

This reverts commit f47f0eb6cb.
2019-05-05 18:39:38 -07:00
Abhishek Sharma
f47f0eb6cb HUDI-101: added mevn-shade plugin with filters. 2019-05-03 13:49:51 -07:00
Balaji Varadarajan
978470af33 Rollback inflights when using Spark [Streaming] write 2019-05-02 12:51:02 -07:00
vinothchandar
57a8b9cc8c Making DataSource/DeltaStreamer use defaults for combining
- Addresses issue where insert will combine and remove duplicates within batch
 - Setting default insert combining to false (write client default)
 - Set to true if filtering duplicates on insert/bulk_insert
2019-05-01 13:21:21 -07:00
Vinoth Chandar
ea20d47248 Introduce config to control interval tree pruning
- turned on by default
 - Minor code refactoring/restructuring
2019-04-29 11:38:23 -07:00
Sivabalan Narayanan
7129dc5bb7 Improving Tag location using interval trees for index files
Adding interface for index look up

Adding index filtering implementations for global bloom index too
2019-04-29 11:38:23 -07:00
Naoki Takezoe
461ce18bd1 Fix to enable hoodie.datasource.read.incr.filters 2019-04-26 11:14:06 -07:00
Nishith Agarwal
26f24b6728 Removing OLD MAGIC header since a) it's no longer used b) causes issues when the data actually has OLD MAGIC 2019-04-25 20:47:16 -07:00
Balaji Varadarajan
2f1e3e15fb Revert "Read and apply schema for each log block from the metadata header instead of the latest schema"
This reverts commit 9e7ce19b06.
2019-04-18 08:54:34 -07:00
lyogev
9ef51deb84 Add empty payload class to support deletes via apache spark 2019-04-17 23:00:20 -07:00
Balaji Varadarajan
243c58f77c Move to apachehudi dockerhub repository & use openjdk docker containers 2019-04-17 16:37:58 -07:00
Balaji Varadarajan
36ef94004e Fix Hive RT query failure in hoodie demo 2019-04-17 16:36:32 -07:00
Omkar Joshi
e35d24f31d Revert "Replacing Apache commons-lang3 object serializer with Kryo serializer"
This reverts commit a6c45feb2c.
2019-04-17 09:23:37 -07:00
Nishith Agarwal
9e7ce19b06 Read and apply schema for each log block from the metadata header instead of the latest schema 2019-04-16 17:20:03 -07:00
Bhavani Sudha Saktheeswaran
83b6aa5e91 Fix multiple issues when using build_local_docker_images for setting up the demo
Details here - https://issues.apache.org/jira/browse/HUDI-98
2019-04-15 10:10:05 -07:00
Nishith Agarwal
a8feee9293 Performing commit archiving in batches to avoid keeping a huge chunk in memory 2019-04-10 15:17:04 -07:00
Balaji Varadarajan
b07110b9fd Essential Hive packages missing in hoodie spark bundle 2019-04-09 21:42:42 -07:00
Nishith Agarwal
2577014617 1. Minor changes to fix compaction 2. Adding 2 compaction policies 2019-04-03 17:38:17 -07:00
Jing Chen
d1d33f725e [HUDI-66] FSUtils.getRelativePartitionPath does not handle repeated folder names 2019-04-03 17:37:03 -07:00
Vinoth Chandar
b34a204a52 Fixing small file handling, inline compaction defaults
- Small file limit is now 100MB by default
 - Turned on inline compaction by default for MOR
 - Changes take effect on DataSource and DeltaStreamer
2019-04-03 10:56:10 -07:00
Vinoth Chandar
51f4908989 Follow up HUDI-27 : Call super.close() in HoodieWraperFileSystem::close() 2019-04-02 21:31:41 -07:00
Vinoth Chandar
5847f0c934 Fix HUDI-27 : Support num_cores > 1 for writing through spark
- Users using spark.executor.cores > 1 used to fail due to "FileSystem closed"
 - This is due to HoodieWrapperFileSystem closing the wrapped filesytem obj
 - FileSystem.getInternal caching code races threads and closes the extra fs instance(s)
 - Bumped up num cores in tests to 8, speeds up tests by 3-4 mins
2019-03-28 15:56:21 -07:00
Vinoth Chandar
f1410bfdcd Fixes HUDI-38: Reduce memory overhead of WriteStatus
- For implicit indexes (e.g BloomIndex), don't buffer up written records
 - By default, only collect 10% of failing records to avoid OOMs
 - Improves debuggability via above, since data errors can now show up in collect()
 - Unit tests & fixing subclasses & adjusting tests
2019-03-28 10:32:59 -07:00
Vinoth Chandar
e56c1612e4 Fixed HUDI-87 : Remove schemastr from BaseAvroPayload 2019-03-27 23:03:25 -07:00
Vinoth Chandar
372fbc4733 Fixes HUDI-9 : Check precondition minInstantsToKeep > cleanerCommitsRetained
- Added a precondition check, otherwise incr pull could miss commits
 - Lowered default cleaner retention to 10, to enable simpler understanding for newbies
 - Bumped down min/max instants to retain as well
2019-03-27 11:02:17 -07:00
Nishith Agarwal
3d9041e216 Fixing source schema and writer schema distinction in payloads 2019-03-26 19:44:27 -07:00
ambition119
395806fc68 [HUDI-63] Removed unused BucketedIndex code 2019-03-26 10:12:47 -07:00
Balaji Varadarajan
194d904c99 run_hive_sync tool must be able to handle case where there are multiple standalone jdbc jars in hive installation dir 2019-03-21 09:58:20 -07:00
Jing Chen
a2a052abd9 add a script that shuts down demo cluster gracefully 2019-03-19 11:01:06 -07:00
Nishith Agarwal
9e59da7fd9 Refactor HoodieTable Rollback to write one rollback instant for a batch of commits to rollback 2019-03-19 10:10:16 -07:00
Nishith Agarwal
0dd4a90b03 Enable multi/nested rollbacks for MOR table type 2019-03-19 10:10:16 -07:00
Omkar Joshi
a6c45feb2c Replacing Apache commons-lang3 object serializer with Kryo serializer 2019-03-18 14:12:25 -07:00
kaka11chen
48797b1ae1 Add compression codec configurations for HoodieParquetWriter. 2019-03-18 07:48:20 -07:00
smarthi
621f2b878d HUDI-75: Add KEYS 2019-03-18 07:46:25 -07:00
Vinoth Chandar
57bbed21de Removing docs folder from master branch
- Only asf-site branch contains the docs
 - Helps streamline doc contributions
2019-03-14 18:19:30 -07:00
Balaji Varadarajan
adc8cac743 Fix hive sync (libfb version mismatch) and deltastreamer issue (missing cmdline argument) in demo 2019-03-13 16:14:32 -07:00
Bhavani Sudha Saktheeswaran
3c647a99cf Fix quickstart documentation for querying via Presto 2019-03-13 15:34:50 -07:00
Omkar Joshi
4a8bec7ea5 Handling duplicate record update for single partition (duplicates in single or different parquet files) 2019-03-10 20:15:17 -07:00
kaka11chen
b514e1ab18 Fix avro doesn't have short and byte byte. 2019-03-06 16:09:24 -08:00
Balaji Varadarajan
3ae6cb4ed5 FileSystem View must treat same fileIds present in different partitions as different file-groups and handle pending compaction correctly 2019-03-01 10:49:04 -08:00
Vinoth Chandar
363df2c12e Upgrade various jar, gem versions for maintenance 2019-03-01 10:14:00 -08:00
vinothchandar
687395e40f [maven-release-plugin] prepare for next development iteration 2019-02-27 07:16:27 -08:00
vinothchandar
bbf40ef987 [maven-release-plugin] prepare release hoodie-0.4.5 2019-02-27 07:16:15 -08:00
vinothchandar
080b7d4d9b Update RELEASE_NOTES for 0.4.5 2019-02-27 06:47:56 -08:00
Bhavani Sudha Saktheeswaran
75c7a2622b Create hoodie-presto bundle jar
Exclude common dependencies that are available in Presto
2019-02-24 19:49:02 -08:00
n3nash
94eb6fd919 Merge pull request #570 from yaooqinn/hiveJarSuffix
typo: bundle jar with unrecognized variables
2019-02-20 16:32:57 -08:00