Nishith Agarwal
a33a55fcb5
Caching Avro Binary encoder/decoder to avoid creating new one for every record
2019-05-06 11:28:08 -07:00
Balaji Varadarajan
ee1feb7c75
Revert "HUDI-101: added mevn-shade plugin with filters."
...
Creates fat jars for all hoodie packages
This reverts commit f47f0eb6cb .
2019-05-05 18:39:38 -07:00
Abhishek Sharma
f47f0eb6cb
HUDI-101: added mevn-shade plugin with filters.
2019-05-03 13:49:51 -07:00
Balaji Varadarajan
978470af33
Rollback inflights when using Spark [Streaming] write
2019-05-02 12:51:02 -07:00
vinothchandar
57a8b9cc8c
Making DataSource/DeltaStreamer use defaults for combining
...
- Addresses issue where insert will combine and remove duplicates within batch
- Setting default insert combining to false (write client default)
- Set to true if filtering duplicates on insert/bulk_insert
2019-05-01 13:21:21 -07:00
Vinoth Chandar
ea20d47248
Introduce config to control interval tree pruning
...
- turned on by default
- Minor code refactoring/restructuring
2019-04-29 11:38:23 -07:00
Sivabalan Narayanan
7129dc5bb7
Improving Tag location using interval trees for index files
...
Adding interface for index look up
Adding index filtering implementations for global bloom index too
2019-04-29 11:38:23 -07:00
Naoki Takezoe
461ce18bd1
Fix to enable hoodie.datasource.read.incr.filters
2019-04-26 11:14:06 -07:00
Nishith Agarwal
26f24b6728
Removing OLD MAGIC header since a) it's no longer used b) causes issues when the data actually has OLD MAGIC
2019-04-25 20:47:16 -07:00
Balaji Varadarajan
2f1e3e15fb
Revert "Read and apply schema for each log block from the metadata header instead of the latest schema"
...
This reverts commit 9e7ce19b06 .
2019-04-18 08:54:34 -07:00
lyogev
9ef51deb84
Add empty payload class to support deletes via apache spark
2019-04-17 23:00:20 -07:00
Balaji Varadarajan
243c58f77c
Move to apachehudi dockerhub repository & use openjdk docker containers
2019-04-17 16:37:58 -07:00
Balaji Varadarajan
36ef94004e
Fix Hive RT query failure in hoodie demo
2019-04-17 16:36:32 -07:00
Omkar Joshi
e35d24f31d
Revert "Replacing Apache commons-lang3 object serializer with Kryo serializer"
...
This reverts commit a6c45feb2c .
2019-04-17 09:23:37 -07:00
Nishith Agarwal
9e7ce19b06
Read and apply schema for each log block from the metadata header instead of the latest schema
2019-04-16 17:20:03 -07:00
Bhavani Sudha Saktheeswaran
83b6aa5e91
Fix multiple issues when using build_local_docker_images for setting up the demo
...
Details here - https://issues.apache.org/jira/browse/HUDI-98
2019-04-15 10:10:05 -07:00
Nishith Agarwal
a8feee9293
Performing commit archiving in batches to avoid keeping a huge chunk in memory
2019-04-10 15:17:04 -07:00
Balaji Varadarajan
b07110b9fd
Essential Hive packages missing in hoodie spark bundle
2019-04-09 21:42:42 -07:00
Nishith Agarwal
2577014617
1. Minor changes to fix compaction 2. Adding 2 compaction policies
2019-04-03 17:38:17 -07:00
Jing Chen
d1d33f725e
[HUDI-66] FSUtils.getRelativePartitionPath does not handle repeated folder names
2019-04-03 17:37:03 -07:00
Vinoth Chandar
b34a204a52
Fixing small file handling, inline compaction defaults
...
- Small file limit is now 100MB by default
- Turned on inline compaction by default for MOR
- Changes take effect on DataSource and DeltaStreamer
2019-04-03 10:56:10 -07:00
Vinoth Chandar
51f4908989
Follow up HUDI-27 : Call super.close() in HoodieWraperFileSystem::close()
2019-04-02 21:31:41 -07:00
Vinoth Chandar
5847f0c934
Fix HUDI-27 : Support num_cores > 1 for writing through spark
...
- Users using spark.executor.cores > 1 used to fail due to "FileSystem closed"
- This is due to HoodieWrapperFileSystem closing the wrapped filesytem obj
- FileSystem.getInternal caching code races threads and closes the extra fs instance(s)
- Bumped up num cores in tests to 8, speeds up tests by 3-4 mins
2019-03-28 15:56:21 -07:00
Vinoth Chandar
f1410bfdcd
Fixes HUDI-38: Reduce memory overhead of WriteStatus
...
- For implicit indexes (e.g BloomIndex), don't buffer up written records
- By default, only collect 10% of failing records to avoid OOMs
- Improves debuggability via above, since data errors can now show up in collect()
- Unit tests & fixing subclasses & adjusting tests
2019-03-28 10:32:59 -07:00
Vinoth Chandar
e56c1612e4
Fixed HUDI-87 : Remove schemastr from BaseAvroPayload
2019-03-27 23:03:25 -07:00
Vinoth Chandar
372fbc4733
Fixes HUDI-9 : Check precondition minInstantsToKeep > cleanerCommitsRetained
...
- Added a precondition check, otherwise incr pull could miss commits
- Lowered default cleaner retention to 10, to enable simpler understanding for newbies
- Bumped down min/max instants to retain as well
2019-03-27 11:02:17 -07:00
Nishith Agarwal
3d9041e216
Fixing source schema and writer schema distinction in payloads
2019-03-26 19:44:27 -07:00
ambition119
395806fc68
[HUDI-63] Removed unused BucketedIndex code
2019-03-26 10:12:47 -07:00
Balaji Varadarajan
194d904c99
run_hive_sync tool must be able to handle case where there are multiple standalone jdbc jars in hive installation dir
2019-03-21 09:58:20 -07:00
Jing Chen
a2a052abd9
add a script that shuts down demo cluster gracefully
2019-03-19 11:01:06 -07:00
Nishith Agarwal
9e59da7fd9
Refactor HoodieTable Rollback to write one rollback instant for a batch of commits to rollback
2019-03-19 10:10:16 -07:00
Nishith Agarwal
0dd4a90b03
Enable multi/nested rollbacks for MOR table type
2019-03-19 10:10:16 -07:00
Omkar Joshi
a6c45feb2c
Replacing Apache commons-lang3 object serializer with Kryo serializer
2019-03-18 14:12:25 -07:00
kaka11chen
48797b1ae1
Add compression codec configurations for HoodieParquetWriter.
2019-03-18 07:48:20 -07:00
smarthi
621f2b878d
HUDI-75: Add KEYS
2019-03-18 07:46:25 -07:00
Vinoth Chandar
57bbed21de
Removing docs folder from master branch
...
- Only asf-site branch contains the docs
- Helps streamline doc contributions
2019-03-14 18:19:30 -07:00
Balaji Varadarajan
adc8cac743
Fix hive sync (libfb version mismatch) and deltastreamer issue (missing cmdline argument) in demo
2019-03-13 16:14:32 -07:00
Bhavani Sudha Saktheeswaran
3c647a99cf
Fix quickstart documentation for querying via Presto
2019-03-13 15:34:50 -07:00
Omkar Joshi
4a8bec7ea5
Handling duplicate record update for single partition (duplicates in single or different parquet files)
2019-03-10 20:15:17 -07:00
kaka11chen
b514e1ab18
Fix avro doesn't have short and byte byte.
2019-03-06 16:09:24 -08:00
Balaji Varadarajan
3ae6cb4ed5
FileSystem View must treat same fileIds present in different partitions as different file-groups and handle pending compaction correctly
2019-03-01 10:49:04 -08:00
Vinoth Chandar
363df2c12e
Upgrade various jar, gem versions for maintenance
2019-03-01 10:14:00 -08:00
vinothchandar
687395e40f
[maven-release-plugin] prepare for next development iteration
2019-02-27 07:16:27 -08:00
vinothchandar
bbf40ef987
[maven-release-plugin] prepare release hoodie-0.4.5
2019-02-27 07:16:15 -08:00
vinothchandar
080b7d4d9b
Update RELEASE_NOTES for 0.4.5
2019-02-27 06:47:56 -08:00
Bhavani Sudha Saktheeswaran
75c7a2622b
Create hoodie-presto bundle jar
...
Exclude common dependencies that are available in Presto
2019-02-24 19:49:02 -08:00
n3nash
94eb6fd919
Merge pull request #570 from yaooqinn/hiveJarSuffix
...
typo: bundle jar with unrecognized variables
2019-02-20 16:32:57 -08:00
Bhavani Sudha Saktheeswaran
639c287cab
Close FSDataInputStream for meta file open in HoodiePartitionMetadata
2019-02-15 22:16:31 -08:00
Kent Yao
8dddecf00f
handle no such element exception in HoodieSparkSqlWriter
2019-02-15 22:11:48 -08:00
vinoth chandar
a16aa2a78f
Create CNAME
2019-02-15 21:53:08 -08:00