Omkar Joshi
e35d24f31d
Revert "Replacing Apache commons-lang3 object serializer with Kryo serializer"
...
This reverts commit a6c45feb2c .
2019-04-17 09:23:37 -07:00
Nishith Agarwal
9e7ce19b06
Read and apply schema for each log block from the metadata header instead of the latest schema
2019-04-16 17:20:03 -07:00
Bhavani Sudha Saktheeswaran
83b6aa5e91
Fix multiple issues when using build_local_docker_images for setting up the demo
...
Details here - https://issues.apache.org/jira/browse/HUDI-98
2019-04-15 10:10:05 -07:00
Nishith Agarwal
a8feee9293
Performing commit archiving in batches to avoid keeping a huge chunk in memory
2019-04-10 15:17:04 -07:00
Balaji Varadarajan
b07110b9fd
Essential Hive packages missing in hoodie spark bundle
2019-04-09 21:42:42 -07:00
Nishith Agarwal
2577014617
1. Minor changes to fix compaction 2. Adding 2 compaction policies
2019-04-03 17:38:17 -07:00
Jing Chen
d1d33f725e
[HUDI-66] FSUtils.getRelativePartitionPath does not handle repeated folder names
2019-04-03 17:37:03 -07:00
Vinoth Chandar
b34a204a52
Fixing small file handling, inline compaction defaults
...
- Small file limit is now 100MB by default
- Turned on inline compaction by default for MOR
- Changes take effect on DataSource and DeltaStreamer
2019-04-03 10:56:10 -07:00
Vinoth Chandar
51f4908989
Follow up HUDI-27 : Call super.close() in HoodieWraperFileSystem::close()
2019-04-02 21:31:41 -07:00
Vinoth Chandar
5847f0c934
Fix HUDI-27 : Support num_cores > 1 for writing through spark
...
- Users using spark.executor.cores > 1 used to fail due to "FileSystem closed"
- This is due to HoodieWrapperFileSystem closing the wrapped filesytem obj
- FileSystem.getInternal caching code races threads and closes the extra fs instance(s)
- Bumped up num cores in tests to 8, speeds up tests by 3-4 mins
2019-03-28 15:56:21 -07:00
Vinoth Chandar
f1410bfdcd
Fixes HUDI-38: Reduce memory overhead of WriteStatus
...
- For implicit indexes (e.g BloomIndex), don't buffer up written records
- By default, only collect 10% of failing records to avoid OOMs
- Improves debuggability via above, since data errors can now show up in collect()
- Unit tests & fixing subclasses & adjusting tests
2019-03-28 10:32:59 -07:00
Vinoth Chandar
e56c1612e4
Fixed HUDI-87 : Remove schemastr from BaseAvroPayload
2019-03-27 23:03:25 -07:00
Vinoth Chandar
372fbc4733
Fixes HUDI-9 : Check precondition minInstantsToKeep > cleanerCommitsRetained
...
- Added a precondition check, otherwise incr pull could miss commits
- Lowered default cleaner retention to 10, to enable simpler understanding for newbies
- Bumped down min/max instants to retain as well
2019-03-27 11:02:17 -07:00
Nishith Agarwal
3d9041e216
Fixing source schema and writer schema distinction in payloads
2019-03-26 19:44:27 -07:00
ambition119
395806fc68
[HUDI-63] Removed unused BucketedIndex code
2019-03-26 10:12:47 -07:00
Balaji Varadarajan
194d904c99
run_hive_sync tool must be able to handle case where there are multiple standalone jdbc jars in hive installation dir
2019-03-21 09:58:20 -07:00
Jing Chen
a2a052abd9
add a script that shuts down demo cluster gracefully
2019-03-19 11:01:06 -07:00
Nishith Agarwal
9e59da7fd9
Refactor HoodieTable Rollback to write one rollback instant for a batch of commits to rollback
2019-03-19 10:10:16 -07:00
Nishith Agarwal
0dd4a90b03
Enable multi/nested rollbacks for MOR table type
2019-03-19 10:10:16 -07:00
Omkar Joshi
a6c45feb2c
Replacing Apache commons-lang3 object serializer with Kryo serializer
2019-03-18 14:12:25 -07:00
kaka11chen
48797b1ae1
Add compression codec configurations for HoodieParquetWriter.
2019-03-18 07:48:20 -07:00
smarthi
621f2b878d
HUDI-75: Add KEYS
2019-03-18 07:46:25 -07:00
Vinoth Chandar
57bbed21de
Removing docs folder from master branch
...
- Only asf-site branch contains the docs
- Helps streamline doc contributions
2019-03-14 18:19:30 -07:00
Balaji Varadarajan
adc8cac743
Fix hive sync (libfb version mismatch) and deltastreamer issue (missing cmdline argument) in demo
2019-03-13 16:14:32 -07:00
Bhavani Sudha Saktheeswaran
3c647a99cf
Fix quickstart documentation for querying via Presto
2019-03-13 15:34:50 -07:00
Omkar Joshi
4a8bec7ea5
Handling duplicate record update for single partition (duplicates in single or different parquet files)
2019-03-10 20:15:17 -07:00
kaka11chen
b514e1ab18
Fix avro doesn't have short and byte byte.
2019-03-06 16:09:24 -08:00
Balaji Varadarajan
3ae6cb4ed5
FileSystem View must treat same fileIds present in different partitions as different file-groups and handle pending compaction correctly
2019-03-01 10:49:04 -08:00
Vinoth Chandar
363df2c12e
Upgrade various jar, gem versions for maintenance
2019-03-01 10:14:00 -08:00
vinothchandar
687395e40f
[maven-release-plugin] prepare for next development iteration
2019-02-27 07:16:27 -08:00
vinothchandar
bbf40ef987
[maven-release-plugin] prepare release hoodie-0.4.5
2019-02-27 07:16:15 -08:00
vinothchandar
080b7d4d9b
Update RELEASE_NOTES for 0.4.5
2019-02-27 06:47:56 -08:00
Bhavani Sudha Saktheeswaran
75c7a2622b
Create hoodie-presto bundle jar
...
Exclude common dependencies that are available in Presto
2019-02-24 19:49:02 -08:00
n3nash
94eb6fd919
Merge pull request #570 from yaooqinn/hiveJarSuffix
...
typo: bundle jar with unrecognized variables
2019-02-20 16:32:57 -08:00
Bhavani Sudha Saktheeswaran
639c287cab
Close FSDataInputStream for meta file open in HoodiePartitionMetadata
2019-02-15 22:16:31 -08:00
Kent Yao
8dddecf00f
handle no such element exception in HoodieSparkSqlWriter
2019-02-15 22:11:48 -08:00
vinoth chandar
a16aa2a78f
Create CNAME
2019-02-15 21:53:08 -08:00
vinoth chandar
ef0d6f2218
Update site url in README
2019-02-15 21:28:39 -08:00
Kent Yao
09f203d324
typo: bundle jar with unrecongnized variables
2019-02-13 16:46:11 +08:00
Balaji Varadarajan
8adaca3454
Table rollback for inflight compactions MUST not delete instant files at any time to avoid race conditions
2019-02-11 18:30:21 -08:00
Balaji Varadarajan
defcf6a0b9
Fix Hoodie Record Reader to work with non-partitioned dataset
2019-02-11 18:29:23 -08:00
Balaji Varadarajan
3a0044216c
New Features in DeltaStreamer :
...
(1) Apply transformation when using delta-streamer to ingest data.
(2) Add Hudi Incremental Source for Delta Streamer
(3) Allow delta-streamer config-property to be passed as command-line
(4) Add Hive Integration to Delta-Streamer and address Review comments
(5) Ensure MultiPartKeysValueExtractor handle hive style partition description
(6) Reuse same spark session on both source and transformer
(7) Support extracting partition fields from _hoodie_partition_path for HoodieIncrSource
(8) Reuse Binary Avro coders
(9) Add push down filter for Incremental source
(10) Add Hoodie DeltaStreamer metrics to track total time taken
2019-02-11 18:22:05 -08:00
Vinoth Chandar
c70dbc13e9
Updating new slack signup link
2019-02-06 13:52:00 -08:00
Kent Yao
2b55f0751f
Using immutable map instead of mutables to generate parameters
2019-01-30 16:09:40 -08:00
Nishith Agarwal
7985eb72b5
Fixing behavior of Merge/CreateHandle for invalid/wrong schema records
2019-01-28 16:01:03 -08:00
Nishith Agarwal
994d42d307
cleaner should now use commit timeline and not include deltacomits
2019-01-28 10:46:33 -08:00
Nishith Agarwal
68723764ed
Adding compaction to HoodieClient example
2019-01-28 10:23:44 -08:00
Nishith Agarwal
169e3f66bb
Filtering partition paths before performing a list status on all partitions
2019-01-25 11:34:00 -08:00
Nishith Agarwal
d1bb804577
Passing a path filter to avoid including folders under .hoodie directory as partition paths
2019-01-11 19:21:09 -08:00
Nishith Agarwal
110df7190b
Enabling hard deletes for MergeOnRead table type
2018-12-31 12:49:58 -08:00