1
0
Commit Graph

801 Commits

Author SHA1 Message Date
Balaji Varadarajan
2f1e3e15fb Revert "Read and apply schema for each log block from the metadata header instead of the latest schema"
This reverts commit 9e7ce19b06.
2019-04-18 08:54:34 -07:00
lyogev
9ef51deb84 Add empty payload class to support deletes via apache spark 2019-04-17 23:00:20 -07:00
Balaji Varadarajan
243c58f77c Move to apachehudi dockerhub repository & use openjdk docker containers 2019-04-17 16:37:58 -07:00
Balaji Varadarajan
36ef94004e Fix Hive RT query failure in hoodie demo 2019-04-17 16:36:32 -07:00
Omkar Joshi
e35d24f31d Revert "Replacing Apache commons-lang3 object serializer with Kryo serializer"
This reverts commit a6c45feb2c.
2019-04-17 09:23:37 -07:00
Nishith Agarwal
9e7ce19b06 Read and apply schema for each log block from the metadata header instead of the latest schema 2019-04-16 17:20:03 -07:00
Bhavani Sudha Saktheeswaran
83b6aa5e91 Fix multiple issues when using build_local_docker_images for setting up the demo
Details here - https://issues.apache.org/jira/browse/HUDI-98
2019-04-15 10:10:05 -07:00
Nishith Agarwal
a8feee9293 Performing commit archiving in batches to avoid keeping a huge chunk in memory 2019-04-10 15:17:04 -07:00
Balaji Varadarajan
b07110b9fd Essential Hive packages missing in hoodie spark bundle 2019-04-09 21:42:42 -07:00
Nishith Agarwal
2577014617 1. Minor changes to fix compaction 2. Adding 2 compaction policies 2019-04-03 17:38:17 -07:00
Jing Chen
d1d33f725e [HUDI-66] FSUtils.getRelativePartitionPath does not handle repeated folder names 2019-04-03 17:37:03 -07:00
Vinoth Chandar
b34a204a52 Fixing small file handling, inline compaction defaults
- Small file limit is now 100MB by default
 - Turned on inline compaction by default for MOR
 - Changes take effect on DataSource and DeltaStreamer
2019-04-03 10:56:10 -07:00
Vinoth Chandar
51f4908989 Follow up HUDI-27 : Call super.close() in HoodieWraperFileSystem::close() 2019-04-02 21:31:41 -07:00
Vinoth Chandar
5847f0c934 Fix HUDI-27 : Support num_cores > 1 for writing through spark
- Users using spark.executor.cores > 1 used to fail due to "FileSystem closed"
 - This is due to HoodieWrapperFileSystem closing the wrapped filesytem obj
 - FileSystem.getInternal caching code races threads and closes the extra fs instance(s)
 - Bumped up num cores in tests to 8, speeds up tests by 3-4 mins
2019-03-28 15:56:21 -07:00
Vinoth Chandar
f1410bfdcd Fixes HUDI-38: Reduce memory overhead of WriteStatus
- For implicit indexes (e.g BloomIndex), don't buffer up written records
 - By default, only collect 10% of failing records to avoid OOMs
 - Improves debuggability via above, since data errors can now show up in collect()
 - Unit tests & fixing subclasses & adjusting tests
2019-03-28 10:32:59 -07:00
Vinoth Chandar
e56c1612e4 Fixed HUDI-87 : Remove schemastr from BaseAvroPayload 2019-03-27 23:03:25 -07:00
Vinoth Chandar
372fbc4733 Fixes HUDI-9 : Check precondition minInstantsToKeep > cleanerCommitsRetained
- Added a precondition check, otherwise incr pull could miss commits
 - Lowered default cleaner retention to 10, to enable simpler understanding for newbies
 - Bumped down min/max instants to retain as well
2019-03-27 11:02:17 -07:00
Nishith Agarwal
3d9041e216 Fixing source schema and writer schema distinction in payloads 2019-03-26 19:44:27 -07:00
ambition119
395806fc68 [HUDI-63] Removed unused BucketedIndex code 2019-03-26 10:12:47 -07:00
Balaji Varadarajan
194d904c99 run_hive_sync tool must be able to handle case where there are multiple standalone jdbc jars in hive installation dir 2019-03-21 09:58:20 -07:00
Jing Chen
a2a052abd9 add a script that shuts down demo cluster gracefully 2019-03-19 11:01:06 -07:00
Nishith Agarwal
9e59da7fd9 Refactor HoodieTable Rollback to write one rollback instant for a batch of commits to rollback 2019-03-19 10:10:16 -07:00
Nishith Agarwal
0dd4a90b03 Enable multi/nested rollbacks for MOR table type 2019-03-19 10:10:16 -07:00
Omkar Joshi
a6c45feb2c Replacing Apache commons-lang3 object serializer with Kryo serializer 2019-03-18 14:12:25 -07:00
kaka11chen
48797b1ae1 Add compression codec configurations for HoodieParquetWriter. 2019-03-18 07:48:20 -07:00
smarthi
621f2b878d HUDI-75: Add KEYS 2019-03-18 07:46:25 -07:00
Vinoth Chandar
57bbed21de Removing docs folder from master branch
- Only asf-site branch contains the docs
 - Helps streamline doc contributions
2019-03-14 18:19:30 -07:00
Balaji Varadarajan
adc8cac743 Fix hive sync (libfb version mismatch) and deltastreamer issue (missing cmdline argument) in demo 2019-03-13 16:14:32 -07:00
Bhavani Sudha Saktheeswaran
3c647a99cf Fix quickstart documentation for querying via Presto 2019-03-13 15:34:50 -07:00
Omkar Joshi
4a8bec7ea5 Handling duplicate record update for single partition (duplicates in single or different parquet files) 2019-03-10 20:15:17 -07:00
kaka11chen
b514e1ab18 Fix avro doesn't have short and byte byte. 2019-03-06 16:09:24 -08:00
Balaji Varadarajan
3ae6cb4ed5 FileSystem View must treat same fileIds present in different partitions as different file-groups and handle pending compaction correctly 2019-03-01 10:49:04 -08:00
Vinoth Chandar
363df2c12e Upgrade various jar, gem versions for maintenance 2019-03-01 10:14:00 -08:00
vinothchandar
687395e40f [maven-release-plugin] prepare for next development iteration 2019-02-27 07:16:27 -08:00
vinothchandar
bbf40ef987 [maven-release-plugin] prepare release hoodie-0.4.5 2019-02-27 07:16:15 -08:00
vinothchandar
080b7d4d9b Update RELEASE_NOTES for 0.4.5 2019-02-27 06:47:56 -08:00
Bhavani Sudha Saktheeswaran
75c7a2622b Create hoodie-presto bundle jar
Exclude common dependencies that are available in Presto
2019-02-24 19:49:02 -08:00
n3nash
94eb6fd919 Merge pull request #570 from yaooqinn/hiveJarSuffix
typo: bundle jar with unrecognized variables
2019-02-20 16:32:57 -08:00
Bhavani Sudha Saktheeswaran
639c287cab Close FSDataInputStream for meta file open in HoodiePartitionMetadata 2019-02-15 22:16:31 -08:00
Kent Yao
8dddecf00f handle no such element exception in HoodieSparkSqlWriter 2019-02-15 22:11:48 -08:00
vinoth chandar
a16aa2a78f Create CNAME 2019-02-15 21:53:08 -08:00
vinoth chandar
ef0d6f2218 Update site url in README 2019-02-15 21:28:39 -08:00
Kent Yao
09f203d324 typo: bundle jar with unrecongnized variables 2019-02-13 16:46:11 +08:00
Balaji Varadarajan
8adaca3454 Table rollback for inflight compactions MUST not delete instant files at any time to avoid race conditions 2019-02-11 18:30:21 -08:00
Balaji Varadarajan
defcf6a0b9 Fix Hoodie Record Reader to work with non-partitioned dataset 2019-02-11 18:29:23 -08:00
Balaji Varadarajan
3a0044216c New Features in DeltaStreamer :
(1) Apply transformation when using delta-streamer to ingest data.
 (2) Add Hudi Incremental Source for Delta Streamer
 (3) Allow delta-streamer config-property to be passed as command-line
 (4) Add Hive Integration to Delta-Streamer and address Review comments
 (5) Ensure MultiPartKeysValueExtractor  handle hive style partition description
 (6) Reuse same spark session on both source and transformer
 (7) Support extracting partition fields from _hoodie_partition_path for HoodieIncrSource
 (8) Reuse Binary Avro coders
 (9) Add push down filter for Incremental source
 (10) Add Hoodie DeltaStreamer metrics to track total time taken
2019-02-11 18:22:05 -08:00
Vinoth Chandar
c70dbc13e9 Updating new slack signup link 2019-02-06 13:52:00 -08:00
Kent Yao
2b55f0751f Using immutable map instead of mutables to generate parameters 2019-01-30 16:09:40 -08:00
Nishith Agarwal
7985eb72b5 Fixing behavior of Merge/CreateHandle for invalid/wrong schema records 2019-01-28 16:01:03 -08:00
Nishith Agarwal
994d42d307 cleaner should now use commit timeline and not include deltacomits 2019-01-28 10:46:33 -08:00