1
0
Commit Graph

393 Commits

Author SHA1 Message Date
Balaji Varadarajan
b07110b9fd Essential Hive packages missing in hoodie spark bundle 2019-04-09 21:42:42 -07:00
Nishith Agarwal
2577014617 1. Minor changes to fix compaction 2. Adding 2 compaction policies 2019-04-03 17:38:17 -07:00
Jing Chen
d1d33f725e [HUDI-66] FSUtils.getRelativePartitionPath does not handle repeated folder names 2019-04-03 17:37:03 -07:00
Vinoth Chandar
b34a204a52 Fixing small file handling, inline compaction defaults
- Small file limit is now 100MB by default
 - Turned on inline compaction by default for MOR
 - Changes take effect on DataSource and DeltaStreamer
2019-04-03 10:56:10 -07:00
Vinoth Chandar
51f4908989 Follow up HUDI-27 : Call super.close() in HoodieWraperFileSystem::close() 2019-04-02 21:31:41 -07:00
Vinoth Chandar
5847f0c934 Fix HUDI-27 : Support num_cores > 1 for writing through spark
- Users using spark.executor.cores > 1 used to fail due to "FileSystem closed"
 - This is due to HoodieWrapperFileSystem closing the wrapped filesytem obj
 - FileSystem.getInternal caching code races threads and closes the extra fs instance(s)
 - Bumped up num cores in tests to 8, speeds up tests by 3-4 mins
2019-03-28 15:56:21 -07:00
Vinoth Chandar
f1410bfdcd Fixes HUDI-38: Reduce memory overhead of WriteStatus
- For implicit indexes (e.g BloomIndex), don't buffer up written records
 - By default, only collect 10% of failing records to avoid OOMs
 - Improves debuggability via above, since data errors can now show up in collect()
 - Unit tests & fixing subclasses & adjusting tests
2019-03-28 10:32:59 -07:00
Vinoth Chandar
e56c1612e4 Fixed HUDI-87 : Remove schemastr from BaseAvroPayload 2019-03-27 23:03:25 -07:00
Vinoth Chandar
372fbc4733 Fixes HUDI-9 : Check precondition minInstantsToKeep > cleanerCommitsRetained
- Added a precondition check, otherwise incr pull could miss commits
 - Lowered default cleaner retention to 10, to enable simpler understanding for newbies
 - Bumped down min/max instants to retain as well
2019-03-27 11:02:17 -07:00
Nishith Agarwal
3d9041e216 Fixing source schema and writer schema distinction in payloads 2019-03-26 19:44:27 -07:00
ambition119
395806fc68 [HUDI-63] Removed unused BucketedIndex code 2019-03-26 10:12:47 -07:00
Balaji Varadarajan
194d904c99 run_hive_sync tool must be able to handle case where there are multiple standalone jdbc jars in hive installation dir 2019-03-21 09:58:20 -07:00
Jing Chen
a2a052abd9 add a script that shuts down demo cluster gracefully 2019-03-19 11:01:06 -07:00
Nishith Agarwal
9e59da7fd9 Refactor HoodieTable Rollback to write one rollback instant for a batch of commits to rollback 2019-03-19 10:10:16 -07:00
Nishith Agarwal
0dd4a90b03 Enable multi/nested rollbacks for MOR table type 2019-03-19 10:10:16 -07:00
Omkar Joshi
a6c45feb2c Replacing Apache commons-lang3 object serializer with Kryo serializer 2019-03-18 14:12:25 -07:00
kaka11chen
48797b1ae1 Add compression codec configurations for HoodieParquetWriter. 2019-03-18 07:48:20 -07:00
smarthi
621f2b878d HUDI-75: Add KEYS 2019-03-18 07:46:25 -07:00
Vinoth Chandar
57bbed21de Removing docs folder from master branch
- Only asf-site branch contains the docs
 - Helps streamline doc contributions
2019-03-14 18:19:30 -07:00
Balaji Varadarajan
adc8cac743 Fix hive sync (libfb version mismatch) and deltastreamer issue (missing cmdline argument) in demo 2019-03-13 16:14:32 -07:00
Bhavani Sudha Saktheeswaran
3c647a99cf Fix quickstart documentation for querying via Presto 2019-03-13 15:34:50 -07:00
Omkar Joshi
4a8bec7ea5 Handling duplicate record update for single partition (duplicates in single or different parquet files) 2019-03-10 20:15:17 -07:00
kaka11chen
b514e1ab18 Fix avro doesn't have short and byte byte. 2019-03-06 16:09:24 -08:00
Balaji Varadarajan
3ae6cb4ed5 FileSystem View must treat same fileIds present in different partitions as different file-groups and handle pending compaction correctly 2019-03-01 10:49:04 -08:00
Vinoth Chandar
363df2c12e Upgrade various jar, gem versions for maintenance 2019-03-01 10:14:00 -08:00
vinothchandar
687395e40f [maven-release-plugin] prepare for next development iteration 2019-02-27 07:16:27 -08:00
vinothchandar
bbf40ef987 [maven-release-plugin] prepare release hoodie-0.4.5 2019-02-27 07:16:15 -08:00
vinothchandar
080b7d4d9b Update RELEASE_NOTES for 0.4.5 2019-02-27 06:47:56 -08:00
Bhavani Sudha Saktheeswaran
75c7a2622b Create hoodie-presto bundle jar
Exclude common dependencies that are available in Presto
2019-02-24 19:49:02 -08:00
n3nash
94eb6fd919 Merge pull request #570 from yaooqinn/hiveJarSuffix
typo: bundle jar with unrecognized variables
2019-02-20 16:32:57 -08:00
Bhavani Sudha Saktheeswaran
639c287cab Close FSDataInputStream for meta file open in HoodiePartitionMetadata 2019-02-15 22:16:31 -08:00
Kent Yao
8dddecf00f handle no such element exception in HoodieSparkSqlWriter 2019-02-15 22:11:48 -08:00
vinoth chandar
a16aa2a78f Create CNAME 2019-02-15 21:53:08 -08:00
vinoth chandar
ef0d6f2218 Update site url in README 2019-02-15 21:28:39 -08:00
Kent Yao
09f203d324 typo: bundle jar with unrecongnized variables 2019-02-13 16:46:11 +08:00
Balaji Varadarajan
8adaca3454 Table rollback for inflight compactions MUST not delete instant files at any time to avoid race conditions 2019-02-11 18:30:21 -08:00
Balaji Varadarajan
defcf6a0b9 Fix Hoodie Record Reader to work with non-partitioned dataset 2019-02-11 18:29:23 -08:00
Balaji Varadarajan
3a0044216c New Features in DeltaStreamer :
(1) Apply transformation when using delta-streamer to ingest data.
 (2) Add Hudi Incremental Source for Delta Streamer
 (3) Allow delta-streamer config-property to be passed as command-line
 (4) Add Hive Integration to Delta-Streamer and address Review comments
 (5) Ensure MultiPartKeysValueExtractor  handle hive style partition description
 (6) Reuse same spark session on both source and transformer
 (7) Support extracting partition fields from _hoodie_partition_path for HoodieIncrSource
 (8) Reuse Binary Avro coders
 (9) Add push down filter for Incremental source
 (10) Add Hoodie DeltaStreamer metrics to track total time taken
2019-02-11 18:22:05 -08:00
Vinoth Chandar
c70dbc13e9 Updating new slack signup link 2019-02-06 13:52:00 -08:00
Kent Yao
2b55f0751f Using immutable map instead of mutables to generate parameters 2019-01-30 16:09:40 -08:00
Nishith Agarwal
7985eb72b5 Fixing behavior of Merge/CreateHandle for invalid/wrong schema records 2019-01-28 16:01:03 -08:00
Nishith Agarwal
994d42d307 cleaner should now use commit timeline and not include deltacomits 2019-01-28 10:46:33 -08:00
Nishith Agarwal
68723764ed Adding compaction to HoodieClient example 2019-01-28 10:23:44 -08:00
Nishith Agarwal
169e3f66bb Filtering partition paths before performing a list status on all partitions 2019-01-25 11:34:00 -08:00
Nishith Agarwal
d1bb804577 Passing a path filter to avoid including folders under .hoodie directory as partition paths 2019-01-11 19:21:09 -08:00
Nishith Agarwal
110df7190b Enabling hard deletes for MergeOnRead table type 2018-12-31 12:49:58 -08:00
Manu Sridharan
345aaa31aa Add m2 directory to Travis cache 2018-12-31 10:31:12 -08:00
arukavytsia
6946dd7557 General enhancements 2018-12-18 12:52:39 -08:00
Balaji Varadarajan
30c5f8b7bd Ensure Hoodie works for non-partitioned Hive table 2018-12-12 13:35:16 -08:00
xubo245
466ff73ffb fix some spell errorin Hudi 2018-12-12 13:06:25 -08:00