Vinoth Chandar
5847f0c934
Fix HUDI-27 : Support num_cores > 1 for writing through spark
...
- Users using spark.executor.cores > 1 used to fail due to "FileSystem closed"
- This is due to HoodieWrapperFileSystem closing the wrapped filesytem obj
- FileSystem.getInternal caching code races threads and closes the extra fs instance(s)
- Bumped up num cores in tests to 8, speeds up tests by 3-4 mins
2019-03-28 15:56:21 -07:00
Vinoth Chandar
f1410bfdcd
Fixes HUDI-38: Reduce memory overhead of WriteStatus
...
- For implicit indexes (e.g BloomIndex), don't buffer up written records
- By default, only collect 10% of failing records to avoid OOMs
- Improves debuggability via above, since data errors can now show up in collect()
- Unit tests & fixing subclasses & adjusting tests
2019-03-28 10:32:59 -07:00
Vinoth Chandar
e56c1612e4
Fixed HUDI-87 : Remove schemastr from BaseAvroPayload
2019-03-27 23:03:25 -07:00
Vinoth Chandar
372fbc4733
Fixes HUDI-9 : Check precondition minInstantsToKeep > cleanerCommitsRetained
...
- Added a precondition check, otherwise incr pull could miss commits
- Lowered default cleaner retention to 10, to enable simpler understanding for newbies
- Bumped down min/max instants to retain as well
2019-03-27 11:02:17 -07:00
Nishith Agarwal
3d9041e216
Fixing source schema and writer schema distinction in payloads
2019-03-26 19:44:27 -07:00
ambition119
395806fc68
[HUDI-63] Removed unused BucketedIndex code
2019-03-26 10:12:47 -07:00
Balaji Varadarajan
194d904c99
run_hive_sync tool must be able to handle case where there are multiple standalone jdbc jars in hive installation dir
2019-03-21 09:58:20 -07:00
Jing Chen
a2a052abd9
add a script that shuts down demo cluster gracefully
2019-03-19 11:01:06 -07:00
Nishith Agarwal
9e59da7fd9
Refactor HoodieTable Rollback to write one rollback instant for a batch of commits to rollback
2019-03-19 10:10:16 -07:00
Nishith Agarwal
0dd4a90b03
Enable multi/nested rollbacks for MOR table type
2019-03-19 10:10:16 -07:00
Omkar Joshi
a6c45feb2c
Replacing Apache commons-lang3 object serializer with Kryo serializer
2019-03-18 14:12:25 -07:00
kaka11chen
48797b1ae1
Add compression codec configurations for HoodieParquetWriter.
2019-03-18 07:48:20 -07:00
smarthi
621f2b878d
HUDI-75: Add KEYS
2019-03-18 07:46:25 -07:00
Vinoth Chandar
57bbed21de
Removing docs folder from master branch
...
- Only asf-site branch contains the docs
- Helps streamline doc contributions
2019-03-14 18:19:30 -07:00
Balaji Varadarajan
adc8cac743
Fix hive sync (libfb version mismatch) and deltastreamer issue (missing cmdline argument) in demo
2019-03-13 16:14:32 -07:00
Bhavani Sudha Saktheeswaran
3c647a99cf
Fix quickstart documentation for querying via Presto
2019-03-13 15:34:50 -07:00
Omkar Joshi
4a8bec7ea5
Handling duplicate record update for single partition (duplicates in single or different parquet files)
2019-03-10 20:15:17 -07:00
kaka11chen
b514e1ab18
Fix avro doesn't have short and byte byte.
2019-03-06 16:09:24 -08:00
Balaji Varadarajan
3ae6cb4ed5
FileSystem View must treat same fileIds present in different partitions as different file-groups and handle pending compaction correctly
2019-03-01 10:49:04 -08:00
Vinoth Chandar
363df2c12e
Upgrade various jar, gem versions for maintenance
2019-03-01 10:14:00 -08:00
vinothchandar
687395e40f
[maven-release-plugin] prepare for next development iteration
2019-02-27 07:16:27 -08:00
vinothchandar
bbf40ef987
[maven-release-plugin] prepare release hoodie-0.4.5
2019-02-27 07:16:15 -08:00
vinothchandar
080b7d4d9b
Update RELEASE_NOTES for 0.4.5
2019-02-27 06:47:56 -08:00
Bhavani Sudha Saktheeswaran
75c7a2622b
Create hoodie-presto bundle jar
...
Exclude common dependencies that are available in Presto
2019-02-24 19:49:02 -08:00
n3nash
94eb6fd919
Merge pull request #570 from yaooqinn/hiveJarSuffix
...
typo: bundle jar with unrecognized variables
2019-02-20 16:32:57 -08:00
Bhavani Sudha Saktheeswaran
639c287cab
Close FSDataInputStream for meta file open in HoodiePartitionMetadata
2019-02-15 22:16:31 -08:00
Kent Yao
8dddecf00f
handle no such element exception in HoodieSparkSqlWriter
2019-02-15 22:11:48 -08:00
vinoth chandar
a16aa2a78f
Create CNAME
2019-02-15 21:53:08 -08:00
vinoth chandar
ef0d6f2218
Update site url in README
2019-02-15 21:28:39 -08:00
Kent Yao
09f203d324
typo: bundle jar with unrecongnized variables
2019-02-13 16:46:11 +08:00
Balaji Varadarajan
8adaca3454
Table rollback for inflight compactions MUST not delete instant files at any time to avoid race conditions
2019-02-11 18:30:21 -08:00
Balaji Varadarajan
defcf6a0b9
Fix Hoodie Record Reader to work with non-partitioned dataset
2019-02-11 18:29:23 -08:00
Balaji Varadarajan
3a0044216c
New Features in DeltaStreamer :
...
(1) Apply transformation when using delta-streamer to ingest data.
(2) Add Hudi Incremental Source for Delta Streamer
(3) Allow delta-streamer config-property to be passed as command-line
(4) Add Hive Integration to Delta-Streamer and address Review comments
(5) Ensure MultiPartKeysValueExtractor handle hive style partition description
(6) Reuse same spark session on both source and transformer
(7) Support extracting partition fields from _hoodie_partition_path for HoodieIncrSource
(8) Reuse Binary Avro coders
(9) Add push down filter for Incremental source
(10) Add Hoodie DeltaStreamer metrics to track total time taken
2019-02-11 18:22:05 -08:00
Vinoth Chandar
c70dbc13e9
Updating new slack signup link
2019-02-06 13:52:00 -08:00
Kent Yao
2b55f0751f
Using immutable map instead of mutables to generate parameters
2019-01-30 16:09:40 -08:00
Nishith Agarwal
7985eb72b5
Fixing behavior of Merge/CreateHandle for invalid/wrong schema records
2019-01-28 16:01:03 -08:00
Nishith Agarwal
994d42d307
cleaner should now use commit timeline and not include deltacomits
2019-01-28 10:46:33 -08:00
Nishith Agarwal
68723764ed
Adding compaction to HoodieClient example
2019-01-28 10:23:44 -08:00
Nishith Agarwal
169e3f66bb
Filtering partition paths before performing a list status on all partitions
2019-01-25 11:34:00 -08:00
Nishith Agarwal
d1bb804577
Passing a path filter to avoid including folders under .hoodie directory as partition paths
2019-01-11 19:21:09 -08:00
Nishith Agarwal
110df7190b
Enabling hard deletes for MergeOnRead table type
2018-12-31 12:49:58 -08:00
Manu Sridharan
345aaa31aa
Add m2 directory to Travis cache
2018-12-31 10:31:12 -08:00
arukavytsia
6946dd7557
General enhancements
2018-12-18 12:52:39 -08:00
Balaji Varadarajan
30c5f8b7bd
Ensure Hoodie works for non-partitioned Hive table
2018-12-12 13:35:16 -08:00
xubo245
466ff73ffb
fix some spell errorin Hudi
2018-12-12 13:06:25 -08:00
jiale.tan
bf65219b73
feat(SparkDataSource): add structured streaming
2018-12-04 16:33:00 -08:00
Nishith Agarwal
7243ce40c9
Serializing the complete payload object instead of serializing just the GenericRecord
...
Removing Converter hierarchy as we now depend purely on JavaSerialization and require the payload to be java serializable
2018-12-04 11:43:41 -08:00
Nishith Agarwal
e83dde3b95
Returning empty Statues for an empty spark partition caused due to incorrect bin packing
2018-12-04 11:41:38 -08:00
Vinoth Chandar
0015c9b00e
Update committership for balaji
2018-11-30 16:23:10 -08:00
Balaji Varadarajan
f999e4960c
Avoid WriteStatus collect() call when committing batch
2018-11-28 10:41:49 -08:00