Bhavani Sudha Saktheeswaran
75c7a2622b
Create hoodie-presto bundle jar
...
Exclude common dependencies that are available in Presto
2019-02-24 19:49:02 -08:00
n3nash
94eb6fd919
Merge pull request #570 from yaooqinn/hiveJarSuffix
...
typo: bundle jar with unrecognized variables
2019-02-20 16:32:57 -08:00
Bhavani Sudha Saktheeswaran
639c287cab
Close FSDataInputStream for meta file open in HoodiePartitionMetadata
2019-02-15 22:16:31 -08:00
Kent Yao
8dddecf00f
handle no such element exception in HoodieSparkSqlWriter
2019-02-15 22:11:48 -08:00
vinoth chandar
a16aa2a78f
Create CNAME
2019-02-15 21:53:08 -08:00
vinoth chandar
ef0d6f2218
Update site url in README
2019-02-15 21:28:39 -08:00
Kent Yao
09f203d324
typo: bundle jar with unrecongnized variables
2019-02-13 16:46:11 +08:00
Balaji Varadarajan
8adaca3454
Table rollback for inflight compactions MUST not delete instant files at any time to avoid race conditions
2019-02-11 18:30:21 -08:00
Balaji Varadarajan
defcf6a0b9
Fix Hoodie Record Reader to work with non-partitioned dataset
2019-02-11 18:29:23 -08:00
Balaji Varadarajan
3a0044216c
New Features in DeltaStreamer :
...
(1) Apply transformation when using delta-streamer to ingest data.
(2) Add Hudi Incremental Source for Delta Streamer
(3) Allow delta-streamer config-property to be passed as command-line
(4) Add Hive Integration to Delta-Streamer and address Review comments
(5) Ensure MultiPartKeysValueExtractor handle hive style partition description
(6) Reuse same spark session on both source and transformer
(7) Support extracting partition fields from _hoodie_partition_path for HoodieIncrSource
(8) Reuse Binary Avro coders
(9) Add push down filter for Incremental source
(10) Add Hoodie DeltaStreamer metrics to track total time taken
2019-02-11 18:22:05 -08:00
Vinoth Chandar
c70dbc13e9
Updating new slack signup link
2019-02-06 13:52:00 -08:00
Kent Yao
2b55f0751f
Using immutable map instead of mutables to generate parameters
2019-01-30 16:09:40 -08:00
Nishith Agarwal
7985eb72b5
Fixing behavior of Merge/CreateHandle for invalid/wrong schema records
2019-01-28 16:01:03 -08:00
Nishith Agarwal
994d42d307
cleaner should now use commit timeline and not include deltacomits
2019-01-28 10:46:33 -08:00
Nishith Agarwal
68723764ed
Adding compaction to HoodieClient example
2019-01-28 10:23:44 -08:00
Nishith Agarwal
169e3f66bb
Filtering partition paths before performing a list status on all partitions
2019-01-25 11:34:00 -08:00
Nishith Agarwal
d1bb804577
Passing a path filter to avoid including folders under .hoodie directory as partition paths
2019-01-11 19:21:09 -08:00
Nishith Agarwal
110df7190b
Enabling hard deletes for MergeOnRead table type
2018-12-31 12:49:58 -08:00
Manu Sridharan
345aaa31aa
Add m2 directory to Travis cache
2018-12-31 10:31:12 -08:00
arukavytsia
6946dd7557
General enhancements
2018-12-18 12:52:39 -08:00
Balaji Varadarajan
30c5f8b7bd
Ensure Hoodie works for non-partitioned Hive table
2018-12-12 13:35:16 -08:00
xubo245
466ff73ffb
fix some spell errorin Hudi
2018-12-12 13:06:25 -08:00
jiale.tan
bf65219b73
feat(SparkDataSource): add structured streaming
2018-12-04 16:33:00 -08:00
Nishith Agarwal
7243ce40c9
Serializing the complete payload object instead of serializing just the GenericRecord
...
Removing Converter hierarchy as we now depend purely on JavaSerialization and require the payload to be java serializable
2018-12-04 11:43:41 -08:00
Nishith Agarwal
e83dde3b95
Returning empty Statues for an empty spark partition caused due to incorrect bin packing
2018-12-04 11:41:38 -08:00
Vinoth Chandar
0015c9b00e
Update committership for balaji
2018-11-30 16:23:10 -08:00
Balaji Varadarajan
f999e4960c
Avoid WriteStatus collect() call when committing batch
2018-11-28 10:41:49 -08:00
Vinoth Chandar
fa65db9c4c
Explicitly handle lack of append() support during LogWriting
2018-11-27 17:58:43 -08:00
Nishith Agarwal
d0fde47458
Fixing number of insert buckets to be generated by rounding off to the closest greater integer
2018-11-15 10:04:45 -08:00
Vinoth Chandar
1362942aa3
Enabling auto tuning of insert splits by default
2018-11-08 09:48:23 -08:00
Balaji Varadarajan
25cd05b24e
Useful Hudi CLI commands to debug/analyze production workloads
2018-10-30 10:28:01 -07:00
Balaji Varadarajan
07324e7a20
Compaction validate, unschedule and repair
2018-10-25 14:12:47 -07:00
Xinli shang
d904fe69ca
Fix addMetadataFields() to carry over 'props'
2018-10-24 10:55:13 -07:00
Nishith Agarwal
48aa026dc4
Adding documentation for migration guide and COW vs MOR tradeoffs, moving some docs around for more clarity
2018-10-19 15:00:38 -07:00
jiale.tan
1628d044ac
feat(SparkDataSource): add additional feature to drop later arriving dups
2018-10-16 11:52:50 -07:00
Balaji Varadarajan
8485b9e263
Fix regression which broke HudiInputFormat handling of non-hoodie datasets
2018-10-16 18:39:56 +01:00
Vinoth Chandar
1fca9b21cc
Add --filter-dupes to DeltaStreamer
...
- Optionally filter out duplicates before inserting data
- Unit tests
2018-10-04 11:25:18 +05:30
vinoth chandar
0a200c32e5
Reflect new committership, id changes for devs
2018-10-02 11:00:50 +05:30
Balaji Varadarajan
f3418e4718
Docker Container Build and Run setup with foundations for adding docker integration tests. Docker images built with Hadoop 2.8.4 Hive 2.3.3 and Spark 2.3.1 and published to docker-hub
...
Look at quickstart document for how to setup docker and run demo
2018-10-02 09:28:21 +05:30
Balaji Varadarajan
9710b5a3a6
Ensure Hoodie metadata folder and files are filtered out when constructing Parquet Data Source
2018-10-01 14:27:14 +05:30
vinoth chandar
06bdba3cef
Update Gemfile.lock with newer jekyll version
2018-09-29 20:50:03 +05:30
vinothchandar
b5a75fdd91
Adding Jiale & Anbu to contributors list
2018-09-29 20:20:28 +05:30
jiale.tan
98fd97b65f
feature(HoodieGlobalBloomIndex): adds a new type of bloom index to allow global record key lookup
2018-09-29 19:55:20 +05:30
vinothchandar
7ba842c0fe
[maven-release-plugin] prepare for next development iteration
2018-09-28 11:27:00 +05:30
vinothchandar
5847b61f44
[maven-release-plugin] prepare release hoodie-0.4.4
2018-09-28 11:26:15 +05:30
vinothchandar
05bf14a42e
Update RELEASE_NOTES for release 0.4.4
2018-09-28 11:05:24 +05:30
vinothchandar
9ca6f91e97
Perform consistency checks during write finalize
...
- Check to ensure written files are listable on storage
- Docs reflected to capture how this helps with s3 storage
- Unit tests added, corrections to existing tests
- Fix DeltaStreamer to manage archived commits in a separate folder
2018-09-28 08:04:41 +05:30
Balaji Varadarajan
4c74dd4cad
Travis CI tests needs to be run in quieter mode (WARN log level) to avoid max log-size errors
2018-09-26 21:10:20 +05:30
Yishuang Lu
faf93b6340
Fix the name of avro schema file in Test
...
Fixed the name of avro schema file in Test
Signed-off-by: Yishuang Lu <luystu@gmail.com >
2018-09-24 21:58:34 +05:30
Balaji Varadarajan
460e24e84b
Hive Sync handling must work for datasets with multi-partition keys
2018-09-20 16:53:26 +05:30