vinothchandar
66c0b81b49
[maven-release-plugin] prepare for next development iteration
2019-05-28 19:17:26 -07:00
vinothchandar
227785c022
[maven-release-plugin] prepare release hoodie-0.4.7
2019-05-28 19:17:15 -07:00
Balaji Varadarajan
93f8f12a30
HUDI-135 - Skip Meta folder when looking for partitions
2019-05-28 17:37:10 -07:00
Balaji Varadarajan
d0d2fa0337
Reduce logging in unit-test runs
2019-05-24 23:43:54 -07:00
Balaji Varadarajan
99b0c72aa6
HUDI-131 Zero FIle Listing in Compactor run
2019-05-24 18:34:14 -07:00
Balaji Varadarajan
145034c5fa
Spark Stage retry handling
2019-05-21 14:49:51 -07:00
Balaji Varadarajan
64fec64097
Timeline Service with Incremental View Syncing support
2019-05-16 13:25:33 -07:00
vinothchandar
446f99aa0f
[maven-release-plugin] prepare for next development iteration
2019-05-14 07:29:22 -07:00
vinothchandar
cc38abecc8
[maven-release-plugin] prepare release hoodie-0.4.6
2019-05-14 07:29:11 -07:00
Balaji Varadarajan
9cce9abf4d
Fix various errors found by long running delta-streamer tests
...
1. Parquet Avro schema mismatch errors when ingesting are sometimes silently ignored due to race-condition in BoundedInMemoryExecutor. This was reproducible when running long-running delta-streamer with wrong schema and it caused data-loss
2. Fix behavior of Delta-Streamer to error out by default if there are any error records
3. Fix a bug in tracking write errors in WriteStats. Earlier the write errors were tracking sampled errors as opposed to total errors.
4. Delta Streamer does not commit the changes done as part of inline compaction as auto-commit is force disabled. Fix this behavior to always auto-commit inline compaction as it would not otherwise commit.
2019-05-13 10:47:34 -07:00
Omkar Joshi
738635306b
migrating kryo's dependency from twitter chill to plain kryo library
2019-05-06 20:32:00 -07:00
Nishith Agarwal
a33a55fcb5
Caching Avro Binary encoder/decoder to avoid creating new one for every record
2019-05-06 11:28:08 -07:00
Nishith Agarwal
26f24b6728
Removing OLD MAGIC header since a) it's no longer used b) causes issues when the data actually has OLD MAGIC
2019-04-25 20:47:16 -07:00
Balaji Varadarajan
2f1e3e15fb
Revert "Read and apply schema for each log block from the metadata header instead of the latest schema"
...
This reverts commit 9e7ce19b06 .
2019-04-18 08:54:34 -07:00
Omkar Joshi
e35d24f31d
Revert "Replacing Apache commons-lang3 object serializer with Kryo serializer"
...
This reverts commit a6c45feb2c .
2019-04-17 09:23:37 -07:00
Nishith Agarwal
9e7ce19b06
Read and apply schema for each log block from the metadata header instead of the latest schema
2019-04-16 17:20:03 -07:00
Nishith Agarwal
2577014617
1. Minor changes to fix compaction 2. Adding 2 compaction policies
2019-04-03 17:38:17 -07:00
Jing Chen
d1d33f725e
[HUDI-66] FSUtils.getRelativePartitionPath does not handle repeated folder names
2019-04-03 17:37:03 -07:00
Nishith Agarwal
3d9041e216
Fixing source schema and writer schema distinction in payloads
2019-03-26 19:44:27 -07:00
Nishith Agarwal
9e59da7fd9
Refactor HoodieTable Rollback to write one rollback instant for a batch of commits to rollback
2019-03-19 10:10:16 -07:00
Nishith Agarwal
0dd4a90b03
Enable multi/nested rollbacks for MOR table type
2019-03-19 10:10:16 -07:00
Omkar Joshi
a6c45feb2c
Replacing Apache commons-lang3 object serializer with Kryo serializer
2019-03-18 14:12:25 -07:00
Omkar Joshi
4a8bec7ea5
Handling duplicate record update for single partition (duplicates in single or different parquet files)
2019-03-10 20:15:17 -07:00
Balaji Varadarajan
3ae6cb4ed5
FileSystem View must treat same fileIds present in different partitions as different file-groups and handle pending compaction correctly
2019-03-01 10:49:04 -08:00
vinothchandar
687395e40f
[maven-release-plugin] prepare for next development iteration
2019-02-27 07:16:27 -08:00
vinothchandar
bbf40ef987
[maven-release-plugin] prepare release hoodie-0.4.5
2019-02-27 07:16:15 -08:00
Bhavani Sudha Saktheeswaran
639c287cab
Close FSDataInputStream for meta file open in HoodiePartitionMetadata
2019-02-15 22:16:31 -08:00
Balaji Varadarajan
3a0044216c
New Features in DeltaStreamer :
...
(1) Apply transformation when using delta-streamer to ingest data.
(2) Add Hudi Incremental Source for Delta Streamer
(3) Allow delta-streamer config-property to be passed as command-line
(4) Add Hive Integration to Delta-Streamer and address Review comments
(5) Ensure MultiPartKeysValueExtractor handle hive style partition description
(6) Reuse same spark session on both source and transformer
(7) Support extracting partition fields from _hoodie_partition_path for HoodieIncrSource
(8) Reuse Binary Avro coders
(9) Add push down filter for Incremental source
(10) Add Hoodie DeltaStreamer metrics to track total time taken
2019-02-11 18:22:05 -08:00
Nishith Agarwal
d1bb804577
Passing a path filter to avoid including folders under .hoodie directory as partition paths
2019-01-11 19:21:09 -08:00
Nishith Agarwal
110df7190b
Enabling hard deletes for MergeOnRead table type
2018-12-31 12:49:58 -08:00
arukavytsia
6946dd7557
General enhancements
2018-12-18 12:52:39 -08:00
Balaji Varadarajan
30c5f8b7bd
Ensure Hoodie works for non-partitioned Hive table
2018-12-12 13:35:16 -08:00
xubo245
466ff73ffb
fix some spell errorin Hudi
2018-12-12 13:06:25 -08:00
Nishith Agarwal
7243ce40c9
Serializing the complete payload object instead of serializing just the GenericRecord
...
Removing Converter hierarchy as we now depend purely on JavaSerialization and require the payload to be java serializable
2018-12-04 11:43:41 -08:00
Vinoth Chandar
fa65db9c4c
Explicitly handle lack of append() support during LogWriting
2018-11-27 17:58:43 -08:00
Balaji Varadarajan
25cd05b24e
Useful Hudi CLI commands to debug/analyze production workloads
2018-10-30 10:28:01 -07:00
Balaji Varadarajan
07324e7a20
Compaction validate, unschedule and repair
2018-10-25 14:12:47 -07:00
Xinli shang
d904fe69ca
Fix addMetadataFields() to carry over 'props'
2018-10-24 10:55:13 -07:00
Balaji Varadarajan
9710b5a3a6
Ensure Hoodie metadata folder and files are filtered out when constructing Parquet Data Source
2018-10-01 14:27:14 +05:30
jiale.tan
98fd97b65f
feature(HoodieGlobalBloomIndex): adds a new type of bloom index to allow global record key lookup
2018-09-29 19:55:20 +05:30
vinothchandar
7ba842c0fe
[maven-release-plugin] prepare for next development iteration
2018-09-28 11:27:00 +05:30
vinothchandar
5847b61f44
[maven-release-plugin] prepare release hoodie-0.4.4
2018-09-28 11:26:15 +05:30
vinothchandar
9ca6f91e97
Perform consistency checks during write finalize
...
- Check to ensure written files are listable on storage
- Docs reflected to capture how this helps with s3 storage
- Unit tests added, corrections to existing tests
- Fix DeltaStreamer to manage archived commits in a separate folder
2018-09-28 08:04:41 +05:30
Balaji Varadarajan
4c74dd4cad
Travis CI tests needs to be run in quieter mode (WARN log level) to avoid max log-size errors
2018-09-26 21:10:20 +05:30
Yishuang Lu
faf93b6340
Fix the name of avro schema file in Test
...
Fixed the name of avro schema file in Test
Signed-off-by: Yishuang Lu <luystu@gmail.com >
2018-09-24 21:58:34 +05:30
Balaji Varadarajan
5cb28e7b1f
Explicitly release resources in LogFileReader and TestHoodieClientBase
2018-09-20 13:24:57 +05:30
Balaji Varadarajan
2728f96505
Add dummy classes to dump all classes loaded as part of packaging modules to ensure javadoc and sources jars are getting created
2018-09-18 09:24:33 +05:30
Vinoth Chandar
bd5af89f12
[maven-release-plugin] rollback the release of hoodie-0.4.4
2018-09-13 15:01:53 +05:30
Vinoth Chandar
d1cc864a43
[maven-release-plugin] prepare for next development iteration
2018-09-12 23:59:47 +05:30
Vinoth Chandar
b748bc836d
[maven-release-plugin] prepare release hoodie-0.4.4
2018-09-12 23:59:34 +05:30