yanghua
722b6be04a
[HUDI-153] Use com.uber.hoodie.common.util.Option instead of Java and Guava Optional
2019-08-07 11:53:59 -07:00
garyli1019
d288e32833
HUDI-171 delete tmp file in addShutDownHook
2019-08-05 17:24:29 -07:00
Balaji Varadarajan
83dab21ae1
Allow HoodieWrapperFileSystem to wrap other proxy file-system implementations with no getScheme implementation ( #793 )
2019-07-24 21:31:46 -07:00
Balaji Varadarajan
0b451b3a58
HUDI-140 : GCS: Log File Reading not working due to difference in seek() behavior for EOF
2019-07-19 12:38:28 -07:00
n3nash
6efa16317c
Fixing default value for avro 1.7 which assumes NULL value instead of a jsonnode that is null ( #792 )
2019-07-17 03:25:54 -07:00
eisig
c0593e7a13
fix HoodieLogFileReader ( #787 )
2019-07-15 13:25:55 -07:00
Balaji Varadarajan
ae3c02fb3f
HUDI-162 : File System view must be built with correct timeline actions
2019-07-14 00:48:09 -07:00
Balaji Varadarajan
5823c1ebd7
HUDI-138 - Meta Files handling also need to support consistency guard
2019-07-13 22:02:55 -07:00
Balaji Varadarajan
9f18a1ca80
Fixing bugs found during running hoodie demo ( #760 )
2019-06-28 17:49:23 -07:00
Balaji Varadarajan
8223127611
Add maprfs to storage schemes
2019-06-20 22:45:35 -07:00
Balaji Varadarajan
2c40e8419e
Ensure TableMetaClient and FileSystem instances have exclusive copy of Configuration
2019-06-20 14:05:00 -07:00
Balaji Varadarajan
a0d7ab2384
HUDI-70 : Making DeltaStreamer run in continuous mode with concurrent compaction
2019-06-18 17:48:14 -07:00
Balaji Varadarajan
a1483f2c5f
HUDI-148 Small File selection logic for MOR must skip fileIds selected for pending compaction correctly
2019-06-17 18:35:17 -07:00
Nishith Agarwal
129e433641
- Ugrading to Hive 2.x
...
- Eliminating in-memory deltaRecordsMap
- Use writerSchema to generate generic record needed by custom payloads
- changes to make tests work with hive 2.x
2019-06-13 12:46:14 -07:00
Balaji Varadarajan
1c943ab230
Ensure log files are consistently ordered when scanning
2019-06-12 16:16:37 -07:00
Vinoth Chandar
b791473a6d
Introduce HoodieReadHandle abstraction into index
...
- Generalized BloomIndex to work with file ids instead of paths
- Abstracted away Bloom filter checking into HoodieLookupHandle
- Abstracted away range information retrieval into HoodieRangeInfoHandle
2019-06-12 10:46:14 -07:00
Balaji Varadarajan
479908fd20
HUDI-125 : Change License for all source files and update RAT configurations
2019-06-09 11:41:55 -07:00
Balaji Varadarajan
a0391b7c01
LogFile comparator must handle log file names without write token for backwards compatibility
2019-06-06 10:00:31 -07:00
Balaji Varadarajan
93f8f12a30
HUDI-135 - Skip Meta folder when looking for partitions
2019-05-28 17:37:10 -07:00
Balaji Varadarajan
d0d2fa0337
Reduce logging in unit-test runs
2019-05-24 23:43:54 -07:00
Balaji Varadarajan
99b0c72aa6
HUDI-131 Zero FIle Listing in Compactor run
2019-05-24 18:34:14 -07:00
Balaji Varadarajan
145034c5fa
Spark Stage retry handling
2019-05-21 14:49:51 -07:00
Balaji Varadarajan
64fec64097
Timeline Service with Incremental View Syncing support
2019-05-16 13:25:33 -07:00
Balaji Varadarajan
9cce9abf4d
Fix various errors found by long running delta-streamer tests
...
1. Parquet Avro schema mismatch errors when ingesting are sometimes silently ignored due to race-condition in BoundedInMemoryExecutor. This was reproducible when running long-running delta-streamer with wrong schema and it caused data-loss
2. Fix behavior of Delta-Streamer to error out by default if there are any error records
3. Fix a bug in tracking write errors in WriteStats. Earlier the write errors were tracking sampled errors as opposed to total errors.
4. Delta Streamer does not commit the changes done as part of inline compaction as auto-commit is force disabled. Fix this behavior to always auto-commit inline compaction as it would not otherwise commit.
2019-05-13 10:47:34 -07:00
Omkar Joshi
738635306b
migrating kryo's dependency from twitter chill to plain kryo library
2019-05-06 20:32:00 -07:00
Nishith Agarwal
a33a55fcb5
Caching Avro Binary encoder/decoder to avoid creating new one for every record
2019-05-06 11:28:08 -07:00
Nishith Agarwal
26f24b6728
Removing OLD MAGIC header since a) it's no longer used b) causes issues when the data actually has OLD MAGIC
2019-04-25 20:47:16 -07:00
Balaji Varadarajan
2f1e3e15fb
Revert "Read and apply schema for each log block from the metadata header instead of the latest schema"
...
This reverts commit 9e7ce19b06 .
2019-04-18 08:54:34 -07:00
Omkar Joshi
e35d24f31d
Revert "Replacing Apache commons-lang3 object serializer with Kryo serializer"
...
This reverts commit a6c45feb2c .
2019-04-17 09:23:37 -07:00
Nishith Agarwal
9e7ce19b06
Read and apply schema for each log block from the metadata header instead of the latest schema
2019-04-16 17:20:03 -07:00
Nishith Agarwal
2577014617
1. Minor changes to fix compaction 2. Adding 2 compaction policies
2019-04-03 17:38:17 -07:00
Jing Chen
d1d33f725e
[HUDI-66] FSUtils.getRelativePartitionPath does not handle repeated folder names
2019-04-03 17:37:03 -07:00
Nishith Agarwal
3d9041e216
Fixing source schema and writer schema distinction in payloads
2019-03-26 19:44:27 -07:00
Nishith Agarwal
9e59da7fd9
Refactor HoodieTable Rollback to write one rollback instant for a batch of commits to rollback
2019-03-19 10:10:16 -07:00
Nishith Agarwal
0dd4a90b03
Enable multi/nested rollbacks for MOR table type
2019-03-19 10:10:16 -07:00
Omkar Joshi
a6c45feb2c
Replacing Apache commons-lang3 object serializer with Kryo serializer
2019-03-18 14:12:25 -07:00
Omkar Joshi
4a8bec7ea5
Handling duplicate record update for single partition (duplicates in single or different parquet files)
2019-03-10 20:15:17 -07:00
Balaji Varadarajan
3ae6cb4ed5
FileSystem View must treat same fileIds present in different partitions as different file-groups and handle pending compaction correctly
2019-03-01 10:49:04 -08:00
Bhavani Sudha Saktheeswaran
639c287cab
Close FSDataInputStream for meta file open in HoodiePartitionMetadata
2019-02-15 22:16:31 -08:00
Balaji Varadarajan
3a0044216c
New Features in DeltaStreamer :
...
(1) Apply transformation when using delta-streamer to ingest data.
(2) Add Hudi Incremental Source for Delta Streamer
(3) Allow delta-streamer config-property to be passed as command-line
(4) Add Hive Integration to Delta-Streamer and address Review comments
(5) Ensure MultiPartKeysValueExtractor handle hive style partition description
(6) Reuse same spark session on both source and transformer
(7) Support extracting partition fields from _hoodie_partition_path for HoodieIncrSource
(8) Reuse Binary Avro coders
(9) Add push down filter for Incremental source
(10) Add Hoodie DeltaStreamer metrics to track total time taken
2019-02-11 18:22:05 -08:00
Nishith Agarwal
d1bb804577
Passing a path filter to avoid including folders under .hoodie directory as partition paths
2019-01-11 19:21:09 -08:00
Nishith Agarwal
110df7190b
Enabling hard deletes for MergeOnRead table type
2018-12-31 12:49:58 -08:00
arukavytsia
6946dd7557
General enhancements
2018-12-18 12:52:39 -08:00
Balaji Varadarajan
30c5f8b7bd
Ensure Hoodie works for non-partitioned Hive table
2018-12-12 13:35:16 -08:00
xubo245
466ff73ffb
fix some spell errorin Hudi
2018-12-12 13:06:25 -08:00
Nishith Agarwal
7243ce40c9
Serializing the complete payload object instead of serializing just the GenericRecord
...
Removing Converter hierarchy as we now depend purely on JavaSerialization and require the payload to be java serializable
2018-12-04 11:43:41 -08:00
Vinoth Chandar
fa65db9c4c
Explicitly handle lack of append() support during LogWriting
2018-11-27 17:58:43 -08:00
Balaji Varadarajan
25cd05b24e
Useful Hudi CLI commands to debug/analyze production workloads
2018-10-30 10:28:01 -07:00
Balaji Varadarajan
07324e7a20
Compaction validate, unschedule and repair
2018-10-25 14:12:47 -07:00
Xinli shang
d904fe69ca
Fix addMetadataFields() to carry over 'props'
2018-10-24 10:55:13 -07:00