1
0
Commit Graph

2109 Commits

Author SHA1 Message Date
Y Ethan Guo
6aa710eae0 [MINOR] Add more configuration to Kafka setup script (#3992)
* [MINOR] Add more configuration to Kafka setup script

* Add option to reuse Kafka topic

* Minor fixes to README
2021-11-23 07:33:38 +05:30
Sagar Sumit
e22150fe15 [HUDI-1937] Rollback unfinished replace commit to allow updates (#3869)
* [HUDI-1937] Rollback unfinished replace commit to allow updates while clustering

* Revert and delete requested replacecommit too

* Rollback pending clustering instants transactionally

* No double locking and add a config to enable rollback

* Update config to be clear about rollback only on conflict
2021-11-23 07:29:03 +05:30
Jimmy.Zhou
0d1e7ecdab [MINOR] Fix typo,'multipe' corrected to 'multiple' (#4068) 2021-11-22 17:20:23 -08:00
Y Ethan Guo
772af935d5 [HUDI-2737] Use earliest instant by default for async compaction and clustering jobs (#3991)
Address review comments

Fix test failures

Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>
2021-11-23 06:49:41 +05:30
Alexey Kudinkin
3bdab01a49 [HUDI-2550] Expand File-Group candidates list for appending for MOR tables (#3986) 2021-11-22 19:19:59 -05:00
Sagar Sumit
fe57e9beea [HUDI-2599] Make addFilesToview and fetchLatestBaseFiles public (#4066) 2021-11-22 12:23:50 -05:00
Sivabalan Narayanan
fc9ca6a07a [HUDI-2559] Converting commit timestamp format to millisecs (#4024)
- Adds support for generating commit timestamps with millisecs granularity. 
- Older commit timestamps (in secs granularity) will be suffixed with 999 and parsed with millisecs format.
2021-11-22 11:44:38 -05:00
Sagar Sumit
89452063b4 [MINOR] Fix instant parsing in HoodieClusteringJob (#4071) 2021-11-22 08:57:44 -05:00
Manoj Govindassamy
7f3b89fad7 [HUDI-2472] Enabling metadata table for TestHoodieIndex test case (#4045)
- Enablng the metadata table for testSimpleGlobalIndexTagLocationWhenShouldUpdatePartitionPath.
   This is more of a test issue.
2021-11-22 07:21:24 -05:00
zhangyue19921010
a2c91a7a9b [HUDI-2533] New option for hoodieClusteringJob to check, rollback and re-execute the last failed clustering job (#3765)
* coding finished and need to do uts

* add uts

* code review

* code review

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2021-11-22 16:30:33 +05:30
Raymond Xu
02f7ca2b05 [HUDI-1870] Add more Spark CI build tasks (#4022)
* [HUDI-1870] Add more Spark CI build tasks

- build for spark3.0.x
- build for spark-shade-unbundle-avro
- fix build failures
  - delete unnecessary assertion for spark 3.0.x
  - use AvroConversionUtils#convertAvroSchemaToStructType instead of calling SchemaConverters#toSqlType directly to solve the compilation failures with spark-shade-unbundle-avro (#5)

Co-authored-by: Yann <biyan900116@gmail.com>
2021-11-22 02:16:45 -08:00
Danny Chan
8281cbf762 [HUDI-2799] Fix the classloader of flink write task (#4042) 2021-11-22 11:05:05 +08:00
董可伦
2533a9cc17 [MINOR] Fix typos (#4053) 2021-11-21 16:34:59 +08:00
Nate Radtke
887787e8b9 [HUDI-1932] Update Hive sync timestamp when change detected (#3053)
* Update Hive sync timestamp when change detected

Only update the last commit timestamp on the Hive table when the table schema
has changed or a partition is created/updated.

When using AWS Glue Data Catalog as the metastore for Hive this will ensure
that table versions are substantive (including schema and/or partition
changes). Prior to this change when a Hive sync is performed without schema
or partition changes the table in the Glue Data Catalog would have a new
version published with the only change being the timestamp property.

https://issues.apache.org/jira/browse/HUDI-1932

* add conditional sync flag

* fix testSyncWithoutDiffs

* fix HiveSyncConfig

Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>
2021-11-21 12:11:05 +05:30
Danny Chan
520538b15d [HUDI-2392] Make flink parquet reader compatible with decimal BINARY encoding (#4057) 2021-11-21 13:27:18 +08:00
Danny Chan
0411f73c7d [HUDI-2804] Add option to skip compaction instants for streaming read (#4051) 2021-11-21 12:38:56 +08:00
leesf
74b59a44ec [HUDI-2813] Claim RFC number for RFC for spark datasource V2 Integration (#4059) 2021-11-20 18:59:12 -08:00
dufeng1010
305d160081 [MINOR] optimize in constructor of inputbatch class (#4040)
Co-authored-by: 闫杜峰 <yandufeng@sinochem.com>
2021-11-21 10:11:01 +08:00
rmahindra123
1a5484d2db [MINOR] Claim RFC number for RFC for debezium source for deltastreamer (#4047) 2021-11-21 09:28:48 +08:00
vinoth chandar
ae0c67d9fc [HUDI-2795] Add mechanism to safely update,delete and recover table properties (#4038)
* [HUDI-2795] Add mechanism to safely update,delete and recover table properties

  - Fail safe mechanism, that lets queries succeed off a backup file
  - Readers who are not upgraded to this version of code will just fail until recovery is done.
  - Added unit tests that exercises all these scenarios.
  - Adding CLI for recovery, updation to table command.
  - [Pending] Add some hash based verfication to ensure any rare partial writes for HDFS

* Fixing upgrade/downgrade infrastructure to use new updation method
2021-11-20 08:07:40 -08:00
Harsha Teja Kanna
f4b974ac7b [HUDI-2742] Added S3 object filter to support multiple S3EventsHoodieIncrSources single S3 meta table (#4025) 2021-11-20 14:54:21 +05:30
Ron
6cc97cc0c9 Remove the aws packages from hudi flink bundle jar (#4050) 2021-11-20 11:55:12 +08:00
wenningd
3dc6262437 [HUDI-2242] Add configuration inference logic for few options (#3359)
Co-authored-by: Wenning Ding <wenningd@amazon.com>
2021-11-19 19:38:38 -08:00
Manoj Govindassamy
0230d40b74 [HUDI-2796] Metadata table support for Restore action to first commit (#4039)
- Adding support for the metadata table to restore to first commit and
   take proper action for the bootstrap on subequent commits.
2021-11-19 20:02:57 -05:00
Manoj Govindassamy
c8617d9390 [HUDI-2472] Enabling metadata table for TestHoodieMergeOnReadTable and TestHoodieCompactor (#4023) 2021-11-19 20:02:21 -05:00
Manoj Govindassamy
459b34240b [HUDI-2593] Virtual keys support for metadata table (#3968)
- Metadata table today has virtual keys disabled, thereby populating the metafields
  for each record written out and increasing the overall storage space used. Hereby
  adding virtual keys support for metadata table so that metafields are disabled
  for metadata table records.

- Adding a custom KeyGenerator for Metadata table so as to not rely on the
  default Base/SimpleKeyGenerators which currently look for record key
  and partition field set in the table config.

- AbstractHoodieLogRecordReader's version of processing next data block and
  createHoodieRecord() will be a generic version and making the derived class
  HoodieMetadataMergedLogRecordReader take care of the special creation of
  records from explictly passed in partition names.
2021-11-19 18:11:29 -05:00
Sagar Sumit
eba354e922 [HUDI-2731] Make clustering work regardless of whether there are base… (#3970) 2021-11-19 11:09:08 -05:00
Danny Chan
bf008762df [HUDI-2798] Fix flink query operation fields (#4041) 2021-11-19 23:39:37 +08:00
Danny Chan
7a00f867ae [HUDI-2791] Allows duplicate files for metadata commit (#4033) 2021-11-19 14:30:17 +08:00
Udit Mehrotra
4e067ca581 [HUDI-2641] Avoid deleting all inflight commits heartbeats while rolling back failed writes (#3956) 2021-11-18 08:33:50 -05:00
wenningd
24def0b30d [HUDI-2362] Add external config file support (#3416)
Co-authored-by: Wenning Ding <wenningd@amazon.com>
2021-11-18 01:59:26 -08:00
Danny Chan
8772cec4bd [HUDI-2790] Fix the changelog mode of HoodieTableSource (#4029) 2021-11-18 16:40:48 +08:00
Danny Chan
71a2ae0fd6 [HUDI-2789] Flink batch upsert for non partitioned table does not work (#4028) 2021-11-18 13:59:03 +08:00
Sivabalan Narayanan
2d3f2a3275 [HUDI-2734] Setting default metadata enable as false for Java (#4003) 2021-11-17 14:43:00 -05:00
Manoj Govindassamy
f715cf607f [HUDI-2716] InLineFS support for S3FS logs (#3977) 2021-11-17 13:59:38 -05:00
wenningd
1ee12cfa6f [HUDI-2314] Add support for DynamoDb based lock provider (#3486)
- Co-authored-by: Wenning Ding <wenningd@amazon.com>
- Co-authored-by: Sivabalan Narayanan <n.siva.b@gmail.com>
2021-11-17 12:09:31 -05:00
卢波
826414cff5 [MINOR] Add the Schema for GooseFS to StorageSchemes (#3982)
Co-authored-by: lubo <bollu@tencent.com>
2021-11-17 22:47:52 +08:00
董可伦
4d884bdaa9 [MINOR] Fix typo,'Hooide' corrected to 'Hoodie' (#4007) 2021-11-17 16:50:04 +08:00
0x574C
aec5d11da2 Check --source-avro-schema-path parameter (#3987)
Co-authored-by: 0x3E6 <dragon1996>
2021-11-17 14:45:43 +08:00
Sivabalan Narayanan
ce7d233307 [HUDI-2151] Part3 Enabling marker based rollback as default rollback strategy (#3950)
* Enabling timeline server based markers

* Enabling timeline server based markers and marker based rollback

* Removing constraint that timeline server can be enabled only for hdfs

* Fixing tests
2021-11-17 11:51:28 +05:30
Sivabalan Narayanan
04eb5fdc65 [HUDI-2753] Ensure list based rollback strategy is used for restore (#3983) 2021-11-17 10:06:55 +05:30
Alexey Kudinkin
cbcbec4d38 [MINOR] Fixed checkstyle config to be based off Maven root-dir (requires Maven >=3.3.1 to work properly); (#4009)
Updated README
2021-11-16 21:30:16 -05:00
Danny Chan
6f5e661010 [HUDI-2769] Fix StreamerUtil#medianInstantTime for very near instant time (#4005) 2021-11-16 13:46:34 +08:00
Sivabalan Narayanan
bff8769ed4 [HUDI-2712] Fixing a bug with rollback of partially failed commit which has new partitions (#3947) 2021-11-15 22:36:03 -05:00
zhangyue19921010
38b6934352 [HUDI-2683] Parallelize deleting archived hoodie commits (#3920)
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2021-11-15 22:36:54 +08:00
Sivabalan Narayanan
53d2d6ae24 [HUDI-2744] Fix parsing of metadadata table compaction timestamp when metrics are enabled (#3976) 2021-11-15 07:27:35 -05:00
dufeng1010
3c4319729c [MINOR] Fix typo in IntervalTreeBasedGlobalIndexFileFilter (#3993)
Co-authored-by: 闫杜峰 <yandufeng@sinochem.com>
2021-11-15 14:39:43 +08:00
xiarixiaoyao
a0dae41409 [HUDI-2758] remove redundant code in the hoodieRealtimeInputFormatUitls.getRealtimeSplits (#3994) 2021-11-15 11:29:40 +08:00
Manoj Govindassamy
a14d1040b9 [HUDI-2589] Claiming RFC-37 for Metadata based bloom index feature. (#3995) 2021-11-14 20:47:41 -05:00
Yann Byron
0bb6d8ff80 [HUDI-2706] refactor spark-sql to make consistent with DataFrame api (#3936) 2021-11-14 15:44:39 -08:00