1
0

466 Commits

Author SHA1 Message Date
vinoth chandar
75040ee9e5 [HUDI-2149] Ensure and Audit docs for every configuration class in the codebase (#3272)
- Added docs when missing
 - Rewrote, reworded as needed
 - Made couple more classes extend HoodieConfig
2021-07-14 10:56:08 -07:00
Sagar Sumit
5804ad8e32 [HUDI-1483] Support async clustering for deltastreamer and Spark streaming (#3142)
- Integrate async clustering service with HoodieDeltaStreamer and HoodieStreamingSink
- Added methods in HoodieAsyncService to reuse code
2021-07-11 14:43:38 -04:00
Sebastian Bernauer
8f7ad8b178 [HUDI-2069] Refactored String constants (#3172) 2021-07-07 14:22:00 -04:00
Randal Boyle
60e0254e67 [HUDI-1996] Adding functionality to allow the providing of basic auth creds for confluent cloud schema registry (#3097)
* adding support for basic auth with confluent cloud schema registry
2021-07-05 23:40:23 -07:00
Sebastian Bernauer
05d6e18190 [HUDI-2055] Added deltastreamer metric for time of lastSync (#3129) 2021-07-05 23:34:46 -07:00
pengzhiwei
b34d53fa9c [HUDI-2088] Missing Partition Fields And PreCombineField In Hoodie Properties For Table Written By Flink (#3171) 2021-07-01 17:25:18 +08:00
wenningd
d412fb2fe6 [HUDI-89] Add configOption & refactor all configs based on that (#2833)
Co-authored-by: Wenning Ding <wenningd@amazon.com>
2021-06-30 14:26:30 -07:00
Vinay Patil
94f0f40fec [HUDI-1944] Support Hudi to read from committed offset (#3175)
* [HUDI-1944] Support Hudi to read from committed offset

* [HUDI-1944] Adding group option to KafkaResetOffsetStrategies

* [HUDI-1944] Update Exception msg
2021-06-30 16:41:28 +08:00
Vinay Patil
039aeb6dce [HUDI-1910] Commit Offset to Kafka after successful Hudi commit (#3092) 2021-06-28 21:52:05 +08:00
zhangyue19921010
e99a6b031b [HUDI-2073] Fix the bug of hoodieClusteringJob never quit (#3157)
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2021-06-26 22:03:41 -07:00
Vinay Patil
ed1a5daa9a [HUDI-2060] Added tests for KafkaOffsetGen (#3136) 2021-06-25 12:37:47 -04:00
n3nash
23dbc09a0d [MINOR] Removing un-used files and references (#3150) 2021-06-24 22:17:40 -07:00
s-sanjay
0fb8556b0d Add ability to provide multi-region (global) data consistency across HMS in different regions (#2542)
[global-hive-sync-tool] Add a global hive sync tool to sync hudi table across clusters. Add a way to rollback the replicated time stamp if we fail to sync or if we partly sync

Co-authored-by: Jagmeet Bali <jsbali@uber.com>
2021-06-24 20:26:26 -07:00
Sebastian Bernauer
b32855545b [HUDI-2069] Fix KafkaAvroSchemaDeserializer to not rely on reflection (#3111)
[HUDI-2069] KafkaAvroSchemaDeserializer should get sourceSchema passed instead using Reflection
2021-06-24 09:08:21 -04:00
Vaibhav Sinha
43b9c1fa1c [HUDI-1826] Add ORC support in HoodieSnapshotExporter (#3130) 2021-06-23 17:04:25 +08:00
Sagar Sumit
429e9fb5fe [HUDI-1248] Increase timeout for deltaStreamerTestRunner in TestHoodieDeltaStreamer (#3110) 2021-06-20 21:42:12 -07:00
Sagar Sumit
1cbdb49816 [HUDI-251] Adds JDBC source support for DeltaStreamer (#2915)
As discussed in RFC-14, this change implements the first phase of JDBC incremental puller.
It consists following changes:

- JdbcSource: This class extends RowSource and implements
  fetchNextBatch(Option<String> lastCkptStr, long sourceLimit)

- SqlQueryBuilder: A simple utility class to build sql queries fluently.

- Implements two modes of fetching: full and incremental.
  Full is a complete scan of RDBMS table.
  Incremental is delta since last checkpoint.
  Incremental mode falls back to full fetch in case of any exception.
2021-06-19 10:12:11 -04:00
Wei
53396061cc [MINOR] Fix wrong package name (#3114) 2021-06-19 11:50:01 +08:00
Wei
d519c74626 [HUDI-2008] Avoid the raw type usage in some classes under hudi-utilities module (#3076) 2021-06-16 22:37:29 +08:00
Vinay Patil
769dd2d7c9 [HUDI-2004] Move CheckpointUtils test cases to independant class (#3072) 2021-06-14 17:14:59 +08:00
Wei
ba728d822f [HUDI-2002] Modify HiveIncrementalPuller log level to ERROR (#3070)
Co-authored-by: wei.zhang2 <wei.zhang2@dmall.com>
2021-06-12 10:21:43 -07:00
Vinoth Govindarajan
9e4114dd46 [HUDI-1790] Added SqlSource to fetch data from any partitions for backfill use case (#2896) 2021-06-10 18:03:07 -04:00
Wei
a8b10e9067 [MINOR] Remove boxing (#3062) 2021-06-10 13:03:32 +08:00
wangxianghu
7261f08507 [HUDI-1929] Support configure KeyGenerator by type (#2993) 2021-06-08 09:26:10 -04:00
pengzhiwei
f760ec543e [HUDI-1659] Basic Implement Of Spark Sql Support For Hoodie (#2645)
Main functions:
Support create table for hoodie.
Support CTAS.
Support Insert for hoodie. Including dynamic partition and static partition insert.
Support MergeInto for hoodie.
Support DELETE
Support UPDATE
Both support spark2 & spark3 based on DataSourceV1.

Main changes:
Add sql parser for spark2.
Add HoodieAnalysis for sql resolve and logical plan rewrite.
Add commands implementation for CREATE TABLE、INSERT、MERGE INTO & CTAS.
In order to push down the update&insert logical to the HoodieRecordPayload for MergeInto, I make same change to the
HoodieWriteHandler and other related classes.
1、Add the inputSchema for parser the incoming record. This is because the inputSchema for MergeInto is different from writeSchema as there are some transforms in the update& insert expression.
2、Add WRITE_SCHEMA to HoodieWriteConfig to pass the write schema for merge into.
3、Pass properties to HoodieRecordPayload#getInsertValue to pass the insert expression and table schema.


Verify this pull request
Add TestCreateTable for test create hoodie tables and CTAS.
Add TestInsertTable for test insert hoodie tables.
Add TestMergeIntoTable for test merge hoodie tables.
Add TestUpdateTable for test update hoodie tables.
Add TestDeleteTable for test delete hoodie tables.
Add TestSqlStatement for test supported ddl/dml currently.
2021-06-07 23:24:32 -07:00
Vinoth Govindarajan
57611d10b5 [HUDI-1743] Added support for SqlFileBasedTransformer (#2747) 2021-06-07 21:48:27 -04:00
wangxianghu
974b476180 [HUDI-1940] Add SqlQueryBasedTransformer unit test (#3004) 2021-05-28 22:30:30 +08:00
Vinay Patil
4eb6ef8144 [HUDI-1935] Updated Logger statement (#2996)
Co-authored-by: veenaypatil <vinay18.patil@gmail.com>
2021-05-26 15:04:58 +08:00
Raymond Xu
afa6bc0b10 [HUDI-1723] Fix path selector listing files with the same mod date (#2845) 2021-05-25 10:19:10 -04:00
wangxianghu
e7020748b5 [HUDI-1920] Set archived as the default value of HOODIE_ARCHIVELOG_FOLDER_PROP_NAME (#2978) 2021-05-25 16:29:55 +08:00
zhangminglei
fe3f5c2d56 [HUDI-1913] Using streams instead of loops for input/output (#2962) 2021-05-19 09:13:38 +08:00
TeRS-K
be9db2c4f5 [HUDI-1055] Remove hardcoded parquet in tests (#2740)
* Remove hardcoded parquet in tests
* Use DataFileUtils.getInstance
* Renaming DataFileUtils to BaseFileUtils

Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2021-05-11 10:01:45 -07:00
Volodymyr Burenin
8a48d16e41 [HUDI-1707] Reduces log level for too verbose messages from info to debug level. (#2714)
* Reduces log level for too verbose messages from info to debug level.
* Sort config output.
* Code Review : Small restructuring + rebasing to master
 - Fixing flaky multi delta streamer test
 - Using isDebugEnabled() checks
 - Some changes to shorten log message without moving to DEBUG

Co-authored-by: volodymyr.burenin <volodymyr.burenin@cloudkitchens.com>
Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2021-05-10 07:16:02 -07:00
Nick Young
ea14d687da [HUDI-1852] Add SCHEMA_REGISTRY_SOURCE_URL_SUFFIX and SCHEMA_REGISTRY_TARGET_URL_SUFFIX property (#2884) 2021-05-01 10:02:00 +08:00
Nick Young
f4e3b94971 [HUDI-1742] Improve table level config priority for HoodieMultiTableDeltaStreamer (#2744) 2021-04-26 22:05:06 +08:00
Sivabalan Narayanan
3e4fa170cf [HUDI-1835] Fixing kafka native config param for auto offset reset (#2864) 2021-04-25 12:16:09 -04:00
pengzhiwei
aacb8be521 [HUDI-1415] Read Hoodie Table As Spark DataSource Table (#2283) 2021-04-20 14:21:38 -07:00
Jintao Guan
3253079507 [HUDI-1764] Add Hudi-CLI support for clustering (#2773)
* tmp base

* update

* update unit test

* update

* update

* update CLI parameters

* linting

* update doSchedule in HoodieClusteringJob

* update

* update diff according to comments
2021-04-20 09:46:42 -07:00
Aditya Tiwari
ec2334ceac [HUDI-1716]: Resolving default values for schema from dataframe (#2765)
- Adding default values and setting null as first entry in UNION data types in avro schema. 

Co-authored-by: Aditya Tiwari <aditya.tiwari@flipkart.com>
2021-04-19 10:05:20 -04:00
Roc Marshal
62bb9e10d9 [Hotfix][utilities] Optimized codes (#2821) 2021-04-15 09:40:14 +08:00
wangxianghu
040756d8c0 [HUDI-1785] Move OperationConverter to hudi-client-common for code reuse (#2798) 2021-04-12 16:22:33 +08:00
li36909
dadd081d45 [HUDI-1751] DeltaStreamer print many unnecessary warn log (#2754) 2021-04-07 00:47:03 -07:00
li36909
920537cac8 [HUDI-1749] Clean/Compaction/Rollback command maybe never exit when operation fail (#2752) 2021-04-05 23:23:15 -07:00
pengzhiwei
684622c7c9 [HUDI-1591] Implement Spark's FileIndex for Hudi to support queries via Hudi DataSource using non-globbed table path and partition pruning (#2651) 2021-04-01 11:12:28 -07:00
Gary Li
452f5e2d66 [HOTFIX] close spark session in functional test suite and disable spark3 test for spark2 (#2727) 2021-03-29 06:04:48 -07:00
n3nash
bec70413c0 [HUDI-1728] Fix MethodNotFound for HiveMetastore Locks (#2731) 2021-03-27 10:07:10 -07:00
garyli1019
6e803e08b1 Moving to 0.9.0-SNAPSHOT on master branch. 2021-03-24 21:37:14 +08:00
n3nash
01a1d7997b [HUDI-1712] Rename & standardize config to match other configs (#2708) 2021-03-24 17:24:02 +08:00
n3nash
d7b18783bd [HUDI-1709] Improving config names and adding hive metastore uri config (#2699) 2021-03-22 01:22:06 -07:00
Volodymyr Burenin
900de34e45 [HUDI-1650] Custom avro kafka deserializer. (#2619)
* Custom avro kafka deserializer

Co-authored-by: volodymyr.burenin <volodymyr.burenin@cloudkitchens.com>
Co-authored-by: Sivabalan Narayanan <sivabala@uber.com>
2021-03-20 00:51:08 -07:00