1
0
Commit Graph

1644 Commits

Author SHA1 Message Date
wangxianghu
f73bedd374 [MINOR] Remove unused methods (#3152) 2021-06-26 13:19:26 +08:00
Vinay Patil
ed1a5daa9a [HUDI-2060] Added tests for KafkaOffsetGen (#3136) 2021-06-25 12:37:47 -04:00
n3nash
23dbc09a0d [MINOR] Removing un-used files and references (#3150) 2021-06-24 22:17:40 -07:00
s-sanjay
0fb8556b0d Add ability to provide multi-region (global) data consistency across HMS in different regions (#2542)
[global-hive-sync-tool] Add a global hive sync tool to sync hudi table across clusters. Add a way to rollback the replicated time stamp if we fail to sync or if we partly sync

Co-authored-by: Jagmeet Bali <jsbali@uber.com>
2021-06-24 20:26:26 -07:00
Danny Chan
e64fe55054 [HUDI-2068] Skip the assign state for SmallFileAssign when the state can not assign initially (#3148) 2021-06-25 08:57:56 +08:00
yuzhaojing
218f2a6df8 [HUDI-2062] Catch FileNotFoundException in WriteProfiles #getCommitMetadata Safely (#3138)
Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com>
2021-06-25 08:54:59 +08:00
Sebastian Bernauer
b32855545b [HUDI-2069] Fix KafkaAvroSchemaDeserializer to not rely on reflection (#3111)
[HUDI-2069] KafkaAvroSchemaDeserializer should get sourceSchema passed instead using Reflection
2021-06-24 09:08:21 -04:00
pengzhiwei
84dd3ca18b [HUDI-2053] Insert Static Partition With DateType Return Incorrect Partition Value (#3133) 2021-06-24 19:09:37 +08:00
pengzhiwei
7e50f9a5a6 [HUDI-2061] Incorrect Schema Inference For Schema Evolved Table (#3137) 2021-06-23 22:48:01 -07:00
leesf
e039e0ff6d [HUDI-2064] Fix TestHoodieBackedMetadata#testOnlyValidPartitionsAdded (#3141) 2021-06-24 07:37:55 +08:00
yuzhaojing
380518e232 [HUDI-2038] Support rollback inflight compaction instances for CompactionPlanOperator (#3105)
Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com>
2021-06-23 20:58:52 +08:00
Vaibhav Sinha
43b9c1fa1c [HUDI-1826] Add ORC support in HoodieSnapshotExporter (#3130) 2021-06-23 17:04:25 +08:00
Danny Chan
2687eab8f0 [HUDI-2054] Remove the duplicate name for flink write pipeline (#3135) 2021-06-23 14:49:38 +08:00
swuferhong
3fb59dda83 [HUDI-1988] FinalizeWrite() been executed twice in AbstractHoodieWriteClient$commitstats (#3050) 2021-06-22 22:57:09 -07:00
Prashant Wason
11e64b2db0 [HUDI-1717] Metadata Reader should merge all the un-synced but complete instants from the dataset timeline. (#3082) 2021-06-22 23:52:18 +08:00
Prashant Wason
062d5baf84 [HUDI-2013] Removed option to fallback to file listing when Metadata Table is enabled. (#3079) 2021-06-22 23:41:52 +08:00
pengzhiwei
69c0d9e2d0 [HUDI-1883] Support Truncate Table For Hoodie (#3098) 2021-06-22 22:33:20 +08:00
yuzhaojing
5db37c255b [HUDI-2047] Ignore FileNotFoundException in WriteProfiles #getWritePathsOfInstant (#3125)
Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com>
2021-06-22 14:18:46 +08:00
Rong Ma
7bd517a82f [HUDI-2031] JVM occasionally crashes during compaction when spark speculative execution is enabled (#3093)
* unit tests added
2021-06-21 18:09:51 -07:00
swuferhong
cb5cd35991 [HUDI-2043] HoodieDefaultTimeline$filterPendingCompactionTImeline() method have wrong filter condition (#3109) 2021-06-21 17:53:54 -07:00
pengzhiwei
4fd8a88b7e [HUDI-1776] Support AlterCommand For Hoodie (#3086) 2021-06-21 22:58:43 +08:00
swuferhong
f8d9242372 [HUDI-2050] Support rollback inflight compaction instances for batch flink compactor (#3124) 2021-06-21 20:32:48 +08:00
Danny Chan
adf167991a [HUDI-2049] StreamWriteFunction should wait for the next inflight instant time before flushing (#3123) 2021-06-21 20:15:27 +08:00
Sagar Sumit
429e9fb5fe [HUDI-1248] Increase timeout for deltaStreamerTestRunner in TestHoodieDeltaStreamer (#3110) 2021-06-20 21:42:12 -07:00
Raymond Xu
e41f13fe7b [MINOR] Put Azure cache tasks first (#3118) 2021-06-20 14:36:39 -07:00
Wei
c08fbb4268 [MINOR] Remove unused module (#3116) 2021-06-19 12:06:47 -07:00
Sagar Sumit
1cbdb49816 [HUDI-251] Adds JDBC source support for DeltaStreamer (#2915)
As discussed in RFC-14, this change implements the first phase of JDBC incremental puller.
It consists following changes:

- JdbcSource: This class extends RowSource and implements
  fetchNextBatch(Option<String> lastCkptStr, long sourceLimit)

- SqlQueryBuilder: A simple utility class to build sql queries fluently.

- Implements two modes of fetching: full and incremental.
  Full is a complete scan of RDBMS table.
  Incremental is delta since last checkpoint.
  Incremental mode falls back to full fetch in case of any exception.
2021-06-19 10:12:11 -04:00
Wei
7865da1e15 [MINOR] Fix Javadoc wrong references (#3115) 2021-06-18 21:51:54 -07:00
Wei
53396061cc [MINOR] Fix wrong package name (#3114) 2021-06-19 11:50:01 +08:00
Danny Chan
cdb9b48170 [HUDI-2040] Make flink writer as exactly-once by default (#3106) 2021-06-18 13:55:23 +08:00
Danny Chan
aa6342c3c9 [HUDI-2036] Move the compaction plan scheduling out of flink writer coordinator (#3101)
Since HUDI-1955 was fixed, we can move the scheduling out if the
coordinator to make the coordinator more lightweight.
2021-06-18 09:35:09 +08:00
pengzhiwei
b9e28e5292 [HUDI-2033] ClassCastException Throw When PreCombineField Is String Type (#3099) 2021-06-17 23:21:20 +08:00
vinoyang
67c3124352 [HUDI-2032] Make keygen class and keygen type optional for FlinkStreamerConfig (#3104)
* [HUDI-2032] Make keygen class and keygen type optional for FlinkStreamerConfig

* Address the review suggestion
2021-06-17 21:22:13 +08:00
yuzhaojing
f97dd25d41 [HUDI-2019] Set up the file system view storage config for singleton embedded server write config every time (#3102)
Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com>
2021-06-17 20:28:03 +08:00
pengzhiwei
ad53cf450e [HUDI-1879] Fix RO Tables Returning Snapshot Result (#2925) 2021-06-17 04:18:21 -07:00
Danny Chan
6763b45dd4 [HUDI-2030] Add metadata cache to WriteProfile to reduce IO (#3090)
Keeps same number of instant metadata cache and refresh the cache on new
commits.
2021-06-17 19:10:34 +08:00
Danny Chan
0b57483a8e [HUDI-2015] Fix flink operator uid to allow multiple pipelines in one job (#3091) 2021-06-17 09:08:19 +08:00
swuferhong
5ce64a81bd Fix the filter condition is missing in the judgment condition of compaction instance (#3025) 2021-06-16 14:28:53 -07:00
Wei
d519c74626 [HUDI-2008] Avoid the raw type usage in some classes under hudi-utilities module (#3076) 2021-06-16 22:37:29 +08:00
swuferhong
8b0a502c4f [HUDI-2014] Support flink hive sync in batch mode (#3081) 2021-06-16 14:29:16 +08:00
yuzhaojing
61efc6af79 [HUDI-2022] Release writer for append handle #close (#3087)
Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com>
2021-06-16 09:18:38 +08:00
vinoth chandar
910fe4842c [MINOR] Rename broken codecov file (#3088)
- Stop polluting PRs with wrong coverage info
- Retaining the file, so someone can try digging in
2021-06-15 18:05:50 -07:00
Jintao Guan
b8fe5b91d5 [HUDI-764] [HUDI-765] ORC reader writer Implementation (#2999)
Co-authored-by: Qingyun (Teresa) Kang <kteresa@uber.com>
2021-06-15 15:21:43 -07:00
Danny Chan
cb642ceb75 [HUDI-1999] Refresh the base file view cache for WriteProfile (#3067)
Refresh the view to discover new small files.
2021-06-15 08:18:38 -07:00
Raymond Xu
f922837064 [HUDI-1950] Fix Azure CI failure in TestParquetUtils (#2984)
* fix azure pipeline configs

* add pentaho.org in maven repositories

* Make sure file paths with scheme in TestParquetUtils

* add azure build status to README
2021-06-15 03:45:17 -07:00
Prashant Wason
515ce8eb36 [MINOR] Fixed the log which should only be printed when the Metadata Table is disabled. (#3080) 2021-06-15 16:18:15 +08:00
Vinay Patil
769dd2d7c9 [HUDI-2004] Move CheckpointUtils test cases to independant class (#3072) 2021-06-14 17:14:59 +08:00
Sivabalan Narayanan
7d9f9d7d82 [HUDI-1991] Fixing drop dups exception in bulk insert row writer path (#3055) 2021-06-14 09:55:52 +08:00
yuzhaojing
6e78682cea [HUDI-2000] Release file writer for merge handle #close (#3068)
Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com>
2021-06-13 18:09:48 +08:00
swuferhong
0c4f2fdc15 [HUDI-1984] Support independent flink hudi compaction function (#3046) 2021-06-13 15:04:46 +08:00