1
0
Commit Graph

1035 Commits

Author SHA1 Message Date
Raymond Xu
5e7ab11e2e [HUDI-994] Move TestHoodieIndex test cases to unit tests (#1850) 2020-07-21 10:23:43 -07:00
lw0090
1ec89e9a94 [HUDI-839] Introducing support for rollbacks using marker files (#1756)
* [HUDI-839] Introducing rollback strategy using marker files

 - Adds a new mechanism for rollbacks where it's based on the marker files generated during the write
 - Consequently, marker file/dir deletion now happens post commit, instead of during finalize 
 - Marker files are also generated for AppendHandle, making it consistent throughout the write path 
 - Until upgrade-downgrade mechanism can upgrade non-marker based inflight writes to marker based, this should only be turned on for new datasets.
 - Added marker dir deletion after successful commit/rollback, individual files are not deleted during finalize
 - Fail safe for deleting marker directories, now during timeline archival process
 - Added check to ensure completed instants are not rolled back using marker based strategy. This will be incorrect
 - Reworked tests to rollback inflight instants, instead of completed instants whenever necessary
 - Added an unit test for MarkerBasedRollbackStrategy


Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2020-07-20 22:41:42 -07:00
Prashant Wason
b71f25f210 [HUDI-92] Provide reasonable names for Spark DAG stages in HUDI. (#1289) 2020-07-19 10:29:25 -07:00
Udit Mehrotra
1aae437257 [HUDI-1102] Add common useful Spark related and Table path detection utilities (#1841)
Co-authored-by: Mehrotra <uditme@amazon.com>
2020-07-18 16:16:32 -07:00
wenningd
bf1d36fa63 [HUDI-1087] Handle decimal type for realtime record reader with SparkSQL (#1831)
Co-authored-by: Wenning Ding <wenningd@amazon.com>
2020-07-15 07:30:58 -07:00
Raymond Xu
b399b4ad43 [HUDI-996] Add functional test in hudi-client (#1824)
- Add functional test suite in hudi-client
- Tag TestHBaseIndex as functional
2020-07-15 08:28:50 +08:00
Raymond Xu
f5dc8ca733 [HUDI-994] Split TestHBaseIndex to unit tests (#1818)
- Refactor and improve TestHBaseIndex for performance
- Move HBaseIndex unit tests to different test classes
2020-07-13 20:32:01 -07:00
Sivabalan Narayanan
21bb1b505a [HUDI-1068] Fixing deletes in global bloom when update partition path is set (#1793) 2020-07-13 22:34:07 -04:00
miaomiaomiao
10e457278b [HUDI-1078]Fix IllegalArgumentException in Delete data demo of Quick-Start Guide (#1808) 2020-07-13 11:38:06 -04:00
Raymond Xu
20ac7c3337 [HUDI-994] Make TestHBaseQPSResourceAllocator a unit test (#1820) 2020-07-11 09:15:05 -07:00
GuoPhilipse
abfebd30f3 [MINOR] Update parameter description (#1821) 2020-07-11 22:57:12 +08:00
Pratyaksh Sharma
9627a385fe [HUDI-916]: Added support for multiple input formats in TimestampBasedKeyGenerator (#1648) 2020-07-10 15:28:45 -04:00
Pratyaksh Sharma
c7f1a781ab [HUDI-728]: Implemented custom key generator (#1433) 2020-07-09 07:35:07 -04:00
Trevor
d58644b657 [HUDI-1062]Remove unnecessary maxEvent check and add some log in KafkaOffsetGen (#1779) 2020-07-08 21:07:34 -07:00
Satish Kotha
086853c004 [HUDI-1080] Fix backward compatibility for com.uber inputformats 2020-07-08 15:30:07 -07:00
Raymond Xu
7b2a947aed [HUDI-1069] Remove duplicate assertNoWriteErrors() (#1797) 2020-07-08 13:58:15 +08:00
mabin001
8c4ff185f1 [HUDI-1064]Trim hoodie table name (#1805) 2020-07-07 19:10:16 +08:00
Shen Hong
be85a6c32b [HUDI-1004] Support update metrics in HoodieDeltaStreamerMetrics (#1732) 2020-07-06 09:44:02 -07:00
Raymond Xu
3b9a30528b [HUDI-996] Add functional test suite for hudi-utilities (#1746)
- Share resources for functional tests
- Add suite for functional test classes from hudi-utilities
2020-07-05 16:44:31 -07:00
Cory Locklear
574dcf920c [MINOR] Relocate jetty during shading/packaging for Databricks runtime (#1781) 2020-07-03 16:22:52 -07:00
andreitaleanu
37ea79566d [HUDI-539] Make HoodieROTablePathFilter implement Configurable (#1784)
Co-authored-by: Andrei Taleanu <taleanu@adobe.com>
2020-07-03 13:39:53 -07:00
baobaoyeye
2be924fd3a [HUDI-760]Remove Rolling Stat management from Hudi Writer (#1739) 2020-06-30 20:07:09 -07:00
Balaji Varadarajan
8919be6a5d [HUDI-855] Run Cleaner async with writing (#1577)
- Cleaner can now run concurrently with write operation 
- Configs to turn on/off

Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2020-06-28 02:04:50 -07:00
Raymond Xu
31247e9b34 [HUDI-896] Report test coverage by modules & parallelize CI (#1753)
- use codecov flags for each module to report coverage
- parallelize CI jobs for shorter time
- add a testcase for MetricsReporterFactory (to trigger codecov comment)
2020-06-27 23:16:12 -07:00
Prashant Wason
2603cfb33e [HUDI-684] Introduced abstraction for writing and reading different types of base file formats. (#1687)
Notable changes:
    1. HoodieFileWriter and HoodieFileReader abstractions for writer/reader side of a base file format
    2. HoodieDataBlock abstraction for creation specific data blocks for base file formats. (e.g. Parquet has HoodieAvroDataBlock)
    3. All hardocded references to Parquet / Parquet based classes have been abstracted to call methods which accept a base file format
    4. HiveSyncTool accepts the base file format as a CLI parameter
    5. HoodieDeltaStreamer accepts the base file format as a CLI parameter
    6. HoodieSparkSqlWriter accepts the base file format as a parameter
2020-06-25 23:46:55 -07:00
wangxianghu
5e47673341 [HUDI-1035] Remove unused class KeyLookupResult (#1754) 2020-06-23 17:01:03 -07:00
Shen Hong
89e37d5273 [HUDI-908] Add some data types to HoodieTestDataGenerator and fix some some bugs. (#1690) 2020-06-22 08:13:28 -07:00
wangxianghu
68a656b016 [HUDI-1032] Remove unused code in HoodieCopyOnWriteTable and code clean (#1750) 2020-06-21 07:34:47 -07:00
Raymond Xu
8a9fdd603e [HUDI-1023] Add validation error messages in delta sync (#1710)
- Remove explicitly specifying BLOOM_INDEX since thats the default anyway
2020-06-19 12:12:35 -07:00
Raymond Xu
ab724af5c4 [MINOR] Rename TestSourceConfig to SourceConfigs (#1749) 2020-06-19 12:08:19 -07:00
hongdd
f3a701757b [HUDI-696] Add unit test for CommitsCommand (#1724) 2020-06-18 21:42:13 +08:00
hongdd
5099a91edd [HUDI-709] Add unit test for UtilsCommand (#1686) 2020-06-18 19:54:14 +08:00
Sivabalan Narayanan
2a04647f5e [MINOR] Updating doap file for 0.5.3 release (#1740) 2020-06-16 12:47:30 -07:00
Yajun Luo
043eb564c2 [HUDI-1003] Handle partitions correctly for syncing hudi non-parititioned table to hive (#1720) 2020-06-15 19:02:03 +08:00
Litianye
ede6c9bda4 [HUDI-1006] Deltastreamer use kafkaSource with offset reset strategy:latest can't consume data (#1719) 2020-06-14 18:01:44 +08:00
vinoyang
31ef4acc59 [MINOR] Fix the ordered list for the hudi-examples README file (#1733) 2020-06-14 16:27:26 +08:00
hongdd
fcabc8fbca [HUDI-1019] Clean refresh command in CLI (#1725) 2020-06-14 14:30:28 +08:00
Satish Kotha
a7fd331624 Add unit test for snapshot reads in hadoop-mr 2020-06-13 10:23:05 -07:00
sathyaprakashg
df2e0c760e HUDI-942 Increase default value number of delta commits for inline compaction (#1664)
Co-authored-by: Sathyaprakash Govindasamy <sathyaprakashg@zillowgroup.com>
2020-06-10 16:16:44 -07:00
Gary Li
37838cea60 [HUDI-822] decouple Hudi related logics from HoodieInputFormat (#1592)
- Refactoring business logic out of InputFormat into Utils helpers.
2020-06-09 06:10:16 -07:00
shenhong
3387b3841f [HUDI-1005] fix NPE in HoodieWriteClient.clean 2020-06-09 05:57:04 -07:00
Shen Hong
6318e943d1 [HUDI-1016] Code optimization in MergeOnReadRollbackActionExecutor(#1718) 2020-06-09 19:14:26 +08:00
garyli1019
22cd824d99 HUDI-494 fix incorrect record size estimation 2020-06-08 20:29:29 -07:00
lw0090
9e07cebece [HUDI-974] Fix fields out of order in MOR mode when using Hive (#1711) 2020-06-09 09:22:06 +08:00
Wenning Ding
7d40f19f39 HUDI-515 Resolve API conflict for Hive 2 & Hive 3 2020-06-08 14:18:38 -07:00
liujinhui
97ab97b726 [HUDI-918] Fix kafkaOffsetGen can not read kafka data bug (#1652) 2020-06-08 20:46:47 +08:00
Shen Hong
2901f5423a [HUDI-1002] Ignore case when setting incremental mode in hive query (#1715) 2020-06-08 19:38:32 +08:00
hj2016
e0a5e0d343 [HUDI-1000] Fix incremental query for COW non-partitioned table with no data (#1708) 2020-06-08 15:34:42 +08:00
garyli1019
e9cab67b80 [HUDI-988] Fix More Unit Test Flakiness 2020-06-07 23:14:46 -07:00
Balaji Varadarajan
fb283934a3 [HUDI-990] Timeline API : filterCompletedAndCompactionInstants needs to handle requested state correctly. Also ensure timeline gets reloaded after we revert committed transactions 2020-06-04 02:52:21 -07:00