1
0
Commit Graph

1068 Commits

Author SHA1 Message Date
Balaji Varadarajan
7a2429f5ba [HUDI-575] Spark Streaming with async compaction support (#1752) 2020-08-05 07:50:15 -07:00
Balaji Varadarajan
61e027fadd [MINOR] Adding timeout for each command execution in docker and capture output. This will help get stdout/stderr of stuck commands (#1918) 2020-08-05 07:46:34 -07:00
Sreeram Ramji
217a84192c [HUDI-1140] Fix Jcommander issue for --hoodie-conf in DeltaStreamer (#1898) 2020-08-04 21:42:51 -07:00
liujianhui
d3711a2641 [HUDI-525] lack of insert info in delta_commit inflight
[HUDI-525] lack of insert info in delta_commit inflight

[HUDI-525] lack of insert info in delta_commit inflight

[HUDI-525] lack of insert info in delta_commit inflight

[HUDI-525] lack of insert info in delta_commit inflight

[HUDI-525] lack of insert info in delta_commit inflight

HUDI-525
2020-08-04 17:43:57 -07:00
Sivabalan Narayanan
ab11ba43e1 [REVERT] "[HUDI-1058] Make delete marker configurable (#1819)" (#1914)
This reverts commit 433d7d2c98.
2020-08-04 15:20:38 -07:00
vinoth chandar
539621bd33 [HUDI-242] Support for RFC-12/Bootstrapping of external datasets to hudi (#1876)
- [HUDI-418] Bootstrap Index Implementation using HFile with unit-test
 - [HUDI-421] FileSystem View Changes to support Bootstrap with unit-tests
 - [HUDI-424] Implement Query Side Integration for querying tables containing bootstrap file slices
 - [HUDI-423] Implement upsert functionality for handling updates to these bootstrap file slices
 - [HUDI-421] Bootstrap Write Client with tests
 - [HUDI-425] Added HoodieDeltaStreamer support
 - [HUDI-899] Add a knob to change partition-path style while performing metadata bootstrap
 - [HUDI-900] Metadata Bootstrap Key Generator needs to handle complex keys correctly
 - [HUDI-424] Simplify Record reader implementation
 - [HUDI-423] Implement upsert functionality for handling updates to these bootstrap file slices
 - [HUDI-420] Hoodie Demo working with hive and sparkSQL. Also, Hoodie CLI working with bootstrap tables

Co-authored-by: Mehrotra <uditme@amazon.com>
Co-authored-by: Vinoth Chandar <vinoth@apache.org>
Co-authored-by: Balaji Varadarajan <varadarb@uber.com>
2020-08-03 20:19:21 -07:00
Sivabalan Narayanan
266bce12b3 [MINOR] Fixing usage of right config value for parallelism to dedup in Bulk Insert (#1905) 2020-08-03 10:38:36 -07:00
Shen Hong
433d7d2c98 [HUDI-1058] Make delete marker configurable (#1819) 2020-08-03 11:06:31 -04:00
Raymond Xu
8aa9142de8 [MINOR] Prevent scalatest plugin from running in non-UTs (#1897) 2020-08-02 20:33:58 -07:00
Bhavani Sudha Saktheeswaran
4ebd2db05b [MINOR] Suppressing full hive log and fetching only exceptions with context (#1903)
Co-authored-by: Bhavani Sudha Saktheeswaran <bsaktheeswaran@moveworks.ai>
2020-08-02 19:44:51 -07:00
Mathieu
30dcd5cf06 [MINOR] Remove redundant import in hudi-integ-test (#1899) 2020-08-02 21:30:23 +08:00
Raymond Xu
10e4268792 [HUDI-995] Use Transformations, Assertions and SchemaTestUtil (#1884)
- Consolidate transform functions for tests in Transformations.java
- Consolidate assertion functions for tests in Assertions.java
- Make use of SchemaTestUtil for loading schema from resource
2020-08-01 20:57:18 +08:00
Udit Mehrotra
e79fbc07fe [HUDI-1054] Several performance fixes during finalizing writes (#1768)
Co-authored-by: Udit Mehrotra <uditme@amazon.com>
2020-07-31 20:10:28 -07:00
n3nash
727f1df62c [MINOR] Suppressing spark logs for hudi-integ and hudi-utilities (#1894) 2020-07-31 19:01:25 -07:00
Y Ethan Guo
ccd70a7e48 [HUDI-472] Introduce configurations and new modes of sorting for bulk_insert (#1149)
* [HUDI-472] Introduce the configuration and new modes of record sorting for bulk_insert(#1149). Three sorting modes are implemented: global sort ("global_sort"), local sort inside each RDD partition ("partition_sort") and no sort ("none")
2020-07-31 09:52:42 -04:00
Nishith Agarwal
2fc2b01d86 [HUDI-394] Provide a basic implementation of test suite 2020-07-30 21:21:15 -07:00
Bhavani Sudha Saktheeswaran
d5b593b7d9 [MINOR] change log.info to log.debug (#1883) 2020-07-28 09:49:03 -07:00
Sivabalan Narayanan
b2763f433b [MINOR] Fixing default index parallelism for simple index (#1882) 2020-07-28 08:22:09 -07:00
Udit Mehrotra
5e7931b1f9 [MINOR] Fix master compilation failure (#1881)
Co-authored-by: Udit Mehrotra <uditme@amazon.com>
2020-07-27 23:02:58 -07:00
hongdd
fa419213f6 [HUDI-703] Add test for HoodieSyncCommand (#1774) 2020-07-28 08:31:43 +08:00
Raymond Xu
ca36c44cb3 [HUDI-995] Move TestRawTripPayload and HoodieTestDataGenerator to hudi-common (#1873) 2020-07-27 19:21:45 +08:00
Raymond Xu
0cb24e4a2d [MINOR] Use HoodieActiveTimeline.COMMIT_FORMATTER (#1874) 2020-07-24 18:48:56 -07:00
Gary Li
467d097dae [MINOR] Add Databricks File System to StorageSchemes (#1877) 2020-07-24 18:47:09 -07:00
Shen Hong
c3279cd598 [HUDI-1082] Fix minor bug in deciding the insert buckets (#1838) 2020-07-23 08:31:49 -04:00
Mathieu
da106803b6 [HUDI-1037] Introduce a write committed callback hook and given a default http callback implementation (#1842) 2020-07-23 19:07:05 +08:00
lamber-ken
f61cd1086a [HUDI-985] Introduce rerun ci bot (#1693)
* [HUDI-985] Introduce rerun ci bot

* Implement bot using github-script

* trigger rebuild
2020-07-22 22:59:24 -07:00
zherenyu831
c39778c150 [HUDI-1113] Add user define metrics reporter (#1851) 2020-07-23 13:46:36 +08:00
vinoth chandar
3dd189ec7d [MINOR] Fix checkstyle issue on TestHoodieClientOnCopyOnWriteStorage (#1865) 2020-07-22 21:54:45 -07:00
vinoth chandar
a8bd76c299 [HUDI-1029] In inline compaction mode, previously failed compactions needs to be retried before new compactions (#1857)
- Prevents failed compactions from causing issues with future commits
2020-07-22 21:22:06 -07:00
vinoth chandar
9bd37ef291 [MINOR] Fix flaky testUpsertsUpdatePartitionPath* tests (#1863) 2020-07-22 22:52:34 -04:00
Sivabalan Narayanan
5b6026ba43 [HUDI-802] Fixing deletes for inserts in same batch in write path (#1792)
* Fixing deletes for inserts in same batch in write path
* Fixing delta streamer tests
* Adding tests for OverwriteWithLatestAvroPayload
2020-07-22 19:39:57 -07:00
hongdd
12ef8c9249 [HUDI-708] Add temps show and unit test for TempViewCommand (#1770) 2020-07-23 08:43:46 +08:00
DeyinZhong
743ef322b8 [HUDI-871] Add support for Tencent Cloud Object Storage(COS) (#1855)
Co-authored-by: deyzhong <deyzhong@tencent.com>
2020-07-22 17:40:19 +08:00
Raymond Xu
5e7ab11e2e [HUDI-994] Move TestHoodieIndex test cases to unit tests (#1850) 2020-07-21 10:23:43 -07:00
lw0090
1ec89e9a94 [HUDI-839] Introducing support for rollbacks using marker files (#1756)
* [HUDI-839] Introducing rollback strategy using marker files

 - Adds a new mechanism for rollbacks where it's based on the marker files generated during the write
 - Consequently, marker file/dir deletion now happens post commit, instead of during finalize 
 - Marker files are also generated for AppendHandle, making it consistent throughout the write path 
 - Until upgrade-downgrade mechanism can upgrade non-marker based inflight writes to marker based, this should only be turned on for new datasets.
 - Added marker dir deletion after successful commit/rollback, individual files are not deleted during finalize
 - Fail safe for deleting marker directories, now during timeline archival process
 - Added check to ensure completed instants are not rolled back using marker based strategy. This will be incorrect
 - Reworked tests to rollback inflight instants, instead of completed instants whenever necessary
 - Added an unit test for MarkerBasedRollbackStrategy


Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2020-07-20 22:41:42 -07:00
Prashant Wason
b71f25f210 [HUDI-92] Provide reasonable names for Spark DAG stages in HUDI. (#1289) 2020-07-19 10:29:25 -07:00
Udit Mehrotra
1aae437257 [HUDI-1102] Add common useful Spark related and Table path detection utilities (#1841)
Co-authored-by: Mehrotra <uditme@amazon.com>
2020-07-18 16:16:32 -07:00
wenningd
bf1d36fa63 [HUDI-1087] Handle decimal type for realtime record reader with SparkSQL (#1831)
Co-authored-by: Wenning Ding <wenningd@amazon.com>
2020-07-15 07:30:58 -07:00
Raymond Xu
b399b4ad43 [HUDI-996] Add functional test in hudi-client (#1824)
- Add functional test suite in hudi-client
- Tag TestHBaseIndex as functional
2020-07-15 08:28:50 +08:00
Raymond Xu
f5dc8ca733 [HUDI-994] Split TestHBaseIndex to unit tests (#1818)
- Refactor and improve TestHBaseIndex for performance
- Move HBaseIndex unit tests to different test classes
2020-07-13 20:32:01 -07:00
Sivabalan Narayanan
21bb1b505a [HUDI-1068] Fixing deletes in global bloom when update partition path is set (#1793) 2020-07-13 22:34:07 -04:00
miaomiaomiao
10e457278b [HUDI-1078]Fix IllegalArgumentException in Delete data demo of Quick-Start Guide (#1808) 2020-07-13 11:38:06 -04:00
Raymond Xu
20ac7c3337 [HUDI-994] Make TestHBaseQPSResourceAllocator a unit test (#1820) 2020-07-11 09:15:05 -07:00
GuoPhilipse
abfebd30f3 [MINOR] Update parameter description (#1821) 2020-07-11 22:57:12 +08:00
Pratyaksh Sharma
9627a385fe [HUDI-916]: Added support for multiple input formats in TimestampBasedKeyGenerator (#1648) 2020-07-10 15:28:45 -04:00
Pratyaksh Sharma
c7f1a781ab [HUDI-728]: Implemented custom key generator (#1433) 2020-07-09 07:35:07 -04:00
Trevor
d58644b657 [HUDI-1062]Remove unnecessary maxEvent check and add some log in KafkaOffsetGen (#1779) 2020-07-08 21:07:34 -07:00
Satish Kotha
086853c004 [HUDI-1080] Fix backward compatibility for com.uber inputformats 2020-07-08 15:30:07 -07:00
Raymond Xu
7b2a947aed [HUDI-1069] Remove duplicate assertNoWriteErrors() (#1797) 2020-07-08 13:58:15 +08:00
mabin001
8c4ff185f1 [HUDI-1064]Trim hoodie table name (#1805) 2020-07-07 19:10:16 +08:00