1
0
Commit Graph

251 Commits

Author SHA1 Message Date
Trevor
6a4dc7384c [HUDI-1218] Introduce BulkInsertSortMode as Independent class (#2021) 2020-08-25 19:04:13 +08:00
Trevor
7291607ae3 [MINOR] Remove unused log code in HoodieReadClient (#2000) 2020-08-22 21:45:50 +08:00
Shen Hong
1d09c02f1c [HUDI-1083] Optimization in determining insert bucket location for a given key (#1868)
- To determine insert bucket location for a given key, hudi walks through all insert buckets with O(N) cost, while this patch adds an optimization to make it O(logN).
2020-08-22 07:41:39 -04:00
Raymond Xu
3a2ae16961 [HUDI-781] Introduce HoodieTestTable for test preparation (#1997) 2020-08-21 11:46:33 +08:00
Mathieu
34c8c9e3ea [MINOR] Move HoodieUpgradeDowngradeException to exception package (#1993) 2020-08-20 23:12:20 +08:00
Mathieu
b883b6d268 [HUDI-1122] Introduce a kafka implementation of hoodie write commit ca… (#1886) 2020-08-20 23:00:59 +08:00
Mathieu
bd7814dadf [HUDI-1206] Remove unused variable in Compactor (#1994) 2020-08-20 18:18:36 +08:00
Ryan Pifer
1137b0b343 Fix HBASE index MOR tables not considering record index valid 2020-08-19 14:55:59 -07:00
Abhishek Modi
bedbb825e0 [HUDI-1025] Meter RPC calls in HoodieWrapperFileSystem (#1916) 2020-08-18 22:42:05 +08:00
Bhavani Sudha Saktheeswaran
4226d75144 Moving to 0.6.1-SNAPSHOT on master branch. 2020-08-14 12:54:15 -07:00
vinoth chandar
9bde6d616c [HUDI-1190] Introduce @PublicAPIClass and @PublicAPIMethod annotations to mark public APIs (#1965)
- Maturity levels one of : evolving, stable, deprecated
- Took a pass and marked out most of the existing public API
2020-08-13 23:28:17 -07:00
Sivabalan Narayanan
379cf0786f [HUDI-1013] Adding Bulk Insert V2 implementation (#1834)
- Adding ability to use native spark row writing for bulk_insert
 - Controlled by `ENABLE_ROW_WRITER_OPT_KEY` datasource write option
 - Introduced KeyGeneratorInterface in hudi-client, moved KeyGenerator back to hudi-spark
 - Simplified the new API additions to just two new methods : getRecordKey(row), getPartitionPath(row)
 - Fixed all built-in key generators with new APIs
 - Made the field position map lazily created upon the first call to row based apis
 - Implemented native row based key generators for CustomKeyGenerator
 - Fixed all the tests, with these new APIs

Co-authored-by: Balaji Varadarajan <varadarb@uber.com>
Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2020-08-13 00:33:39 -07:00
wenningd
8b928e9bca [HUDI-808] Support cleaning bootstrap source data (#1870)
Co-authored-by: Wenning Ding <wenningd@amazon.com>
Co-authored-by: Balaji Varadarajan <vbalaji@apache.org>
2020-08-11 01:43:46 -07:00
Balaji Varadarajan
626f78f6f6 Revert "[HUDI-781] Introduce HoodieTestTable for test preparation (#1871)"
This reverts commit b2e703d442.
2020-08-10 22:13:02 -07:00
Raymond Xu
b2e703d442 [HUDI-781] Introduce HoodieTestTable for test preparation (#1871) 2020-08-11 09:44:03 +08:00
Sivabalan Narayanan
858eda85d7 [HUDI-1098] Adding OptimisticConsistencyGuard to be used during FinalizeWrite (#1912) 2020-08-09 17:51:37 -07:00
Sivabalan Narayanan
ff53e8f0b6 [HUDI-1014] Adding Upgrade and downgrade infra for smooth transitioning from list based rollback to marker based rollback (#1858)
- This pull request adds upgrade/downgrade infra for smooth transition from list based rollback to marker based rollback*
 - A new property called hoodie.table.version is added to hoodie.properties file as part of this. Whenever hoodie is launched with newer table version i.e 1(or moving from pre 0.6.0 to 0.6.0), an upgrade step will be executed automatically to adhere to marker based rollback.*
 - This automatic upgrade step will happen just once per dataset as the hoodie.table.version will be updated in property file after upgrade is completed once*
 - Similarly, a command line tool for Downgrading is added if incase some user wants to downgrade hoodie from table version 1 to 0 or move from hoodie 0.6.0 to pre 0.6.0*
 - *Added UpgradeDowngrade to assist in upgrading or downgrading hoodie table*
 - *Added Interfaces for upgrade and downgrade and concrete implementations for upgrading from 0 to 1 and downgrading from 1 to 0.*
 - *Made some changes to ListingBasedRollbackHelper to expose just rollback stats w/o performing actual rollback, which will be consumed by Upgrade infra*
- Reworking failure handling for upgrade/downgrade
 - Changed tests accordingly, added one test around left over cleanup
 - New tables now write table version into hoodie.properties
 - Clean up code naming, abstractions.

Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2020-08-09 15:32:43 -07:00
Udit Mehrotra
e4a2d98f79 [HUDI-426] Bootstrap datasource integration (#1702) 2020-08-09 14:06:13 -07:00
liujinhui
6b349b7711 [HUDI-210] Hudi Supports Prometheus Pushgateway (#1931)
Co-authored-by: leesf <leesf@apache.org>
2020-08-09 15:29:54 +08:00
wenningd
9fe2d2b14a [HUDI-427] [HUDI-971] Implement CLI support for performing bootstrap (#1869)
* [HUDI-971] Clean partitions & fileIds returned by HFileBootstrapIndex
* [HUDI-427] Implement CLI support for performing bootstrap

Co-authored-by: Wenning Ding <wenningd@amazon.com>
Co-authored-by: Balaji Varadarajan <vbalaji@apache.org>
2020-08-08 12:37:29 -07:00
Raymond Xu
5ee676e34f [MINOR] Move a test method to Transformations (#1934)
- Move TestHoodieKeyLocationFetchHandle#getRecordsPerPartition to Transformations
- Improve some var namings
2020-08-08 18:25:55 +08:00
cheshta2904
1072f2748a [HUDI-1026] Removed slf4j dependency from HoodieClientTestHarness (#1928) 2020-08-08 12:07:22 +08:00
Gary Li
4f74a84607 [HUDI-69] Support Spark Datasource for MOR table - RDD approach (#1848)
- This PR implements Spark Datasource for MOR table in the RDD approach.
- Implemented SnapshotRelation
- Implemented HudiMergeOnReadRDD
- Implemented separate Iterator to handle merge and unmerge record reader.
- Added TestMORDataSource to verify this feature.
- Clean up test file name, add tests for mixed query type tests
 - We can now revert the change made in DefaultSource

Co-authored-by: Vinoth Chandar <vchandar@confluent.io>
2020-08-07 00:28:14 -07:00
Udit Mehrotra
ab453f2623 [HUDI-999] [RFC-12] Parallelize fetching of source data files/partitions (#1924) 2020-08-06 23:44:57 -07:00
Prashant Wason
c21209cb58 [HUDI-1149] Added a console metrics reporter and associated unit tests. 2020-08-05 10:31:46 -07:00
Balaji Varadarajan
7a2429f5ba [HUDI-575] Spark Streaming with async compaction support (#1752) 2020-08-05 07:50:15 -07:00
liujianhui
d3711a2641 [HUDI-525] lack of insert info in delta_commit inflight
[HUDI-525] lack of insert info in delta_commit inflight

[HUDI-525] lack of insert info in delta_commit inflight

[HUDI-525] lack of insert info in delta_commit inflight

[HUDI-525] lack of insert info in delta_commit inflight

[HUDI-525] lack of insert info in delta_commit inflight

HUDI-525
2020-08-04 17:43:57 -07:00
Sivabalan Narayanan
ab11ba43e1 [REVERT] "[HUDI-1058] Make delete marker configurable (#1819)" (#1914)
This reverts commit 433d7d2c98.
2020-08-04 15:20:38 -07:00
vinoth chandar
539621bd33 [HUDI-242] Support for RFC-12/Bootstrapping of external datasets to hudi (#1876)
- [HUDI-418] Bootstrap Index Implementation using HFile with unit-test
 - [HUDI-421] FileSystem View Changes to support Bootstrap with unit-tests
 - [HUDI-424] Implement Query Side Integration for querying tables containing bootstrap file slices
 - [HUDI-423] Implement upsert functionality for handling updates to these bootstrap file slices
 - [HUDI-421] Bootstrap Write Client with tests
 - [HUDI-425] Added HoodieDeltaStreamer support
 - [HUDI-899] Add a knob to change partition-path style while performing metadata bootstrap
 - [HUDI-900] Metadata Bootstrap Key Generator needs to handle complex keys correctly
 - [HUDI-424] Simplify Record reader implementation
 - [HUDI-423] Implement upsert functionality for handling updates to these bootstrap file slices
 - [HUDI-420] Hoodie Demo working with hive and sparkSQL. Also, Hoodie CLI working with bootstrap tables

Co-authored-by: Mehrotra <uditme@amazon.com>
Co-authored-by: Vinoth Chandar <vinoth@apache.org>
Co-authored-by: Balaji Varadarajan <varadarb@uber.com>
2020-08-03 20:19:21 -07:00
Sivabalan Narayanan
266bce12b3 [MINOR] Fixing usage of right config value for parallelism to dedup in Bulk Insert (#1905) 2020-08-03 10:38:36 -07:00
Shen Hong
433d7d2c98 [HUDI-1058] Make delete marker configurable (#1819) 2020-08-03 11:06:31 -04:00
Raymond Xu
10e4268792 [HUDI-995] Use Transformations, Assertions and SchemaTestUtil (#1884)
- Consolidate transform functions for tests in Transformations.java
- Consolidate assertion functions for tests in Assertions.java
- Make use of SchemaTestUtil for loading schema from resource
2020-08-01 20:57:18 +08:00
Udit Mehrotra
e79fbc07fe [HUDI-1054] Several performance fixes during finalizing writes (#1768)
Co-authored-by: Udit Mehrotra <uditme@amazon.com>
2020-07-31 20:10:28 -07:00
Y Ethan Guo
ccd70a7e48 [HUDI-472] Introduce configurations and new modes of sorting for bulk_insert (#1149)
* [HUDI-472] Introduce the configuration and new modes of record sorting for bulk_insert(#1149). Three sorting modes are implemented: global sort ("global_sort"), local sort inside each RDD partition ("partition_sort") and no sort ("none")
2020-07-31 09:52:42 -04:00
Sivabalan Narayanan
b2763f433b [MINOR] Fixing default index parallelism for simple index (#1882) 2020-07-28 08:22:09 -07:00
Raymond Xu
ca36c44cb3 [HUDI-995] Move TestRawTripPayload and HoodieTestDataGenerator to hudi-common (#1873) 2020-07-27 19:21:45 +08:00
Shen Hong
c3279cd598 [HUDI-1082] Fix minor bug in deciding the insert buckets (#1838) 2020-07-23 08:31:49 -04:00
Mathieu
da106803b6 [HUDI-1037] Introduce a write committed callback hook and given a default http callback implementation (#1842) 2020-07-23 19:07:05 +08:00
zherenyu831
c39778c150 [HUDI-1113] Add user define metrics reporter (#1851) 2020-07-23 13:46:36 +08:00
vinoth chandar
3dd189ec7d [MINOR] Fix checkstyle issue on TestHoodieClientOnCopyOnWriteStorage (#1865) 2020-07-22 21:54:45 -07:00
vinoth chandar
a8bd76c299 [HUDI-1029] In inline compaction mode, previously failed compactions needs to be retried before new compactions (#1857)
- Prevents failed compactions from causing issues with future commits
2020-07-22 21:22:06 -07:00
vinoth chandar
9bd37ef291 [MINOR] Fix flaky testUpsertsUpdatePartitionPath* tests (#1863) 2020-07-22 22:52:34 -04:00
Sivabalan Narayanan
5b6026ba43 [HUDI-802] Fixing deletes for inserts in same batch in write path (#1792)
* Fixing deletes for inserts in same batch in write path
* Fixing delta streamer tests
* Adding tests for OverwriteWithLatestAvroPayload
2020-07-22 19:39:57 -07:00
Raymond Xu
5e7ab11e2e [HUDI-994] Move TestHoodieIndex test cases to unit tests (#1850) 2020-07-21 10:23:43 -07:00
lw0090
1ec89e9a94 [HUDI-839] Introducing support for rollbacks using marker files (#1756)
* [HUDI-839] Introducing rollback strategy using marker files

 - Adds a new mechanism for rollbacks where it's based on the marker files generated during the write
 - Consequently, marker file/dir deletion now happens post commit, instead of during finalize 
 - Marker files are also generated for AppendHandle, making it consistent throughout the write path 
 - Until upgrade-downgrade mechanism can upgrade non-marker based inflight writes to marker based, this should only be turned on for new datasets.
 - Added marker dir deletion after successful commit/rollback, individual files are not deleted during finalize
 - Fail safe for deleting marker directories, now during timeline archival process
 - Added check to ensure completed instants are not rolled back using marker based strategy. This will be incorrect
 - Reworked tests to rollback inflight instants, instead of completed instants whenever necessary
 - Added an unit test for MarkerBasedRollbackStrategy


Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2020-07-20 22:41:42 -07:00
Prashant Wason
b71f25f210 [HUDI-92] Provide reasonable names for Spark DAG stages in HUDI. (#1289) 2020-07-19 10:29:25 -07:00
Raymond Xu
b399b4ad43 [HUDI-996] Add functional test in hudi-client (#1824)
- Add functional test suite in hudi-client
- Tag TestHBaseIndex as functional
2020-07-15 08:28:50 +08:00
Raymond Xu
f5dc8ca733 [HUDI-994] Split TestHBaseIndex to unit tests (#1818)
- Refactor and improve TestHBaseIndex for performance
- Move HBaseIndex unit tests to different test classes
2020-07-13 20:32:01 -07:00
Sivabalan Narayanan
21bb1b505a [HUDI-1068] Fixing deletes in global bloom when update partition path is set (#1793) 2020-07-13 22:34:07 -04:00
Raymond Xu
20ac7c3337 [HUDI-994] Make TestHBaseQPSResourceAllocator a unit test (#1820) 2020-07-11 09:15:05 -07:00