1
0
Commit Graph

1109 Commits

Author SHA1 Message Date
Raymond Xu
111a9753a0 [MINOR] Update README.md (#2010)
- add maven profile to test running commands
- remove -DskipITs for packaging commands
2020-08-24 09:28:29 -07:00
Mathieu
f8dcd5334e [HUDI-1217] Improve avroToBytes method of HoodieAvroUtils (#2018) 2020-08-24 17:33:28 +08:00
Mathieu
35b21855da [HUDI-1150] Fix unable to parse input partition field :1 exception when using TimestampBasedKeyGenerator(#1920) 2020-08-23 19:56:50 +08:00
Trevor
7291607ae3 [MINOR] Remove unused log code in HoodieReadClient (#2000) 2020-08-22 21:45:50 +08:00
Shen Hong
1d09c02f1c [HUDI-1083] Optimization in determining insert bucket location for a given key (#1868)
- To determine insert bucket location for a given key, hudi walks through all insert buckets with O(N) cost, while this patch adds an optimization to make it O(logN).
2020-08-22 07:41:39 -04:00
liujinhui
bfdce7b082 [HUDI-1193](Upgrade http dependency version) (#1970) 2020-08-21 20:24:04 +08:00
Raymond Xu
3a2ae16961 [HUDI-781] Introduce HoodieTestTable for test preparation (#1997) 2020-08-21 11:46:33 +08:00
Mathieu
34c8c9e3ea [MINOR] Move HoodieUpgradeDowngradeException to exception package (#1993) 2020-08-20 23:12:20 +08:00
Mathieu
b883b6d268 [HUDI-1122] Introduce a kafka implementation of hoodie write commit ca… (#1886) 2020-08-20 23:00:59 +08:00
Mathieu
bd7814dadf [HUDI-1206] Remove unused variable in Compactor (#1994) 2020-08-20 18:18:36 +08:00
Pratyaksh Sharma
a2312fa1b7 [HUDI-1177]: fixed TaskNotSerializableException in TimestampBasedKeyGenerator (#1987)
Co-authored-by: Bhavani Sudha Saktheeswaran <bhavanisudhas@gmail.com>
2020-08-19 17:43:34 -07:00
Ryan Pifer
1137b0b343 Fix HBASE index MOR tables not considering record index valid 2020-08-19 14:55:59 -07:00
Bhavani Sudha Saktheeswaran
6fa371a79c [MINOR] Fix release script for onetime uploading of gpgkeys (#1949) 2020-08-18 21:29:52 -07:00
Bhavani Sudha Saktheeswaran
824f23bcb8 [HUDI-1197] Fix import issue that fails scala 2.12 build (#1976) 2020-08-18 08:41:16 -07:00
Abhishek Modi
bedbb825e0 [HUDI-1025] Meter RPC calls in HoodieWrapperFileSystem (#1916) 2020-08-18 22:42:05 +08:00
Bhavani Sudha Saktheeswaran
4226d75144 Moving to 0.6.1-SNAPSHOT on master branch. 2020-08-14 12:54:15 -07:00
Balaji Varadarajan
b8f4a30efd Fix Integration test flakiness in HoodieJavaStreamingApp (#1967) 2020-08-14 01:42:15 -07:00
vinoth chandar
9bde6d616c [HUDI-1190] Introduce @PublicAPIClass and @PublicAPIMethod annotations to mark public APIs (#1965)
- Maturity levels one of : evolving, stable, deprecated
- Took a pass and marked out most of the existing public API
2020-08-13 23:28:17 -07:00
Sivabalan Narayanan
379cf0786f [HUDI-1013] Adding Bulk Insert V2 implementation (#1834)
- Adding ability to use native spark row writing for bulk_insert
 - Controlled by `ENABLE_ROW_WRITER_OPT_KEY` datasource write option
 - Introduced KeyGeneratorInterface in hudi-client, moved KeyGenerator back to hudi-spark
 - Simplified the new API additions to just two new methods : getRecordKey(row), getPartitionPath(row)
 - Fixed all built-in key generators with new APIs
 - Made the field position map lazily created upon the first call to row based apis
 - Implemented native row based key generators for CustomKeyGenerator
 - Fixed all the tests, with these new APIs

Co-authored-by: Balaji Varadarajan <varadarb@uber.com>
Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2020-08-13 00:33:39 -07:00
Udit Mehrotra
8d04268264 [HUDI-1174] Changes for bootstrapped tables to work with presto (#1944)
The purpose of this pull request is to implement changes required on Hudi side to get Bootstrapped tables integrated with Presto. The testing was done against presto 0.232 and following changes were identified to make it work:

Annotation UseRecordReaderFromInputFormat is required on HoodieParquetInputFormat as well, because the reading for bootstrapped tables needs to happen through record reader to be able to perform the merge. On presto side, this annotation is already handled.

We need to internally maintain VIRTUAL_COLUMN_NAMES because presto's internal hive version hive-apache-1.2.2 has VirutalColumn as a class, versus the one we depend on in hudi which is an enum. 

Dependency changes in hudi-presto-bundle to avoid runtime exceptions.
2020-08-12 17:51:31 -07:00
wenningd
8b928e9bca [HUDI-808] Support cleaning bootstrap source data (#1870)
Co-authored-by: Wenning Ding <wenningd@amazon.com>
Co-authored-by: Balaji Varadarajan <vbalaji@apache.org>
2020-08-11 01:43:46 -07:00
Balaji Varadarajan
626f78f6f6 Revert "[HUDI-781] Introduce HoodieTestTable for test preparation (#1871)"
This reverts commit b2e703d442.
2020-08-10 22:13:02 -07:00
Sivabalan Narayanan
9c24151929 [HUDI-1175] Commenting out testsuite tests from Integration tests until we investigate the CI flakiness (#1945) 2020-08-10 21:00:57 -07:00
Raymond Xu
b2e703d442 [HUDI-781] Introduce HoodieTestTable for test preparation (#1871) 2020-08-11 09:44:03 +08:00
liujinhui
934f00b689 [HUDI-1173] fix hudi-prometheus pom dependency (#1942) 2020-08-11 09:06:17 +08:00
Sivabalan Narayanan
858eda85d7 [HUDI-1098] Adding OptimisticConsistencyGuard to be used during FinalizeWrite (#1912) 2020-08-09 17:51:37 -07:00
Sivabalan Narayanan
ff53e8f0b6 [HUDI-1014] Adding Upgrade and downgrade infra for smooth transitioning from list based rollback to marker based rollback (#1858)
- This pull request adds upgrade/downgrade infra for smooth transition from list based rollback to marker based rollback*
 - A new property called hoodie.table.version is added to hoodie.properties file as part of this. Whenever hoodie is launched with newer table version i.e 1(or moving from pre 0.6.0 to 0.6.0), an upgrade step will be executed automatically to adhere to marker based rollback.*
 - This automatic upgrade step will happen just once per dataset as the hoodie.table.version will be updated in property file after upgrade is completed once*
 - Similarly, a command line tool for Downgrading is added if incase some user wants to downgrade hoodie from table version 1 to 0 or move from hoodie 0.6.0 to pre 0.6.0*
 - *Added UpgradeDowngrade to assist in upgrading or downgrading hoodie table*
 - *Added Interfaces for upgrade and downgrade and concrete implementations for upgrading from 0 to 1 and downgrading from 1 to 0.*
 - *Made some changes to ListingBasedRollbackHelper to expose just rollback stats w/o performing actual rollback, which will be consumed by Upgrade infra*
- Reworking failure handling for upgrade/downgrade
 - Changed tests accordingly, added one test around left over cleanup
 - New tables now write table version into hoodie.properties
 - Clean up code naming, abstractions.

Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2020-08-09 15:32:43 -07:00
Udit Mehrotra
e4a2d98f79 [HUDI-426] Bootstrap datasource integration (#1702) 2020-08-09 14:06:13 -07:00
linshan-ma
c24c528fb7 [HUDI-1156] Remove unused dependencies from HoodieDeltaStreamerWrapper Class (#1927) 2020-08-09 17:09:28 +08:00
liujinhui
6b349b7711 [HUDI-210] Hudi Supports Prometheus Pushgateway (#1931)
Co-authored-by: leesf <leesf@apache.org>
2020-08-09 15:29:54 +08:00
Bhavani Sudha Saktheeswaran
3c949d2ff5 [MINOR] Fix path to hudi-hive-sync-bundle jars from run_sync_tool.sh (#1937) 2020-08-09 00:45:10 -04:00
wenningd
9fe2d2b14a [HUDI-427] [HUDI-971] Implement CLI support for performing bootstrap (#1869)
* [HUDI-971] Clean partitions & fileIds returned by HFileBootstrapIndex
* [HUDI-427] Implement CLI support for performing bootstrap

Co-authored-by: Wenning Ding <wenningd@amazon.com>
Co-authored-by: Balaji Varadarajan <vbalaji@apache.org>
2020-08-08 12:37:29 -07:00
Raymond Xu
5ee676e34f [MINOR] Move a test method to Transformations (#1934)
- Move TestHoodieKeyLocationFetchHandle#getRecordsPerPartition to Transformations
- Improve some var namings
2020-08-08 18:25:55 +08:00
cheshta2904
1072f2748a [HUDI-1026] Removed slf4j dependency from HoodieClientTestHarness (#1928) 2020-08-08 12:07:22 +08:00
Yungthuis
8b66524090 [MINOR] Remove unused import (#1932)
Co-authored-by: tom_glb <goodMorning_glb@hotmail.com>
2020-08-08 12:04:31 +08:00
Gary Li
4f74a84607 [HUDI-69] Support Spark Datasource for MOR table - RDD approach (#1848)
- This PR implements Spark Datasource for MOR table in the RDD approach.
- Implemented SnapshotRelation
- Implemented HudiMergeOnReadRDD
- Implemented separate Iterator to handle merge and unmerge record reader.
- Added TestMORDataSource to verify this feature.
- Clean up test file name, add tests for mixed query type tests
 - We can now revert the change made in DefaultSource

Co-authored-by: Vinoth Chandar <vchandar@confluent.io>
2020-08-07 00:28:14 -07:00
Udit Mehrotra
ab453f2623 [HUDI-999] [RFC-12] Parallelize fetching of source data files/partitions (#1924) 2020-08-06 23:44:57 -07:00
Mathieu
b51646dcc7 [HUDI-1151] Fix NPE when no new data in kafka using HoodieDeltaStreamer (#1921) 2020-08-07 00:03:20 +08:00
lw0090
51ea27d665 [HUDI-875] Abstract hudi-sync-common, and support hudi-hive-sync, hudi-dla-sync (#1810)
- Generalize the hive-sync module for syncing to multiple metastores
- Added new options for datasource
- Added new command line for delta streamer 

Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2020-08-05 21:34:55 -07:00
Prashant Wason
c21209cb58 [HUDI-1149] Added a console metrics reporter and associated unit tests. 2020-08-05 10:31:46 -07:00
Balaji Varadarajan
9bcd3221fd [HUDI-1144] Speedup spark read queries by caching metaclient in HoodieROPathFilter (#1919) 2020-08-05 09:19:10 -07:00
Balaji Varadarajan
7a2429f5ba [HUDI-575] Spark Streaming with async compaction support (#1752) 2020-08-05 07:50:15 -07:00
Balaji Varadarajan
61e027fadd [MINOR] Adding timeout for each command execution in docker and capture output. This will help get stdout/stderr of stuck commands (#1918) 2020-08-05 07:46:34 -07:00
Sreeram Ramji
217a84192c [HUDI-1140] Fix Jcommander issue for --hoodie-conf in DeltaStreamer (#1898) 2020-08-04 21:42:51 -07:00
liujianhui
d3711a2641 [HUDI-525] lack of insert info in delta_commit inflight
[HUDI-525] lack of insert info in delta_commit inflight

[HUDI-525] lack of insert info in delta_commit inflight

[HUDI-525] lack of insert info in delta_commit inflight

[HUDI-525] lack of insert info in delta_commit inflight

[HUDI-525] lack of insert info in delta_commit inflight

HUDI-525
2020-08-04 17:43:57 -07:00
Sivabalan Narayanan
ab11ba43e1 [REVERT] "[HUDI-1058] Make delete marker configurable (#1819)" (#1914)
This reverts commit 433d7d2c98.
2020-08-04 15:20:38 -07:00
vinoth chandar
539621bd33 [HUDI-242] Support for RFC-12/Bootstrapping of external datasets to hudi (#1876)
- [HUDI-418] Bootstrap Index Implementation using HFile with unit-test
 - [HUDI-421] FileSystem View Changes to support Bootstrap with unit-tests
 - [HUDI-424] Implement Query Side Integration for querying tables containing bootstrap file slices
 - [HUDI-423] Implement upsert functionality for handling updates to these bootstrap file slices
 - [HUDI-421] Bootstrap Write Client with tests
 - [HUDI-425] Added HoodieDeltaStreamer support
 - [HUDI-899] Add a knob to change partition-path style while performing metadata bootstrap
 - [HUDI-900] Metadata Bootstrap Key Generator needs to handle complex keys correctly
 - [HUDI-424] Simplify Record reader implementation
 - [HUDI-423] Implement upsert functionality for handling updates to these bootstrap file slices
 - [HUDI-420] Hoodie Demo working with hive and sparkSQL. Also, Hoodie CLI working with bootstrap tables

Co-authored-by: Mehrotra <uditme@amazon.com>
Co-authored-by: Vinoth Chandar <vinoth@apache.org>
Co-authored-by: Balaji Varadarajan <varadarb@uber.com>
2020-08-03 20:19:21 -07:00
Sivabalan Narayanan
266bce12b3 [MINOR] Fixing usage of right config value for parallelism to dedup in Bulk Insert (#1905) 2020-08-03 10:38:36 -07:00
Shen Hong
433d7d2c98 [HUDI-1058] Make delete marker configurable (#1819) 2020-08-03 11:06:31 -04:00
Raymond Xu
8aa9142de8 [MINOR] Prevent scalatest plugin from running in non-UTs (#1897) 2020-08-02 20:33:58 -07:00