1
0
Commit Graph

1155 Commits

Author SHA1 Message Date
dugenkui
ae68b2b355 [MINOR] fix typos (#2116) 2020-09-26 20:40:33 +08:00
Mathieu
1dd6635fbb [MINOR] Fix ClassCastException when use QuickstartUtils generate data (#2105) 2020-09-25 10:13:39 -07:00
hongdd
2eaba0962a [HUDI-544] Archived commits command code cleanup (#1242)
* Archived commits command code cleanup
2020-09-25 09:36:41 -07:00
dugenkui
6837118c21 [MINOR] Improve description (#2113) 2020-09-25 22:21:37 +08:00
vinoth chandar
83d2e03cf7 [MINOR] Adding scripts to checkout and push to PRs (#2109)
- Tested the checkout_pr.sh locally
 - Tested a dryrun of pr_push_command.sh
2020-09-24 15:01:32 -07:00
wenningd
d37977b310 [MINOR] Remove useless config for bootstrap integ testing (#2102)
Co-authored-by: Wenning Ding <wenningd@amazon.com>
2020-09-22 13:29:59 -07:00
lw0090
fcc497eff1 [HUDI-1268] fix UpgradeDowngrade fs Rename issue for hdfs and aliyun oss (#2099) 2020-09-22 09:57:20 -07:00
Kaiux
8087016504 [HUDI-1213] Set Default for the bootstrap config : hoodie.bootstrap.full.input.provider (#2087) 2020-09-22 03:28:19 -07:00
Alexander Filipchik
c8e19e2def [HUDI-801] Adding a way to post process schema after it is fetched (#1524)
* [HUDI-801] Adding a way to post process schema after it is fetched

Co-authored-by: Alex Filipchik <alex.filipchik@csscompany.com>
Co-authored-by: Balaji Varadarajan <balaji.varadarajan@robinhood.com>
2020-09-19 11:18:36 -07:00
Raymond Xu
7c45894f43 [HUDI-995] Migrate HoodieTestUtils APIs to HoodieTestTable (#2094)
Migrate deprecated APIs in HoodieTestUtils to HoodieTestTable for test classes
- TestClientRollback
- TestCopyOnWriteRollbackActionExecutor

Use FileCreateUtils APIs in CompactionTestUtils.

Then remove unused deprecated APIs after migration.
2020-09-19 17:55:24 +08:00
Pratyaksh Sharma
73e5b4c7bb [HUDI-796] Add deduping logic for upserts case (#1558) 2020-09-18 19:37:52 +08:00
Udit Mehrotra
bf65269f66 [HUDI-1230] Fix for preventing MOR datasource jobs from hanging via spark-submit (#2046) 2020-09-17 20:03:35 -07:00
Raymond Xu
3201665295 [HUDI-995] Use HoodieTestTable in more classes (#2079)
* [HUDI-995] Use HoodieTestTable in more classes

Migrate test data prep logic in
- TestStatsCommand
- TestHoodieROTablePathFilter

Re-implement methods for create new commit times in HoodieTestUtils and HoodieClientTestHarness
- Move relevant APIs to HoodieTestTable
- Migrate usages

After changing to HoodieTestTable APIs, removed unused deprecated APIs in HoodieTestUtils
2020-09-17 09:29:07 -07:00
shenh062326
581d54097c [HUDI-1143] Change timestamp field in HoodieTestDataGenerator from double to long 2020-09-15 20:58:29 -07:00
liujinhui
6c84ef20ac [HUDI-1282] Check whether the topic exists before deltastrmer consumes Kafka (#2090) 2020-09-16 10:43:52 +08:00
Balaji Varadarajan
5e61454a6c [HUDI-802] AWSDmsTransformer does not handle insert and delete of a row in a single batch correctly (#2084) 2020-09-11 16:11:42 -07:00
Karl-WangSK
a1cff8abae [HUDI-1255] Add new Payload(OverwriteNonDefaultsWithLatestAvroPayload) for updating specified fields in storage (#2056)
Add new Payload(OverwriteNonDefaultsWithLatestAvroPayload) for updating specified fields in storage

## Brief change log

update current value for several fields that you want to change.

The default payload OverwriteWithLatestAvroPayload overwrite the whole record when 

compared to `orderingVal`.This doesn't meet our need when we just want to change specified fields.
For example: (suppose Default value is null)
```
current Value 
Field:      name   age   gender
Value:     karl     20    male
```
```
insert Value
Field:      name   age   gender
Value:     null     30    null
```
```
After insert:
Field:      name   age   gender
Value:     karl     30    male
```
## Verify this pull request

Added TestOverwriteNonDefaultsWithLatestAvroPayload to verify the change.
2020-09-09 21:54:21 -07:00
linshan-ma
063a98fc2b [HUDI-1254] TypedProperties can not get values by initializing an existing properties (#2059) 2020-09-09 23:42:41 +08:00
Balajee Nagasubramaniam
fec7cd3c97 [HUDI-1130] hudi-test-suite support for schema evolution (can be triggered on any insert/upsert DAG node). 2020-09-08 22:43:59 -07:00
Abhishek Modi
53d1e55110 Test Suite should work with Docker + Unit Tests 2020-09-08 22:41:14 -07:00
wenningd
2fee087f0f [HUDI-1181] Fix decimal type display issue for record key field (#1953)
* [HUDI-1181] Fix decimal type display issue for record key field

* Remove getNestedFieldVal method from DataSourceUtils

* resolve comments

Co-authored-by: Wenning Ding <wenningd@amazon.com>
2020-09-08 17:50:54 -07:00
Gary Li
e3cf34dff9 Merge pull request #2077 from chuangehh/typofix
[MINOR] Fix typo in the pom files
2020-09-08 00:02:08 -07:00
chuangehh
51b16bd36f [MINOR] fix typo 2020-09-08 11:55:38 +08:00
Prashant Wason
fe7c9e71eb [MINOR] Fix BindException when running tests of shared machines. (#2070)
When unit tests are run on shared machines (e.g. jenkins cluster), the unit tests sometimes fail due to BindException in starting HDFS Cluster. This is because the port chosen may have been bound by another process using the same machine. The fix is to retry the port selection a few times.
2020-09-07 19:30:45 -07:00
Raymond Xu
83e39e2b17 [HUDI-781] Add HoodieWriteableTestTable (#2040)
- Introduce HoodieWriteableTestTable for writing records into files
- Migrate writeParquetFiles() in HoodieClientTestUtils to HoodieWriteableTestTable
- Adopt HoodieWrittableTestTable for test cases in
  - ITTestRepairsCommand.java
  - TestHoodieIndex.java
  - TestHoodieKeyLocationFetchHandle.java
  - TestHoodieGlobalBloomIndex.java
  - TestHoodieBloomIndex.java
- Renamed HoodieTestTable and FileCreateUtils APIs
  - dataFile changed to baseFile
2020-09-07 17:54:36 +08:00
Sreeram Ramji
6537af2676 [HUDI-1153] Spark DataSource and Streaming Write must fail when operation type is misconfigured (#2014) 2020-09-04 09:08:30 -07:00
Dongwook
8d19ebfd0f [HUDI-993] Let delete API use "hoodie.delete.shuffle.parallelism" (#1703)
For Delete API, "hoodie.delete.shuffle.parallelism" isn't used as opposed to "hoodie.upsert.shuffle.parallelism" is used for upsert, this creates the performance difference between delete by upsert API with "EmptyHoodieRecordPayload" and delete API for certain cases.

This patch makes the following fixes in this regard. 
- Let deduplicateKeys method use "hoodie.delete.shuffle.parallelism"
- Repartition inputRDD as "hoodie.delete.shuffle.parallelism" in case "hoodie.combine.before.delete=false"
2020-09-01 12:55:31 -04:00
Gary Li
48a58c98a1 [MINOR] fix get classname for hive sync (#2008) 2020-08-31 16:26:10 -07:00
Prashant Wason
6461927eac [HUDI-960] Implementation of the HFile base and log file format. (#1804)
* [HUDI-960] Implementation of the HFile base and log file format.

1. Includes HFileWriter and HFileReader
2. Includes HFileInputFormat for both snapshot and realtime input format for Hive
3. Unit test for new code
4. IT for using HFile format and querying using Hive (Presto and SparkSQL are not supported)

Advantage:
HFile file format saves data as binary key-value pairs. This implementation chooses the following values:
1. Key = Hoodie Record Key (as bytes)
2. Value = Avro encoded GenericRecord (as bytes)

HFile allows efficient lookup of a record by key or range of keys. Hence, this base file format is well suited to applications like RFC-15, RFC-08 which will benefit from the ability to lookup records by key or search in a range of keys without having to read the entire data/log format.

Limitations:
HFile storage format has certain limitations when used as a general purpose data storage format.
1. Does not have a implemented reader for Presto and SparkSQL
2. Is not a columnar file format and hence may lead to lower compression levels and greater IO on query side due to lack of column pruning


Other changes: 
 - Remove databricks/avro from pom
 - Fix HoodieClientTestUtils from not using scala imports/reflection based conversion etc
 - Breaking up limitFileSize(), per parquet and hfile base files
 - Added three new configs for HoodieHFileConfig - prefetchBlocksOnOpen, cacheDataInL1, dropBehindCacheCompaction
 - Throw UnsupportedException in HFileReader.getRecordKeys()
 - Updated HoodieCopyOnWriteTable to create the correct merge handle (HoodieSortedMergeHandle for HFile and HoodieMergeHandle otherwise)

* Fixing checkstyle

Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2020-08-31 08:05:59 -07:00
Mathieu
6df8f88d86 [HUDI-1252] Remove unused class NoOpBulkInsertPartitioner in DataSourceTestUtils (#2054) 2020-08-31 03:03:10 -07:00
Thinking Chen
6b417d1a86 [HUDI-1225] Fix: Avro Date logical type not handled correctly when converting to Spark Row (#2047) 2020-08-29 01:16:42 -07:00
Raymond Xu
0360bef217 [MINOR] Improve helper methods in TestCleaner (#2052)
- Use private static assert methods
- Use ParameterizedTest
- Rename HoodieTestTable APIs
2020-08-29 14:06:25 +08:00
Satish Kotha
4dbeabffa3 [HUDI-1228] Add utility method to query extra metadata 2020-08-28 12:23:47 -07:00
Mathieu
fa81248247 [HUDI-531] Add java doc for hudi test suite general classes (#1900) 2020-08-28 08:44:40 +08:00
Sivabalan Narayanan
3a578d7402 [HUDI-1056] Fix release validate script for rc_num and release_type (#2025) 2020-08-26 09:26:33 -07:00
hongdd
dedc4517dd [HUDI-978] Specify version information for each component separately (#1772) 2020-08-26 21:08:09 +08:00
Satish Kotha
f468c20c6c [HUDI-1226] Fix ComplexKeyGenerator for non-partitioned tables 2020-08-25 20:55:48 -07:00
Mathieu
df8f099c99 [HUDI-532] Add java doc for the test classes of hudi test suite (#1901) 2020-08-26 08:49:01 +08:00
Mathieu
7e68c42eb1 [HUDI-1223] Remove unused UpdateHandler class in HoodieCopyOnWriteTable (#2032) 2020-08-26 08:46:19 +08:00
Balajee Nagasubramaniam
cc555ba188 [HUDI-1133] Tune buffer sizes for the diskbased external spillable map 2020-08-25 14:23:58 -07:00
Satish Kotha
492ddcbb06 [HUDI-1191] Add incremental meta client API to query partitions modified in a time window 2020-08-25 12:40:10 -07:00
Trevor
6a4dc7384c [HUDI-1218] Introduce BulkInsertSortMode as Independent class (#2021) 2020-08-25 19:04:13 +08:00
Prashant Wason
218d4a6836 [HUDI-1135] Make timeline server timeout settings configurable. 2020-08-24 18:09:00 -07:00
Prashant Wason
9b1f16b604 [HUDI-1136] Add back findInstantsAfterOrEquals to the HoodieTimeline class. 2020-08-24 18:08:17 -07:00
Bhavani Sudha Saktheeswaran
f7e02aa8a3 [MINOR] Update DOAP with 0.6.0 Release (#2024) 2020-08-24 14:47:38 -07:00
Satish Kotha
ea983ff912 [HUDI-1137] Add option to configure different path selector 2020-08-24 13:26:44 -07:00
Raymond Xu
111a9753a0 [MINOR] Update README.md (#2010)
- add maven profile to test running commands
- remove -DskipITs for packaging commands
2020-08-24 09:28:29 -07:00
Mathieu
f8dcd5334e [HUDI-1217] Improve avroToBytes method of HoodieAvroUtils (#2018) 2020-08-24 17:33:28 +08:00
Mathieu
35b21855da [HUDI-1150] Fix unable to parse input partition field :1 exception when using TimestampBasedKeyGenerator(#1920) 2020-08-23 19:56:50 +08:00
Trevor
7291607ae3 [MINOR] Remove unused log code in HoodieReadClient (#2000) 2020-08-22 21:45:50 +08:00