1
0
Commit Graph

1188 Commits

Author SHA1 Message Date
lw0090
ec6267c303 [HUDI-307] add test to check timestamp date decimal type write and read consistent (#2177) 2020-10-18 17:18:50 +08:00
rmpifer
a44f66869f [HUDI-1289] Remove relocation of pattern for hbase dependencies and add shading of guava in hadoop, spark, and presto bundles (#2147)
- Update hudi-spark-bundle pom to not relocate hbase and htrace pattern
- Remove codec relocation as this is not included in bundle which was causing error
2020-10-14 17:04:35 -07:00
satishkotha
7fa641ea9a [HUDI-1302] Add support for timestamp field in HiveSync (#2129) 2020-10-13 22:58:00 -07:00
wangxianghu
c7d962efff [HUDI-1328] Introduce HoodieFlinkEngineContext to hudi-flink-client (#2161) 2020-10-14 09:30:49 +08:00
lw0090
b66c3ef23a [HUDI-1298] Add better error messages when IOException occurs during log file reading (#2133) 2020-10-13 00:45:10 -07:00
satishkotha
0d407342ef [HUDI-1304] Add unit test for testing compaction on replaced file groups (#2150) 2020-10-12 16:48:29 -07:00
Raymond Xu
c5e10d668f [HUDI-995] Migrate HoodieTestUtils APIs to HoodieTestTable (#2167)
Remove APIs in `HoodieTestUtils`
- `createCommitFiles`
- `createDataFile`
- `createNewLogFile`
- `createCompactionRequest`

Migrated usages in `TestCleaner#testPendingCompactions`.

Also improved some API names in `HoodieTestTable`.
2020-10-12 14:39:10 +08:00
hj2016
c0472d3317 [HUDI-1184] Fix the support of hbase index partition path change (#1978)
When the hbase index is used, when the record partition is changed to another partition, the path does not change according to the value of the partition column

Co-authored-by: huangjing <huangjing@clinbrain.com>
2020-10-11 19:05:57 -07:00
dugenkui
b58daf29ba [MINOR] remove unused generics type (#2163) 2020-10-11 18:38:42 -07:00
lw0090
2126f13e13 [HUDI-791] Replace null by Option in Delta Streamer (#2171) 2020-10-11 18:29:57 -07:00
dugenkui
032bc3b08f [MINOR] NPE Optimization for Option (#2158) 2020-10-11 17:55:41 -07:00
dugenkui
d4d4c8c899 [MINOR] Fix typo and others (#2164)
* remove HoodieSerializationException that will never be throw
* remove unused method, make HoodieException more readable
* fix typo
2020-10-11 17:52:44 -07:00
lw0090
86db4da33c [HUDI-1339] delete useless import in hudi-spark module (#2173) 2020-10-11 17:10:52 -07:00
lw0090
585ce0094d [HUDI-1301] use spark INCREMENTAL mode query hudi dataset support schema version. (#2125) 2020-10-10 20:53:41 +08:00
vinoyang
eafd7bf289 [MINOR] Fix wrong javadoc and refactor some naming issues (#2156) 2020-10-09 15:09:26 -07:00
dugenkui
00271af64e [MINOR] Fix typo (#2159)
* fix typo

* fix typo
2020-10-09 14:52:55 -07:00
Raymond Xu
1d1d91d444 [HUDI-995] Migrate HoodieTestUtils APIs to HoodieTestTable (#2143)
* [HUDI-995] Migrate HoodieTestUtils APIs to HoodieTestTable

Remove APIs in `HoodieTestUtils`
- listAllDataFilesAndLogFilesInPath
- listAllLogFilesInPath
- listAllDataFilesInPath
- writeRecordsToLogFiles
- createCleanFiles
- createPendingCleanFiles

Migrate the callers to use `HoodieTestTable` and `HoodieWriteableTestTable` with new APIs added
- listAllBaseAndLogFiles
- listAllLogFiles
- listAllBaseFiles
- withLogAppends
- addClean
- addInflightClean

Also added related APIs in `FileCreateUtils`
- createCleanFile
- createRequestedCleanFile
- createInflightCleanFile
2020-10-09 10:21:27 +08:00
Prashant Wason
788d236c44 [HUDI-1303] Some improvements for the HUDI Test Suite. (#2128)
1. Use the DAG Node's label from the yaml as its name instead of UUID names which are not descriptive when debugging issues from logs.
2. Fix CleanNode constructor which is not correctly implemented
3. When generating upsets, allows more granualar control over the number of inserts and upserts - zero or more inserts and upserts can be specified instead of always requiring both inserts and upserts.
4. Fixed generation of records of specific size
   - The current code was using a class variable "shouldAddMore" which was reset to false after the first record generation causing subsequent records to be of minimum size.
   - In this change, we pre-calculate the extra size of the complex fields. When generating records, for complex fields we read the field size from this map.
5. Refresh the timeline of the DeltaSync service before calling readFromSource. This ensures that only the newest generated data is read and data generated in the older Dag Nodes is ignored (as their AVRO files will have an older timestamp).
6. Making --workload-generator-classname an optional parameter as most probably the default will be used
2020-10-07 08:33:51 -04:00
Pratyaksh Sharma
524193eb4b [HUDI-603]: DeltaStreamer can now fetch schema before every run in continuous mode (#1566)
Co-authored-by: Balaji Varadarajan <balaji.varadarajan@robinhood.com>
2020-10-06 20:34:03 -07:00
rmpifer
fed01cd3c9 [MINOR] Update spark master default to yarn (#2148) 2020-10-05 15:22:28 -07:00
lw0090
fdae388626 [HUDI-1203] add port configuration for EmbeddedTimelineService (#2142) 2020-10-05 11:36:54 -07:00
Shen Hong
b335459c80 [HUDI-1208] Ordering Field should be optional when precombine is turned off (#2088) 2020-10-04 11:34:21 -07:00
Pratyaksh Sharma
080ba3ed54 [HUDI-1199] relocated jetty in hudi-utilities-bundle pom (#1990)
* [HUDI-1199]: relocated jetty in hudi-utilities-bundle pom

* [HUDI-1199]: re trigger travis build
2020-10-04 11:22:01 -07:00
Prashant Wason
6c610b91ef [HUDI-1305] Added an API to shutdown and remove the metrics reporter. (#2132)
This helps in removing reporter once the test has complete. Prevents log pollution from un-necessary metric logs.

- Added an API to shutdown the metrics reporter after tests.
2020-10-04 09:30:04 -07:00
Mathieu
1f7add9291 [HUDI-1089] Refactor hudi-client to support multi-engine (#1827)
- This change breaks `hudi-client` into `hudi-client-common` and `hudi-spark-client` modules 
- Simple usages of Spark using jsc.parallelize() has been redone using EngineContext#map, EngineContext#flatMap etc
- Code changes in the PR, break classes into `BaseXYZ` parent classes with no spark dependencies living in `hudi-client-common`
- Classes on `hudi-spark-client` are named `SparkXYZ` extending the parent classes with all the Spark dependencies
- To simplify/cleanup, HoodieIndex#fetchRecordLocation has been removed and its usages in tests replaced with alternatives

Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2020-10-01 14:25:29 -07:00
vinoyang
5aaaf8bff1 [MINOR] Change the log level of the dag scheduler for the test suite (#2134) 2020-09-30 17:17:44 +08:00
satishkotha
a99e93bed5 [HUDI-1072] Introduce REPLACE top level action. Implement insert_overwrite operation on top of replace action (#2048) 2020-09-29 17:04:25 -07:00
hongdd
32c9cad52c [HUDI-840] Avoid blank file created by HoodieLogFormatWriter (#1567) 2020-09-29 08:02:15 -07:00
liujinhui
20b9b399c9 [HUDI-1233] Deltastreamer Kafka consumption delay reporting indicators (#2074) 2020-09-29 13:44:31 +08:00
vinoyang
c0c0095fa9 [MINOR] Reformat prepare_integration_suite script (#2126) 2020-09-28 14:12:57 -07:00
liujinhui
a86f5574ed [HUDI-1192] Make create hive database automatically configurable (#1968) 2020-09-27 14:10:13 +08:00
leesf
b0f1b736f8 [MINOR] Fix checkstyle (#2117) 2020-09-26 22:25:19 +08:00
Raymond Xu
1be0b06ef8 [HUDI-995] Migrate HoodieTestUtils APIs to HoodieTestTable (#2112)
Remove APIs in HoodieTestUtils

- HoodieTestUtils#createInflightCommitFiles
- HoodieTestUtils#getCommitFilePath
- HoodieTestUtils#doesCommitExist

and migrate usages to HoodieTestTable in

- hudi-cli/src/test/java/org/apache/hudi/cli/commands/TestRollbacksCommand.java
- hudi-cli/src/test/java/org/apache/hudi/cli/commands/TestUpgradeDowngradeCommand.java
- hudi-cli/src/test/java/org/apache/hudi/cli/integ/ITTestCommitsCommand.java
- hudi-cli/src/test/java/org/apache/hudi/cli/testutils/HoodieTestCommitMetadataGenerator.java
- hudi-client/src/test/java/org/apache/hudi/client/TestHoodieClientOnCopyOnWriteStorage.java
2020-09-26 21:21:47 +08:00
dugenkui
ae68b2b355 [MINOR] fix typos (#2116) 2020-09-26 20:40:33 +08:00
Mathieu
1dd6635fbb [MINOR] Fix ClassCastException when use QuickstartUtils generate data (#2105) 2020-09-25 10:13:39 -07:00
hongdd
2eaba0962a [HUDI-544] Archived commits command code cleanup (#1242)
* Archived commits command code cleanup
2020-09-25 09:36:41 -07:00
dugenkui
6837118c21 [MINOR] Improve description (#2113) 2020-09-25 22:21:37 +08:00
vinoth chandar
83d2e03cf7 [MINOR] Adding scripts to checkout and push to PRs (#2109)
- Tested the checkout_pr.sh locally
 - Tested a dryrun of pr_push_command.sh
2020-09-24 15:01:32 -07:00
wenningd
d37977b310 [MINOR] Remove useless config for bootstrap integ testing (#2102)
Co-authored-by: Wenning Ding <wenningd@amazon.com>
2020-09-22 13:29:59 -07:00
lw0090
fcc497eff1 [HUDI-1268] fix UpgradeDowngrade fs Rename issue for hdfs and aliyun oss (#2099) 2020-09-22 09:57:20 -07:00
Kaiux
8087016504 [HUDI-1213] Set Default for the bootstrap config : hoodie.bootstrap.full.input.provider (#2087) 2020-09-22 03:28:19 -07:00
Alexander Filipchik
c8e19e2def [HUDI-801] Adding a way to post process schema after it is fetched (#1524)
* [HUDI-801] Adding a way to post process schema after it is fetched

Co-authored-by: Alex Filipchik <alex.filipchik@csscompany.com>
Co-authored-by: Balaji Varadarajan <balaji.varadarajan@robinhood.com>
2020-09-19 11:18:36 -07:00
Raymond Xu
7c45894f43 [HUDI-995] Migrate HoodieTestUtils APIs to HoodieTestTable (#2094)
Migrate deprecated APIs in HoodieTestUtils to HoodieTestTable for test classes
- TestClientRollback
- TestCopyOnWriteRollbackActionExecutor

Use FileCreateUtils APIs in CompactionTestUtils.

Then remove unused deprecated APIs after migration.
2020-09-19 17:55:24 +08:00
Pratyaksh Sharma
73e5b4c7bb [HUDI-796] Add deduping logic for upserts case (#1558) 2020-09-18 19:37:52 +08:00
Udit Mehrotra
bf65269f66 [HUDI-1230] Fix for preventing MOR datasource jobs from hanging via spark-submit (#2046) 2020-09-17 20:03:35 -07:00
Raymond Xu
3201665295 [HUDI-995] Use HoodieTestTable in more classes (#2079)
* [HUDI-995] Use HoodieTestTable in more classes

Migrate test data prep logic in
- TestStatsCommand
- TestHoodieROTablePathFilter

Re-implement methods for create new commit times in HoodieTestUtils and HoodieClientTestHarness
- Move relevant APIs to HoodieTestTable
- Migrate usages

After changing to HoodieTestTable APIs, removed unused deprecated APIs in HoodieTestUtils
2020-09-17 09:29:07 -07:00
shenh062326
581d54097c [HUDI-1143] Change timestamp field in HoodieTestDataGenerator from double to long 2020-09-15 20:58:29 -07:00
liujinhui
6c84ef20ac [HUDI-1282] Check whether the topic exists before deltastrmer consumes Kafka (#2090) 2020-09-16 10:43:52 +08:00
Balaji Varadarajan
5e61454a6c [HUDI-802] AWSDmsTransformer does not handle insert and delete of a row in a single batch correctly (#2084) 2020-09-11 16:11:42 -07:00
Karl-WangSK
a1cff8abae [HUDI-1255] Add new Payload(OverwriteNonDefaultsWithLatestAvroPayload) for updating specified fields in storage (#2056)
Add new Payload(OverwriteNonDefaultsWithLatestAvroPayload) for updating specified fields in storage

## Brief change log

update current value for several fields that you want to change.

The default payload OverwriteWithLatestAvroPayload overwrite the whole record when 

compared to `orderingVal`.This doesn't meet our need when we just want to change specified fields.
For example: (suppose Default value is null)
```
current Value 
Field:      name   age   gender
Value:     karl     20    male
```
```
insert Value
Field:      name   age   gender
Value:     null     30    null
```
```
After insert:
Field:      name   age   gender
Value:     karl     30    male
```
## Verify this pull request

Added TestOverwriteNonDefaultsWithLatestAvroPayload to verify the change.
2020-09-09 21:54:21 -07:00