Prashant Wason
2603cfb33e
[HUDI-684] Introduced abstraction for writing and reading different types of base file formats. ( #1687 )
...
Notable changes:
1. HoodieFileWriter and HoodieFileReader abstractions for writer/reader side of a base file format
2. HoodieDataBlock abstraction for creation specific data blocks for base file formats. (e.g. Parquet has HoodieAvroDataBlock)
3. All hardocded references to Parquet / Parquet based classes have been abstracted to call methods which accept a base file format
4. HiveSyncTool accepts the base file format as a CLI parameter
5. HoodieDeltaStreamer accepts the base file format as a CLI parameter
6. HoodieSparkSqlWriter accepts the base file format as a parameter
2020-06-25 23:46:55 -07:00
wangxianghu
5e47673341
[HUDI-1035] Remove unused class KeyLookupResult ( #1754 )
2020-06-23 17:01:03 -07:00
Shen Hong
89e37d5273
[HUDI-908] Add some data types to HoodieTestDataGenerator and fix some some bugs. ( #1690 )
2020-06-22 08:13:28 -07:00
wangxianghu
68a656b016
[HUDI-1032] Remove unused code in HoodieCopyOnWriteTable and code clean ( #1750 )
2020-06-21 07:34:47 -07:00
Raymond Xu
8a9fdd603e
[HUDI-1023] Add validation error messages in delta sync ( #1710 )
...
- Remove explicitly specifying BLOOM_INDEX since thats the default anyway
2020-06-19 12:12:35 -07:00
Satish Kotha
a7fd331624
Add unit test for snapshot reads in hadoop-mr
2020-06-13 10:23:05 -07:00
sathyaprakashg
df2e0c760e
HUDI-942 Increase default value number of delta commits for inline compaction ( #1664 )
...
Co-authored-by: Sathyaprakash Govindasamy <sathyaprakashg@zillowgroup.com >
2020-06-10 16:16:44 -07:00
Gary Li
37838cea60
[HUDI-822] decouple Hudi related logics from HoodieInputFormat ( #1592 )
...
- Refactoring business logic out of InputFormat into Utils helpers.
2020-06-09 06:10:16 -07:00
shenhong
3387b3841f
[HUDI-1005] fix NPE in HoodieWriteClient.clean
2020-06-09 05:57:04 -07:00
Shen Hong
6318e943d1
[HUDI-1016] Code optimization in MergeOnReadRollbackActionExecutor( #1718 )
2020-06-09 19:14:26 +08:00
garyli1019
22cd824d99
HUDI-494 fix incorrect record size estimation
2020-06-08 20:29:29 -07:00
garyli1019
e9cab67b80
[HUDI-988] Fix More Unit Test Flakiness
2020-06-07 23:14:46 -07:00
Balaji Varadarajan
fb283934a3
[HUDI-990] Timeline API : filterCompletedAndCompactionInstants needs to handle requested state correctly. Also ensure timeline gets reloaded after we revert committed transactions
2020-06-04 02:52:21 -07:00
Balaji Varadarajan
a68180b179
[HUDI-988] Fix Unit Test Flakiness : Ensure all instantiations of HoodieWriteClient is closed properly. Fix bug in TestRollbacks. Make CLI unit tests for Hudi CLI check skip redering strings
2020-06-04 02:52:21 -07:00
Raymond Xu
742c204099
[HUDI-811] Restructure test packages in hudi-client/cli ( #1689 )
2020-06-02 10:25:42 +08:00
dengziming
bde7a7043e
[HUDI-476]: Add hudi-examples module ( #1151 )
...
add hoodie delta streamer mock source example and dfs source and kafka source examples
Signed-off-by: dengziming <dengziming1993@gmail.com >
add defaultSparkConf utils method
change version of hudi-examples to 0.5.2-SNAPSHOT
change the artifcatId of hudi-spark and hudi-utilities
alter some code to adapt kafka2.0
Update scritps
Add license
2020-05-28 01:44:39 +08:00
Raymond Xu
03f136361a
[HUDI-811] Restructure test packages in hudi-common ( #1644 )
...
* [HUDI-811] Restructure test packages in hudi-common
2020-05-27 16:28:17 +08:00
sathyaprakashg
d3edac4612
HUDI-921 Remove inlineCompactionEvery method in HoodieCompactionConfig.Builder ( #1654 )
...
Co-authored-by: Sathyaprakash Govindasamy <sathyaprakashg@zillowgroup.com >
2020-05-24 01:09:18 -07:00
Raymond Xu
f34de3fb27
[HUDI-836] Implement datadog metrics reporter ( #1572 )
...
- Adds support for emitting metrics to datadog
- Tests, configs..
2020-05-22 09:14:21 -07:00
Balaji Varadarajan
74ecc27e92
[HUDI-846][HUDI-848] Enable Incremental cleaning and embedded timeline-server by default ( #1634 )
2020-05-20 05:29:43 -07:00
Raymond Xu
f802d4400b
[MINOR] Fix resource cleanup in TestTableSchemaEvolution ( #1640 )
...
- Remove Xms it is not needed.
- extending process exit timeout from 30 to 120 sec should be safe to do
2020-05-20 05:07:30 -07:00
Balaji Varadarajan
e6f3bf10cf
[HUDI-858] Allow multiple operations to be executed within a single commit ( #1633 )
2020-05-18 19:27:24 -07:00
Sivabalan Narayanan
29edf4b3b8
[HUDI-407] Adding Simple Index to Hoodie. ( #1402 )
...
This index finds the location by joining incoming records with records from base files.
2020-05-17 18:32:24 -07:00
Balaji Varadarajan
3c9da2e5f0
[HUDI-895] Remove unnecessary listing .hoodie folder when using timeline server ( #1636 )
2020-05-17 18:18:53 -07:00
Mathieu
25a0080b2f
[HUDI-714]Add javadoc and comments to hudi write method link ( #1409 )
...
* [HUDI-714] Add javadoc and comments to hudi write method link
2020-05-16 08:36:51 -04:00
Shen Hong
e8ffc6f0aa
[HUDI-881] Replace part of spark context by hadoop configuration in AbstractHoodieClient and HoodieReadClient ( #1620 )
2020-05-12 09:33:29 -07:00
Shen Hong
b54517aad0
[HUDI-886] Replace jsc.hadoopConfiguration by hadoop configuration in hudi-client testcase ( #1621 )
2020-05-12 08:51:31 -07:00
Shen Hong
295d00beea
[HUDI-880] Replace part of spark context by hadoop configuration in HoodieTable. ( #1614 )
2020-05-11 23:33:57 -07:00
Shen Hong
6dac10115c
[HUDI-870] Remove spark context in ClientUtils and HoodieIndex ( #1609 )
2020-05-11 19:05:36 +08:00
Balaji Varadarajan
8d0e23173b
[HUDI-820] cleaner repair command should only inspect clean metadata files ( #1542 )
2020-05-11 09:25:54 +08:00
vinoth chandar
f92b9fdcc4
[MINOR] Fix hardcoding of ports in TestHoodieJmxMetrics ( #1606 )
2020-05-10 19:23:26 -04:00
Carm
fa6aba751d
[MINOR] fixed building IndexFileFilter with a wrong condition in HoodieGlobalBloomIndex class ( #1537 )
2020-05-10 09:45:07 +08:00
Udit Mehrotra
d54b4b8a52
[HUDI-838] Support schema from HoodieCommitMetadata for HiveSync ( #1559 )
...
Co-authored-by: Mehrotra <uditme@amazon.com >
2020-05-07 16:33:09 -07:00
Raymond Xu
366bb10d8c
[HUDI-812] Migrate hudi common tests to JUnit 5 ( #1590 )
...
* [HUDI-812] Migrate hudi-common tests to JUnit 5
2020-05-06 19:15:20 +08:00
Balaji Varadarajan
506447fd4f
[HUDI-850] Avoid unnecessary listings in incremental cleaning mode ( #1576 )
2020-05-01 21:37:21 -07:00
vinoth chandar
c4b71622b9
[MINOR] Reorder HoodieTimeline#compareTimestamp arguments for better readability ( #1575 )
...
- reads nicely as (instantTime1, GREATER_THAN_OR_EQUALS, instantTime2) etc
2020-04-30 09:19:39 -07:00
Raymond Xu
69b16309c8
[HUDI-814] Migrate hudi-client tests to JUnit 5 ( #1570 )
2020-04-29 13:57:28 +08:00
Raymond Xu
06dae30297
[HUDI-810] Migrate ClientTestHarness to JUnit 5 ( #1553 )
2020-04-28 23:38:16 +08:00
satishkotha
6de9f5d9e5
[HUDI-819] Fix a bug with MergeOnReadLazyInsertIterable.
...
Variable declared here[1] masks protected statuses variable. So although hoodie writes data, will not include writestatus in the completed section. This can cause duplicates being written (#1540 )
[1] https://github.com/apache/incubator-hudi/blob/master/hudi-client/src/main/java/org/apache/hudi/execution/MergeOnReadLazyInsertIterable.java#L53
2020-04-27 12:50:39 -07:00
vinoth chandar
19ca0b5629
[HUDI-785] Refactor compaction/savepoint execution based on ActionExector abstraction ( #1548 )
...
- Savepoint and compaction classes moved to table.action.* packages
- HoodieWriteClient#savepoint(...) returns void
- Renamed HoodieCommitArchiveLog -> HoodieTimelineArchiveLog
- Fixed tests to take into account the additional validation done
- Moved helper code into CompactHelpers and SavepointHelpers
2020-04-25 18:26:44 -07:00
Alexander Filipchik
aea7c1657e
[HUDI-795] Handle auto-deleted empty aux folder ( #1515 )
...
Co-authored-by: Alex Filipchik <alex.filipchik@csscompany.com >
2020-04-22 09:47:32 -07:00
leesf
26684f5984
[HUDI-816] Fixed MAX_MEMORY_FOR_MERGE_PROP and MAX_MEMORY_FOR_COMPACTION_PROP do not work due to HUDI-678 ( #1536 )
2020-04-22 16:33:18 +08:00
n3nash
332072bc6d
[HUDI-371] Supporting hive combine input format for realtime tables ( #1503 )
2020-04-20 20:40:06 -07:00
Dongwook
ddd105bb31
[HUDI-772] Make UserDefinedBulkInsertPartitioner configurable for DataSource ( #1500 )
2020-04-20 08:38:18 -07:00
lw0090
09fd6f64c5
[HUDI-800] Fix Metrics getReporter().close() throws NPE. ( #1529 )
2020-04-19 21:33:07 +08:00
baobaoyeye
75523657a4
[MINOR] use Option and fix description in toString method ( #1527 )
...
* [MINOR] fix some places are not elegant, as a newcomer
* [MINOR] fix some places are not elegant, as a newcomer
2020-04-18 12:51:37 +08:00
Raymond Xu
acdc4a8d00
[HUDI-798] Migrate to Mockito Jupiter for JUnit 5 ( #1521 )
2020-04-16 16:07:32 +08:00
Prashant Wason
19d29ac7d0
[HUDI-741] Added checks to validate Hoodie's schema evolution.
...
HUDI specific validation of schema evolution should ensure that a newer schema can be used for the dataset by checking that the data written using the old schema can be read using the new schema.
Code changes:
1. Added a new config in HoodieWriteConfig to enable schema validation check (disabled by default)
2. Moved code that reads schema from base/log files into hudi-common from hudi-hive-sync
3. Added writerSchema to the extraMetadata of compaction commits in MOR table. This is same as that for commits on COW table.
Testing changes:
4. Extended TestHoodieClientBase to add insertBatch API which allows inserting a new batch of unique records into a HUDI table
5. Added a unit test to verify schema evolution for both COW and MOR tables.
6. Added unit tests for schema compatiblity checks.
2020-04-15 23:34:59 -07:00
Raymond Xu
d65efe659d
[HUDI-780] Migrate test cases to Junit 5 ( #1504 )
2020-04-15 12:35:01 -07:00
vinoth chandar
661b0b3bab
[HUDI-761] Refactoring rollback and restore actions using the ActionExecutor abstraction ( #1492 )
...
- rollback() and restore() table level APIs introduced
- Restore is implemented by wrapping calls to rollback executor
- Existing tests transparently cover this, since its just a refactor
2020-04-13 08:29:19 -07:00