1
0
Commit Graph

125 Commits

Author SHA1 Message Date
Satish Kotha
f468c20c6c [HUDI-1226] Fix ComplexKeyGenerator for non-partitioned tables 2020-08-25 20:55:48 -07:00
Mathieu
35b21855da [HUDI-1150] Fix unable to parse input partition field :1 exception when using TimestampBasedKeyGenerator(#1920) 2020-08-23 19:56:50 +08:00
Pratyaksh Sharma
a2312fa1b7 [HUDI-1177]: fixed TaskNotSerializableException in TimestampBasedKeyGenerator (#1987)
Co-authored-by: Bhavani Sudha Saktheeswaran <bhavanisudhas@gmail.com>
2020-08-19 17:43:34 -07:00
Bhavani Sudha Saktheeswaran
824f23bcb8 [HUDI-1197] Fix import issue that fails scala 2.12 build (#1976) 2020-08-18 08:41:16 -07:00
Bhavani Sudha Saktheeswaran
4226d75144 Moving to 0.6.1-SNAPSHOT on master branch. 2020-08-14 12:54:15 -07:00
Balaji Varadarajan
b8f4a30efd Fix Integration test flakiness in HoodieJavaStreamingApp (#1967) 2020-08-14 01:42:15 -07:00
vinoth chandar
9bde6d616c [HUDI-1190] Introduce @PublicAPIClass and @PublicAPIMethod annotations to mark public APIs (#1965)
- Maturity levels one of : evolving, stable, deprecated
- Took a pass and marked out most of the existing public API
2020-08-13 23:28:17 -07:00
Sivabalan Narayanan
379cf0786f [HUDI-1013] Adding Bulk Insert V2 implementation (#1834)
- Adding ability to use native spark row writing for bulk_insert
 - Controlled by `ENABLE_ROW_WRITER_OPT_KEY` datasource write option
 - Introduced KeyGeneratorInterface in hudi-client, moved KeyGenerator back to hudi-spark
 - Simplified the new API additions to just two new methods : getRecordKey(row), getPartitionPath(row)
 - Fixed all built-in key generators with new APIs
 - Made the field position map lazily created upon the first call to row based apis
 - Implemented native row based key generators for CustomKeyGenerator
 - Fixed all the tests, with these new APIs

Co-authored-by: Balaji Varadarajan <varadarb@uber.com>
Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2020-08-13 00:33:39 -07:00
wenningd
8b928e9bca [HUDI-808] Support cleaning bootstrap source data (#1870)
Co-authored-by: Wenning Ding <wenningd@amazon.com>
Co-authored-by: Balaji Varadarajan <vbalaji@apache.org>
2020-08-11 01:43:46 -07:00
Udit Mehrotra
e4a2d98f79 [HUDI-426] Bootstrap datasource integration (#1702) 2020-08-09 14:06:13 -07:00
wenningd
9fe2d2b14a [HUDI-427] [HUDI-971] Implement CLI support for performing bootstrap (#1869)
* [HUDI-971] Clean partitions & fileIds returned by HFileBootstrapIndex
* [HUDI-427] Implement CLI support for performing bootstrap

Co-authored-by: Wenning Ding <wenningd@amazon.com>
Co-authored-by: Balaji Varadarajan <vbalaji@apache.org>
2020-08-08 12:37:29 -07:00
Gary Li
4f74a84607 [HUDI-69] Support Spark Datasource for MOR table - RDD approach (#1848)
- This PR implements Spark Datasource for MOR table in the RDD approach.
- Implemented SnapshotRelation
- Implemented HudiMergeOnReadRDD
- Implemented separate Iterator to handle merge and unmerge record reader.
- Added TestMORDataSource to verify this feature.
- Clean up test file name, add tests for mixed query type tests
 - We can now revert the change made in DefaultSource

Co-authored-by: Vinoth Chandar <vchandar@confluent.io>
2020-08-07 00:28:14 -07:00
Udit Mehrotra
ab453f2623 [HUDI-999] [RFC-12] Parallelize fetching of source data files/partitions (#1924) 2020-08-06 23:44:57 -07:00
lw0090
51ea27d665 [HUDI-875] Abstract hudi-sync-common, and support hudi-hive-sync, hudi-dla-sync (#1810)
- Generalize the hive-sync module for syncing to multiple metastores
- Added new options for datasource
- Added new command line for delta streamer 

Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2020-08-05 21:34:55 -07:00
Balaji Varadarajan
7a2429f5ba [HUDI-575] Spark Streaming with async compaction support (#1752) 2020-08-05 07:50:15 -07:00
Sivabalan Narayanan
ab11ba43e1 [REVERT] "[HUDI-1058] Make delete marker configurable (#1819)" (#1914)
This reverts commit 433d7d2c98.
2020-08-04 15:20:38 -07:00
vinoth chandar
539621bd33 [HUDI-242] Support for RFC-12/Bootstrapping of external datasets to hudi (#1876)
- [HUDI-418] Bootstrap Index Implementation using HFile with unit-test
 - [HUDI-421] FileSystem View Changes to support Bootstrap with unit-tests
 - [HUDI-424] Implement Query Side Integration for querying tables containing bootstrap file slices
 - [HUDI-423] Implement upsert functionality for handling updates to these bootstrap file slices
 - [HUDI-421] Bootstrap Write Client with tests
 - [HUDI-425] Added HoodieDeltaStreamer support
 - [HUDI-899] Add a knob to change partition-path style while performing metadata bootstrap
 - [HUDI-900] Metadata Bootstrap Key Generator needs to handle complex keys correctly
 - [HUDI-424] Simplify Record reader implementation
 - [HUDI-423] Implement upsert functionality for handling updates to these bootstrap file slices
 - [HUDI-420] Hoodie Demo working with hive and sparkSQL. Also, Hoodie CLI working with bootstrap tables

Co-authored-by: Mehrotra <uditme@amazon.com>
Co-authored-by: Vinoth Chandar <vinoth@apache.org>
Co-authored-by: Balaji Varadarajan <varadarb@uber.com>
2020-08-03 20:19:21 -07:00
Shen Hong
433d7d2c98 [HUDI-1058] Make delete marker configurable (#1819) 2020-08-03 11:06:31 -04:00
Raymond Xu
8aa9142de8 [MINOR] Prevent scalatest plugin from running in non-UTs (#1897) 2020-08-02 20:33:58 -07:00
Raymond Xu
10e4268792 [HUDI-995] Use Transformations, Assertions and SchemaTestUtil (#1884)
- Consolidate transform functions for tests in Transformations.java
- Consolidate assertion functions for tests in Assertions.java
- Make use of SchemaTestUtil for loading schema from resource
2020-08-01 20:57:18 +08:00
Y Ethan Guo
ccd70a7e48 [HUDI-472] Introduce configurations and new modes of sorting for bulk_insert (#1149)
* [HUDI-472] Introduce the configuration and new modes of record sorting for bulk_insert(#1149). Three sorting modes are implemented: global sort ("global_sort"), local sort inside each RDD partition ("partition_sort") and no sort ("none")
2020-07-31 09:52:42 -04:00
Nishith Agarwal
2fc2b01d86 [HUDI-394] Provide a basic implementation of test suite 2020-07-30 21:21:15 -07:00
Udit Mehrotra
5e7931b1f9 [MINOR] Fix master compilation failure (#1881)
Co-authored-by: Udit Mehrotra <uditme@amazon.com>
2020-07-27 23:02:58 -07:00
hongdd
fa419213f6 [HUDI-703] Add test for HoodieSyncCommand (#1774) 2020-07-28 08:31:43 +08:00
Raymond Xu
ca36c44cb3 [HUDI-995] Move TestRawTripPayload and HoodieTestDataGenerator to hudi-common (#1873) 2020-07-27 19:21:45 +08:00
Udit Mehrotra
1aae437257 [HUDI-1102] Add common useful Spark related and Table path detection utilities (#1841)
Co-authored-by: Mehrotra <uditme@amazon.com>
2020-07-18 16:16:32 -07:00
miaomiaomiao
10e457278b [HUDI-1078]Fix IllegalArgumentException in Delete data demo of Quick-Start Guide (#1808) 2020-07-13 11:38:06 -04:00
Pratyaksh Sharma
9627a385fe [HUDI-916]: Added support for multiple input formats in TimestampBasedKeyGenerator (#1648) 2020-07-10 15:28:45 -04:00
Pratyaksh Sharma
c7f1a781ab [HUDI-728]: Implemented custom key generator (#1433) 2020-07-09 07:35:07 -04:00
mabin001
8c4ff185f1 [HUDI-1064]Trim hoodie table name (#1805) 2020-07-07 19:10:16 +08:00
Raymond Xu
31247e9b34 [HUDI-896] Report test coverage by modules & parallelize CI (#1753)
- use codecov flags for each module to report coverage
- parallelize CI jobs for shorter time
- add a testcase for MetricsReporterFactory (to trigger codecov comment)
2020-06-27 23:16:12 -07:00
Prashant Wason
2603cfb33e [HUDI-684] Introduced abstraction for writing and reading different types of base file formats. (#1687)
Notable changes:
    1. HoodieFileWriter and HoodieFileReader abstractions for writer/reader side of a base file format
    2. HoodieDataBlock abstraction for creation specific data blocks for base file formats. (e.g. Parquet has HoodieAvroDataBlock)
    3. All hardocded references to Parquet / Parquet based classes have been abstracted to call methods which accept a base file format
    4. HiveSyncTool accepts the base file format as a CLI parameter
    5. HoodieDeltaStreamer accepts the base file format as a CLI parameter
    6. HoodieSparkSqlWriter accepts the base file format as a parameter
2020-06-25 23:46:55 -07:00
Shen Hong
89e37d5273 [HUDI-908] Add some data types to HoodieTestDataGenerator and fix some some bugs. (#1690) 2020-06-22 08:13:28 -07:00
sathyaprakashg
df2e0c760e HUDI-942 Increase default value number of delta commits for inline compaction (#1664)
Co-authored-by: Sathyaprakash Govindasamy <sathyaprakashg@zillowgroup.com>
2020-06-10 16:16:44 -07:00
Raymond Xu
742c204099 [HUDI-811] Restructure test packages in hudi-client/cli (#1689) 2020-06-02 10:25:42 +08:00
Raymond Xu
03f136361a [HUDI-811] Restructure test packages in hudi-common (#1644)
* [HUDI-811] Restructure test packages in hudi-common
2020-05-27 16:28:17 +08:00
Bhavani Sudha Saktheeswaran
9697fbf71e [HUDI-936] Fix fetch ordering val in HoodieSparkSqlWriter to remove unnecessary conversion to String (#1659) 2020-05-26 21:09:02 -07:00
Balaji Varadarajan
74ecc27e92 [HUDI-846][HUDI-848] Enable Incremental cleaning and embedded timeline-server by default (#1634) 2020-05-20 05:29:43 -07:00
rolandjohann
459356e292 [HUDI-863] get decimal properties from derived spark DataType (#1596) 2020-05-18 04:28:27 -07:00
Mathieu
25a0080b2f [HUDI-714]Add javadoc and comments to hudi write method link (#1409)
* [HUDI-714] Add javadoc and comments to hudi write method link
2020-05-16 08:36:51 -04:00
Gary Li
a64afdfd17 HUDI-528 Handle empty commit in incremental pulling (#1612) 2020-05-14 22:55:25 -07:00
Raymond Xu
0d4848b68b [HUDI-811] Restructure test packages (#1607)
* restructure hudi-spark tests
* restructure hudi-timeline-service tests
* restructure hudi-hadoop-mr hudi-utilities tests
* restructure hudi-hive-sync tests
2020-05-13 15:37:03 -07:00
cxzl25
32bada29dc [HUDI-889] Writer supports useJdbc configuration when hive synchronization is enabled (#1627) 2020-05-14 00:20:13 +08:00
Shen Hong
295d00beea [HUDI-880] Replace part of spark context by hadoop configuration in HoodieTable. (#1614) 2020-05-11 23:33:57 -07:00
AakashPradeep
5e0f5e5521 [HUDI-852] adding check for table name for Append Save mode (#1580)
* adding check for table name for Append Save mode

* adding existing table validation for delete and upsert operation

Co-authored-by: Aakash Pradeep <apradeep@twilio.com>
2020-05-03 23:09:17 -07:00
Dongwook
ddd105bb31 [HUDI-772] Make UserDefinedBulkInsertPartitioner configurable for DataSource (#1500) 2020-04-20 08:38:18 -07:00
Raymond Xu
acdc4a8d00 [HUDI-798] Migrate to Mockito Jupiter for JUnit 5 (#1521) 2020-04-16 16:07:32 +08:00
Raymond Xu
d65efe659d [HUDI-780] Migrate test cases to Junit 5 (#1504) 2020-04-15 12:35:01 -07:00
Pratyaksh Sharma
d610252d6b [HUDI-288]: Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment (#1150)
* [HUDI-288]: Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment
2020-04-07 16:10:26 -07:00
vinoth chandar
eaf6cc2d90 [HUDI-756] Organize Cleaning Action execution into a single package in hudi-client (#1485)
- Introduced a thin abstraction ActionExecutor, that all actions will implement
- Pulled cleaning code from table, writeclient into a single package
- CleanHelper is now CleanPlanner, HoodieCleanClient is no longer around
- Minor refactor of HoodieTable factory method
- HoodieTable.create() methods with and without metaclient passed in
- HoodieTable constructor now does not do a redundant instantiation
- Fixed existing unit tests to work at the HoodieWriteClient level
2020-04-04 00:07:34 -07:00