1
0
Commit Graph

368 Commits

Author SHA1 Message Date
Sivabalan Narayanan
5a0d3f1cf9 [HUDI-786] Fixing read beyond inline length in InlineFS (#1616) 2020-05-28 12:59:11 -07:00
dengziming
bde7a7043e [HUDI-476]: Add hudi-examples module (#1151)
add hoodie delta streamer mock source example and dfs source and kafka source examples

Signed-off-by: dengziming <dengziming1993@gmail.com>

add defaultSparkConf utils method

change version of hudi-examples to 0.5.2-SNAPSHOT
change the artifcatId of hudi-spark and hudi-utilities
alter some code to adapt kafka2.0

Update scritps

Add license
2020-05-28 01:44:39 +08:00
Raymond Xu
03f136361a [HUDI-811] Restructure test packages in hudi-common (#1644)
* [HUDI-811] Restructure test packages in hudi-common
2020-05-27 16:28:17 +08:00
Raymond Xu
6c450957ce [HUDI-690] Filter out inflight compaction in exporter (#1667) 2020-05-26 09:23:34 -07:00
Pratyaksh Sharma
6a0aa9a645 [HUDI-803] Replaced used of NullNode with JsonProperties.NULL_VALUE in HoodieAvroUtils (#1538)
- added more test cases in TestHoodieAvroUtils.class

Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2020-05-20 09:04:43 -07:00
Balaji Varadarajan
e6f3bf10cf [HUDI-858] Allow multiple operations to be executed within a single commit (#1633) 2020-05-18 19:27:24 -07:00
Joey
2600d2de8d [MINOR] Fix apache-rat violations (#1639)
* MINOR Fix apache-rat violations. Also, enabling RAT for hudi-utilities and hudi-integ-test
2020-05-18 11:16:49 -07:00
Sivabalan Narayanan
29edf4b3b8 [HUDI-407] Adding Simple Index to Hoodie. (#1402)
This index finds the location by joining incoming records with records from base files.
2020-05-17 18:32:24 -07:00
Balaji Varadarajan
3c9da2e5f0 [HUDI-895] Remove unnecessary listing .hoodie folder when using timeline server (#1636) 2020-05-17 18:18:53 -07:00
Mathieu
25a0080b2f [HUDI-714]Add javadoc and comments to hudi write method link (#1409)
* [HUDI-714] Add javadoc and comments to hudi write method link
2020-05-16 08:36:51 -04:00
Alexander Filipchik
83796b3189 [HUDI-793] Adding proper default to hudi metadata fields and proper handling to rewrite routine (#1513)
* Adding proper default to hudi metadata fields and proper handling to rewrite routine
* Handle fields declared with a null default

Co-authored-by: Alex Filipchik <alex.filipchik@csscompany.com>
2020-05-13 18:04:38 -07:00
liujinhui
32ea4c70ff [HUDI-869] Add support for alluxio (#1608) 2020-05-13 21:00:34 +08:00
Udit Mehrotra
d54b4b8a52 [HUDI-838] Support schema from HoodieCommitMetadata for HiveSync (#1559)
Co-authored-by: Mehrotra <uditme@amazon.com>
2020-05-07 16:33:09 -07:00
Alexander Filipchik
e783ab1749 [HUDI-784] Adressing issue with log reader on GCS (#1516)
[HUDI-784] Adressing issue with log reader on GCS (#1516)
Co-authored-by: Alex Filipchik <alex.filipchik@csscompany.com>
2020-05-07 13:05:32 -07:00
vinoth chandar
c4b71622b9 [MINOR] Reorder HoodieTimeline#compareTimestamp arguments for better readability (#1575)
- reads nicely as (instantTime1, GREATER_THAN_OR_EQUALS, instantTime2) etc
2020-04-30 09:19:39 -07:00
Prashant Wason
62bd3e7ded [HUDI-757] Added hudi-cli command to export metadata of Instants.
Example:
hudi:db.table-> export instants --localFolder /tmp/ --limit 5 --actions clean,rollback,commit --desc false
2020-04-21 12:41:19 -07:00
n3nash
332072bc6d [HUDI-371] Supporting hive combine input format for realtime tables (#1503) 2020-04-20 20:40:06 -07:00
Mathieu
2a2f31d919 [MINOR] Remove reduntant code and fix typo in HoodieDefaultTimeline (#1535) 2020-04-21 09:40:22 +08:00
baobaoyeye
75523657a4 [MINOR] use Option and fix description in toString method (#1527)
* [MINOR] fix some places are not elegant, as a newcomer

* [MINOR] fix some places are not elegant, as a newcomer
2020-04-18 12:51:37 +08:00
Prashant Wason
19d29ac7d0 [HUDI-741] Added checks to validate Hoodie's schema evolution.
HUDI specific validation of schema evolution should ensure that a newer schema can be used for the dataset by checking that the data written using the old schema can be read using the new schema.

Code changes:

1. Added a new config in HoodieWriteConfig to enable schema validation check (disabled by default)
2. Moved code that reads schema from base/log files into hudi-common from hudi-hive-sync
3. Added writerSchema to the extraMetadata of compaction commits in MOR table. This is same as that for commits on COW table.

Testing changes:

4. Extended TestHoodieClientBase to add insertBatch API which allows inserting a new batch of unique records into a HUDI table
5. Added a unit test to verify schema evolution for both COW and MOR tables.
6. Added unit tests for schema compatiblity checks.
2020-04-15 23:34:59 -07:00
vinoth chandar
661b0b3bab [HUDI-761] Refactoring rollback and restore actions using the ActionExecutor abstraction (#1492)
- rollback() and restore() table level APIs introduced
- Restore is implemented by wrapping calls to rollback executor
- Existing tests transparently cover this, since its just a refactor
2020-04-13 08:29:19 -07:00
Balaji Varadarajan
17bf930342 [HUDI-770] Organize upsert/insert API implementation under a single package (#1495) 2020-04-12 23:11:00 -07:00
Pratyaksh Sharma
6d7ca2cf7e [HUDI-727]: Copy default values of fields if not present when rewriting incoming record with new schema (#1427) 2020-04-12 17:55:26 -07:00
Shen Hong
5d717a28f4 [HUDI-782] Add support of Aliyun object storage service. (#1506) 2020-04-12 10:06:30 +08:00
satishkotha
c0f96e0726 [HUDI-687] Stop incremental reader on RO table when there is a pending compaction (#1396) 2020-04-10 10:45:41 -07:00
Ramachandran Madtas Subramaniam
f5f34bb1c1 [HUDI-568] Improve unit test coverage
Classes improved:
* HoodieTableMetaClient
* RocksDBDAO
* HoodieRealtimeFileSplit
2020-04-09 10:15:34 -07:00
Zhiyuan Zhao
b5d093a21b [MINOR] Clear up the redundant comment. (#1489) 2020-04-06 16:31:54 +08:00
vinoth chandar
eaf6cc2d90 [HUDI-756] Organize Cleaning Action execution into a single package in hudi-client (#1485)
- Introduced a thin abstraction ActionExecutor, that all actions will implement
- Pulled cleaning code from table, writeclient into a single package
- CleanHelper is now CleanPlanner, HoodieCleanClient is no longer around
- Minor refactor of HoodieTable factory method
- HoodieTable.create() methods with and without metaclient passed in
- HoodieTable constructor now does not do a redundant instantiation
- Fixed existing unit tests to work at the HoodieWriteClient level
2020-04-04 00:07:34 -07:00
Shaofeng Shi
78b3194e82 [HUDI-751] Fix some coding issues reported by FindBugs (#1470) 2020-03-31 21:19:32 +08:00
lamber-ken
dbc9acd23a [HUDI-716] Exception: Not an Avro data file when running HoodieCleanClient.runClean (#1432) 2020-03-30 11:19:17 -07:00
Suneel Marthi
fa36082554 [HUDI-746] Reduce build warnings < 10 (#1465) 2020-03-30 11:46:52 +08:00
vinoth chandar
e057c27603 [HUDI-744] Restructure hudi-common and clean up files under util packages (#1462)
- Brings more order and cohesion to the classes in hudi-common
 - Utils classes related to a particular concept (avro, timeline,...) are placed near to the package
 - common.fs package now contains all the filesystem level classes including wrapper filesystem
 - bloom.filter package renamed to just bloom
 - config package contains classes that help store properties
 - common.fs.inline package contains all the inline filesystem classes/impl
 - common.table.timeline now consolidates all timeline related classes
 - common.table.view consolidates all the classes related to filesystem view metadata
 - common.table.timeline.versioning contains all classes related to versioning of timeline
 - Fix few unit tests as a result
 - Moved the test packages around to match the source file move
 - Rename AvroUtils to TimelineMetadataUtils & minor fixes/typos
2020-03-29 10:58:49 -07:00
Sivabalan Narayanan
ac73bdcdc3 [HUDI-430] Adding InlineFileSystem to support embedding any file format as an InlineFile (#1176)
* Adding InlineFileSystem to support embedding any file format (parquet, hfile, etc). Supports reading the embedded file using respective readers.
2020-03-28 12:13:35 -04:00
Suneel Marthi
04449f33fe [HUDI-743]: Remove FileIOUtils.close() (#1461) 2020-03-28 18:03:15 +08:00
Suneel Marthi
8c3001363d HUDI-479: Eliminate or Minimize use of Guava if possible (#1159) 2020-03-28 03:11:32 -04:00
Zhiyuan Zhao
0241b21f77 [HUDI-65] commitTime rename to instantTime (#1431) 2020-03-22 18:06:00 -07:00
Zhiyuan Zhao
14e0c95206 [HUDI-400] Check upgrade from old plan to new plan for compaction (#1422)
* Fix NPE when DataFile is null
* Check from old plan upgrade to new plan
2020-03-20 15:13:17 +08:00
Suneel Marthi
99b7e9eb9e [HUDI-629]: Replace Guava's Hashing with an equivalent in NumericUtils.java (#1350)
* [HUDI-629]: Replace Guava's Hashing with an equivalent in NumericUtils.java
2020-03-13 20:28:05 -04:00
lamber-ken
170ee88457 [HUDI-553] Building/Running Hudi on higher java versions (#1369) 2020-03-07 01:27:40 -08:00
Ramachandran M S
9d46ce380a [HUDI -409] Match header and footer block length to improve corrupted block detection (#1332) 2020-03-03 13:26:54 -08:00
hongdd
8306205d7a [HUDI-332]Add operation type (insert/upsert/bulkinsert/delete) to HoodieCommitMetadata (#1157)
[HUDI-332]Add operation type (insert/upsert/bulkinsert/delete) to HoodieCommitMetadata (#1157)
2020-03-03 10:10:29 -08:00
Ramachandran M S
b7f35be452 [HUDI-618] Adding unit tests for PriorityBasedFileSystemView (#1345)
[HUDI-618] Adding unit tests for PriorityBasedFileSystemView
2020-02-26 10:55:02 -08:00
lamber-ken
83c8ad5a38 [HUDI-625] Fixing performance issues around DiskBasedMap & kryo (#1352) 2020-02-24 22:40:37 -08:00
Suneel Marthi
078d4825d9 [HUDI-624]: Split some of the code from PR for HUDI-479 (#1344) 2020-02-21 14:22:21 +08:00
Nishith Agarwal
185ff646ad Refactoring getter to avoid double extrametadata in json representation 2020-02-20 09:52:02 -08:00
Suneel Marthi
f9d2f66dc1 [HUDI-622]: Remove VisibleForTesting annotation and import from code (#1343)
* HUDI:622: Remove VisibleForTesting annotation and import from code
2020-02-20 15:17:53 +08:00
Suneel Marthi
b8f9d0ec45 [HUDI-615]: Add some methods and test cases for StringUtils. (#1338) 2020-02-17 14:13:33 +08:00
Suneel Marthi
24e73816b2 [MINOR] Code Cleanup, remove redundant code (#1337) 2020-02-15 22:03:29 +08:00
lamber-ken
d2c872ede4 [HUDI-605] Avoid calculating the size of schema redundantly (#1317) 2020-02-12 19:40:52 +08:00
Balajee Nagasubramaniam
1fb0b001a3 [HUDI-570] - Improve test coverage for FSUtils.java 2020-02-05 14:25:24 -08:00