1
0
Commit Graph

75 Commits

Author SHA1 Message Date
wenningd
bf1d36fa63 [HUDI-1087] Handle decimal type for realtime record reader with SparkSQL (#1831)
Co-authored-by: Wenning Ding <wenningd@amazon.com>
2020-07-15 07:30:58 -07:00
Satish Kotha
086853c004 [HUDI-1080] Fix backward compatibility for com.uber inputformats 2020-07-08 15:30:07 -07:00
andreitaleanu
37ea79566d [HUDI-539] Make HoodieROTablePathFilter implement Configurable (#1784)
Co-authored-by: Andrei Taleanu <taleanu@adobe.com>
2020-07-03 13:39:53 -07:00
Prashant Wason
2603cfb33e [HUDI-684] Introduced abstraction for writing and reading different types of base file formats. (#1687)
Notable changes:
    1. HoodieFileWriter and HoodieFileReader abstractions for writer/reader side of a base file format
    2. HoodieDataBlock abstraction for creation specific data blocks for base file formats. (e.g. Parquet has HoodieAvroDataBlock)
    3. All hardocded references to Parquet / Parquet based classes have been abstracted to call methods which accept a base file format
    4. HiveSyncTool accepts the base file format as a CLI parameter
    5. HoodieDeltaStreamer accepts the base file format as a CLI parameter
    6. HoodieSparkSqlWriter accepts the base file format as a parameter
2020-06-25 23:46:55 -07:00
Shen Hong
89e37d5273 [HUDI-908] Add some data types to HoodieTestDataGenerator and fix some some bugs. (#1690) 2020-06-22 08:13:28 -07:00
Satish Kotha
a7fd331624 Add unit test for snapshot reads in hadoop-mr 2020-06-13 10:23:05 -07:00
Gary Li
37838cea60 [HUDI-822] decouple Hudi related logics from HoodieInputFormat (#1592)
- Refactoring business logic out of InputFormat into Utils helpers.
2020-06-09 06:10:16 -07:00
lw0090
9e07cebece [HUDI-974] Fix fields out of order in MOR mode when using Hive (#1711) 2020-06-09 09:22:06 +08:00
Wenning Ding
7d40f19f39 HUDI-515 Resolve API conflict for Hive 2 & Hive 3 2020-06-08 14:18:38 -07:00
Shen Hong
2901f5423a [HUDI-1002] Ignore case when setting incremental mode in hive query (#1715) 2020-06-08 19:38:32 +08:00
hj2016
e0a5e0d343 [HUDI-1000] Fix incremental query for COW non-partitioned table with no data (#1708) 2020-06-08 15:34:42 +08:00
Yajun Luo
a9a97d6af4 [HUDI-934] Add processing logic for the decimal LogicalType (#1677) 2020-06-02 19:50:55 +08:00
Raymond Xu
03f136361a [HUDI-811] Restructure test packages in hudi-common (#1644)
* [HUDI-811] Restructure test packages in hudi-common
2020-05-27 16:28:17 +08:00
Raymond Xu
0d4848b68b [HUDI-811] Restructure test packages (#1607)
* restructure hudi-spark tests
* restructure hudi-timeline-service tests
* restructure hudi-hadoop-mr hudi-utilities tests
* restructure hudi-hive-sync tests
2020-05-13 15:37:03 -07:00
Raymond Xu
366bb10d8c [HUDI-812] Migrate hudi common tests to JUnit 5 (#1590)
* [HUDI-812] Migrate hudi-common tests to JUnit 5
2020-05-06 19:15:20 +08:00
bschell
e21441ad83 Add changes for presto mor queries (#1578)
Adds the neccessary changes to hudi for support of presto querying hudi
merge-on-read table's realtime view.

Co-authored-by: Brandon Scheller <bschelle@amazon.com>
2020-05-04 11:27:14 -07:00
Raymond Xu
6e15eebd81 [HUDI-809] Migrate CommonTestHarness to JUnit 5 (#1530) 2020-04-22 14:10:25 +08:00
n3nash
332072bc6d [HUDI-371] Supporting hive combine input format for realtime tables (#1503) 2020-04-20 20:40:06 -07:00
Raymond Xu
acdc4a8d00 [HUDI-798] Migrate to Mockito Jupiter for JUnit 5 (#1521) 2020-04-16 16:07:32 +08:00
Raymond Xu
d65efe659d [HUDI-780] Migrate test cases to Junit 5 (#1504) 2020-04-15 12:35:01 -07:00
Pratyaksh Sharma
6d7ca2cf7e [HUDI-727]: Copy default values of fields if not present when rewriting incoming record with new schema (#1427) 2020-04-12 17:55:26 -07:00
satishkotha
c0f96e0726 [HUDI-687] Stop incremental reader on RO table when there is a pending compaction (#1396) 2020-04-10 10:45:41 -07:00
Ramachandran Madtas Subramaniam
f5f34bb1c1 [HUDI-568] Improve unit test coverage
Classes improved:
* HoodieTableMetaClient
* RocksDBDAO
* HoodieRealtimeFileSplit
2020-04-09 10:15:34 -07:00
Abhishek Modi
996f761232 Trying git merge --squash 2020-04-09 08:18:02 -07:00
Ramachandran Madtas Subramaniam
639ec20412 [HUDI-562] Enable testing at debug log level
This is to ensure that tests will execute all code paths, even the ones
written under DEBUG log levels. This will improve coverage as well as
ensure there are no surprised when DEBUG log level is enabled in
production.
2020-04-02 11:14:35 -07:00
Shaofeng Shi
78b3194e82 [HUDI-751] Fix some coding issues reported by FindBugs (#1470) 2020-03-31 21:19:32 +08:00
Suneel Marthi
fa36082554 [HUDI-746] Reduce build warnings < 10 (#1465) 2020-03-30 11:46:52 +08:00
vinoth chandar
e057c27603 [HUDI-744] Restructure hudi-common and clean up files under util packages (#1462)
- Brings more order and cohesion to the classes in hudi-common
 - Utils classes related to a particular concept (avro, timeline,...) are placed near to the package
 - common.fs package now contains all the filesystem level classes including wrapper filesystem
 - bloom.filter package renamed to just bloom
 - config package contains classes that help store properties
 - common.fs.inline package contains all the inline filesystem classes/impl
 - common.table.timeline now consolidates all timeline related classes
 - common.table.view consolidates all the classes related to filesystem view metadata
 - common.table.timeline.versioning contains all classes related to versioning of timeline
 - Fix few unit tests as a result
 - Moved the test packages around to match the source file move
 - Rename AvroUtils to TimelineMetadataUtils & minor fixes/typos
2020-03-29 10:58:49 -07:00
Suneel Marthi
8c3001363d HUDI-479: Eliminate or Minimize use of Guava if possible (#1159) 2020-03-28 03:11:32 -04:00
Zhiyuan Zhao
0241b21f77 [HUDI-65] commitTime rename to instantTime (#1431) 2020-03-22 18:06:00 -07:00
vinoth chandar
e3019031d8 [HUDI-539] Make ROPathFilter conf member serializable (#1415) 2020-03-17 12:52:48 -07:00
bschell
418f9bb2e9 Add constructor to HoodieROTablePathFilter (#1413)
Allows HoodieROTablePathFilter to accept a configuration for
initializing the filesystem. This fixes a bug with Presto's use of this
pathfilter.

Co-authored-by: Brandon Scheller <bschelle@amazon.com>
2020-03-16 15:19:16 -07:00
Suneel Marthi
99b7e9eb9e [HUDI-629]: Replace Guava's Hashing with an equivalent in NumericUtils.java (#1350)
* [HUDI-629]: Replace Guava's Hashing with an equivalent in NumericUtils.java
2020-03-13 20:28:05 -04:00
yanghua
0dc8e493aa Moving to 0.6.0-SNAPSHOT on master branch. 2020-03-01 15:08:30 +08:00
Ramachandran M S
acf359c834 [HUDI-627] Aggregate code coverage and publish to codecov.io during CI (#1347) 2020-02-27 13:54:20 -08:00
Suneel Marthi
24e73816b2 [MINOR] Code Cleanup, remove redundant code (#1337) 2020-02-15 22:03:29 +08:00
Suneel Marthi
594da28fbf [HUDI-595] code cleanup, refactoring code out of PR# 1159 (#1302) 2020-02-04 21:52:03 +08:00
Suneel Marthi
5b7bb142dc [HUDI-583] Code Cleanup, remove redundant code, and other changes (#1237) 2020-02-02 18:03:44 +08:00
lamber-ken
c06ec8bfc7 [MINOR] Fix assigning to configuration more times (#1291) 2020-01-29 17:18:35 -05:00
leesf
6e59c1c777 Moving to 0.5.2-SNAPSHOT on master branch. 2020-01-20 10:51:33 -08:00
vinoth chandar
c2c0f6b13d [HUDI-509] Renaming code in sync with cWiki restructuring (#1212)
- Storage Type replaced with Table Type (remaining instances)
 - View types replaced with query types;
 - ReadOptimized view referred as Snapshot Query
 - TableFileSystemView sub interfaces renamed to BaseFileOnly and Slice Views
 - HoodieDataFile renamed to HoodieBaseFile
 - Hive Sync tool will register RO tables for MOR with a `_ro` suffix
 - Datasource/Deltastreamer options renamed accordingly
 - Support fallback to old config values as well, so migration is painless
 - Config for controlling _ro suffix addition
 - Renaming DataFile to BaseFile across DTOs, HoodieFileSlice and AbstractTableFileSystemView
2020-01-16 23:58:47 -08:00
Bhavani Sudha Saktheeswaran
d09eacdc13 [HUDI-25] Optimize HoodieInputformat.listStatus() for faster Hive incremental queries on Hoodie
Summary:
    - InputPathHandler class classifies  inputPaths into incremental, non incremental and non hoodie paths.
    - Incremental queries leverage HoodieCommitMetadata to get partitions that are affected and only lists those partitions as opposed to listing all partitions
    - listStatus() processes each category separately
2020-01-08 14:53:05 -08:00
vinoth chandar
9706f659db [HUDI-508] Standardizing on "Table" instead of "Dataset" across code (#1197)
- Docs were talking about storage types before, cWiki moved to "Table"
 - Most of code already has HoodieTable, HoodieTableMetaClient - correct naming
 - Replacing renaming use of dataset across code/comments
 - Few usages in comments and use of Spark SQL DataSet remain unscathed
2020-01-07 12:52:32 -08:00
Abhishek Modi
b5df6723a2 [HUDI-464] Use Hive Exec Core for tests (#1125) 2020-01-06 16:32:55 -08:00
Pratyaksh Sharma
dde21e7315 [HUDI-402]: code clean up in test cases 2019-12-31 11:10:49 -08:00
lamber-ken
ab6ae5cebb [HUDI-482] Fix missing @Override annotation on methods (#1156)
* [HUDI-482] Fix missing @Override annotation on methods
2019-12-31 11:44:56 +08:00
Balaji Varadarajan
9a1f698eef [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset 2019-12-15 21:26:30 -08:00
lamber-ken
ba514cfea0 [MINOR] Remove redundant plus operator (#1097) 2019-12-12 05:42:05 +08:00
lamber-ken
d447e2d751 [checkstyle] Unify LOG form (#1092) 2019-12-10 19:23:38 +08:00
lamber-ken
2745b7552f [HUDI-379] Refactor the codes based on new JavadocStyle code style rule (#1079) 2019-12-06 12:59:28 +08:00