1
0
Commit Graph

56 Commits

Author SHA1 Message Date
lamber-ken
c06ec8bfc7 [MINOR] Fix assigning to configuration more times (#1291) 2020-01-29 17:18:35 -05:00
Balaji Varadarajan
ba54a7e973 [HUDI-559] : Make the timeline layout version default to be null version 2020-01-20 00:02:55 -08:00
vinoth chandar
c2c0f6b13d [HUDI-509] Renaming code in sync with cWiki restructuring (#1212)
- Storage Type replaced with Table Type (remaining instances)
 - View types replaced with query types;
 - ReadOptimized view referred as Snapshot Query
 - TableFileSystemView sub interfaces renamed to BaseFileOnly and Slice Views
 - HoodieDataFile renamed to HoodieBaseFile
 - Hive Sync tool will register RO tables for MOR with a `_ro` suffix
 - Datasource/Deltastreamer options renamed accordingly
 - Support fallback to old config values as well, so migration is painless
 - Config for controlling _ro suffix addition
 - Renaming DataFile to BaseFile across DTOs, HoodieFileSlice and AbstractTableFileSystemView
2020-01-16 23:58:47 -08:00
Y Ethan Guo
b39458b008 [MINOR] Make constant fields final in HoodieTestDataGenerator (#1234) 2020-01-16 12:42:30 +08:00
Mehrotra
2bb0c21a3d Fix conversion of Spark struct type to Avro schema
cr https://code.amazon.com/reviews/CR-17184364
2020-01-14 00:27:56 -08:00
lamber-ken
fd8f1c70c0 [MINOR] Reuse random object (#1222) 2020-01-13 18:26:04 -08:00
lamber-ken
d9675c4ec0 [HUDI-522] Use the same version jcommander uniformly (#1214) 2020-01-12 10:48:52 -08:00
lamber-ken
017ee8e661 [MINOR] Fix partition typo (#1209) 2020-01-12 09:35:55 +08:00
vinoth chandar
9706f659db [HUDI-508] Standardizing on "Table" instead of "Dataset" across code (#1197)
- Docs were talking about storage types before, cWiki moved to "Table"
 - Most of code already has HoodieTable, HoodieTableMetaClient - correct naming
 - Replacing renaming use of dataset across code/comments
 - Few usages in comments and use of Spark SQL DataSet remain unscathed
2020-01-07 12:52:32 -08:00
Sivabalan Narayanan
7031445eb3 [HUDI-377] Adding Delete() support to DeltaStreamer (#1073)
- Provides ability to perform hard deletes by writing delete marker records into the source data
- if the record contains a special field _hoodie_delete_marker set to true, deletes are performed
2020-01-04 11:07:31 -08:00
SteNicholas
726ae47ce2 [MINOR]Optimize hudi-client module (#1139) 2020-01-04 10:57:08 -08:00
Pratyaksh Sharma
dde21e7315 [HUDI-402]: code clean up in test cases 2019-12-31 11:10:49 -08:00
dengziming
2a823f32ee [MINOR]: alter some wrong params which bring fatal exception 2019-12-30 16:50:12 -08:00
Sivabalan Narayanan
9c4217a3e1 [HUDI-389] Fixing Index look up to return right partitions for a given key along with fileId with Global Bloom (#1091)
* Fixing Index look up to return partitions for a given key along with fileId with Global Bloom
* Addressing some of the comments
* Fixing test in TestHoodieGlobalBloomIndex to test the fix
2019-12-24 20:56:30 -08:00
Sivabalan Narayanan
14881e99e0 [HUDI-106] Adding support for DynamicBloomFilter (#976)
- Introduced configs for bloom filter type
- Implemented dynamic bloom filter with configurable max number of keys
- BloomFilterFactory abstractions; Defaults to current simple bloom filter
2019-12-17 19:06:24 -08:00
Balaji Varadarajan
9a1f698eef [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset 2019-12-15 21:26:30 -08:00
lamber-ken
ba514cfea0 [MINOR] Remove redundant plus operator (#1097) 2019-12-12 05:42:05 +08:00
Pratyaksh Sharma
3790b75e05 [HUDI-368] code clean up in TestAsyncCompaction class (#1050) 2019-12-11 05:52:41 +08:00
lamber-ken
d447e2d751 [checkstyle] Unify LOG form (#1092) 2019-12-10 19:23:38 +08:00
lamber-ken
2745b7552f [HUDI-379] Refactor the codes based on new JavadocStyle code style rule (#1079) 2019-12-06 12:59:28 +08:00
lamber-ken
c06d89b648 [HUDI-378] Refactor the rest codes based on new ImportOrder code style rule (#1078) 2019-12-05 17:25:03 +08:00
lamber-ken
b3e0ebbc4a [checkstyle] Add ConstantName java checkstyle rule (#1066)
* add SimplifyBooleanExpression java checkstyle rule
* collapse empty tags in scalastyle file
2019-12-04 18:59:15 +08:00
leesf
98ab33bb6e [HUDI-294] Delete Paths written in Cleaner plan needs to be relative to partition-path (#1062)
[HUDI-294] Delete Paths written in Cleaner plan needs to be relative to partition-path
2019-12-03 10:11:03 -08:00
ForwardXu
0b52ae3ac2 [HUDI-209] Implement JMX metrics reporter (#1045) 2019-11-28 19:17:34 +08:00
lamber-ken
da8d1334ee [HUDI-373] Refactor hudi-client based on new ImportOrder code style rule (#1056) 2019-11-28 09:25:56 +08:00
Sivabalan Narayanan
c3355109b1 [HUDI-328] Adding delete api to HoodieWriteClient (#1004)
[HUDI-328]  Adding delete api to HoodieWriteClient and Spark DataSource
2019-11-22 15:05:25 -08:00
hongdd
7bc08cbfdc [HUDI-345] Fix used deprecated function (#1024)
- Schema.parse() with new Schema.Parser().parse
- FSDataOutputStream constructor
2019-11-22 03:32:09 -08:00
谢磊
804e348d0e [HUDI-346] Set allowMultipleEmptyLines to false for EmptyLineSeparator rule (#1025) 2019-11-19 18:44:42 +08:00
lamber-ken
045fa87a3d [HUDI-330] add EmptyStatement java checkstyle rule 2019-11-13 14:11:11 -08:00
Balaji Varadarajan
8ff06ddb0f [HUDI-80] Leverage Commit metadata to figure out partitions to be cleaned for Cleaning by commits mode (#1008) 2019-11-12 06:12:44 -08:00
Balaji Varadarajan
1032fc3e54 [HUDI-137] Hudi cleaning state changes should be consistent with compaction actions
Before this change, Cleaner performs cleaning of old file versions and then stores the deleted files in .clean files.
With this setup, we will not be able to track file deletions if a cleaner fails after deleting files but before writing .clean metadata.
This is fine for regular file-system view generation but Incremental timeline syncing relies on clean/commit/compaction metadata to keep a consistent file-system view.

Cleaner state transitions is now similar to that of compaction.

1. Requested : HoodieWriteClient.scheduleClean() selects the list of files that needs to be deleted and stores them in metadata
2. Inflight : HoodieWriteClient marks the state to be inflight before it starts deleting
3. Completed : HoodieWriteClient marks the state after completing the deletion according to the cleaner plan
2019-11-11 10:40:16 -08:00
vinoth chandar
e4c91ed13f [HUDI-290] Normalize test class name of all test classes (#951) 2019-10-22 20:19:11 -07:00
Balaji Varadarajan
77f4e73615 [HUDI-121] Fix licensing issues found during RC voting by general incubator group 2019-10-16 02:09:02 -07:00
Udit Mehrotra
12523c379f [HUDI-298] Fix issue with incorrect column mapping casusing bad data, during on-the-fly merge of Real Time tables (#956)
* Fix issue with incorrect column mapping casusing bad data, during on-the-fly merge of Real Time tables
2019-10-16 02:05:53 -07:00
leesf
b19bed442d [HUDI-296] Explore use of spotless to auto fix formatting errors (#945)
- Add spotless format fixing to project
- One time reformatting for conformity
- Build fails for formatting changes and mvn spotless:apply autofixes them
2019-10-10 05:19:40 -07:00
leesf
d050d98071 [HUDI-232] Implement sealing/unsealing for HoodieRecord class (#938) 2019-10-07 10:56:46 -07:00
leesf
7dd9c74b1b [HUDI-285] Implement HoodieStorageWriter based on actual file type (#936) 2019-10-04 07:45:16 -07:00
leesf
3dedc7e5fd [HUDI-265] Failed to delete tmp dirs created in unit tests (#928) 2019-10-03 09:48:13 -07:00
vinoyang
01e803b00e [HUDI-247] Unify the re-initialization of HoodieTableMetaClient in test for hoodie-client module (#930) 2019-09-30 05:38:52 -07:00
Balaji Varadarajan
2ea8b0c3f1 [HUDI-279] Fix regression in Schema Evolution due to PR-755 2019-09-25 22:53:43 -07:00
vinoyang
f020d029c4 HUDI-267 Refactor bad method name HoodieTestUtils#initTableType and HoodieTableMetaClient#initializePathAsHoodieDataset (#916) 2019-09-21 09:05:02 -07:00
Balaji Varadarajan
2c6da09d9d [HUDI-257] Fix Bloom Index unit-test failures 2019-09-17 09:41:15 -07:00
Nishith Agarwal
0b032b2761 Fix requested eompaction rollback during restore command 2019-09-13 12:40:13 -07:00
yanghua
895d732a14 refactor code 2019-09-12 05:15:07 -07:00
yanghua
5f04241fce refactor code: add docs and init/cleanup resource group for hoodie client test base 2019-09-12 05:15:07 -07:00
yanghua
80c27f2351 Optimize hoodie client after implementat auto closeable interface 2019-09-12 05:15:07 -07:00
yanghua
90bfb900aa revert setting jsc spark configuration 2019-09-12 05:15:07 -07:00
yanghua
6f2b166005 [HUDI-217] Provide a unified resource management class to standardize the resource allocation and release for hudi client test cases 2019-09-12 05:15:07 -07:00
Bhavani Sudha Saktheeswaran
64df98fc4a [HUDI-164] Fixes incorrect averageBytesPerRecord
When number of records written is zero, averageBytesPerRecord results in a huge size (division by zero and ceiled to Long.MAX_VALUE) causing OOM. This commit fixes this issue by reverse traversing the commits until a more reasonable average record size can be computed and if that is not possible returns the default configured record size.
2019-09-11 15:20:25 -07:00
Balaji Varadarajan
93bc5e2153 HUDI-243 Rename HoodieInputFormat and HoodieRealtimeInputFormat to HoodieParquetInputFormat and HoodieParquetRealtimeInputFormat 2019-09-11 14:03:01 -07:00