1
0
Commit Graph

60 Commits

Author SHA1 Message Date
vinoth chandar
9706f659db [HUDI-508] Standardizing on "Table" instead of "Dataset" across code (#1197)
- Docs were talking about storage types before, cWiki moved to "Table"
 - Most of code already has HoodieTable, HoodieTableMetaClient - correct naming
 - Replacing renaming use of dataset across code/comments
 - Few usages in comments and use of Spark SQL DataSet remain unscathed
2020-01-07 12:52:32 -08:00
Pratyaksh Sharma
dde21e7315 [HUDI-402]: code clean up in test cases 2019-12-31 11:10:49 -08:00
lamber-ken
ab6ae5cebb [HUDI-482] Fix missing @Override annotation on methods (#1156)
* [HUDI-482] Fix missing @Override annotation on methods
2019-12-31 11:44:56 +08:00
lamber-ken
e4ea7a2971 Update comment 2019-12-29 19:03:56 -08:00
lamber-ken
8440482977 Fix empty content clean plan 2019-12-29 19:03:56 -08:00
Mathieu
01c25d6aff [MINOR] Update the java doc of HoodieTableType (#1148) 2019-12-29 09:57:19 +08:00
hongdd
8affdf8bcb [HUDI-416] Improve hint information for cli (#1110) 2019-12-25 20:19:12 +08:00
dengziming
94aec965f5 [minor] Fix few typos in the java docs (#1132) 2019-12-24 20:44:11 -08:00
comsir
dd06660183 [MINOR] fix typo 2019-12-24 20:40:00 -08:00
vinoth chandar
350b0ecb4d [HUDI-311] : Support for AWS Database Migration Service in DeltaStreamer
- Add a transformer class, that adds `Op` fiels if not found in input frame
 - Add a payload implementation, that issues deletes when Op=D
 - Remove Parquet as a top level source type, consolidate with RowSource
 - Made delta streamer work without a property file, simply using overridden cli options
 - Unit tests for transformer/payload classes
2019-12-23 20:56:55 -08:00
Sivabalan Narayanan
14881e99e0 [HUDI-106] Adding support for DynamicBloomFilter (#976)
- Introduced configs for bloom filter type
- Implemented dynamic bloom filter with configurable max number of keys
- BloomFilterFactory abstractions; Defaults to current simple bloom filter
2019-12-17 19:06:24 -08:00
Balaji Varadarajan
9a1f698eef [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset 2019-12-15 21:26:30 -08:00
lamber-ken
ba514cfea0 [MINOR] Remove redundant plus operator (#1097) 2019-12-12 05:42:05 +08:00
lamber-ken
d447e2d751 [checkstyle] Unify LOG form (#1092) 2019-12-10 19:23:38 +08:00
lamber-ken
2745b7552f [HUDI-379] Refactor the codes based on new JavadocStyle code style rule (#1079) 2019-12-06 12:59:28 +08:00
lamber-ken
c06d89b648 [HUDI-378] Refactor the rest codes based on new ImportOrder code style rule (#1078) 2019-12-05 17:25:03 +08:00
lamber-ken
b3e0ebbc4a [checkstyle] Add ConstantName java checkstyle rule (#1066)
* add SimplifyBooleanExpression java checkstyle rule
* collapse empty tags in scalastyle file
2019-12-04 18:59:15 +08:00
vinoyang
84602c8882 [HUDI-355] Refactor hudi-common based on new comment and code style rules (#1049)
[HUDI-355] Refactor hudi-common based on new comment and code style rules
2019-12-03 20:49:13 -08:00
leesf
98ab33bb6e [HUDI-294] Delete Paths written in Cleaner plan needs to be relative to partition-path (#1062)
[HUDI-294] Delete Paths written in Cleaner plan needs to be relative to partition-path
2019-12-03 10:11:03 -08:00
lamber-ken
784e3ad0b6 [HUDI-370] Refactor hudi-common based on new ImportOrder code style rule (#1063) 2019-12-02 06:59:09 +08:00
谢磊
b77fad39b5 [HUDI-364] Refactor hudi-hive based on new ImportOrder code style rule (#1048)
[HUDI-364] Refactor hudi-hive based on new ImportOrder code style rule
2019-11-27 16:30:37 +08:00
wenningd
d6e83e8f49 [HUDI-325] Fix Hive partition error for updated HDFS Hudi table (#1001) 2019-11-26 21:18:39 -08:00
bschell
60fed21dc7 [HUDI-327] Add null/empty checks to key generators (#1040)
* Adds null and empty checks to all key generators. 
* Also improves error messaging for key generator issues.
2019-11-26 02:37:16 -08:00
Sivabalan Narayanan
c3355109b1 [HUDI-328] Adding delete api to HoodieWriteClient (#1004)
[HUDI-328]  Adding delete api to HoodieWriteClient and Spark DataSource
2019-11-22 15:05:25 -08:00
hongdd
7bc08cbfdc [HUDI-345] Fix used deprecated function (#1024)
- Schema.parse() with new Schema.Parser().parse
- FSDataOutputStream constructor
2019-11-22 03:32:09 -08:00
谢磊
804e348d0e [HUDI-346] Set allowMultipleEmptyLines to false for EmptyLineSeparator rule (#1025) 2019-11-19 18:44:42 +08:00
b_rousseau
e806eb797f [HUDI-339] Add support of Azure cloud storage (#1019)
- Add Azure WASB (BLOB) and ADLS storage in StorageSchemes enum
- Update testStorageSchemes to test new added storage
2019-11-17 14:29:24 -08:00
Nishith Agarwal
f82e58994e - Ensure that rollback instant is always created before the next commit instant.
This especially affects IncrementalPull for MOR tables since we can end up pulling in
  log blocks for uncommitted data
- Ensure that generated commit instants are 1 second apart
2019-11-17 14:11:26 -08:00
Balaji Varadarajan
8ff06ddb0f [HUDI-80] Leverage Commit metadata to figure out partitions to be cleaned for Cleaning by commits mode (#1008) 2019-11-12 06:12:44 -08:00
Balaji Varadarajan
1032fc3e54 [HUDI-137] Hudi cleaning state changes should be consistent with compaction actions
Before this change, Cleaner performs cleaning of old file versions and then stores the deleted files in .clean files.
With this setup, we will not be able to track file deletions if a cleaner fails after deleting files but before writing .clean metadata.
This is fine for regular file-system view generation but Incremental timeline syncing relies on clean/commit/compaction metadata to keep a consistent file-system view.

Cleaner state transitions is now similar to that of compaction.

1. Requested : HoodieWriteClient.scheduleClean() selects the list of files that needs to be deleted and stores them in metadata
2. Inflight : HoodieWriteClient marks the state to be inflight before it starts deleting
3. Completed : HoodieWriteClient marks the state after completing the deletion according to the cleaner plan
2019-11-11 10:40:16 -08:00
pratyakshsharma
20871a17b2 [HUDI-302]: simplified countInstants() method in HoodieDefaultTimeline (#997) 2019-11-06 12:56:09 -08:00
Guru107
eda472adb0 [MINOR] Fix avro schema warnings in build 2019-10-31 21:49:38 -07:00
Balaji Varadarajan
d8be818ac9 [HUDI-130] Paths written in compaction plan needs to be relative to base-path 2019-10-23 02:52:24 -07:00
vinoth chandar
e4c91ed13f [HUDI-290] Normalize test class name of all test classes (#951) 2019-10-22 20:19:11 -07:00
Balaji Varadarajan
77f4e73615 [HUDI-121] Fix licensing issues found during RC voting by general incubator group 2019-10-16 02:09:02 -07:00
Udit Mehrotra
12523c379f [HUDI-298] Fix issue with incorrect column mapping casusing bad data, during on-the-fly merge of Real Time tables (#956)
* Fix issue with incorrect column mapping casusing bad data, during on-the-fly merge of Real Time tables
2019-10-16 02:05:53 -07:00
leesf
b19bed442d [HUDI-296] Explore use of spotless to auto fix formatting errors (#945)
- Add spotless format fixing to project
- One time reformatting for conformity
- Build fails for formatting changes and mvn spotless:apply autofixes them
2019-10-10 05:19:40 -07:00
leesf
d050d98071 [HUDI-232] Implement sealing/unsealing for HoodieRecord class (#938) 2019-10-07 10:56:46 -07:00
Balaji Varadarajan
9b66ea41fd [HUDI-121] Remove leftover notice file and replace com.uber.hoodie with org.apache.hudi in log4j properties 2019-10-04 09:18:57 -07:00
leesf
3dedc7e5fd [HUDI-265] Failed to delete tmp dirs created in unit tests (#928) 2019-10-03 09:48:13 -07:00
Balaji Varadarajan
6da2f9ac7c [HUDI-287] Address comments during review of release candidate
1. Remove LICENSE and NOTICE files in hoodie child modules.
  2. Remove developers and contributor section from pom
  3. Also ensure any failures in validation script is reported appropriately
  4. Make hoodie parent pom consistent with that of its parent apache-21 (https://github.com/apache/maven-apache-parent/blob/apache-21/pom.xml)
2019-10-03 09:00:07 -07:00
Balaji Varadarajan
6e8a28bcae HUDI-121 : Address comments during RC2 voting
1. Remove dnl utils jar from git
2. Add LICENSE Headers in missing files
3. Fix NOTICE and LICENSE in all HUDI packages and in top-level
4. Fix License wording in certain HUDI source files
5. Include non java/scala code in RAT licensing check
6. Use whitelist to include dependencies as part of timeline-server bundling
2019-09-30 15:42:15 -07:00
vinoyang
01e803b00e [HUDI-247] Unify the re-initialization of HoodieTableMetaClient in test for hoodie-client module (#930) 2019-09-30 05:38:52 -07:00
Balaji Varadarajan
2ea8b0c3f1 [HUDI-279] Fix regression in Schema Evolution due to PR-755 2019-09-25 22:53:43 -07:00
vinoyang
f020d029c4 HUDI-267 Refactor bad method name HoodieTestUtils#initTableType and HoodieTableMetaClient#initializePathAsHoodieDataset (#916) 2019-09-21 09:05:02 -07:00
Balaji Varadarajan
c1e7d0e5a6 [HUDI-121] Update Release notes and fix master version 2019-09-17 09:50:30 -07:00
Balaji Varadarajan
7190c022bb [HUDI-249] Updating Notice files 2019-09-13 13:50:58 -07:00
Balaji Varadarajan
d2525c31b7 Moving to 0.6.0-SNAPSHOT on master branch. 2019-09-13 09:58:29 -07:00
Balaji Varadarajan
58623631d4 [HUDI-249] Update Release-notes. Add sign-artifacts to POM and release related scripts. Add missing license headers 2019-09-13 08:41:29 -07:00
Bhavani Sudha Saktheeswaran
64df98fc4a [HUDI-164] Fixes incorrect averageBytesPerRecord
When number of records written is zero, averageBytesPerRecord results in a huge size (division by zero and ceiled to Long.MAX_VALUE) causing OOM. This commit fixes this issue by reverse traversing the commits until a more reasonable average record size can be computed and if that is not possible returns the default configured record size.
2019-09-11 15:20:25 -07:00