1
0
Commit Graph

61 Commits

Author SHA1 Message Date
vinoth chandar
c2c0f6b13d [HUDI-509] Renaming code in sync with cWiki restructuring (#1212)
- Storage Type replaced with Table Type (remaining instances)
 - View types replaced with query types;
 - ReadOptimized view referred as Snapshot Query
 - TableFileSystemView sub interfaces renamed to BaseFileOnly and Slice Views
 - HoodieDataFile renamed to HoodieBaseFile
 - Hive Sync tool will register RO tables for MOR with a `_ro` suffix
 - Datasource/Deltastreamer options renamed accordingly
 - Support fallback to old config values as well, so migration is painless
 - Config for controlling _ro suffix addition
 - Renaming DataFile to BaseFile across DTOs, HoodieFileSlice and AbstractTableFileSystemView
2020-01-16 23:58:47 -08:00
Balajee Nagasubramaniam
dd09abb56d [HUDI-335] Improvements to DiskBasedMap used by ExternalSpillableMap, for write and random/sequential read paths, by introducing bufferedRandmomAccessFile 2020-01-15 16:45:45 -08:00
lamber-ken
7aa3ce31e6 [MINOR] Fix redundant judgment statement (#1231) 2020-01-15 16:30:14 -08:00
liujianhui
c1f8acab34 [HUDI-526] fix the HoodieAppendHandle 2020-01-13 10:44:31 -08:00
pratyakshsharma
3c90d252cc [HUDI-114]: added option to overwrite payload implementation in hoodie.properties file 2020-01-09 22:34:40 -08:00
hongdd
5af3dc6aed [HUDI-331]Fix java docs for all public apis in HoodieWriteClient (#1111) 2020-01-09 16:00:53 +08:00
Wenning Ding
aba83876e7 Update deprecated HBase API 2020-01-08 10:26:47 -08:00
vinoth chandar
9706f659db [HUDI-508] Standardizing on "Table" instead of "Dataset" across code (#1197)
- Docs were talking about storage types before, cWiki moved to "Table"
 - Most of code already has HoodieTable, HoodieTableMetaClient - correct naming
 - Replacing renaming use of dataset across code/comments
 - Few usages in comments and use of Spark SQL DataSet remain unscathed
2020-01-07 12:52:32 -08:00
Balaji Varadarajan
8306f749a2 [HUDI-417] Refactor HoodieWriteClient so that commit logic can be shareable by both bootstrap and normal write operations (#1166) 2020-01-06 20:11:48 -08:00
hejinbiao123
b9fab0b933 Revert "[HUDI-455] Redo hudi-client log statements using SLF4J (#1145)" (#1181)
This reverts commit e637d9ed26.
2020-01-06 21:13:29 +08:00
SteNicholas
726ae47ce2 [MINOR]Optimize hudi-client module (#1139) 2020-01-04 10:57:08 -08:00
hejinbiao123
e637d9ed26 [HUDI-455] Redo hudi-client log statements using SLF4J (#1145)
* [HUDI-455] Redo hudi-client log statements using SLF4J
2019-12-31 13:49:34 +08:00
lamber-ken
ab6ae5cebb [HUDI-482] Fix missing @Override annotation on methods (#1156)
* [HUDI-482] Fix missing @Override annotation on methods
2019-12-31 11:44:56 +08:00
lamber-ken
8440482977 Fix empty content clean plan 2019-12-29 19:03:56 -08:00
lamber-ken
2f254163d4 Skip setting commit metadata 2019-12-29 19:03:56 -08:00
lamber-ken
179837e8ef Fix checkstyle 2019-12-29 19:03:56 -08:00
lamber-ken
58c5bed40a [HUDI-453] Fix throw failed to archive commits error when writing data to MOR/COW table 2019-12-29 19:03:56 -08:00
Mathieu
3c811ec29b [MINOR] fix typos 2019-12-25 20:26:16 +08:00
Sivabalan Narayanan
9c4217a3e1 [HUDI-389] Fixing Index look up to return right partitions for a given key along with fileId with Global Bloom (#1091)
* Fixing Index look up to return partitions for a given key along with fileId with Global Bloom
* Addressing some of the comments
* Fixing test in TestHoodieGlobalBloomIndex to test the fix
2019-12-24 20:56:30 -08:00
dengziming
94aec965f5 [minor] Fix few typos in the java docs (#1132) 2019-12-24 20:44:11 -08:00
Mathieu
41f36770e0 [MINOR] fix typo 2019-12-25 06:48:15 +08:00
Thinking Chen
8172197c35 Fix Error: java.lang.IllegalArgumentException: Can not create a Path from an empty string in HoodieCopyOnWrite#deleteFilesFunc (#1126)
same link in https://github.com/apache/incubator-hudi/pull/771
this time is in HoodieCopyOnWrite deleteFilesFunc method
2019-12-24 14:29:28 +08:00
Sivabalan Narayanan
14881e99e0 [HUDI-106] Adding support for DynamicBloomFilter (#976)
- Introduced configs for bloom filter type
- Implemented dynamic bloom filter with configurable max number of keys
- BloomFilterFactory abstractions; Defaults to current simple bloom filter
2019-12-17 19:06:24 -08:00
Balaji Varadarajan
9a1f698eef [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset 2019-12-15 21:26:30 -08:00
lamber-ken
ba514cfea0 [MINOR] Remove redundant plus operator (#1097) 2019-12-12 05:42:05 +08:00
lamber-ken
d447e2d751 [checkstyle] Unify LOG form (#1092) 2019-12-10 19:23:38 +08:00
Wenning Ding
e555aa516d [HUDI-353] Add hive style partitioning path 2019-12-09 12:29:53 -08:00
lamber-ken
2745b7552f [HUDI-379] Refactor the codes based on new JavadocStyle code style rule (#1079) 2019-12-06 12:59:28 +08:00
lamber-ken
c06d89b648 [HUDI-378] Refactor the rest codes based on new ImportOrder code style rule (#1078) 2019-12-05 17:25:03 +08:00
lamber-ken
b3e0ebbc4a [checkstyle] Add ConstantName java checkstyle rule (#1066)
* add SimplifyBooleanExpression java checkstyle rule
* collapse empty tags in scalastyle file
2019-12-04 18:59:15 +08:00
leesf
98ab33bb6e [HUDI-294] Delete Paths written in Cleaner plan needs to be relative to partition-path (#1062)
[HUDI-294] Delete Paths written in Cleaner plan needs to be relative to partition-path
2019-12-03 10:11:03 -08:00
ForwardXu
0b52ae3ac2 [HUDI-209] Implement JMX metrics reporter (#1045) 2019-11-28 19:17:34 +08:00
lamber-ken
da8d1334ee [HUDI-373] Refactor hudi-client based on new ImportOrder code style rule (#1056) 2019-11-28 09:25:56 +08:00
Sivabalan Narayanan
c3355109b1 [HUDI-328] Adding delete api to HoodieWriteClient (#1004)
[HUDI-328]  Adding delete api to HoodieWriteClient and Spark DataSource
2019-11-22 15:05:25 -08:00
Pratyaksh Sharma
1e14390719 [HUDI-350]: updated default value of config.getCleanerCommitsRetained() in javadocs 2019-11-20 06:50:04 -08:00
谢磊
804e348d0e [HUDI-346] Set allowMultipleEmptyLines to false for EmptyLineSeparator rule (#1025) 2019-11-19 18:44:42 +08:00
Nishith Agarwal
f82e58994e - Ensure that rollback instant is always created before the next commit instant.
This especially affects IncrementalPull for MOR tables since we can end up pulling in
  log blocks for uncommitted data
- Ensure that generated commit instants are 1 second apart
2019-11-17 14:11:26 -08:00
Balaji Varadarajan
8ff06ddb0f [HUDI-80] Leverage Commit metadata to figure out partitions to be cleaned for Cleaning by commits mode (#1008) 2019-11-12 06:12:44 -08:00
Balaji Varadarajan
1032fc3e54 [HUDI-137] Hudi cleaning state changes should be consistent with compaction actions
Before this change, Cleaner performs cleaning of old file versions and then stores the deleted files in .clean files.
With this setup, we will not be able to track file deletions if a cleaner fails after deleting files but before writing .clean metadata.
This is fine for regular file-system view generation but Incremental timeline syncing relies on clean/commit/compaction metadata to keep a consistent file-system view.

Cleaner state transitions is now similar to that of compaction.

1. Requested : HoodieWriteClient.scheduleClean() selects the list of files that needs to be deleted and stores them in metadata
2. Inflight : HoodieWriteClient marks the state to be inflight before it starts deleting
3. Completed : HoodieWriteClient marks the state after completing the deletion according to the cleaner plan
2019-11-11 10:40:16 -08:00
pratyakshsharma
0863b1cfd9 [HUDI-245]: replaced instances of getInstants() and reverse() with getReverseOrderedInstants() (#1000) 2019-11-07 08:42:48 -08:00
Balaji Varadarajan
c23da694cc [HUDI-169] Speed up rolling back of instants (#968) 2019-10-24 19:34:00 -07:00
Balaji Varadarajan
d8be818ac9 [HUDI-130] Paths written in compaction plan needs to be relative to base-path 2019-10-23 02:52:24 -07:00
vinoth chandar
dfdc0e40e1 [HUDI-283] : Ensure a sane minimum for merge buffer memory (#964)
- Some environments e.g spark-shell provide 0 for memory size
- This causes unnecessary performance degradation
2019-10-20 21:00:04 -07:00
Wenning Ding
1c09f5b055 [HUDI-301] fix path error when update a non-partition MOR table 2019-10-16 02:07:33 -07:00
leesf
b19bed442d [HUDI-296] Explore use of spotless to auto fix formatting errors (#945)
- Add spotless format fixing to project
- One time reformatting for conformity
- Build fails for formatting changes and mvn spotless:apply autofixes them
2019-10-10 05:19:40 -07:00
leesf
d050d98071 [HUDI-232] Implement sealing/unsealing for HoodieRecord class (#938) 2019-10-07 10:56:46 -07:00
leesf
7dd9c74b1b [HUDI-285] Implement HoodieStorageWriter based on actual file type (#936) 2019-10-04 07:45:16 -07:00
Balaji Varadarajan
6da2f9ac7c [HUDI-287] Address comments during review of release candidate
1. Remove LICENSE and NOTICE files in hoodie child modules.
  2. Remove developers and contributor section from pom
  3. Also ensure any failures in validation script is reported appropriately
  4. Make hoodie parent pom consistent with that of its parent apache-21 (https://github.com/apache/maven-apache-parent/blob/apache-21/pom.xml)
2019-10-03 09:00:07 -07:00
Balaji Varadarajan
6e8a28bcae HUDI-121 : Address comments during RC2 voting
1. Remove dnl utils jar from git
2. Add LICENSE Headers in missing files
3. Fix NOTICE and LICENSE in all HUDI packages and in top-level
4. Fix License wording in certain HUDI source files
5. Include non java/scala code in RAT licensing check
6. Use whitelist to include dependencies as part of timeline-server bundling
2019-09-30 15:42:15 -07:00
Balaji Varadarajan
2ea8b0c3f1 [HUDI-279] Fix regression in Schema Evolution due to PR-755 2019-09-25 22:53:43 -07:00