1
0
Commit Graph

634 Commits

Author SHA1 Message Date
bschell
60fed21dc7 [HUDI-327] Add null/empty checks to key generators (#1040)
* Adds null and empty checks to all key generators. 
* Also improves error messaging for key generator issues.
2019-11-26 02:37:16 -08:00
Pratyaksh Sharma
2a4cfb47c7 [HUDI-340]: made max events to read from kafka source configurable (#1039) 2019-11-26 18:34:02 +08:00
hongdd
a7e07cd910 [HUDI-359] Add hudi-env for hudi-cli module (#1042) 2019-11-25 13:25:42 -08:00
谢磊
212282c8aa [HUDI-358] Add Java-doc and importOrder checkstyle rule (#1043)
- import groups are separated by one blank line
- org.apache.hudi.* at the top location
2019-11-25 11:36:23 -08:00
hongdd
44823041a3 [HUDI-362] Adds a check for the existence of field (#1047) 2019-11-25 11:31:07 -08:00
filippo balicchia
845a0509b3 [MINOR] Some minor optimizations in HoodieJavaStreamingApp (#1046) 2019-11-25 18:49:13 +08:00
Sivabalan Narayanan
c3355109b1 [HUDI-328] Adding delete api to HoodieWriteClient (#1004)
[HUDI-328]  Adding delete api to HoodieWriteClient and Spark DataSource
2019-11-22 15:05:25 -08:00
hongdd
7bc08cbfdc [HUDI-345] Fix used deprecated function (#1024)
- Schema.parse() with new Schema.Parser().parse
- FSDataOutputStream constructor
2019-11-22 03:32:09 -08:00
Gurudatt Kulkarni
17eaf41c54 [HUDI-348] Add Issue template for the project (#1029) 2019-11-22 03:25:52 -08:00
Pratyaksh Sharma
1e14390719 [HUDI-350]: updated default value of config.getCleanerCommitsRetained() in javadocs 2019-11-20 06:50:04 -08:00
lamber-ken
d9fbe33339 [HOTFIX] Fix error configuration item of dockerfile-maven-plugin 2019-11-19 16:30:03 -08:00
谢磊
804e348d0e [HUDI-346] Set allowMultipleEmptyLines to false for EmptyLineSeparator rule (#1025) 2019-11-19 18:44:42 +08:00
谢磊
66492498f7 [HUDI-342] add pull request template for hudi project (#1022) 2019-11-19 14:06:01 +08:00
b_rousseau
e806eb797f [HUDI-339] Add support of Azure cloud storage (#1019)
- Add Azure WASB (BLOB) and ADLS storage in StorageSchemes enum
- Update testStorageSchemes to test new added storage
2019-11-17 14:29:24 -08:00
Nishith Agarwal
f82e58994e - Ensure that rollback instant is always created before the next commit instant.
This especially affects IncrementalPull for MOR tables since we can end up pulling in
  log blocks for uncommitted data
- Ensure that generated commit instants are 1 second apart
2019-11-17 14:11:26 -08:00
Nishith Agarwal
3a05edab01 - Fixing RT queries for HiveOnSpark that causes race conditions
- Adding more comments to understand usage of reader/writer schema
2019-11-16 13:46:47 -08:00
谢磊
22315a887f [HOTFIX] fix missing version of rat-plugin (#1015) 2019-11-13 21:18:46 -08:00
Balaji Varadarajan
f7c2f8cedc [HUDI-329] Presto Containers for integration test must allow newly built local jars to override 2019-11-13 17:35:34 -08:00
Mehrotra
92c69f5703 Migrate integration tests to spark 2.4.4 2019-11-13 16:53:24 -08:00
lamber-ken
045fa87a3d [HUDI-330] add EmptyStatement java checkstyle rule 2019-11-13 14:11:11 -08:00
Balaji Varadarajan
8ff06ddb0f [HUDI-80] Leverage Commit metadata to figure out partitions to be cleaned for Cleaning by commits mode (#1008) 2019-11-12 06:12:44 -08:00
Udit Mehrotra
0bb5999f79 [HUDI-306] Support Glue catalog and other hive metastore implementations (#961)
- Support Glue catalog and other metastore implementations
- Remove shading from hudi utilities bundle
- Add maven profile to optionally shade hive in utilities bundle
2019-11-11 17:27:31 -08:00
Balaji Varadarajan
1032fc3e54 [HUDI-137] Hudi cleaning state changes should be consistent with compaction actions
Before this change, Cleaner performs cleaning of old file versions and then stores the deleted files in .clean files.
With this setup, we will not be able to track file deletions if a cleaner fails after deleting files but before writing .clean metadata.
This is fine for regular file-system view generation but Incremental timeline syncing relies on clean/commit/compaction metadata to keep a consistent file-system view.

Cleaner state transitions is now similar to that of compaction.

1. Requested : HoodieWriteClient.scheduleClean() selects the list of files that needs to be deleted and stores them in metadata
2. Inflight : HoodieWriteClient marks the state to be inflight before it starts deleting
3. Completed : HoodieWriteClient marks the state after completing the deletion according to the cleaner plan
2019-11-11 10:40:16 -08:00
Sivabalan Narayanan
23b303e4b1 [HUDI-218] Adding Presto support to Integration Test (#1003) 2019-11-11 06:21:49 -08:00
Pratyaksh Sharma
5f1309407a [HUDI-253]: added validations for schema provider class (#995) 2019-11-11 06:03:44 -08:00
vinoth chandar
1483b97018 [DOCS] Change Hudi acronyms to plural 2019-11-10 12:39:58 -08:00
Jeff G
1ce3d891ce [DOCS] Update to align with original Uber whitepaper (#999) 2019-11-10 12:38:13 -08:00
pratyakshsharma
0863b1cfd9 [HUDI-245]: replaced instances of getInstants() and reverse() with getReverseOrderedInstants() (#1000) 2019-11-07 08:42:48 -08:00
pratyakshsharma
20871a17b2 [HUDI-302]: simplified countInstants() method in HoodieDefaultTimeline (#997) 2019-11-06 12:56:09 -08:00
Gurudatt Kulkarni
71ac2c0d5e [HUDI-324] TimestampKeyGenerator should support milliseconds (#993) 2019-11-05 04:22:14 -08:00
Bhavani Sudha Saktheeswaran
04834817c8 [MINOR] Add features and instructions to build Hudi in README (#992) 2019-11-03 01:48:06 -08:00
Raymond Xu
91740635b2 [HUDI-321] Support bulkinsert in HDFSParquetImporter (#987)
- Add bulk insert feature
- Fix some minor issues
2019-11-02 23:12:44 -07:00
Wenning Ding
bd77dc792c Add MOR integration testing 2019-11-02 19:49:04 -07:00
Wenning Ding
b6057c5e0e [HUDI-314] Fix multi partition keys error when querying a realtime table 2019-11-02 19:49:04 -07:00
Balaji Varadarajan
a6390aefc4 [HUDI-312] Make docker hdfs cluster ephemeral. This is needed to fix flakiness in integration tests. Also, Fix DeltaStreamer hanging issue due to uncaught exception 2019-11-01 11:49:59 -07:00
dependabot[bot]
144ea4eedf Bump httpclient from 4.3.2 to 4.3.6 (#980)
Bumps httpclient from 4.3.2 to 4.3.6.

Signed-off-by: dependabot[bot] <support@github.com>
2019-11-01 05:22:31 -07:00
dependabot[bot]
74d8e625c5 Bump checkstyle from 8.8 to 8.18 (#981)
Bumps [checkstyle](https://github.com/checkstyle/checkstyle) from 8.8 to 8.18.
- [Release notes](https://github.com/checkstyle/checkstyle/releases)
- [Commits](https://github.com/checkstyle/checkstyle/compare/checkstyle-8.8...checkstyle-8.18)

Signed-off-by: dependabot[bot] <support@github.com>
2019-11-01 05:06:03 -07:00
Wenning Ding
ee0fd06de7 synchronized lock on conf object instead of class 2019-10-31 21:54:27 -07:00
Wenning Ding
3251d62bd3 [HUDI-313] Fix select count star error when querying a realtime table 2019-10-31 21:54:27 -07:00
Guru107
eda472adb0 [MINOR] Fix avro schema warnings in build 2019-10-31 21:49:38 -07:00
leesf
7c7403a59d [MINOR] fix annotation in teardown (#990) 2019-10-31 07:59:35 -07:00
leesf
b0838d25f7 [MINOR] Fix no output in travis (#984) 2019-10-29 21:17:45 -07:00
leesf
ef5001e432 [MINOR] Fix vm crashes (#979) 2019-10-28 16:25:07 -07:00
Balaji Varadarajan
c23da694cc [HUDI-169] Speed up rolling back of instants (#968) 2019-10-24 19:34:00 -07:00
Balaji Varadarajan
d8be818ac9 [HUDI-130] Paths written in compaction plan needs to be relative to base-path 2019-10-23 02:52:24 -07:00
vinoth chandar
e4c91ed13f [HUDI-290] Normalize test class name of all test classes (#951) 2019-10-22 20:19:11 -07:00
Gurudatt Kulkarni
031b067a3a [MINOR] Move all repository declarations to parent pom (#966) 2019-10-22 20:17:13 -07:00
Amit Prabhu
4529f535b2 [MINOR] Add backtick escape while syncing partition fields (#967) 2019-10-22 20:16:16 -07:00
Balaji Varadarajan
14dd649d06 [MINOR] Remove release notes and move confluent repository to hoodie parent pom 2019-10-21 14:16:05 -07:00
vinoth chandar
dfdc0e40e1 [HUDI-283] : Ensure a sane minimum for merge buffer memory (#964)
- Some environments e.g spark-shell provide 0 for memory size
- This causes unnecessary performance degradation
2019-10-20 21:00:04 -07:00