1
0
Commit Graph

105 Commits

Author SHA1 Message Date
Suneel Marthi
fa36082554 [HUDI-746] Reduce build warnings < 10 (#1465) 2020-03-30 11:46:52 +08:00
vinoth chandar
e057c27603 [HUDI-744] Restructure hudi-common and clean up files under util packages (#1462)
- Brings more order and cohesion to the classes in hudi-common
 - Utils classes related to a particular concept (avro, timeline,...) are placed near to the package
 - common.fs package now contains all the filesystem level classes including wrapper filesystem
 - bloom.filter package renamed to just bloom
 - config package contains classes that help store properties
 - common.fs.inline package contains all the inline filesystem classes/impl
 - common.table.timeline now consolidates all timeline related classes
 - common.table.view consolidates all the classes related to filesystem view metadata
 - common.table.timeline.versioning contains all classes related to versioning of timeline
 - Fix few unit tests as a result
 - Moved the test packages around to match the source file move
 - Rename AvroUtils to TimelineMetadataUtils & minor fixes/typos
2020-03-29 10:58:49 -07:00
Sivabalan Narayanan
ac73bdcdc3 [HUDI-430] Adding InlineFileSystem to support embedding any file format as an InlineFile (#1176)
* Adding InlineFileSystem to support embedding any file format (parquet, hfile, etc). Supports reading the embedded file using respective readers.
2020-03-28 12:13:35 -04:00
Suneel Marthi
8c3001363d HUDI-479: Eliminate or Minimize use of Guava if possible (#1159) 2020-03-28 03:11:32 -04:00
Raymond Xu
bc82e2be6c [HUDI-711] Refactor exporter main logic (#1436)
* Refactor exporter main logic
* break main method into multiple readable methods
* fix bug of passing wrong file list
* avoid deleting output path when exists
* throw exception to early abort on multiple cases
* use JavaSparkContext instead of SparkSession
* improve unit test for expected exceptions
2020-03-25 18:02:24 +08:00
Zhiyuan Zhao
0241b21f77 [HUDI-65] commitTime rename to instantTime (#1431) 2020-03-22 18:06:00 -07:00
lamber-ken
38c3ccc51a [HUDI-663] Fix HoodieDeltaStreamer offset not handled correctly (#1377) 2020-03-22 10:31:48 -07:00
Pratyaksh Sharma
1e1d9e1d34 [HUDI-616] Fixed parquet files getting created on local FS (#1434) 2020-03-22 22:19:47 +08:00
Zhiyuan Zhao
06652aa935 [MINOR] Add omissive param desc on method doc and cleanup redundant code (#1437) 2020-03-22 21:39:33 +08:00
Zhiyuan Zhao
8b00791ef4 [MINOR] cleanup redundant comment and unused variable and fix typo (#1435) 2020-03-21 20:12:06 -07:00
Mathieu
eeab532d79 [HUDI-725] Remove init log in the constructor of DeltaSync (#1425) 2020-03-20 17:47:59 +08:00
Mathieu
21c45e1051 [HUDI-726]Delete unused method in HoodieDeltaStreamer (#1426) 2020-03-20 17:44:16 +08:00
Sivabalan Narayanan
a752b7b18c Merge pull request #1165 from yihua/HUDI-76-deltastreamer-csv-source
[HUDI-76] Add CSV Source support for Hudi Delta Streamer
2020-03-19 10:00:53 -04:00
Raymond Xu
779edc0688 [HUDI-344] Add partitioner param to Exporter (#1405) 2020-03-18 19:24:04 +08:00
Y Ethan Guo
cf765df606 [HUDI-76] Add CSV Source support for Hudi Delta Streamer 2020-03-15 19:03:37 -07:00
Raymond Xu
14323cb100 [HUDI-344] Improve exporter tests (#1404) 2020-03-15 20:24:30 +08:00
Suneel Marthi
99b7e9eb9e [HUDI-629]: Replace Guava's Hashing with an equivalent in NumericUtils.java (#1350)
* [HUDI-629]: Replace Guava's Hashing with an equivalent in NumericUtils.java
2020-03-13 20:28:05 -04:00
Sivabalan Narayanan
1ca912af09 [HUDI-667] Fixing delete tests for DeltaStreamer (#1395) 2020-03-11 16:19:23 -07:00
openopen2
44700d531a [HUDI-344] Hudi Dataset Snapshot Exporter (#1360)
Co-authored-by: jason1993 <261049174@qq.com>
2020-03-10 09:17:51 +08:00
hongdd
f93e64fee4 [HUDI-681]Remove embeddedTimelineService from HoodieReadClient (#1388)
* [HUDI-681]Remove embeddedTimelineService from HoodieReadClient
2020-03-09 18:31:04 +08:00
lamber-ken
ccbf543607 [HUDI-654] Rename hudi-hive to hudi-hive-sync 2020-03-06 22:13:16 +08:00
yanghua
0dc8e493aa Moving to 0.6.0-SNAPSHOT on master branch. 2020-03-01 15:08:30 +08:00
vinoth chandar
71170fafe7 [HUDI-554] Cleanup package structure in hudi-client (#1346)
- Just package, class moves and renames with the following intent
 - `client` now has all the various client classes, that do the transaction management
 - `func` renamed to `execution` and some helpers moved to `client/utils`
 - All compaction code under `io` now under `table/compact`
 - Rollback code under `table/rollback` and in general all code for individual operations under `table`
 - `exception` `config`, `metrics` left untouched
 - Moved the tests also accordingly
 - Fixed some flaky tests
2020-02-27 08:05:58 -08:00
Suneel Marthi
078d4825d9 [HUDI-624]: Split some of the code from PR for HUDI-479 (#1344) 2020-02-21 14:22:21 +08:00
Suneel Marthi
f9d2f66dc1 [HUDI-622]: Remove VisibleForTesting annotation and import from code (#1343)
* HUDI:622: Remove VisibleForTesting annotation and import from code
2020-02-20 15:17:53 +08:00
amitsingh-10
c2b08cdfc9 [HUDI-617] Add support for types implementing CharSequence (#1339)
- Data types extending CharSequence implement a #toString method which provides an easy way to convert them to String. 
- For example, org.apache.avro.util.Utf8 is easily convertible into String if we use the toString() method. It's better to make the support more generic to support a wider range of data types as partitionKey.
2020-02-18 11:19:44 -08:00
Mathieu
8c6138cb01 [MINOR] Add javadoc to SchedulerConfGenerator and code clean (#1340) 2020-02-18 11:15:02 -08:00
wangxianghu
aaa6cf9a98 [MINOR] Fix some typos 2020-02-15 09:49:25 +08:00
openopen2
dfbee673ef [HUDI-514] A schema provider to get metadata through Jdbc (#1200) 2020-02-13 18:06:06 -08:00
Mathieu
175de0db7b [MINOR] Fix typo (#1331) 2020-02-13 10:46:10 -08:00
Mathieu
5fdf5a1927 [HUDI-560] Remove legacy IdentityTransformer (#1264) 2020-02-10 10:04:58 +08:00
lamber-ken
46842f4e92 [MINOR] Remove the declaration of thrown RuntimeException (#1305) 2020-02-05 23:23:20 +08:00
lamber-ken
425e3e6c78 [HUDI-585] Optimize the steps of building with scala-2.12 (#1293) 2020-02-05 23:13:10 +08:00
Suneel Marthi
594da28fbf [HUDI-595] code cleanup, refactoring code out of PR# 1159 (#1302) 2020-02-04 21:52:03 +08:00
dengziming
347e297ac1 [HUDI-596] Close KafkaConsumer every time (#1303) 2020-02-03 23:42:21 -08:00
Suneel Marthi
5b7bb142dc [HUDI-583] Code Cleanup, remove redundant code, and other changes (#1237) 2020-02-02 18:03:44 +08:00
leesf
ed54eb20a5 [MINOR] Add missing licenses (#1271) 2020-01-22 08:06:45 -05:00
leesf
6e59c1c777 Moving to 0.5.2-SNAPSHOT on master branch. 2020-01-20 10:51:33 -08:00
Y Ethan Guo
9489d0fb84 [HUDI-551] Abstract a test case class for DFS Source to make it extensible (#1239) 2020-01-19 18:50:12 +08:00
Y Ethan Guo
d0ee95ed16 [HUDI-552] Fix the schema mismatch in Row-to-Avro conversion (#1246) 2020-01-18 16:40:56 -08:00
wenningd
292c1e2ff4 [HUDI-238] Make Hudi support Scala 2.12 (#1226)
* [HUDI-238] Rename scala related artifactId & add maven profile to support Scala 2.12
2020-01-17 14:02:21 -08:00
vinoth chandar
c2c0f6b13d [HUDI-509] Renaming code in sync with cWiki restructuring (#1212)
- Storage Type replaced with Table Type (remaining instances)
 - View types replaced with query types;
 - ReadOptimized view referred as Snapshot Query
 - TableFileSystemView sub interfaces renamed to BaseFileOnly and Slice Views
 - HoodieDataFile renamed to HoodieBaseFile
 - Hive Sync tool will register RO tables for MOR with a `_ro` suffix
 - Datasource/Deltastreamer options renamed accordingly
 - Support fallback to old config values as well, so migration is painless
 - Config for controlling _ro suffix addition
 - Renaming DataFile to BaseFile across DTOs, HoodieFileSlice and AbstractTableFileSystemView
2020-01-16 23:58:47 -08:00
Y Ethan Guo
b39458b008 [MINOR] Make constant fields final in HoodieTestDataGenerator (#1234) 2020-01-16 12:42:30 +08:00
Scheller
1daba24065 Add GlobalDeleteKeyGenerator
Adds new GlobalDeleteKeyGenerator for record_key deletes with global indices. Also refactors key generators into their own package.
2020-01-15 17:01:29 -08:00
Mehrotra
2bb0c21a3d Fix conversion of Spark struct type to Avro schema
cr https://code.amazon.com/reviews/CR-17184364
2020-01-14 00:27:56 -08:00
lamber-ken
fd8f1c70c0 [MINOR] Reuse random object (#1222) 2020-01-13 18:26:04 -08:00
openopen2
a44c61b813 [HUDI-502] provide a custom time zone definition for TimestampBasedKeyGenerator (#1188) 2020-01-12 15:45:23 -08:00
harveyyue
971c7d41bd [HUDI-322] DeltaSteamer should pick checkpoints off only deltacommits for MOR tables 2020-01-12 15:11:47 -08:00
Udit Mehrotra
ad50008a59 [HUDI-91][HUDI-12]Migrate to spark 2.4.4, migrate to spark-avro library instead of databricks-avro, add support for Decimal/Date types
- Upgrade Spark to 2.4.4, Parquet to 1.10.1, Avro to 1.8.2
- Remove spark-avro from hudi-spark-bundle. Users need to provide --packages org.apache.spark:spark-avro:2.4.4 when running spark-shell or spark-submit
- Replace com.databricks:spark-avro with org.apache.spark:spark-avro
- Shade avro in hudi-hadoop-mr-bundle to make sure it does not conflict with hive's avro version.
2020-01-12 15:03:11 -08:00
lamber-ken
d9675c4ec0 [HUDI-522] Use the same version jcommander uniformly (#1214) 2020-01-12 10:48:52 -08:00