1
0
Commit Graph

57 Commits

Author SHA1 Message Date
lamber-ken
425e3e6c78 [HUDI-585] Optimize the steps of building with scala-2.12 (#1293) 2020-02-05 23:13:10 +08:00
Suneel Marthi
5b7bb142dc [HUDI-583] Code Cleanup, remove redundant code, and other changes (#1237) 2020-02-02 18:03:44 +08:00
leesf
652224edc8 [HUDI-578] Trim recordKeyFields and partitionPathFields in ComplexKeyGenerator (#1281)
* [HUDI-578] Trim recordKeyFields and partitionPathFields in ComplexKeyGenerator

* add tests
2020-01-29 16:26:26 -08:00
leesf
6e59c1c777 Moving to 0.5.2-SNAPSHOT on master branch. 2020-01-20 10:51:33 -08:00
Y Ethan Guo
d0ee95ed16 [HUDI-552] Fix the schema mismatch in Row-to-Avro conversion (#1246) 2020-01-18 16:40:56 -08:00
wenningd
292c1e2ff4 [HUDI-238] Make Hudi support Scala 2.12 (#1226)
* [HUDI-238] Rename scala related artifactId & add maven profile to support Scala 2.12
2020-01-17 14:02:21 -08:00
Prashant Wason
0a07752dc0 [HUDI-527] scalastyle-maven-plugin moved to pluginManagement as it is only used in hoodie-spark and hoodie-cli modules.
This fixes compile warnings as well as unnecessary plugin invocation for most of the modules which do not have scala code.
2020-01-17 10:46:10 -08:00
vinoth chandar
c2c0f6b13d [HUDI-509] Renaming code in sync with cWiki restructuring (#1212)
- Storage Type replaced with Table Type (remaining instances)
 - View types replaced with query types;
 - ReadOptimized view referred as Snapshot Query
 - TableFileSystemView sub interfaces renamed to BaseFileOnly and Slice Views
 - HoodieDataFile renamed to HoodieBaseFile
 - Hive Sync tool will register RO tables for MOR with a `_ro` suffix
 - Datasource/Deltastreamer options renamed accordingly
 - Support fallback to old config values as well, so migration is painless
 - Config for controlling _ro suffix addition
 - Renaming DataFile to BaseFile across DTOs, HoodieFileSlice and AbstractTableFileSystemView
2020-01-16 23:58:47 -08:00
Scheller
1daba24065 Add GlobalDeleteKeyGenerator
Adds new GlobalDeleteKeyGenerator for record_key deletes with global indices. Also refactors key generators into their own package.
2020-01-15 17:01:29 -08:00
Sivabalan Narayanan
2248fd9aea Fixing checkstyle issues 2020-01-15 14:21:26 -08:00
Sivabalan Narayanan
2b2f23aa60 Fixing delete util method 2020-01-15 14:21:26 -08:00
Sivabalan Narayanan
87fdb769f0 Adding util methods to assist in adding deletion support to Quick Start 2020-01-15 14:21:26 -08:00
Mehrotra
2bb0c21a3d Fix conversion of Spark struct type to Avro schema
cr https://code.amazon.com/reviews/CR-17184364
2020-01-14 00:27:56 -08:00
Udit Mehrotra
ad50008a59 [HUDI-91][HUDI-12]Migrate to spark 2.4.4, migrate to spark-avro library instead of databricks-avro, add support for Decimal/Date types
- Upgrade Spark to 2.4.4, Parquet to 1.10.1, Avro to 1.8.2
- Remove spark-avro from hudi-spark-bundle. Users need to provide --packages org.apache.spark:spark-avro:2.4.4 when running spark-shell or spark-submit
- Replace com.databricks:spark-avro with org.apache.spark:spark-avro
- Shade avro in hudi-hadoop-mr-bundle to make sure it does not conflict with hive's avro version.
2020-01-12 15:03:11 -08:00
lamber-ken
d9675c4ec0 [HUDI-522] Use the same version jcommander uniformly (#1214) 2020-01-12 10:48:52 -08:00
pratyakshsharma
3c90d252cc [HUDI-114]: added option to overwrite payload implementation in hoodie.properties file 2020-01-09 22:34:40 -08:00
Y Ethan Guo
480fc7869d [HUDI-319] Add a new maven profile to generate unified Javadoc for all Java and Scala classes (#1195)
* Add javadoc build command in README, links to javadoc plugin and rename profile.
* Make java version configurable in one place.
2020-01-08 10:38:09 -08:00
vinoth chandar
9706f659db [HUDI-508] Standardizing on "Table" instead of "Dataset" across code (#1197)
- Docs were talking about storage types before, cWiki moved to "Table"
 - Most of code already has HoodieTable, HoodieTableMetaClient - correct naming
 - Replacing renaming use of dataset across code/comments
 - Few usages in comments and use of Spark SQL DataSet remain unscathed
2020-01-07 12:52:32 -08:00
lamber-ken
75c3f630d4 [HUDI-405] Remove HIVE_ASSUME_DATE_PARTITION_OPT_KEY config from DataSource 2020-01-06 14:25:38 -08:00
Pratyaksh Sharma
8f935e779a [HUDI-406]: added default partition path in TimestampBasedKeyGenerator 2020-01-06 09:38:06 -08:00
hongdd
2d5b79d96f [HUDI-438] Merge duplicated code fragment in HoodieSparkSqlWriter (#1114) 2020-01-06 22:51:22 +08:00
Sivabalan Narayanan
7031445eb3 [HUDI-377] Adding Delete() support to DeltaStreamer (#1073)
- Provides ability to perform hard deletes by writing delete marker records into the source data
- if the record contains a special field _hoodie_delete_marker set to true, deletes are performed
2020-01-04 11:07:31 -08:00
Pratyaksh Sharma
dde21e7315 [HUDI-402]: code clean up in test cases 2019-12-31 11:10:49 -08:00
vinoth chandar
350b0ecb4d [HUDI-311] : Support for AWS Database Migration Service in DeltaStreamer
- Add a transformer class, that adds `Op` fiels if not found in input frame
 - Add a payload implementation, that issues deletes when Op=D
 - Remove Parquet as a top level source type, consolidate with RowSource
 - Made delta streamer work without a property file, simply using overridden cli options
 - Unit tests for transformer/payload classes
2019-12-23 20:56:55 -08:00
lamber-ken
313fab5fd1 [HUDI-444] Refactor the codes based on scala codestyle ReturnChecker rule (#1121) 2019-12-24 07:05:54 +08:00
YanJia-Gary-Li
36b3b6f5dd [HUDI-415] Get commit time when Spark start (#1113) 2019-12-19 22:19:06 -08:00
lamber-ken
a405d3873b [MINOR] replace scala map add operator (#1093)
replace ++: with ++
2019-12-12 11:29:17 +08:00
lamber-ken
ba514cfea0 [MINOR] Remove redundant plus operator (#1097) 2019-12-12 05:42:05 +08:00
lamber-ken
d447e2d751 [checkstyle] Unify LOG form (#1092) 2019-12-10 19:23:38 +08:00
Wenning Ding
e555aa516d [HUDI-353] Add hive style partitioning path 2019-12-09 12:29:53 -08:00
lamber-ken
2745b7552f [HUDI-379] Refactor the codes based on new JavadocStyle code style rule (#1079) 2019-12-06 12:59:28 +08:00
hongdd
b65a897856 [HUDI-374] Unable to generateUpdates in QuickstartUtils (#1059) 2019-11-30 11:11:00 -08:00
lamber-ken
024230fbd2 [HUDI-372] Support the shortName for Hudi DataSource (#1054)
- Ability to do `spark.write.format("hudi")...`
2019-11-30 08:02:33 -08:00
谢磊
f9139c0f61 [HUDI-366] Refactor some module codes based on new ImportOrder code style rule (#1055)
[HUDI-366] Refactor hudi-hadoop-mr / hudi-timeline-service / hudi-spark / hudi-integ-test / hudi- utilities based on new ImportOrder code style rule
2019-11-27 21:32:43 +08:00
bschell
60fed21dc7 [HUDI-327] Add null/empty checks to key generators (#1040)
* Adds null and empty checks to all key generators. 
* Also improves error messaging for key generator issues.
2019-11-26 02:37:16 -08:00
filippo balicchia
845a0509b3 [MINOR] Some minor optimizations in HoodieJavaStreamingApp (#1046) 2019-11-25 18:49:13 +08:00
Sivabalan Narayanan
c3355109b1 [HUDI-328] Adding delete api to HoodieWriteClient (#1004)
[HUDI-328]  Adding delete api to HoodieWriteClient and Spark DataSource
2019-11-22 15:05:25 -08:00
hongdd
7bc08cbfdc [HUDI-345] Fix used deprecated function (#1024)
- Schema.parse() with new Schema.Parser().parse
- FSDataOutputStream constructor
2019-11-22 03:32:09 -08:00
谢磊
804e348d0e [HUDI-346] Set allowMultipleEmptyLines to false for EmptyLineSeparator rule (#1025) 2019-11-19 18:44:42 +08:00
vinoth chandar
e4c91ed13f [HUDI-290] Normalize test class name of all test classes (#951) 2019-10-22 20:19:11 -07:00
Balaji Varadarajan
77f4e73615 [HUDI-121] Fix licensing issues found during RC voting by general incubator group 2019-10-16 02:09:02 -07:00
leesf
b19bed442d [HUDI-296] Explore use of spotless to auto fix formatting errors (#945)
- Add spotless format fixing to project
- One time reformatting for conformity
- Build fails for formatting changes and mvn spotless:apply autofixes them
2019-10-10 05:19:40 -07:00
Balaji Varadarajan
9b66ea41fd [HUDI-121] Remove leftover notice file and replace com.uber.hoodie with org.apache.hudi in log4j properties 2019-10-04 09:18:57 -07:00
Balaji Varadarajan
6da2f9ac7c [HUDI-287] Address comments during review of release candidate
1. Remove LICENSE and NOTICE files in hoodie child modules.
  2. Remove developers and contributor section from pom
  3. Also ensure any failures in validation script is reported appropriately
  4. Make hoodie parent pom consistent with that of its parent apache-21 (https://github.com/apache/maven-apache-parent/blob/apache-21/pom.xml)
2019-10-03 09:00:07 -07:00
Balaji Varadarajan
6e8a28bcae HUDI-121 : Address comments during RC2 voting
1. Remove dnl utils jar from git
2. Add LICENSE Headers in missing files
3. Fix NOTICE and LICENSE in all HUDI packages and in top-level
4. Fix License wording in certain HUDI source files
5. Include non java/scala code in RAT licensing check
6. Use whitelist to include dependencies as part of timeline-server bundling
2019-09-30 15:42:15 -07:00
Bhavani Sudha Saktheeswaran
50a073ff57 [HUDI-271] Create QuickstartUtils for simplifying quickstart guide
- This will be used in Quickstart guide (Doc changes to follow in a seperate PR). The intention is to simplify quickstart to showcase hudi APIs by writing and reading using spark datasources.
- This is located in hudi-spark module intentionally to bring all the necessary classes in hudi-spark-bundle finally.
2019-09-30 15:22:18 -07:00
Vinoth Chandar
e217db56ab [HUDI-254]: Bundle and shade databricks/avro with spark bundle
- spark 2.4 onwards, spark has built in support. shading to avoid conflicts
 - spark 2.3 still needs this bundled, so that dropping bundle into jars folder would work
2019-09-17 12:38:51 -07:00
Balaji Varadarajan
c1e7d0e5a6 [HUDI-121] Update Release notes and fix master version 2019-09-17 09:50:30 -07:00
Balaji Varadarajan
7190c022bb [HUDI-249] Updating Notice files 2019-09-13 13:50:58 -07:00
Balaji Varadarajan
d2525c31b7 Moving to 0.6.0-SNAPSHOT on master branch. 2019-09-13 09:58:29 -07:00