leesf
ed54eb20a5
[MINOR] Add missing licenses ( #1271 )
2020-01-22 08:06:45 -05:00
Y Ethan Guo
9489d0fb84
[HUDI-551] Abstract a test case class for DFS Source to make it extensible ( #1239 )
2020-01-19 18:50:12 +08:00
Y Ethan Guo
d0ee95ed16
[HUDI-552] Fix the schema mismatch in Row-to-Avro conversion ( #1246 )
2020-01-18 16:40:56 -08:00
wenningd
292c1e2ff4
[HUDI-238] Make Hudi support Scala 2.12 ( #1226 )
...
* [HUDI-238] Rename scala related artifactId & add maven profile to support Scala 2.12
2020-01-17 14:02:21 -08:00
vinoth chandar
c2c0f6b13d
[HUDI-509] Renaming code in sync with cWiki restructuring ( #1212 )
...
- Storage Type replaced with Table Type (remaining instances)
- View types replaced with query types;
- ReadOptimized view referred as Snapshot Query
- TableFileSystemView sub interfaces renamed to BaseFileOnly and Slice Views
- HoodieDataFile renamed to HoodieBaseFile
- Hive Sync tool will register RO tables for MOR with a `_ro` suffix
- Datasource/Deltastreamer options renamed accordingly
- Support fallback to old config values as well, so migration is painless
- Config for controlling _ro suffix addition
- Renaming DataFile to BaseFile across DTOs, HoodieFileSlice and AbstractTableFileSystemView
2020-01-16 23:58:47 -08:00
Y Ethan Guo
b39458b008
[MINOR] Make constant fields final in HoodieTestDataGenerator ( #1234 )
2020-01-16 12:42:30 +08:00
Scheller
1daba24065
Add GlobalDeleteKeyGenerator
...
Adds new GlobalDeleteKeyGenerator for record_key deletes with global indices. Also refactors key generators into their own package.
2020-01-15 17:01:29 -08:00
Mehrotra
2bb0c21a3d
Fix conversion of Spark struct type to Avro schema
...
cr https://code.amazon.com/reviews/CR-17184364
2020-01-14 00:27:56 -08:00
lamber-ken
fd8f1c70c0
[MINOR] Reuse random object ( #1222 )
2020-01-13 18:26:04 -08:00
openopen2
a44c61b813
[HUDI-502] provide a custom time zone definition for TimestampBasedKeyGenerator ( #1188 )
2020-01-12 15:45:23 -08:00
harveyyue
971c7d41bd
[HUDI-322] DeltaSteamer should pick checkpoints off only deltacommits for MOR tables
2020-01-12 15:11:47 -08:00
lamber-ken
d9675c4ec0
[HUDI-522] Use the same version jcommander uniformly ( #1214 )
2020-01-12 10:48:52 -08:00
pratyakshsharma
3c90d252cc
[HUDI-114]: added option to overwrite payload implementation in hoodie.properties file
2020-01-09 22:34:40 -08:00
vinoth chandar
9706f659db
[HUDI-508] Standardizing on "Table" instead of "Dataset" across code ( #1197 )
...
- Docs were talking about storage types before, cWiki moved to "Table"
- Most of code already has HoodieTable, HoodieTableMetaClient - correct naming
- Replacing renaming use of dataset across code/comments
- Few usages in comments and use of Spark SQL DataSet remain unscathed
2020-01-07 12:52:32 -08:00
lamber-ken
75c3f630d4
[HUDI-405] Remove HIVE_ASSUME_DATE_PARTITION_OPT_KEY config from DataSource
2020-01-06 14:25:38 -08:00
Pratyaksh Sharma
8f935e779a
[HUDI-406]: added default partition path in TimestampBasedKeyGenerator
2020-01-06 09:38:06 -08:00
lamber-ken
28ccf8c521
[HUDI-484] Fix NPE when reading IncrementalPull.sqltemplate in HiveIncrementalPuller ( #1167 )
2020-01-04 23:53:47 -08:00
Sivabalan Narayanan
7031445eb3
[HUDI-377] Adding Delete() support to DeltaStreamer ( #1073 )
...
- Provides ability to perform hard deletes by writing delete marker records into the source data
- if the record contains a special field _hoodie_delete_marker set to true, deletes are performed
2020-01-04 11:07:31 -08:00
Pratyaksh Sharma
290278fc6c
[HUDI-118]: Options provided for passing properties to Cleaner, compactor and importer commands
2020-01-03 16:00:57 -08:00
lamber-ken
e1e5fe3324
[MINOR] Fix error usage of String.format ( #1169 )
2020-01-02 09:11:15 +08:00
Pratyaksh Sharma
dde21e7315
[HUDI-402]: code clean up in test cases
2019-12-31 11:10:49 -08:00
lamber-ken
ab6ae5cebb
[HUDI-482] Fix missing @Override annotation on methods ( #1156 )
...
* [HUDI-482] Fix missing @Override annotation on methods
2019-12-31 11:44:56 +08:00
yungthuis66
f20a130e3a
[MINOR] typo fix ( #1142 )
2019-12-26 09:03:43 -08:00
vinoth chandar
350b0ecb4d
[HUDI-311] : Support for AWS Database Migration Service in DeltaStreamer
...
- Add a transformer class, that adds `Op` fiels if not found in input frame
- Add a payload implementation, that issues deletes when Op=D
- Remove Parquet as a top level source type, consolidate with RowSource
- Made delta streamer work without a property file, simply using overridden cli options
- Unit tests for transformer/payload classes
2019-12-23 20:56:55 -08:00
lamber-ken
ba514cfea0
[MINOR] Remove redundant plus operator ( #1097 )
2019-12-12 05:42:05 +08:00
lamber-ken
d447e2d751
[checkstyle] Unify LOG form ( #1092 )
2019-12-10 19:23:38 +08:00
Wenning Ding
e555aa516d
[HUDI-353] Add hive style partitioning path
2019-12-09 12:29:53 -08:00
lamber-ken
2745b7552f
[HUDI-379] Refactor the codes based on new JavadocStyle code style rule ( #1079 )
2019-12-06 12:59:28 +08:00
lamber-ken
b3e0ebbc4a
[checkstyle] Add ConstantName java checkstyle rule ( #1066 )
...
* add SimplifyBooleanExpression java checkstyle rule
* collapse empty tags in scalastyle file
2019-12-04 18:59:15 +08:00
谢磊
f9139c0f61
[HUDI-366] Refactor some module codes based on new ImportOrder code style rule ( #1055 )
...
[HUDI-366] Refactor hudi-hadoop-mr / hudi-timeline-service / hudi-spark / hudi-integ-test / hudi- utilities based on new ImportOrder code style rule
2019-11-27 21:32:43 +08:00
谢磊
b77fad39b5
[HUDI-364] Refactor hudi-hive based on new ImportOrder code style rule ( #1048 )
...
[HUDI-364] Refactor hudi-hive based on new ImportOrder code style rule
2019-11-27 16:30:37 +08:00
bschell
60fed21dc7
[HUDI-327] Add null/empty checks to key generators ( #1040 )
...
* Adds null and empty checks to all key generators.
* Also improves error messaging for key generator issues.
2019-11-26 02:37:16 -08:00
Pratyaksh Sharma
2a4cfb47c7
[HUDI-340]: made max events to read from kafka source configurable ( #1039 )
2019-11-26 18:34:02 +08:00
谢磊
804e348d0e
[HUDI-346] Set allowMultipleEmptyLines to false for EmptyLineSeparator rule ( #1025 )
2019-11-19 18:44:42 +08:00
Pratyaksh Sharma
5f1309407a
[HUDI-253]: added validations for schema provider class ( #995 )
2019-11-11 06:03:44 -08:00
Gurudatt Kulkarni
71ac2c0d5e
[HUDI-324] TimestampKeyGenerator should support milliseconds ( #993 )
2019-11-05 04:22:14 -08:00
Raymond Xu
91740635b2
[HUDI-321] Support bulkinsert in HDFSParquetImporter ( #987 )
...
- Add bulk insert feature
- Fix some minor issues
2019-11-02 23:12:44 -07:00
Balaji Varadarajan
a6390aefc4
[HUDI-312] Make docker hdfs cluster ephemeral. This is needed to fix flakiness in integration tests. Also, Fix DeltaStreamer hanging issue due to uncaught exception
2019-11-01 11:49:59 -07:00
vinoth chandar
e4c91ed13f
[HUDI-290] Normalize test class name of all test classes ( #951 )
2019-10-22 20:19:11 -07:00
YanJia-Gary-Li
ed745dfdbf
[HUDI-40] Add parquet support for the Delta Streamer ( #949 )
2019-10-16 21:11:59 -07:00
Balaji Varadarajan
77f4e73615
[HUDI-121] Fix licensing issues found during RC voting by general incubator group
2019-10-16 02:09:02 -07:00
leesf
e10e06918e
[HUDI-292] Avoid consuming more entries from kafka than specified sourceLimit. ( #947 )
...
- Special handling when allocedEvents > numEvents
- Added unit tests
2019-10-11 05:28:45 -07:00
leesf
b19bed442d
[HUDI-296] Explore use of spotless to auto fix formatting errors ( #945 )
...
- Add spotless format fixing to project
- One time reformatting for conformity
- Build fails for formatting changes and mvn spotless:apply autofixes them
2019-10-10 05:19:40 -07:00
Balaji Varadarajan
9b66ea41fd
[HUDI-121] Remove leftover notice file and replace com.uber.hoodie with org.apache.hudi in log4j properties
2019-10-04 09:18:57 -07:00
leesf
3dedc7e5fd
[HUDI-265] Failed to delete tmp dirs created in unit tests ( #928 )
2019-10-03 09:48:13 -07:00
Balaji Varadarajan
6da2f9ac7c
[HUDI-287] Address comments during review of release candidate
...
1. Remove LICENSE and NOTICE files in hoodie child modules.
2. Remove developers and contributor section from pom
3. Also ensure any failures in validation script is reported appropriately
4. Make hoodie parent pom consistent with that of its parent apache-21 (https://github.com/apache/maven-apache-parent/blob/apache-21/pom.xml )
2019-10-03 09:00:07 -07:00
Balaji Varadarajan
6e8a28bcae
HUDI-121 : Address comments during RC2 voting
...
1. Remove dnl utils jar from git
2. Add LICENSE Headers in missing files
3. Fix NOTICE and LICENSE in all HUDI packages and in top-level
4. Fix License wording in certain HUDI source files
5. Include non java/scala code in RAT licensing check
6. Use whitelist to include dependencies as part of timeline-server bundling
2019-09-30 15:42:15 -07:00
Balaji Varadarajan
2ea8b0c3f1
[HUDI-279] Fix regression in Schema Evolution due to PR-755
2019-09-25 22:53:43 -07:00
Xing Pan
bf05f95413
[HUDI-269] Limit sync frequency ( #921 )
...
* [HUDI-269] Throttle DeltaStreamer sync runs
2019-09-24 05:30:35 -07:00
PanXing
635154c439
[MINOR] support reading cfg file in another s3 bucket ( #914 )
2019-09-22 06:47:23 -07:00