Y Ethan Guo
b39458b008
[MINOR] Make constant fields final in HoodieTestDataGenerator ( #1234 )
2020-01-16 12:42:30 +08:00
Scheller
1daba24065
Add GlobalDeleteKeyGenerator
...
Adds new GlobalDeleteKeyGenerator for record_key deletes with global indices. Also refactors key generators into their own package.
2020-01-15 17:01:29 -08:00
Mehrotra
2bb0c21a3d
Fix conversion of Spark struct type to Avro schema
...
cr https://code.amazon.com/reviews/CR-17184364
2020-01-14 00:27:56 -08:00
lamber-ken
fd8f1c70c0
[MINOR] Reuse random object ( #1222 )
2020-01-13 18:26:04 -08:00
openopen2
a44c61b813
[HUDI-502] provide a custom time zone definition for TimestampBasedKeyGenerator ( #1188 )
2020-01-12 15:45:23 -08:00
pratyakshsharma
3c90d252cc
[HUDI-114]: added option to overwrite payload implementation in hoodie.properties file
2020-01-09 22:34:40 -08:00
vinoth chandar
9706f659db
[HUDI-508] Standardizing on "Table" instead of "Dataset" across code ( #1197 )
...
- Docs were talking about storage types before, cWiki moved to "Table"
- Most of code already has HoodieTable, HoodieTableMetaClient - correct naming
- Replacing renaming use of dataset across code/comments
- Few usages in comments and use of Spark SQL DataSet remain unscathed
2020-01-07 12:52:32 -08:00
lamber-ken
75c3f630d4
[HUDI-405] Remove HIVE_ASSUME_DATE_PARTITION_OPT_KEY config from DataSource
2020-01-06 14:25:38 -08:00
lamber-ken
28ccf8c521
[HUDI-484] Fix NPE when reading IncrementalPull.sqltemplate in HiveIncrementalPuller ( #1167 )
2020-01-04 23:53:47 -08:00
Sivabalan Narayanan
7031445eb3
[HUDI-377] Adding Delete() support to DeltaStreamer ( #1073 )
...
- Provides ability to perform hard deletes by writing delete marker records into the source data
- if the record contains a special field _hoodie_delete_marker set to true, deletes are performed
2020-01-04 11:07:31 -08:00
Pratyaksh Sharma
dde21e7315
[HUDI-402]: code clean up in test cases
2019-12-31 11:10:49 -08:00
vinoth chandar
350b0ecb4d
[HUDI-311] : Support for AWS Database Migration Service in DeltaStreamer
...
- Add a transformer class, that adds `Op` fiels if not found in input frame
- Add a payload implementation, that issues deletes when Op=D
- Remove Parquet as a top level source type, consolidate with RowSource
- Made delta streamer work without a property file, simply using overridden cli options
- Unit tests for transformer/payload classes
2019-12-23 20:56:55 -08:00
lamber-ken
d447e2d751
[checkstyle] Unify LOG form ( #1092 )
2019-12-10 19:23:38 +08:00
lamber-ken
2745b7552f
[HUDI-379] Refactor the codes based on new JavadocStyle code style rule ( #1079 )
2019-12-06 12:59:28 +08:00
lamber-ken
b3e0ebbc4a
[checkstyle] Add ConstantName java checkstyle rule ( #1066 )
...
* add SimplifyBooleanExpression java checkstyle rule
* collapse empty tags in scalastyle file
2019-12-04 18:59:15 +08:00
谢磊
f9139c0f61
[HUDI-366] Refactor some module codes based on new ImportOrder code style rule ( #1055 )
...
[HUDI-366] Refactor hudi-hadoop-mr / hudi-timeline-service / hudi-spark / hudi-integ-test / hudi- utilities based on new ImportOrder code style rule
2019-11-27 21:32:43 +08:00
Pratyaksh Sharma
2a4cfb47c7
[HUDI-340]: made max events to read from kafka source configurable ( #1039 )
2019-11-26 18:34:02 +08:00
谢磊
804e348d0e
[HUDI-346] Set allowMultipleEmptyLines to false for EmptyLineSeparator rule ( #1025 )
2019-11-19 18:44:42 +08:00
Pratyaksh Sharma
5f1309407a
[HUDI-253]: added validations for schema provider class ( #995 )
2019-11-11 06:03:44 -08:00
vinoth chandar
e4c91ed13f
[HUDI-290] Normalize test class name of all test classes ( #951 )
2019-10-22 20:19:11 -07:00
YanJia-Gary-Li
ed745dfdbf
[HUDI-40] Add parquet support for the Delta Streamer ( #949 )
2019-10-16 21:11:59 -07:00
Balaji Varadarajan
77f4e73615
[HUDI-121] Fix licensing issues found during RC voting by general incubator group
2019-10-16 02:09:02 -07:00
leesf
e10e06918e
[HUDI-292] Avoid consuming more entries from kafka than specified sourceLimit. ( #947 )
...
- Special handling when allocedEvents > numEvents
- Added unit tests
2019-10-11 05:28:45 -07:00
leesf
b19bed442d
[HUDI-296] Explore use of spotless to auto fix formatting errors ( #945 )
...
- Add spotless format fixing to project
- One time reformatting for conformity
- Build fails for formatting changes and mvn spotless:apply autofixes them
2019-10-10 05:19:40 -07:00
Balaji Varadarajan
9b66ea41fd
[HUDI-121] Remove leftover notice file and replace com.uber.hoodie with org.apache.hudi in log4j properties
2019-10-04 09:18:57 -07:00
leesf
3dedc7e5fd
[HUDI-265] Failed to delete tmp dirs created in unit tests ( #928 )
2019-10-03 09:48:13 -07:00
Balaji Varadarajan
58623631d4
[HUDI-249] Update Release-notes. Add sign-artifacts to POM and release related scripts. Add missing license headers
2019-09-13 08:41:29 -07:00
vinoth chandar
7a973a6944
[HUDI-159] Redesigning bundles for lighter-weight integrations
...
- Documented principles applied for redesign at packaging/README.md
- No longer depends on incl commons-codec, commons-io, commons-pool, commons-dbcp, commons-lang, commons-logging, avro-mapred
- Introduce new FileIOUtils & added checkstyle rule for illegal import of above
- Parquet, Avro dependencies moved to provided scope to enable being picked up from Hive/Spark/Presto instead
- Pickup jackson jars for Hive sync tool from HIVE_HOME & unbundling jackson everywhere
- Remove hive-jdbc standalone jar from being bundled in Spark/Hive/Utilities bundles
- 6.5x reduced number of classes across bundles
2019-09-11 11:08:27 -07:00
Balaji Varadarajan
376b59ae5f
[HUDI-227] : DeltaStreamer Improvements : Commit empty input batch with progressing checkpoints and allow users to override configs through properties. Original PR : PR-805 and PR-806 ( #863 )
2019-08-30 09:13:34 -07:00
Alexander Filipchik
e0ab89b3ac
[HUDI-223] Adding a way to infer target schema from the dataset after the transformation ( #854 )
...
- Adding a way to decouple target and source schema providers
- Adding flattening transformer
2019-08-28 04:48:38 -07:00
Balaji Varadarajan
a4f9d7575f
HUDI-123 Rename code packages/constants to org.apache.hudi ( #830 )
...
- Rename com.uber.hoodie to org.apache.hudi
- Flag to pass com.uber.hoodie Input formats for hoodie-sync
- Works with HUDI demo.
- Also tested for backwards compatibility with datasets built by com.uber.hoodie packages
- Migration guide : https://cwiki.apache.org/confluence/display/HUDI/Migration+Guide+From+com.uber.hoodie+to+org.apache.hudi
2019-08-11 17:48:17 -07:00