vinoth chandar
9706f659db
[HUDI-508] Standardizing on "Table" instead of "Dataset" across code ( #1197 )
...
- Docs were talking about storage types before, cWiki moved to "Table"
- Most of code already has HoodieTable, HoodieTableMetaClient - correct naming
- Replacing renaming use of dataset across code/comments
- Few usages in comments and use of Spark SQL DataSet remain unscathed
2020-01-07 12:52:32 -08:00
Abhishek Modi
b5df6723a2
[HUDI-464] Use Hive Exec Core for tests ( #1125 )
2020-01-06 16:32:55 -08:00
lamber-ken
75c3f630d4
[HUDI-405] Remove HIVE_ASSUME_DATE_PARTITION_OPT_KEY config from DataSource
2020-01-06 14:25:38 -08:00
Pratyaksh Sharma
8f935e779a
[HUDI-406]: added default partition path in TimestampBasedKeyGenerator
2020-01-06 09:38:06 -08:00
lamber-ken
28ccf8c521
[HUDI-484] Fix NPE when reading IncrementalPull.sqltemplate in HiveIncrementalPuller ( #1167 )
2020-01-04 23:53:47 -08:00
Sivabalan Narayanan
7031445eb3
[HUDI-377] Adding Delete() support to DeltaStreamer ( #1073 )
...
- Provides ability to perform hard deletes by writing delete marker records into the source data
- if the record contains a special field _hoodie_delete_marker set to true, deletes are performed
2020-01-04 11:07:31 -08:00
Pratyaksh Sharma
290278fc6c
[HUDI-118]: Options provided for passing properties to Cleaner, compactor and importer commands
2020-01-03 16:00:57 -08:00
lamber-ken
e1e5fe3324
[MINOR] Fix error usage of String.format ( #1169 )
2020-01-02 09:11:15 +08:00
Pratyaksh Sharma
dde21e7315
[HUDI-402]: code clean up in test cases
2019-12-31 11:10:49 -08:00
lamber-ken
ab6ae5cebb
[HUDI-482] Fix missing @Override annotation on methods ( #1156 )
...
* [HUDI-482] Fix missing @Override annotation on methods
2019-12-31 11:44:56 +08:00
yungthuis66
f20a130e3a
[MINOR] typo fix ( #1142 )
2019-12-26 09:03:43 -08:00
vinoth chandar
350b0ecb4d
[HUDI-311] : Support for AWS Database Migration Service in DeltaStreamer
...
- Add a transformer class, that adds `Op` fiels if not found in input frame
- Add a payload implementation, that issues deletes when Op=D
- Remove Parquet as a top level source type, consolidate with RowSource
- Made delta streamer work without a property file, simply using overridden cli options
- Unit tests for transformer/payload classes
2019-12-23 20:56:55 -08:00
lamber-ken
ba514cfea0
[MINOR] Remove redundant plus operator ( #1097 )
2019-12-12 05:42:05 +08:00
lamber-ken
d447e2d751
[checkstyle] Unify LOG form ( #1092 )
2019-12-10 19:23:38 +08:00
Wenning Ding
e555aa516d
[HUDI-353] Add hive style partitioning path
2019-12-09 12:29:53 -08:00
lamber-ken
2745b7552f
[HUDI-379] Refactor the codes based on new JavadocStyle code style rule ( #1079 )
2019-12-06 12:59:28 +08:00
lamber-ken
b3e0ebbc4a
[checkstyle] Add ConstantName java checkstyle rule ( #1066 )
...
* add SimplifyBooleanExpression java checkstyle rule
* collapse empty tags in scalastyle file
2019-12-04 18:59:15 +08:00
谢磊
f9139c0f61
[HUDI-366] Refactor some module codes based on new ImportOrder code style rule ( #1055 )
...
[HUDI-366] Refactor hudi-hadoop-mr / hudi-timeline-service / hudi-spark / hudi-integ-test / hudi- utilities based on new ImportOrder code style rule
2019-11-27 21:32:43 +08:00
谢磊
b77fad39b5
[HUDI-364] Refactor hudi-hive based on new ImportOrder code style rule ( #1048 )
...
[HUDI-364] Refactor hudi-hive based on new ImportOrder code style rule
2019-11-27 16:30:37 +08:00
bschell
60fed21dc7
[HUDI-327] Add null/empty checks to key generators ( #1040 )
...
* Adds null and empty checks to all key generators.
* Also improves error messaging for key generator issues.
2019-11-26 02:37:16 -08:00
Pratyaksh Sharma
2a4cfb47c7
[HUDI-340]: made max events to read from kafka source configurable ( #1039 )
2019-11-26 18:34:02 +08:00
谢磊
804e348d0e
[HUDI-346] Set allowMultipleEmptyLines to false for EmptyLineSeparator rule ( #1025 )
2019-11-19 18:44:42 +08:00
Pratyaksh Sharma
5f1309407a
[HUDI-253]: added validations for schema provider class ( #995 )
2019-11-11 06:03:44 -08:00
Gurudatt Kulkarni
71ac2c0d5e
[HUDI-324] TimestampKeyGenerator should support milliseconds ( #993 )
2019-11-05 04:22:14 -08:00
Raymond Xu
91740635b2
[HUDI-321] Support bulkinsert in HDFSParquetImporter ( #987 )
...
- Add bulk insert feature
- Fix some minor issues
2019-11-02 23:12:44 -07:00
Balaji Varadarajan
a6390aefc4
[HUDI-312] Make docker hdfs cluster ephemeral. This is needed to fix flakiness in integration tests. Also, Fix DeltaStreamer hanging issue due to uncaught exception
2019-11-01 11:49:59 -07:00
vinoth chandar
e4c91ed13f
[HUDI-290] Normalize test class name of all test classes ( #951 )
2019-10-22 20:19:11 -07:00
Balaji Varadarajan
14dd649d06
[MINOR] Remove release notes and move confluent repository to hoodie parent pom
2019-10-21 14:16:05 -07:00
YanJia-Gary-Li
ed745dfdbf
[HUDI-40] Add parquet support for the Delta Streamer ( #949 )
2019-10-16 21:11:59 -07:00
Balaji Varadarajan
77f4e73615
[HUDI-121] Fix licensing issues found during RC voting by general incubator group
2019-10-16 02:09:02 -07:00
leesf
e10e06918e
[HUDI-292] Avoid consuming more entries from kafka than specified sourceLimit. ( #947 )
...
- Special handling when allocedEvents > numEvents
- Added unit tests
2019-10-11 05:28:45 -07:00
leesf
b19bed442d
[HUDI-296] Explore use of spotless to auto fix formatting errors ( #945 )
...
- Add spotless format fixing to project
- One time reformatting for conformity
- Build fails for formatting changes and mvn spotless:apply autofixes them
2019-10-10 05:19:40 -07:00
Balaji Varadarajan
9b66ea41fd
[HUDI-121] Remove leftover notice file and replace com.uber.hoodie with org.apache.hudi in log4j properties
2019-10-04 09:18:57 -07:00
leesf
3dedc7e5fd
[HUDI-265] Failed to delete tmp dirs created in unit tests ( #928 )
2019-10-03 09:48:13 -07:00
Balaji Varadarajan
6da2f9ac7c
[HUDI-287] Address comments during review of release candidate
...
1. Remove LICENSE and NOTICE files in hoodie child modules.
2. Remove developers and contributor section from pom
3. Also ensure any failures in validation script is reported appropriately
4. Make hoodie parent pom consistent with that of its parent apache-21 (https://github.com/apache/maven-apache-parent/blob/apache-21/pom.xml )
2019-10-03 09:00:07 -07:00
Balaji Varadarajan
6e8a28bcae
HUDI-121 : Address comments during RC2 voting
...
1. Remove dnl utils jar from git
2. Add LICENSE Headers in missing files
3. Fix NOTICE and LICENSE in all HUDI packages and in top-level
4. Fix License wording in certain HUDI source files
5. Include non java/scala code in RAT licensing check
6. Use whitelist to include dependencies as part of timeline-server bundling
2019-09-30 15:42:15 -07:00
Balaji Varadarajan
2ea8b0c3f1
[HUDI-279] Fix regression in Schema Evolution due to PR-755
2019-09-25 22:53:43 -07:00
Xing Pan
bf05f95413
[HUDI-269] Limit sync frequency ( #921 )
...
* [HUDI-269] Throttle DeltaStreamer sync runs
2019-09-24 05:30:35 -07:00
PanXing
635154c439
[MINOR] support reading cfg file in another s3 bucket ( #914 )
2019-09-22 06:47:23 -07:00
vinoyang
f020d029c4
HUDI-267 Refactor bad method name HoodieTestUtils#initTableType and HoodieTableMetaClient#initializePathAsHoodieDataset ( #916 )
2019-09-21 09:05:02 -07:00
Balaji Varadarajan
c1e7d0e5a6
[HUDI-121] Update Release notes and fix master version
2019-09-17 09:50:30 -07:00
Balaji Varadarajan
7190c022bb
[HUDI-249] Updating Notice files
2019-09-13 13:50:58 -07:00
Balaji Varadarajan
d2525c31b7
Moving to 0.6.0-SNAPSHOT on master branch.
2019-09-13 09:58:29 -07:00
Balaji Varadarajan
58623631d4
[HUDI-249] Update Release-notes. Add sign-artifacts to POM and release related scripts. Add missing license headers
2019-09-13 08:41:29 -07:00
vinoth chandar
7a973a6944
[HUDI-159] Redesigning bundles for lighter-weight integrations
...
- Documented principles applied for redesign at packaging/README.md
- No longer depends on incl commons-codec, commons-io, commons-pool, commons-dbcp, commons-lang, commons-logging, avro-mapred
- Introduce new FileIOUtils & added checkstyle rule for illegal import of above
- Parquet, Avro dependencies moved to provided scope to enable being picked up from Hive/Spark/Presto instead
- Pickup jackson jars for Hive sync tool from HIVE_HOME & unbundling jackson everywhere
- Remove hive-jdbc standalone jar from being bundled in Spark/Hive/Utilities bundles
- 6.5x reduced number of classes across bundles
2019-09-11 11:08:27 -07:00
leesf
821e0dcffc
[HUDI-236] Failed to close stream
2019-09-03 19:24:11 -07:00
Alex Filipchik
555dd55c16
Support nested ordering fields
2019-08-30 13:41:16 -07:00
Balaji Varadarajan
376b59ae5f
[HUDI-227] : DeltaStreamer Improvements : Commit empty input batch with progressing checkpoints and allow users to override configs through properties. Original PR : PR-805 and PR-806 ( #863 )
2019-08-30 09:13:34 -07:00
Balaji Varadarajan
5f9fa82f47
HUDI-124 : Exclude jdk.tools from hadoop-common and update Notice files ( #858 )
2019-08-28 16:20:47 -07:00
Alexander Filipchik
e0ab89b3ac
[HUDI-223] Adding a way to infer target schema from the dataset after the transformation ( #854 )
...
- Adding a way to decouple target and source schema providers
- Adding flattening transformer
2019-08-28 04:48:38 -07:00