lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Dongwook	8d19ebfd0f	[HUDI-993] Let delete API use "hoodie.delete.shuffle.parallelism" (#1703 ) For Delete API, "hoodie.delete.shuffle.parallelism" isn't used as opposed to "hoodie.upsert.shuffle.parallelism" is used for upsert, this creates the performance difference between delete by upsert API with "EmptyHoodieRecordPayload" and delete API for certain cases. This patch makes the following fixes in this regard. - Let deduplicateKeys method use "hoodie.delete.shuffle.parallelism" - Repartition inputRDD as "hoodie.delete.shuffle.parallelism" in case "hoodie.combine.before.delete=false"	2020-09-01 12:55:31 -04:00
Balaji Varadarajan	b8f4a30efd	Fix Integration test flakiness in HoodieJavaStreamingApp (#1967 )	2020-08-14 01:42:15 -07:00
Sivabalan Narayanan	379cf0786f	[HUDI-1013] Adding Bulk Insert V2 implementation (#1834 ) - Adding ability to use native spark row writing for bulk_insert - Controlled by `ENABLE_ROW_WRITER_OPT_KEY` datasource write option - Introduced KeyGeneratorInterface in hudi-client, moved KeyGenerator back to hudi-spark - Simplified the new API additions to just two new methods : getRecordKey(row), getPartitionPath(row) - Fixed all built-in key generators with new APIs - Made the field position map lazily created upon the first call to row based apis - Implemented native row based key generators for CustomKeyGenerator - Fixed all the tests, with these new APIs Co-authored-by: Balaji Varadarajan <varadarb@uber.com> Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2020-08-13 00:33:39 -07:00
Balaji Varadarajan	7a2429f5ba	[HUDI-575] Spark Streaming with async compaction support (#1752 )	2020-08-05 07:50:15 -07:00
Raymond Xu	10e4268792	[HUDI-995] Use Transformations, Assertions and SchemaTestUtil (#1884 ) - Consolidate transform functions for tests in Transformations.java - Consolidate assertion functions for tests in Assertions.java - Make use of SchemaTestUtil for loading schema from resource	2020-08-01 20:57:18 +08:00
Raymond Xu	ca36c44cb3	[HUDI-995] Move TestRawTripPayload and HoodieTestDataGenerator to hudi-common (#1873 )	2020-07-27 19:21:45 +08:00
Raymond Xu	742c204099	[HUDI-811] Restructure test packages in hudi-client/cli (#1689 )	2020-06-02 10:25:42 +08:00
Raymond Xu	0d4848b68b	[HUDI-811] Restructure test packages (#1607 ) * restructure hudi-spark tests * restructure hudi-timeline-service tests * restructure hudi-hadoop-mr hudi-utilities tests * restructure hudi-hive-sync tests	2020-05-13 15:37:03 -07:00
vinoth chandar	c2c0f6b13d	[HUDI-509] Renaming code in sync with cWiki restructuring (#1212 ) - Storage Type replaced with Table Type (remaining instances) - View types replaced with query types; - ReadOptimized view referred as Snapshot Query - TableFileSystemView sub interfaces renamed to BaseFileOnly and Slice Views - HoodieDataFile renamed to HoodieBaseFile - Hive Sync tool will register RO tables for MOR with a `_ro` suffix - Datasource/Deltastreamer options renamed accordingly - Support fallback to old config values as well, so migration is painless - Config for controlling _ro suffix addition - Renaming DataFile to BaseFile across DTOs, HoodieFileSlice and AbstractTableFileSystemView	2020-01-16 23:58:47 -08:00
Mehrotra	2bb0c21a3d	Fix conversion of Spark struct type to Avro schema cr https://code.amazon.com/reviews/CR-17184364	2020-01-14 00:27:56 -08:00
lamber-ken	d9675c4ec0	[HUDI-522] Use the same version jcommander uniformly (#1214 )	2020-01-12 10:48:52 -08:00
vinoth chandar	9706f659db	[HUDI-508] Standardizing on "Table" instead of "Dataset" across code (#1197 ) - Docs were talking about storage types before, cWiki moved to "Table" - Most of code already has HoodieTable, HoodieTableMetaClient - correct naming - Replacing renaming use of dataset across code/comments - Few usages in comments and use of Spark SQL DataSet remain unscathed	2020-01-07 12:52:32 -08:00
lamber-ken	d447e2d751	[checkstyle] Unify LOG form (#1092 )	2019-12-10 19:23:38 +08:00
lamber-ken	2745b7552f	[HUDI-379] Refactor the codes based on new JavadocStyle code style rule (#1079 )	2019-12-06 12:59:28 +08:00
谢磊	f9139c0f61	[HUDI-366] Refactor some module codes based on new ImportOrder code style rule (#1055 ) [HUDI-366] Refactor hudi-hadoop-mr / hudi-timeline-service / hudi-spark / hudi-integ-test / hudi- utilities based on new ImportOrder code style rule	2019-11-27 21:32:43 +08:00
filippo balicchia	845a0509b3	[MINOR] Some minor optimizations in HoodieJavaStreamingApp (#1046 )	2019-11-25 18:49:13 +08:00
leesf	b19bed442d	[HUDI-296] Explore use of spotless to auto fix formatting errors (#945 ) - Add spotless format fixing to project - One time reformatting for conformity - Build fails for formatting changes and mvn spotless:apply autofixes them	2019-10-10 05:19:40 -07:00
Balaji Varadarajan	a4f9d7575f	HUDI-123 Rename code packages/constants to org.apache.hudi (#830 ) - Rename com.uber.hoodie to org.apache.hudi - Flag to pass com.uber.hoodie Input formats for hoodie-sync - Works with HUDI demo. - Also tested for backwards compatibility with datasets built by com.uber.hoodie packages - Migration guide : https://cwiki.apache.org/confluence/display/HUDI/Migration+Guide+From+com.uber.hoodie+to+org.apache.hudi	2019-08-11 17:48:17 -07:00

18 Commits