lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Gary Li	4f74a84607	[HUDI-69] Support Spark Datasource for MOR table - RDD approach (#1848 ) - This PR implements Spark Datasource for MOR table in the RDD approach. - Implemented SnapshotRelation - Implemented HudiMergeOnReadRDD - Implemented separate Iterator to handle merge and unmerge record reader. - Added TestMORDataSource to verify this feature. - Clean up test file name, add tests for mixed query type tests - We can now revert the change made in DefaultSource Co-authored-by: Vinoth Chandar <vchandar@confluent.io>	2020-08-07 00:28:14 -07:00
Balaji Varadarajan	9bcd3221fd	[HUDI-1144] Speedup spark read queries by caching metaclient in HoodieROPathFilter (#1919 )	2020-08-05 09:19:10 -07:00
vinoth chandar	539621bd33	[HUDI-242] Support for RFC-12/Bootstrapping of external datasets to hudi (#1876 ) - [HUDI-418] Bootstrap Index Implementation using HFile with unit-test - [HUDI-421] FileSystem View Changes to support Bootstrap with unit-tests - [HUDI-424] Implement Query Side Integration for querying tables containing bootstrap file slices - [HUDI-423] Implement upsert functionality for handling updates to these bootstrap file slices - [HUDI-421] Bootstrap Write Client with tests - [HUDI-425] Added HoodieDeltaStreamer support - [HUDI-899] Add a knob to change partition-path style while performing metadata bootstrap - [HUDI-900] Metadata Bootstrap Key Generator needs to handle complex keys correctly - [HUDI-424] Simplify Record reader implementation - [HUDI-423] Implement upsert functionality for handling updates to these bootstrap file slices - [HUDI-420] Hoodie Demo working with hive and sparkSQL. Also, Hoodie CLI working with bootstrap tables Co-authored-by: Mehrotra <uditme@amazon.com> Co-authored-by: Vinoth Chandar <vinoth@apache.org> Co-authored-by: Balaji Varadarajan <varadarb@uber.com>	2020-08-03 20:19:21 -07:00
Bhavani Sudha Saktheeswaran	d5b593b7d9	[MINOR] change log.info to log.debug (#1883 )	2020-07-28 09:49:03 -07:00
wenningd	bf1d36fa63	[HUDI-1087] Handle decimal type for realtime record reader with SparkSQL (#1831 ) Co-authored-by: Wenning Ding <wenningd@amazon.com>	2020-07-15 07:30:58 -07:00
Satish Kotha	086853c004	[HUDI-1080] Fix backward compatibility for com.uber inputformats	2020-07-08 15:30:07 -07:00
andreitaleanu	37ea79566d	[HUDI-539] Make HoodieROTablePathFilter implement Configurable (#1784 ) Co-authored-by: Andrei Taleanu <taleanu@adobe.com>	2020-07-03 13:39:53 -07:00
Prashant Wason	2603cfb33e	[HUDI-684] Introduced abstraction for writing and reading different types of base file formats. (#1687 ) Notable changes: 1. HoodieFileWriter and HoodieFileReader abstractions for writer/reader side of a base file format 2. HoodieDataBlock abstraction for creation specific data blocks for base file formats. (e.g. Parquet has HoodieAvroDataBlock) 3. All hardocded references to Parquet / Parquet based classes have been abstracted to call methods which accept a base file format 4. HiveSyncTool accepts the base file format as a CLI parameter 5. HoodieDeltaStreamer accepts the base file format as a CLI parameter 6. HoodieSparkSqlWriter accepts the base file format as a parameter	2020-06-25 23:46:55 -07:00
Shen Hong	89e37d5273	[HUDI-908] Add some data types to HoodieTestDataGenerator and fix some some bugs. (#1690 )	2020-06-22 08:13:28 -07:00
Gary Li	37838cea60	[HUDI-822] decouple Hudi related logics from HoodieInputFormat (#1592 ) - Refactoring business logic out of InputFormat into Utils helpers.	2020-06-09 06:10:16 -07:00
lw0090	9e07cebece	[HUDI-974] Fix fields out of order in MOR mode when using Hive (#1711 )	2020-06-09 09:22:06 +08:00
Wenning Ding	7d40f19f39	HUDI-515 Resolve API conflict for Hive 2 & Hive 3	2020-06-08 14:18:38 -07:00
Shen Hong	2901f5423a	[HUDI-1002] Ignore case when setting incremental mode in hive query (#1715 )	2020-06-08 19:38:32 +08:00
hj2016	e0a5e0d343	[HUDI-1000] Fix incremental query for COW non-partitioned table with no data (#1708 )	2020-06-08 15:34:42 +08:00
Yajun Luo	a9a97d6af4	[HUDI-934] Add processing logic for the decimal LogicalType (#1677 )	2020-06-02 19:50:55 +08:00
bschell	e21441ad83	Add changes for presto mor queries (#1578 ) Adds the neccessary changes to hudi for support of presto querying hudi merge-on-read table's realtime view. Co-authored-by: Brandon Scheller <bschelle@amazon.com>	2020-05-04 11:27:14 -07:00
n3nash	332072bc6d	[HUDI-371] Supporting hive combine input format for realtime tables (#1503 )	2020-04-20 20:40:06 -07:00
Pratyaksh Sharma	6d7ca2cf7e	[HUDI-727]: Copy default values of fields if not present when rewriting incoming record with new schema (#1427 )	2020-04-12 17:55:26 -07:00
satishkotha	c0f96e0726	[HUDI-687] Stop incremental reader on RO table when there is a pending compaction (#1396 )	2020-04-10 10:45:41 -07:00
Shaofeng Shi	78b3194e82	[HUDI-751] Fix some coding issues reported by FindBugs (#1470 )	2020-03-31 21:19:32 +08:00
Suneel Marthi	fa36082554	[HUDI-746] Reduce build warnings < 10 (#1465 )	2020-03-30 11:46:52 +08:00
vinoth chandar	e057c27603	[HUDI-744] Restructure hudi-common and clean up files under util packages (#1462 ) - Brings more order and cohesion to the classes in hudi-common - Utils classes related to a particular concept (avro, timeline,...) are placed near to the package - common.fs package now contains all the filesystem level classes including wrapper filesystem - bloom.filter package renamed to just bloom - config package contains classes that help store properties - common.fs.inline package contains all the inline filesystem classes/impl - common.table.timeline now consolidates all timeline related classes - common.table.view consolidates all the classes related to filesystem view metadata - common.table.timeline.versioning contains all classes related to versioning of timeline - Fix few unit tests as a result - Moved the test packages around to match the source file move - Rename AvroUtils to TimelineMetadataUtils & minor fixes/typos	2020-03-29 10:58:49 -07:00
Suneel Marthi	8c3001363d	HUDI-479: Eliminate or Minimize use of Guava if possible (#1159 )	2020-03-28 03:11:32 -04:00
vinoth chandar	e3019031d8	[HUDI-539] Make ROPathFilter conf member serializable (#1415 )	2020-03-17 12:52:48 -07:00
bschell	418f9bb2e9	Add constructor to HoodieROTablePathFilter (#1413 ) Allows HoodieROTablePathFilter to accept a configuration for initializing the filesystem. This fixes a bug with Presto's use of this pathfilter. Co-authored-by: Brandon Scheller <bschelle@amazon.com>	2020-03-16 15:19:16 -07:00
Suneel Marthi	99b7e9eb9e	[HUDI-629]: Replace Guava's Hashing with an equivalent in NumericUtils.java (#1350 ) * [HUDI-629]: Replace Guava's Hashing with an equivalent in NumericUtils.java	2020-03-13 20:28:05 -04:00
Suneel Marthi	24e73816b2	[MINOR] Code Cleanup, remove redundant code (#1337 )	2020-02-15 22:03:29 +08:00
Suneel Marthi	594da28fbf	[HUDI-595] code cleanup, refactoring code out of PR# 1159 (#1302 )	2020-02-04 21:52:03 +08:00
Suneel Marthi	5b7bb142dc	[HUDI-583] Code Cleanup, remove redundant code, and other changes (#1237 )	2020-02-02 18:03:44 +08:00
lamber-ken	c06ec8bfc7	[MINOR] Fix assigning to configuration more times (#1291 )	2020-01-29 17:18:35 -05:00
vinoth chandar	c2c0f6b13d	[HUDI-509] Renaming code in sync with cWiki restructuring (#1212 ) - Storage Type replaced with Table Type (remaining instances) - View types replaced with query types; - ReadOptimized view referred as Snapshot Query - TableFileSystemView sub interfaces renamed to BaseFileOnly and Slice Views - HoodieDataFile renamed to HoodieBaseFile - Hive Sync tool will register RO tables for MOR with a `_ro` suffix - Datasource/Deltastreamer options renamed accordingly - Support fallback to old config values as well, so migration is painless - Config for controlling _ro suffix addition - Renaming DataFile to BaseFile across DTOs, HoodieFileSlice and AbstractTableFileSystemView	2020-01-16 23:58:47 -08:00
Bhavani Sudha Saktheeswaran	d09eacdc13	[HUDI-25] Optimize HoodieInputformat.listStatus() for faster Hive incremental queries on Hoodie Summary: - InputPathHandler class classifies inputPaths into incremental, non incremental and non hoodie paths. - Incremental queries leverage HoodieCommitMetadata to get partitions that are affected and only lists those partitions as opposed to listing all partitions - listStatus() processes each category separately	2020-01-08 14:53:05 -08:00
vinoth chandar	9706f659db	[HUDI-508] Standardizing on "Table" instead of "Dataset" across code (#1197 ) - Docs were talking about storage types before, cWiki moved to "Table" - Most of code already has HoodieTable, HoodieTableMetaClient - correct naming - Replacing renaming use of dataset across code/comments - Few usages in comments and use of Spark SQL DataSet remain unscathed	2020-01-07 12:52:32 -08:00
lamber-ken	ab6ae5cebb	[HUDI-482] Fix missing @Override annotation on methods (#1156 ) * [HUDI-482] Fix missing @Override annotation on methods	2019-12-31 11:44:56 +08:00
lamber-ken	ba514cfea0	[MINOR] Remove redundant plus operator (#1097 )	2019-12-12 05:42:05 +08:00
lamber-ken	d447e2d751	[checkstyle] Unify LOG form (#1092 )	2019-12-10 19:23:38 +08:00
lamber-ken	2745b7552f	[HUDI-379] Refactor the codes based on new JavadocStyle code style rule (#1079 )	2019-12-06 12:59:28 +08:00
谢磊	f9139c0f61	[HUDI-366] Refactor some module codes based on new ImportOrder code style rule (#1055 ) [HUDI-366] Refactor hudi-hadoop-mr / hudi-timeline-service / hudi-spark / hudi-integ-test / hudi- utilities based on new ImportOrder code style rule	2019-11-27 21:32:43 +08:00
谢磊	804e348d0e	[HUDI-346] Set allowMultipleEmptyLines to false for EmptyLineSeparator rule (#1025 )	2019-11-19 18:44:42 +08:00
Nishith Agarwal	3a05edab01	- Fixing RT queries for HiveOnSpark that causes race conditions - Adding more comments to understand usage of reader/writer schema	2019-11-16 13:46:47 -08:00
Wenning Ding	b6057c5e0e	[HUDI-314] Fix multi partition keys error when querying a realtime table	2019-11-02 19:49:04 -07:00
Wenning Ding	ee0fd06de7	synchronized lock on conf object instead of class	2019-10-31 21:54:27 -07:00
Wenning Ding	3251d62bd3	[HUDI-313] Fix select count star error when querying a realtime table	2019-10-31 21:54:27 -07:00
Udit Mehrotra	12523c379f	[HUDI-298] Fix issue with incorrect column mapping casusing bad data, during on-the-fly merge of Real Time tables (#956 ) * Fix issue with incorrect column mapping casusing bad data, during on-the-fly merge of Real Time tables	2019-10-16 02:05:53 -07:00
leesf	b19bed442d	[HUDI-296] Explore use of spotless to auto fix formatting errors (#945 ) - Add spotless format fixing to project - One time reformatting for conformity - Build fails for formatting changes and mvn spotless:apply autofixes them	2019-10-10 05:19:40 -07:00
Balaji Varadarajan	6da2f9ac7c	[HUDI-287] Address comments during review of release candidate 1. Remove LICENSE and NOTICE files in hoodie child modules. 2. Remove developers and contributor section from pom 3. Also ensure any failures in validation script is reported appropriately 4. Make hoodie parent pom consistent with that of its parent apache-21 (https://github.com/apache/maven-apache-parent/blob/apache-21/pom.xml)	2019-10-03 09:00:07 -07:00
Balaji Varadarajan	6e8a28bcae	HUDI-121 : Address comments during RC2 voting 1. Remove dnl utils jar from git 2. Add LICENSE Headers in missing files 3. Fix NOTICE and LICENSE in all HUDI packages and in top-level 4. Fix License wording in certain HUDI source files 5. Include non java/scala code in RAT licensing check 6. Use whitelist to include dependencies as part of timeline-server bundling	2019-09-30 15:42:15 -07:00
Balaji Varadarajan	7190c022bb	[HUDI-249] Updating Notice files	2019-09-13 13:50:58 -07:00
Balaji Varadarajan	93bc5e2153	HUDI-243 Rename HoodieInputFormat and HoodieRealtimeInputFormat to HoodieParquetInputFormat and HoodieParquetRealtimeInputFormat	2019-09-11 14:03:01 -07:00
vinoth chandar	7a973a6944	[HUDI-159] Redesigning bundles for lighter-weight integrations - Documented principles applied for redesign at packaging/README.md - No longer depends on incl commons-codec, commons-io, commons-pool, commons-dbcp, commons-lang, commons-logging, avro-mapred - Introduce new FileIOUtils & added checkstyle rule for illegal import of above - Parquet, Avro dependencies moved to provided scope to enable being picked up from Hive/Spark/Presto instead - Pickup jackson jars for Hive sync tool from HIVE_HOME & unbundling jackson everywhere - Remove hive-jdbc standalone jar from being bundled in Spark/Hive/Utilities bundles - 6.5x reduced number of classes across bundles	2019-09-11 11:08:27 -07:00

1 2

53 Commits