lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
jshmchenxi	c3e9243ea1	[MINOR] Add maven profile to support skipping shade sources jars (#2358 ) Co-authored-by: Xi Chen <chenxi07@qiyi.com>	2021-01-03 23:19:48 -05:00
Danny Chan	76faf59652	[HUDI-1495] Upgrade Flink version to 1.12.0 (#2384 )	2020-12-29 10:15:43 +08:00
wenningd	fce1453fa6	[HUDI-1040] Make Hudi support Spark 3 (#2208 ) * Fix flaky MOR unit test * Update Spark APIs to make it be compatible with both spark2 & spark3 * Refactor bulk insert v2 part to make Hudi be able to compile with Spark3 * Add spark3 profile to handle fasterxml & spark version * Create hudi-spark-common module & refactor hudi-spark related modules Co-authored-by: Wenning Ding <wenningd@amazon.com>	2020-12-09 15:52:23 -08:00
wangxianghu	4d05680038	[HUDI-1327] Introduce base implemetation of hudi-flink-client (#2176 )	2020-11-18 17:57:11 +08:00
Bhavani Sudha Saktheeswaran	6490b029dd	[HUDI-1345] Remove Hbase and htrace relocation from utilities bundle (#2185 )	2020-10-19 16:11:08 -05:00
rmpifer	a44f66869f	[HUDI-1289] Remove relocation of pattern for hbase dependencies and add shading of guava in hadoop, spark, and presto bundles (#2147 ) - Update hudi-spark-bundle pom to not relocate hbase and htrace pattern - Remove codec relocation as this is not included in bundle which was causing error	2020-10-14 17:04:35 -07:00
Pratyaksh Sharma	080ba3ed54	[HUDI-1199] relocated jetty in hudi-utilities-bundle pom (#1990 ) * [HUDI-1199]: relocated jetty in hudi-utilities-bundle pom * [HUDI-1199]: re trigger travis build	2020-10-04 11:22:01 -07:00
Mathieu	1f7add9291	[HUDI-1089] Refactor hudi-client to support multi-engine (#1827 ) - This change breaks `hudi-client` into `hudi-client-common` and `hudi-spark-client` modules - Simple usages of Spark using jsc.parallelize() has been redone using EngineContext#map, EngineContext#flatMap etc - Code changes in the PR, break classes into `BaseXYZ` parent classes with no spark dependencies living in `hudi-client-common` - Classes on `hudi-spark-client` are named `SparkXYZ` extending the parent classes with all the Spark dependencies - To simplify/cleanup, HoodieIndex#fetchRecordLocation has been removed and its usages in tests replaced with alternatives Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2020-10-01 14:25:29 -07:00
Abhishek Modi	53d1e55110	Test Suite should work with Docker + Unit Tests	2020-09-08 22:41:14 -07:00
chuangehh	51b16bd36f	[MINOR] fix typo	2020-09-08 11:55:38 +08:00
Prashant Wason	6461927eac	[HUDI-960] Implementation of the HFile base and log file format. (#1804 ) * [HUDI-960] Implementation of the HFile base and log file format. 1. Includes HFileWriter and HFileReader 2. Includes HFileInputFormat for both snapshot and realtime input format for Hive 3. Unit test for new code 4. IT for using HFile format and querying using Hive (Presto and SparkSQL are not supported) Advantage: HFile file format saves data as binary key-value pairs. This implementation chooses the following values: 1. Key = Hoodie Record Key (as bytes) 2. Value = Avro encoded GenericRecord (as bytes) HFile allows efficient lookup of a record by key or range of keys. Hence, this base file format is well suited to applications like RFC-15, RFC-08 which will benefit from the ability to lookup records by key or search in a range of keys without having to read the entire data/log format. Limitations: HFile storage format has certain limitations when used as a general purpose data storage format. 1. Does not have a implemented reader for Presto and SparkSQL 2. Is not a columnar file format and hence may lead to lower compression levels and greater IO on query side due to lack of column pruning Other changes: - Remove databricks/avro from pom - Fix HoodieClientTestUtils from not using scala imports/reflection based conversion etc - Breaking up limitFileSize(), per parquet and hfile base files - Added three new configs for HoodieHFileConfig - prefetchBlocksOnOpen, cacheDataInL1, dropBehindCacheCompaction - Throw UnsupportedException in HFileReader.getRecordKeys() - Updated HoodieCopyOnWriteTable to create the correct merge handle (HoodieSortedMergeHandle for HFile and HoodieMergeHandle otherwise) * Fixing checkstyle Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2020-08-31 08:05:59 -07:00
Mathieu	b883b6d268	[HUDI-1122] Introduce a kafka implementation of hoodie write commit ca… (#1886 )	2020-08-20 23:00:59 +08:00
Bhavani Sudha Saktheeswaran	4226d75144	Moving to 0.6.1-SNAPSHOT on master branch.	2020-08-14 12:54:15 -07:00
Udit Mehrotra	8d04268264	[HUDI-1174] Changes for bootstrapped tables to work with presto (#1944 ) The purpose of this pull request is to implement changes required on Hudi side to get Bootstrapped tables integrated with Presto. The testing was done against presto 0.232 and following changes were identified to make it work: Annotation UseRecordReaderFromInputFormat is required on HoodieParquetInputFormat as well, because the reading for bootstrapped tables needs to happen through record reader to be able to perform the merge. On presto side, this annotation is already handled. We need to internally maintain VIRTUAL_COLUMN_NAMES because presto's internal hive version hive-apache-1.2.2 has VirutalColumn as a class, versus the one we depend on in hudi which is an enum. Dependency changes in hudi-presto-bundle to avoid runtime exceptions.	2020-08-12 17:51:31 -07:00
liujinhui	934f00b689	[HUDI-1173] fix hudi-prometheus pom dependency (#1942 )	2020-08-11 09:06:17 +08:00
lw0090	51ea27d665	[HUDI-875] Abstract hudi-sync-common, and support hudi-hive-sync, hudi-dla-sync (#1810 ) - Generalize the hive-sync module for syncing to multiple metastores - Added new options for datasource - Added new command line for delta streamer Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2020-08-05 21:34:55 -07:00
vinoth chandar	539621bd33	[HUDI-242] Support for RFC-12/Bootstrapping of external datasets to hudi (#1876 ) - [HUDI-418] Bootstrap Index Implementation using HFile with unit-test - [HUDI-421] FileSystem View Changes to support Bootstrap with unit-tests - [HUDI-424] Implement Query Side Integration for querying tables containing bootstrap file slices - [HUDI-423] Implement upsert functionality for handling updates to these bootstrap file slices - [HUDI-421] Bootstrap Write Client with tests - [HUDI-425] Added HoodieDeltaStreamer support - [HUDI-899] Add a knob to change partition-path style while performing metadata bootstrap - [HUDI-900] Metadata Bootstrap Key Generator needs to handle complex keys correctly - [HUDI-424] Simplify Record reader implementation - [HUDI-423] Implement upsert functionality for handling updates to these bootstrap file slices - [HUDI-420] Hoodie Demo working with hive and sparkSQL. Also, Hoodie CLI working with bootstrap tables Co-authored-by: Mehrotra <uditme@amazon.com> Co-authored-by: Vinoth Chandar <vinoth@apache.org> Co-authored-by: Balaji Varadarajan <varadarb@uber.com>	2020-08-03 20:19:21 -07:00
Nishith Agarwal	2fc2b01d86	[HUDI-394] Provide a basic implementation of test suite	2020-07-30 21:21:15 -07:00
Mathieu	da106803b6	[HUDI-1037] Introduce a write committed callback hook and given a default http callback implementation (#1842 )	2020-07-23 19:07:05 +08:00
Cory Locklear	574dcf920c	[MINOR] Relocate jetty during shading/packaging for Databricks runtime (#1781 )	2020-07-03 16:22:52 -07:00
Raymond Xu	31247e9b34	[HUDI-896] Report test coverage by modules & parallelize CI (#1753 ) - use codecov flags for each module to report coverage - parallelize CI jobs for shorter time - add a testcase for MetricsReporterFactory (to trigger codecov comment)	2020-06-27 23:16:12 -07:00
Raymond Xu	f34de3fb27	[HUDI-836] Implement datadog metrics reporter (#1572 ) - Adds support for emitting metrics to datadog - Tests, configs..	2020-05-22 09:14:21 -07:00
Udit Mehrotra	404c7e82d9	[HUDI-884] Shade avro and parquet-avro in hudi-hive-sync-bundle (#1618 ) Co-authored-by: Mehrotra <uditme@amazon.com>	2020-05-12 11:40:31 -07:00
bschell	e21441ad83	Add changes for presto mor queries (#1578 ) Adds the neccessary changes to hudi for support of presto querying hudi merge-on-read table's realtime view. Co-authored-by: Brandon Scheller <bschelle@amazon.com>	2020-05-04 11:27:14 -07:00
Trevor	2a611f4ad3	[HUDI-749] Fix hudi-timeline-server-bundle run_server.sh start error (#1477 )	2020-04-01 22:19:54 +08:00
lamber-ken	90227eeda7	[HUDI-673] Rename hudi-hive-bundle to hudi-hive-sync-bundle	2020-03-07 21:44:35 +08:00
lamber-ken	ccbf543607	[HUDI-654] Rename hudi-hive to hudi-hive-sync	2020-03-06 22:13:16 +08:00
Bhavani Sudha Saktheeswaran	5f85c26704	[HUDI-584] Relocate spark-avro dependency by maven-shade-plugin (#1290 )	2020-03-04 11:01:49 -08:00
yanghua	0dc8e493aa	Moving to 0.6.0-SNAPSHOT on master branch.	2020-03-01 15:08:30 +08:00
Ramachandran M S	acf359c834	[HUDI-627] Aggregate code coverage and publish to codecov.io during CI (#1347 )	2020-02-27 13:54:20 -08:00
lamber-ken	11fb2c2614	[HUDI-580] Fix incorrect license header in files	2020-02-25 08:54:26 -08:00
lamber-ken	425e3e6c78	[HUDI-585] Optimize the steps of building with scala-2.12 (#1293 )	2020-02-05 23:13:10 +08:00
leesf	6e59c1c777	Moving to 0.5.2-SNAPSHOT on master branch.	2020-01-20 10:51:33 -08:00
wenningd	292c1e2ff4	[HUDI-238] Make Hudi support Scala 2.12 (#1226 ) * [HUDI-238] Rename scala related artifactId & add maven profile to support Scala 2.12	2020-01-17 14:02:21 -08:00
Udit Mehrotra	ad50008a59	[HUDI-91][HUDI-12]Migrate to spark 2.4.4, migrate to spark-avro library instead of databricks-avro, add support for Decimal/Date types - Upgrade Spark to 2.4.4, Parquet to 1.10.1, Avro to 1.8.2 - Remove spark-avro from hudi-spark-bundle. Users need to provide --packages org.apache.spark:spark-avro:2.4.4 when running spark-shell or spark-submit - Replace com.databricks:spark-avro with org.apache.spark:spark-avro - Shade avro in hudi-hadoop-mr-bundle to make sure it does not conflict with hive's avro version.	2020-01-12 15:03:11 -08:00
lamber-ken	d9675c4ec0	[HUDI-522] Use the same version jcommander uniformly (#1214 )	2020-01-12 10:48:52 -08:00
Udit Mehrotra	0bb5999f79	[HUDI-306] Support Glue catalog and other hive metastore implementations (#961 ) - Support Glue catalog and other metastore implementations - Remove shading from hudi utilities bundle - Add maven profile to optionally shade hive in utilities bundle	2019-11-11 17:27:31 -08:00
Gurudatt Kulkarni	031b067a3a	[MINOR] Move all repository declarations to parent pom (#966 )	2019-10-22 20:17:13 -07:00
Mehrotra	8c13340062	Shade and relocate Avro dependency in hadoop-mr-bundle	2019-10-16 02:08:12 -07:00
leesf	b19bed442d	[HUDI-296] Explore use of spotless to auto fix formatting errors (#945 ) - Add spotless format fixing to project - One time reformatting for conformity - Build fails for formatting changes and mvn spotless:apply autofixes them	2019-10-10 05:19:40 -07:00
Balaji Varadarajan	9b66ea41fd	[HUDI-121] Remove leftover notice file and replace com.uber.hoodie with org.apache.hudi in log4j properties	2019-10-04 09:18:57 -07:00
Balaji Varadarajan	6da2f9ac7c	[HUDI-287] Address comments during review of release candidate 1. Remove LICENSE and NOTICE files in hoodie child modules. 2. Remove developers and contributor section from pom 3. Also ensure any failures in validation script is reported appropriately 4. Make hoodie parent pom consistent with that of its parent apache-21 (https://github.com/apache/maven-apache-parent/blob/apache-21/pom.xml)	2019-10-03 09:00:07 -07:00
Balaji Varadarajan	6e8a28bcae	HUDI-121 : Address comments during RC2 voting 1. Remove dnl utils jar from git 2. Add LICENSE Headers in missing files 3. Fix NOTICE and LICENSE in all HUDI packages and in top-level 4. Fix License wording in certain HUDI source files 5. Include non java/scala code in RAT licensing check 6. Use whitelist to include dependencies as part of timeline-server bundling	2019-09-30 15:42:15 -07:00
Vinoth Chandar	e217db56ab	[HUDI-254]: Bundle and shade databricks/avro with spark bundle - spark 2.4 onwards, spark has built in support. shading to avoid conflicts - spark 2.3 still needs this bundled, so that dropping bundle into jars folder would work	2019-09-17 12:38:51 -07:00
Balaji Varadarajan	c1e7d0e5a6	[HUDI-121] Update Release notes and fix master version	2019-09-17 09:50:30 -07:00
Balaji Varadarajan	7190c022bb	[HUDI-249] Updating Notice files	2019-09-13 13:50:58 -07:00
Balaji Varadarajan	d2525c31b7	Moving to 0.6.0-SNAPSHOT on master branch.	2019-09-13 09:58:29 -07:00
Vinoth Chandar	d0b9b56b7d	[HUDI-143] Excluding javax.* from utilities and spark bundles - Plus minor code review comments	2019-09-11 11:08:27 -07:00
vinoth chandar	7a973a6944	[HUDI-159] Redesigning bundles for lighter-weight integrations - Documented principles applied for redesign at packaging/README.md - No longer depends on incl commons-codec, commons-io, commons-pool, commons-dbcp, commons-lang, commons-logging, avro-mapred - Introduce new FileIOUtils & added checkstyle rule for illegal import of above - Parquet, Avro dependencies moved to provided scope to enable being picked up from Hive/Spark/Presto instead - Pickup jackson jars for Hive sync tool from HIVE_HOME & unbundling jackson everywhere - Remove hive-jdbc standalone jar from being bundled in Spark/Hive/Utilities bundles - 6.5x reduced number of classes across bundles	2019-09-11 11:08:27 -07:00
leesf	5c2da6051e	[HUDI-225] Create Hudi Timeline Server Fat Jar	2019-08-29 20:03:06 -07:00

1 2

91 Commits