lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
yuzhao.cyz	a1d0ff4209	Moving to 0.11.0-SNAPSHOT on master branch.	2021-11-27 17:22:10 +08:00
Udit Mehrotra	3e301196bf	Moving to 0.10.0-SNAPSHOT on master branch.	2021-08-14 18:51:09 -07:00
garyli1019	6e803e08b1	Moving to 0.9.0-SNAPSHOT on master branch.	2021-03-24 21:37:14 +08:00
Vinoth Chandar	3719e7b388	Moving to 0.8.0-SNAPSHOT on master branch.	2021-01-20 11:31:22 -08:00
Sivabalan Narayanan	a43e191d6c	[MINOR] Bumping snapshot version to 0.7.0 (#2435 )	2021-01-16 09:56:28 -05:00
Shen Hong	d9411c38db	[HUDI-1364] Add HoodieJavaEngineContext to hudi-java-client (#2222 )	2020-11-23 10:06:28 -08:00
wangxianghu	c7d962efff	[HUDI-1328] Introduce HoodieFlinkEngineContext to hudi-flink-client (#2161 )	2020-10-14 09:30:49 +08:00
Mathieu	1f7add9291	[HUDI-1089] Refactor hudi-client to support multi-engine (#1827 ) - This change breaks `hudi-client` into `hudi-client-common` and `hudi-spark-client` modules - Simple usages of Spark using jsc.parallelize() has been redone using EngineContext#map, EngineContext#flatMap etc - Code changes in the PR, break classes into `BaseXYZ` parent classes with no spark dependencies living in `hudi-client-common` - Classes on `hudi-spark-client` are named `SparkXYZ` extending the parent classes with all the Spark dependencies - To simplify/cleanup, HoodieIndex#fetchRecordLocation has been removed and its usages in tests replaced with alternatives Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2020-10-01 14:25:29 -07:00
Prashant Wason	6461927eac	[HUDI-960] Implementation of the HFile base and log file format. (#1804 ) * [HUDI-960] Implementation of the HFile base and log file format. 1. Includes HFileWriter and HFileReader 2. Includes HFileInputFormat for both snapshot and realtime input format for Hive 3. Unit test for new code 4. IT for using HFile format and querying using Hive (Presto and SparkSQL are not supported) Advantage: HFile file format saves data as binary key-value pairs. This implementation chooses the following values: 1. Key = Hoodie Record Key (as bytes) 2. Value = Avro encoded GenericRecord (as bytes) HFile allows efficient lookup of a record by key or range of keys. Hence, this base file format is well suited to applications like RFC-15, RFC-08 which will benefit from the ability to lookup records by key or search in a range of keys without having to read the entire data/log format. Limitations: HFile storage format has certain limitations when used as a general purpose data storage format. 1. Does not have a implemented reader for Presto and SparkSQL 2. Is not a columnar file format and hence may lead to lower compression levels and greater IO on query side due to lack of column pruning Other changes: - Remove databricks/avro from pom - Fix HoodieClientTestUtils from not using scala imports/reflection based conversion etc - Breaking up limitFileSize(), per parquet and hfile base files - Added three new configs for HoodieHFileConfig - prefetchBlocksOnOpen, cacheDataInL1, dropBehindCacheCompaction - Throw UnsupportedException in HFileReader.getRecordKeys() - Updated HoodieCopyOnWriteTable to create the correct merge handle (HoodieSortedMergeHandle for HFile and HoodieMergeHandle otherwise) * Fixing checkstyle Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2020-08-31 08:05:59 -07:00
Bhavani Sudha Saktheeswaran	4226d75144	Moving to 0.6.1-SNAPSHOT on master branch.	2020-08-14 12:54:15 -07:00
Udit Mehrotra	e4a2d98f79	[HUDI-426] Bootstrap datasource integration (#1702 )	2020-08-09 14:06:13 -07:00
liujinhui	6b349b7711	[HUDI-210] Hudi Supports Prometheus Pushgateway (#1931 ) Co-authored-by: leesf <leesf@apache.org>	2020-08-09 15:29:54 +08:00
Raymond Xu	b399b4ad43	[HUDI-996] Add functional test in hudi-client (#1824 ) - Add functional test suite in hudi-client - Tag TestHBaseIndex as functional	2020-07-15 08:28:50 +08:00
Raymond Xu	acdc4a8d00	[HUDI-798] Migrate to Mockito Jupiter for JUnit 5 (#1521 )	2020-04-16 16:07:32 +08:00
Raymond Xu	d65efe659d	[HUDI-780] Migrate test cases to Junit 5 (#1504 )	2020-04-15 12:35:01 -07:00
Sivabalan Narayanan	ac73bdcdc3	[HUDI-430] Adding InlineFileSystem to support embedding any file format as an InlineFile (#1176 ) * Adding InlineFileSystem to support embedding any file format (parquet, hfile, etc). Supports reading the embedded file using respective readers.	2020-03-28 12:13:35 -04:00
ForwardXu	1e321c2fc0	[HUDI-209] Implement JMX metrics reporter (#1106 )	2020-03-19 20:10:35 +08:00
Suneel Marthi	99b7e9eb9e	[HUDI-629]: Replace Guava's Hashing with an equivalent in NumericUtils.java (#1350 ) * [HUDI-629]: Replace Guava's Hashing with an equivalent in NumericUtils.java	2020-03-13 20:28:05 -04:00
yanghua	0dc8e493aa	Moving to 0.6.0-SNAPSHOT on master branch.	2020-03-01 15:08:30 +08:00
leesf	6e59c1c777	Moving to 0.5.2-SNAPSHOT on master branch.	2020-01-20 10:51:33 -08:00
wenningd	292c1e2ff4	[HUDI-238] Make Hudi support Scala 2.12 (#1226 ) * [HUDI-238] Rename scala related artifactId & add maven profile to support Scala 2.12	2020-01-17 14:02:21 -08:00
lamber-ken	d9675c4ec0	[HUDI-522] Use the same version jcommander uniformly (#1214 )	2020-01-12 10:48:52 -08:00
Abhishek Modi	b5df6723a2	[HUDI-464] Use Hive Exec Core for tests (#1125 )	2020-01-06 16:32:55 -08:00
hejinbiao123	b9fab0b933	Revert "[HUDI-455] Redo hudi-client log statements using SLF4J (#1145 )" (#1181 ) This reverts commit `e637d9ed26`.	2020-01-06 21:13:29 +08:00
hejinbiao123	e637d9ed26	[HUDI-455] Redo hudi-client log statements using SLF4J (#1145 ) * [HUDI-455] Redo hudi-client log statements using SLF4J	2019-12-31 13:49:34 +08:00
leesf	b19bed442d	[HUDI-296] Explore use of spotless to auto fix formatting errors (#945 ) - Add spotless format fixing to project - One time reformatting for conformity - Build fails for formatting changes and mvn spotless:apply autofixes them	2019-10-10 05:19:40 -07:00
Balaji Varadarajan	9b66ea41fd	[HUDI-121] Remove leftover notice file and replace com.uber.hoodie with org.apache.hudi in log4j properties	2019-10-04 09:18:57 -07:00
Balaji Varadarajan	c1e7d0e5a6	[HUDI-121] Update Release notes and fix master version	2019-09-17 09:50:30 -07:00
Balaji Varadarajan	d2525c31b7	Moving to 0.6.0-SNAPSHOT on master branch.	2019-09-13 09:58:29 -07:00
vinoth chandar	cd090871a1	[HUDI-159]: Pom cleanup and removal of com.twitter.parquet - Redo all classes based on org.parquet only - remove unuused dependencies like parquet-hadoop, common-configuration2 - timeline-service does not build a fat jar anymore - Fix utilities and hadoop-mr bundles based on above	2019-08-25 16:01:14 -07:00
vinoth chandar	6edf0b9def	[HUDI-68] Pom cleanup & demo automation (#846 ) - [HUDI-172] Cleanup Maven POM/Classpath - Fix ordering of dependencies in poms, to enable better resolution - Idea is to place more specific ones at the top - And place dependencies which use them below them - [HUDI-68] : Automate demo steps on docker setup - Move hive queries from hive cli to beeline - Standardize on taking query input from text command files - Deltastreamer ingest, also does hive sync in a single step - Spark Incremental Query materialized as a derived Hive table using datasource - Fix flakiness in HDFS spin up and output comparison - Code cleanup around streamlining and loc reduction - Also fixed pom to not shade some hive classs in spark, to enable hive sync	2019-08-22 20:18:50 -07:00
Balaji Varadarajan	a4f9d7575f	HUDI-123 Rename code packages/constants to org.apache.hudi (#830 ) - Rename com.uber.hoodie to org.apache.hudi - Flag to pass com.uber.hoodie Input formats for hoodie-sync - Works with HUDI demo. - Also tested for backwards compatibility with datasets built by com.uber.hoodie packages - Migration guide : https://cwiki.apache.org/confluence/display/HUDI/Migration+Guide+From+com.uber.hoodie+to+org.apache.hudi	2019-08-11 17:48:17 -07:00

32 Commits