lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
vinothchandar	9ca6f91e97	Perform consistency checks during write finalize - Check to ensure written files are listable on storage - Docs reflected to capture how this helps with s3 storage - Unit tests added, corrections to existing tests - Fix DeltaStreamer to manage archived commits in a separate folder	2018-09-28 08:04:41 +05:30
Balaji Varadarajan	4c74dd4cad	Travis CI tests needs to be run in quieter mode (WARN log level) to avoid max log-size errors	2018-09-26 21:10:20 +05:30
Vinoth Chandar	bd5af89f12	[maven-release-plugin] rollback the release of hoodie-0.4.4	2018-09-13 15:01:53 +05:30
Vinoth Chandar	d1cc864a43	[maven-release-plugin] prepare for next development iteration	2018-09-12 23:59:47 +05:30
Vinoth Chandar	b748bc836d	[maven-release-plugin] prepare release hoodie-0.4.4	2018-09-12 23:59:34 +05:30
Vinoth Chandar	a5359662be	Moving depedencies off cdh to apache + Hive2 support - Tests redone in the process - Main changes are to RealtimeRecordReader and how it treats maps/arrays - Make hive sync work with Hive 1/2 and CDH environments - Fixes to make corner cases for Hive queries - Spark Hive integration - Working version across Apache and CDH versions - Known Issue - https://github.com/uber/hudi/issues/439	2018-09-11 11:03:30 +05:30
Vinoth Chandar	d58ddbd999	Reworking the deltastreamer tool - Standardize version of jackson - DFSPropertiesConfiguration replaces usage of commons PropertiesConfiguration - Remove dependency on ConstructorUtils - Throw error if ordering value is not present, during key generation - Switch to shade plugin for hoodie-utilities - Added support for consumption for Confluent avro kafka serdes - Support for Confluent schema registry - KafkaSource now deals with skews nicely, by doing round robin allocation of source limit across partitions - Added support for BULK_INSERT operations as well - Pass in the payload class config properly into HoodieWriteClient - Fix documentation based on new usage - Adding tests on deltastreamer, sources and all new util classes.	2018-09-08 10:24:32 +08:00
Balaji Varadarajan	e2dee68ccd	Simplify and fix CLI to schedule and run compactions	2018-09-07 05:28:13 +08:00
Nishith Agarwal	459e523d9e	1. Small file size handling for inserts into log files. In summary, the total size of the log file is compared with the parquet max file size and if there is scope to add inserts the add it.	2018-09-06 08:52:08 +08:00
Vinoth Chandar	89cd6b0726	[maven-release-plugin] prepare for next development iteration	2018-08-22 21:30:05 -07:00
Vinoth Chandar	8d305c5a86	[maven-release-plugin] prepare release hoodie-0.4.3	2018-08-22 21:29:53 -07:00
Balaji Varadarajan	594059a19c	Add CLI support inspect, schedule and run compaction	2018-08-07 08:19:50 -07:00
Vinoth Chandar	34827d50e1	[maven-release-plugin] prepare for next development iteration	2018-06-11 08:59:13 -07:00
Vinoth Chandar	43ef385730	[maven-release-plugin] prepare release hoodie-0.4.2	2018-06-11 08:59:02 -07:00
Balaji Varadarajan	788e4f2d2e	CodeStyle formatting to conform to basic Checkstyle rules. The code-style rules follow google style with some changes: 1. Increase line length from 100 to 120 2. Disable JavaDoc related checkstyles as this needs more manual work. Both source and test code are checked for code-style	2018-03-30 11:09:40 -07:00
Jian Xu	7f079632a6	Use hadoopConf in HoodieTableMetaClient and related tests	2018-03-12 11:47:55 -07:00
Vinoth Chandar	73534d467f	[maven-release-plugin] prepare for next development iteration	2018-03-07 21:04:10 -08:00
Vinoth Chandar	f2e5c6f9f8	[maven-release-plugin] prepare release hoodie-0.4.1	2018-03-07 21:04:00 -08:00
vinothchandar	21ce846f18	Remove stateful fs member from HoodieTestUtils & FSUtils	2018-01-17 23:34:21 -08:00
Vinoth Chandar	0cd186c899	Multi FS Support - Reviving PR 191, to make FileSystem creation off actual path - Streamline all filesystem access to HoodieTableMetaClient - Hadoop Conf from Spark Context serialized & passed to executor code too - Pick up env vars prefixed with HOODIE_ENV_ into Configuration object - Cleanup usage of FSUtils.getFS, piggybacking off HoodieTableMetaClient.getFS - Adding s3a to supported schemes & support escaping "." in env vars - Tests use HoodieTestUtils.getDefaultHadoopConf	2018-01-17 23:34:21 -08:00
Nishith Agarwal	44839b88c6	Removing compaction action type and associated compaction timeline operations, replace with commit action type	2018-01-09 09:56:15 -08:00
Vinoth Chandar	e45679f5e2	Reformatting code per Google Code Style all over	2017-11-12 23:19:02 -08:00
Nishith Agarwal	abe964bebd	Implementing custom payload/merge hooks abstractions for application specific merge logic	2017-11-07 18:55:55 -08:00
Vinoth Chandar	e1fe3ab937	[maven-release-plugin] prepare for next development iteration	2017-10-02 22:42:54 -07:00
Vinoth Chandar	50139fe904	[maven-release-plugin] prepare release hoodie-0.4.0	2017-10-02 22:42:32 -07:00
Vinoth Chandar	64e0573aca	Adding hoodie-spark to support Spark Datasource for Hoodie - Write with COW/MOR paths work fully - Read with RO view works on both storages* - Incremental view supported on COW - Refactored out HoodieReadClient methods, to just contain key based access - HoodieDataSourceHelpers class can be now used to construct inputs to datasource - Tests in hoodie-client using new helpers and mechanisms - Basic tests around save modes & insert/upserts (more to follow) - Bumped up scala to 2.11, since 2.10 is deprecated & complains with scalatest - Updated documentation to describe usage - New sample app written using the DataSource API	2017-10-02 20:44:53 -07:00
Nishith Agarwal	e2d13c6305	Fix build failing issues	2017-09-07 10:54:36 -07:00
Prasanna Rajaperumal	7d3963b4ab	Pushing master to 0.4.0 as we continue to make minor releases over 0.3.8 (MVP for MOR)	2017-06-30 11:41:23 -07:00
Nishith Agarwal	3eba812a1b	[maven-release-plugin] prepare for next development iteration	2017-06-30 11:17:07 -07:00
Nishith Agarwal	06d44daea3	[maven-release-plugin] prepare release hoodie-0.3.9	2017-06-30 11:16:58 -07:00
Jian Xu	29b906b763	Fix TimestampBasedKeyGenerator when DATE_STRING is used for partitionpath.field	2017-06-27 13:02:06 -07:00
Vinoth Chandar	754ab88a2d	Introduce ReadOptimizedView & RealtimeView out of TableFileSystemView - Usage now marks code as clearly using either RO or RT views, for future evolution - Tests on all of FileGroups and FileSlices	2017-06-22 17:16:13 -07:00
Vinoth Chandar	c00f1a9ed9	Refactoring HoodieTableFileSystemView using FileGroups/FileSlices - Merged all filter* and get* methods - new constructor takes filestatus[] - All existing tests pass - FileGroup is all files that belong to a fileID within a partition - FileSlice is a generation of data and log files, starting at a base commit	2017-06-22 17:16:13 -07:00
Prasanna Rajaperumal	0ed3fac5e3	[maven-release-plugin] prepare for next development iteration	2017-06-16 11:03:17 -07:00
Prasanna Rajaperumal	45732e440c	[maven-release-plugin] prepare release hoodie-0.3.8	2017-06-16 10:59:58 -07:00
Prasanna Rajaperumal	933cc8071f	[maven-release-plugin] prepare for next development iteration	2017-05-24 14:02:50 -07:00
Prasanna Rajaperumal	bebae06b5b	[maven-release-plugin] prepare release hoodie-0.3.7	2017-05-24 14:02:41 -07:00
Vinoth Chandar	da17c5c607	Introduce getCommitsAndCompactionsTimeline() explicitly & adjust usage across code base	2017-05-01 21:48:27 -07:00
Vinoth Chandar	bae0528013	Cleanup calls to HoodieTimeline.compareTimeStamps	2017-05-01 21:48:27 -07:00
Vinoth Chandar	7b1446548f	Initial impl of HoodieRealtimeInputFormat - Works end-end for flat schemas - Schema evolution & hardening remains - HoodieClientExample can now write mor tables as well	2017-05-01 21:48:27 -07:00
Prasanna Rajaperumal	c3258039f0	[maven-release-plugin] prepare for next development iteration	2017-04-27 11:00:56 -07:00
Prasanna Rajaperumal	de1bdad756	[maven-release-plugin] prepare release hoodie-0.3.6	2017-04-27 11:00:45 -07:00
Vinoth Chandar	dce35ff0d7	Adding a config to control whether date partitioning can be assumed - false by default - CAUTION: If you have an existing tables without partition metadata, you need to set this to "true"	2017-04-03 18:28:01 -07:00
Vinoth Chandar	f9fd16069d	FSUtils.getAllPartitionsPaths() works based on .hoodie_partition_metadata - clean/rollback/write paths covered by existing tests - Snapshot copier fixed to copy metadata file also, and test fixed - Existing tables need to be repaired by addition of metadata, before this can be rolled out	2017-04-03 18:28:01 -07:00
Prasanna Rajaperumal	57ab7a2405	[maven-release-plugin] prepare for next development iteration	2017-03-31 14:58:55 -07:00
Prasanna Rajaperumal	803c635098	[maven-release-plugin] prepare release hoodie-0.3.5	2017-03-31 14:58:46 -07:00
Prasanna Rajaperumal	f4bb44c1b1	Update snapshot version to 0.3.5-SNAPSHOT	2017-03-31 14:54:54 -07:00
ovj	21898907c1	tool for importing hive tables (in parquet format) into hoodie dataset (#89 ) * tool for importing hive tables (in parquet format) into hoodie dataset * review fixes * review fixes * review fixes	2017-03-21 14:42:13 -07:00
vinoth chandar	69d3950a32	Revamped Deltastreamer (#93 ) * Add analytics to site * Fix ugly favicon * New & Improved HoodieDeltaStreamer - Can incrementally consume from HDFS or Kafka, with exactly-once semantics! - Supports Json/Avro data, Source can also do custom things - Source is totally pluggable, via reflection - Key generation is pluggable, currently added SimpleKeyGenerator - Schema provider is pluggable, currently Filebased schemas - Configurable field to break ties during preCombine - Finally, can also plugin the HoodieRecordPayload, to get other merge types than overwriting - Handles efficient avro serialization in Spark Pending : - Rewriting of HiveIncrPullSource - Hive sync via hoodie-hive - Cleanup & tests * Minor fixes from master rebase * Implementation of HiveIncrPullSource - Copies commit by commit from source to target * Adding TimestampBasedKeyGenerator - Supports unix time & date strings	2017-03-13 12:41:29 -07:00
prazanna	eb46e7c72b	Implement Merge on Read Storage (#76 ) 1. Create HoodieTable abstraction for commits and fileSystemView 2. HoodieMergeOnReadTable created 3. View is now always obtained from the table and the correct view based on the table type is returned	2017-02-21 16:24:38 -08:00

1 2

74 Commits