lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Nishith Agarwal	6a3c94aaa3	suppressing logs (under 4MB) for jenkins	2017-08-15 16:30:51 -07:00
Vinoth Chandar	86209640f7	Adding range based pruning to bloom index - keys compared lexicographically using String::compareTo - Range metadata additionally written into parquet file footers - Trim fat & few optimizations to speed up indexing - Add param to control whether input shall be cached, to speed up lookup - Add param to turn on/off range pruning - Auto compute of parallelism now simply factors in amount of comparisons done - More accurate parallelism computation when range pruning is on - tests added & hardened, docs updated	2017-08-04 13:22:13 -07:00
Nishith Agarwal	0b26b60a5c	fix for cleaning log files(mor)	2017-08-02 11:54:42 -07:00
Nishith Agarwal	19c22b231e	1. Use HoodieLogFormat to archive commits and other actions 2. Introduced avro schema for commits and compactions and an avro wrapper schema	2017-07-26 14:27:44 -07:00
Nishith Agarwal	616c9a68c3	Enabled deletes in merge_on_read	2017-07-26 13:37:27 -07:00
Prasanna Rajaperumal	7d3963b4ab	Pushing master to 0.4.0 as we continue to make minor releases over 0.3.8 (MVP for MOR)	2017-06-30 11:41:23 -07:00
Nishith Agarwal	3eba812a1b	[maven-release-plugin] prepare for next development iteration	2017-06-30 11:17:07 -07:00
Nishith Agarwal	06d44daea3	[maven-release-plugin] prepare release hoodie-0.3.9	2017-06-30 11:16:58 -07:00
Nishith Agarwal	348250d960	Using FsUtils instead of Files API to extract file extension	2017-06-29 19:26:31 -07:00
Prasanna Rajaperumal	5cc071f74e	Savepoint should not create a hole in the commit timeline	2017-06-27 16:36:09 -07:00
Vinoth Chandar	754ab88a2d	Introduce ReadOptimizedView & RealtimeView out of TableFileSystemView - Usage now marks code as clearly using either RO or RT views, for future evolution - Tests on all of FileGroups and FileSlices	2017-06-22 17:16:13 -07:00
Vinoth Chandar	c00f1a9ed9	Refactoring HoodieTableFileSystemView using FileGroups/FileSlices - Merged all filter* and get* methods - new constructor takes filestatus[] - All existing tests pass - FileGroup is all files that belong to a fileID within a partition - FileSlice is a generation of data and log files, starting at a base commit	2017-06-22 17:16:13 -07:00
gekath	52c507f83e	Writes relative paths to .commit files Handle case where path is read in as null from commit file Merged with updated release	2017-06-16 12:51:19 -07:00
gekath	db7311f85e	Writes relative paths to .commit files instead of absolute paths Clean up code Removed commented out code Fixed merge conflict with master	2017-06-16 12:51:19 -07:00
Prasanna Rajaperumal	0ed3fac5e3	[maven-release-plugin] prepare for next development iteration	2017-06-16 11:03:17 -07:00
Prasanna Rajaperumal	45732e440c	[maven-release-plugin] prepare release hoodie-0.3.8	2017-06-16 10:59:58 -07:00
Prasanna Rajaperumal	4b26be9f61	Fixes to RealtimeInputFormat and RealtimeRecordReader and update documentation for HiveSyncTool	2017-06-15 18:21:07 -07:00
Kaushik Devarajaiah	521555c576	Parallelize file version deletes during clean and related tests	2017-06-15 18:20:42 -07:00
Prasanna Rajaperumal	db6150c5ef	Refactor hoodie-hive	2017-06-09 13:06:33 -07:00
Danny Chen	c192dd60b4	Change from deprecated closeQuietly to try with resources	2017-06-05 19:11:53 -07:00
Nishith Agarwal	ba050973e3	updated HoodieRealtimeRecordReader to use HoodieCompactedLogRecordScanner, added test for recordreader	2017-06-02 11:33:59 -07:00
Prasanna Rajaperumal	933cc8071f	[maven-release-plugin] prepare for next development iteration	2017-05-24 14:02:50 -07:00
Prasanna Rajaperumal	bebae06b5b	[maven-release-plugin] prepare release hoodie-0.3.7	2017-05-24 14:02:41 -07:00
Prasanna Rajaperumal	240c91241b	Implement HoodieLogFormat replacing Avro as the default log format	2017-05-23 08:35:11 -07:00
Vinoth Chandar	da17c5c607	Introduce getCommitsAndCompactionsTimeline() explicitly & adjust usage across code base	2017-05-01 21:48:27 -07:00
Vinoth Chandar	bae0528013	Cleanup calls to HoodieTimeline.compareTimeStamps	2017-05-01 21:48:27 -07:00
Vinoth Chandar	7b1446548f	Initial impl of HoodieRealtimeInputFormat - Works end-end for flat schemas - Schema evolution & hardening remains - HoodieClientExample can now write mor tables as well	2017-05-01 21:48:27 -07:00
Vinoth Chandar	9f526396a0	Add support for merge_on_read tables to HoodieClientExample	2017-05-01 21:48:27 -07:00
Prasanna Rajaperumal	c3258039f0	[maven-release-plugin] prepare for next development iteration	2017-04-27 11:00:56 -07:00
Prasanna Rajaperumal	de1bdad756	[maven-release-plugin] prepare release hoodie-0.3.6	2017-04-27 11:00:45 -07:00
Prasanna Rajaperumal	8974e11161	Make sure properties set in HoodieWriteConfig is propogated down to individual configs. Fix a race condition which lets InputFormat to think file size is 0 when it is actually not	2017-04-27 10:52:25 -07:00
Prasanna Rajaperumal	91b088f29f	Implement Compaction policy abstraction. Implement LogSizeBased Bounded IO Compaction as the default strategy	2017-04-20 16:59:06 -07:00
Vinoth Chandar	2b6322318c	CR feedback	2017-04-03 18:28:01 -07:00
Vinoth Chandar	dce35ff0d7	Adding a config to control whether date partitioning can be assumed - false by default - CAUTION: If you have an existing tables without partition metadata, you need to set this to "true"	2017-04-03 18:28:01 -07:00
Vinoth Chandar	f9fd16069d	FSUtils.getAllPartitionsPaths() works based on .hoodie_partition_metadata - clean/rollback/write paths covered by existing tests - Snapshot copier fixed to copy metadata file also, and test fixed - Existing tables need to be repaired by addition of metadata, before this can be rolled out	2017-04-03 18:28:01 -07:00
Vinoth Chandar	3129770fd0	Create .hoodie_partition_metadata in each partition, linking back to basepath - Concurreny handled via taskID, failure recovery handled via renames - Falls back to search 3 levels up - Cli tool has command to add this to existing tables	2017-04-03 18:28:01 -07:00
Prasanna Rajaperumal	1e802ad4f2	Move HoodieAvroReader to hoodie-common, it will be used for compaction and in the record reader	2017-04-03 13:58:35 -07:00
Prasanna Rajaperumal	aee136777b	Fixes needed to run merge-on-read testing on production scale data	2017-04-02 22:25:47 -07:00
Prasanna Rajaperumal	57ab7a2405	[maven-release-plugin] prepare for next development iteration	2017-03-31 14:58:55 -07:00
Prasanna Rajaperumal	803c635098	[maven-release-plugin] prepare release hoodie-0.3.5	2017-03-31 14:58:46 -07:00
Prasanna Rajaperumal	f4bb44c1b1	Update snapshot version to 0.3.5-SNAPSHOT	2017-03-31 14:54:54 -07:00
Prasanna Rajaperumal	77e54e78f8	Create the partition path if it does not exist when listing data files in a partition	2017-03-28 05:20:15 -07:00
Yash Sharma	bca7e7dae4	improve documentations	2017-03-28 05:08:54 -07:00
ovj	21898907c1	tool for importing hive tables (in parquet format) into hoodie dataset (#89 ) * tool for importing hive tables (in parquet format) into hoodie dataset * review fixes * review fixes * review fixes	2017-03-21 14:42:13 -07:00
prazanna	d835710c51	Metadata timeline marks an already complete instant as complete again (#98 )	2017-03-17 12:42:26 -07:00
Prasanna Rajaperumal	d83b671ada	Implement Savepoints and required metadata timeline - Part 2	2017-03-13 23:09:29 -07:00
prazanna	6f36e1eaaf	Implement Savepoints and required metadata timeline (#86 ) - Introduce avro to save clean metadata with details about the last commit that was retained - Save rollback metadata in the meta timeline - Create savepoint metadata and add API to createSavepoint, deleteSavepoint and rollbackToSavepoint - Savepointed commit should not be rolledback or cleaned or archived - introduce cli commands to show, create and rollback to savepoints - Write unit tests to test savepoints and rollbackToSavepoints	2017-03-13 15:12:03 -07:00
vinoth chandar	69d3950a32	Revamped Deltastreamer (#93 ) * Add analytics to site * Fix ugly favicon * New & Improved HoodieDeltaStreamer - Can incrementally consume from HDFS or Kafka, with exactly-once semantics! - Supports Json/Avro data, Source can also do custom things - Source is totally pluggable, via reflection - Key generation is pluggable, currently added SimpleKeyGenerator - Schema provider is pluggable, currently Filebased schemas - Configurable field to break ties during preCombine - Finally, can also plugin the HoodieRecordPayload, to get other merge types than overwriting - Handles efficient avro serialization in Spark Pending : - Rewriting of HiveIncrPullSource - Hive sync via hoodie-hive - Cleanup & tests * Minor fixes from master rebase * Implementation of HiveIncrPullSource - Copies commit by commit from source to target * Adding TimestampBasedKeyGenerator - Supports unix time & date strings	2017-03-13 12:41:29 -07:00
siddharthagunda	348a48aa80	Add delete support to Hoodie (#85 )	2017-03-04 01:33:49 -08:00
Prasanna Rajaperumal	fe5c5e8021	Test Failure in Travis-ci	2017-02-21 20:25:01 -08:00

1 2

87 Commits