lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Vinoth Chandar	f2980052cd	Revert effects of PR #259	2017-09-28 10:29:58 -07:00
Vinoth Chandar	9f98ae643b	Adding canIndexLogFiles(), isImplicitWithStorage(), isGlobal() to HoodieIndex	2017-09-28 10:19:29 -07:00
Eric Sayle	6230e15191	Update deprecated hash function Guava deprecated hashString(String) in v15, and removed it in v16. Replace call with hashUnencodedString(String), which replace it, to be compatible with newer versions of Guava.	2017-09-18 17:39:19 -07:00
Omkar Joshi	5c639c0b05	Adding support for UserDefinedBulkInsertPartitioner	2017-09-08 20:55:13 -07:00
Omkar Joshi	ec40d04d51	Fixing UpsertPartitioner to ensure that input records are deterministically assigned to output partitions	2017-09-07 17:03:56 -07:00
Nishith Agarwal	e484e91807	adding new config to separate shuffle and write parallelism	2017-08-18 16:05:25 -07:00
Nishith Agarwal	5ee4ac40ae	Use CompletedFileSystemView instead of CompactedView considering deltacommits	2017-08-07 12:26:42 -07:00
Vinoth Chandar	86209640f7	Adding range based pruning to bloom index - keys compared lexicographically using String::compareTo - Range metadata additionally written into parquet file footers - Trim fat & few optimizations to speed up indexing - Add param to control whether input shall be cached, to speed up lookup - Add param to turn on/off range pruning - Auto compute of parallelism now simply factors in amount of comparisons done - More accurate parallelism computation when range pruning is on - tests added & hardened, docs updated	2017-08-04 13:22:13 -07:00
Nishith Agarwal	0b26b60a5c	fix for cleaning log files(mor)	2017-08-02 11:54:42 -07:00
Nishith Agarwal	19c22b231e	1. Use HoodieLogFormat to archive commits and other actions 2. Introduced avro schema for commits and compactions and an avro wrapper schema	2017-07-26 14:27:44 -07:00
Nishith Agarwal	616c9a68c3	Enabled deletes in merge_on_read	2017-07-26 13:37:27 -07:00
Prasanna Rajaperumal	5cc071f74e	Savepoint should not create a hole in the commit timeline	2017-06-27 16:36:09 -07:00
Vinoth Chandar	754ab88a2d	Introduce ReadOptimizedView & RealtimeView out of TableFileSystemView - Usage now marks code as clearly using either RO or RT views, for future evolution - Tests on all of FileGroups and FileSlices	2017-06-22 17:16:13 -07:00
Vinoth Chandar	c00f1a9ed9	Refactoring HoodieTableFileSystemView using FileGroups/FileSlices - Merged all filter* and get* methods - new constructor takes filestatus[] - All existing tests pass - FileGroup is all files that belong to a fileID within a partition - FileSlice is a generation of data and log files, starting at a base commit	2017-06-22 17:16:13 -07:00
Vinoth Chandar	23e7badd8a	Rename IO Handles & introduce stub for BucketedIndex - UpdateHandle -> MergeHandle, InsertHandle -> CreateHandle - Also bunch of code cleanup in different places	2017-06-22 17:16:13 -07:00
Kaushik Devarajaiah	3aa8083913	Correct clean bug that causes clean failure when partitionPaths are empty	2017-06-20 15:45:32 -07:00
gekath	52c507f83e	Writes relative paths to .commit files Handle case where path is read in as null from commit file Merged with updated release	2017-06-16 12:51:19 -07:00
gekath	db7311f85e	Writes relative paths to .commit files instead of absolute paths Clean up code Removed commented out code Fixed merge conflict with master	2017-06-16 12:51:19 -07:00
Kaushik Devarajaiah	521555c576	Parallelize file version deletes during clean and related tests	2017-06-15 18:20:42 -07:00
Prasanna Rajaperumal	dda28c0b4b	Rollback inflight commits as well when rolling back to savepoint	2017-06-14 11:03:27 -07:00
Prasanna Rajaperumal	db6150c5ef	Refactor hoodie-hive	2017-06-09 13:06:33 -07:00
Prasanna Rajaperumal	bae98efeee	Delete other instant files (.clean) as well during commit archival	2017-05-24 13:51:49 -07:00
Prasanna Rajaperumal	240c91241b	Implement HoodieLogFormat replacing Avro as the default log format	2017-05-23 08:35:11 -07:00
Nishith Agarwal	3c984447da	view scheme added	2017-05-22 12:27:40 -07:00
Prasanna Rajaperumal	70dd7a25ea	Clean should not create a .inflight file	2017-05-22 10:48:35 -07:00
Zeeshan Qureshi	43a55b09fd	Add GCS to supported filesystems	2017-05-18 10:30:34 -07:00
Vinoth Chandar	da17c5c607	Introduce getCommitsAndCompactionsTimeline() explicitly & adjust usage across code base	2017-05-01 21:48:27 -07:00
Vinoth Chandar	bae0528013	Cleanup calls to HoodieTimeline.compareTimeStamps	2017-05-01 21:48:27 -07:00
Prasanna Rajaperumal	8974e11161	Make sure properties set in HoodieWriteConfig is propogated down to individual configs. Fix a race condition which lets InputFormat to think file size is 0 when it is actually not	2017-04-27 10:52:25 -07:00
Prasanna Rajaperumal	91b088f29f	Implement Compaction policy abstraction. Implement LogSizeBased Bounded IO Compaction as the default strategy	2017-04-20 16:59:06 -07:00
Vinoth Chandar	2b6322318c	CR feedback	2017-04-03 18:28:01 -07:00
Vinoth Chandar	e0fc4ec38e	Documentation update + helper method for WriteConfig builder	2017-04-03 18:28:01 -07:00
Vinoth Chandar	dce35ff0d7	Adding a config to control whether date partitioning can be assumed - false by default - CAUTION: If you have an existing tables without partition metadata, you need to set this to "true"	2017-04-03 18:28:01 -07:00
Vinoth Chandar	3129770fd0	Create .hoodie_partition_metadata in each partition, linking back to basepath - Concurreny handled via taskID, failure recovery handled via renames - Falls back to search 3 levels up - Cli tool has command to add this to existing tables	2017-04-03 18:28:01 -07:00
Prasanna Rajaperumal	1e802ad4f2	Move HoodieAvroReader to hoodie-common, it will be used for compaction and in the record reader	2017-04-03 13:58:35 -07:00
Prasanna Rajaperumal	aee136777b	Fixes needed to run merge-on-read testing on production scale data	2017-04-02 22:25:47 -07:00
Yash Sharma	d6f94b998d	Hoodie operability with S3	2017-03-28 05:08:54 -07:00
prazanna	0e3f635adb	remove hardcoding of autoClean	2017-03-23 15:54:26 -07:00
fishie9	b7047ab4fb	Pass in String StroageLevel for WriteStatus (#113 )	2017-03-23 04:31:30 -07:00
prazanna	f1b7afad21	Add config for index parallelism and make clean public (#109 ) * Add config for index parallelism and make clean public * Review comments on clean api modification	2017-03-21 17:36:46 -07:00
ovj	21898907c1	tool for importing hive tables (in parquet format) into hoodie dataset (#89 ) * tool for importing hive tables (in parquet format) into hoodie dataset * review fixes * review fixes * review fixes	2017-03-21 14:42:13 -07:00
prazanna	d835710c51	Metadata timeline marks an already complete instant as complete again (#98 )	2017-03-17 12:42:26 -07:00
Prasanna Rajaperumal	d83b671ada	Implement Savepoints and required metadata timeline - Part 2	2017-03-13 23:09:29 -07:00
prazanna	6f36e1eaaf	Implement Savepoints and required metadata timeline (#86 ) - Introduce avro to save clean metadata with details about the last commit that was retained - Save rollback metadata in the meta timeline - Create savepoint metadata and add API to createSavepoint, deleteSavepoint and rollbackToSavepoint - Savepointed commit should not be rolledback or cleaned or archived - introduce cli commands to show, create and rollback to savepoints - Write unit tests to test savepoints and rollbackToSavepoints	2017-03-13 15:12:03 -07:00
vinoth chandar	69d3950a32	Revamped Deltastreamer (#93 ) * Add analytics to site * Fix ugly favicon * New & Improved HoodieDeltaStreamer - Can incrementally consume from HDFS or Kafka, with exactly-once semantics! - Supports Json/Avro data, Source can also do custom things - Source is totally pluggable, via reflection - Key generation is pluggable, currently added SimpleKeyGenerator - Schema provider is pluggable, currently Filebased schemas - Configurable field to break ties during preCombine - Finally, can also plugin the HoodieRecordPayload, to get other merge types than overwriting - Handles efficient avro serialization in Spark Pending : - Rewriting of HiveIncrPullSource - Hive sync via hoodie-hive - Cleanup & tests * Minor fixes from master rebase * Implementation of HiveIncrPullSource - Copies commit by commit from source to target * Adding TimestampBasedKeyGenerator - Supports unix time & date strings	2017-03-13 12:41:29 -07:00
siddharthagunda	348a48aa80	Add delete support to Hoodie (#85 )	2017-03-04 01:33:49 -08:00
vinoth chandar	116a78094f	Cleanup code based on Java8 Lambdas (#84 )	2017-02-27 15:52:13 -08:00
Prasanna Rajaperumal	1132f3533d	Merge and pull master commits	2017-02-21 17:53:28 -08:00
prazanna	eb46e7c72b	Implement Merge on Read Storage (#76 ) 1. Create HoodieTable abstraction for commits and fileSystemView 2. HoodieMergeOnReadTable created 3. View is now always obtained from the table and the correct view based on the table type is returned	2017-02-21 16:24:38 -08:00
prazanna	11d2fd3428	Introduce RealtimeTableView and Implement HoodieRealtimeTableCompactor (#73 )	2017-02-21 16:24:18 -08:00

1 2 3

111 Commits