lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
vinothchandar	cf7f7aabb9	Nicer handling of timeline archival for Cloud storage - When append() is not supported, rollover to new file always (instead of failing) - Provide way to configure archive log folder (avoids small files inside .hoodie) - Datasets written via Spark datasource archive to .hoodie/archived - HoodieClientExample will now retain only 2,3 commits to exercise archival path during dev cycles - Few tweaks to code structure around CommitArchiveLog	2018-01-17 23:34:21 -08:00
Vinoth Chandar	0cd186c899	Multi FS Support - Reviving PR 191, to make FileSystem creation off actual path - Streamline all filesystem access to HoodieTableMetaClient - Hadoop Conf from Spark Context serialized & passed to executor code too - Pick up env vars prefixed with HOODIE_ENV_ into Configuration object - Cleanup usage of FSUtils.getFS, piggybacking off HoodieTableMetaClient.getFS - Adding s3a to supported schemes & support escaping "." in env vars - Tests use HoodieTestUtils.getDefaultHadoopConf	2018-01-17 23:34:21 -08:00
Nishith Agarwal	44839b88c6	Removing compaction action type and associated compaction timeline operations, replace with commit action type	2018-01-09 09:56:15 -08:00
Vinoth Chandar	e45679f5e2	Reformatting code per Google Code Style all over	2017-11-12 23:19:02 -08:00
Nishith Agarwal	abe964bebd	Implementing custom payload/merge hooks abstractions for application specific merge logic	2017-11-07 18:55:55 -08:00
Nishith Agarwal	c7d63a7622	1) Separated rollback as a table operation 2) Implement rollback for MOR	2017-10-12 07:36:46 -07:00
Vinoth Chandar	e1fe3ab937	[maven-release-plugin] prepare for next development iteration	2017-10-02 22:42:54 -07:00
Vinoth Chandar	50139fe904	[maven-release-plugin] prepare release hoodie-0.4.0	2017-10-02 22:42:32 -07:00
Vinoth Chandar	274aaf49fe	Incorporating code review feedback for DataSource	2017-10-02 20:44:53 -07:00
Vinoth Chandar	64e0573aca	Adding hoodie-spark to support Spark Datasource for Hoodie - Write with COW/MOR paths work fully - Read with RO view works on both storages* - Incremental view supported on COW - Refactored out HoodieReadClient methods, to just contain key based access - HoodieDataSourceHelpers class can be now used to construct inputs to datasource - Tests in hoodie-client using new helpers and mechanisms - Basic tests around save modes & insert/upserts (more to follow) - Bumped up scala to 2.11, since 2.10 is deprecated & complains with scalatest - Updated documentation to describe usage - New sample app written using the DataSource API	2017-10-02 20:44:53 -07:00
Kaushik Devarajaiah	c98ee057fc	capture record metadata before deflating for record counting	2017-10-02 10:46:06 -07:00
Jian Xu	7e9a4a89dd	Use getFileStatus to get single FileStatus for single file	2017-09-11 11:24:44 -07:00
Jian Xu	b1cf097b0c	Add nested fields support for MOR tables	2017-08-16 10:35:26 -07:00
Nishith Agarwal	6a3c94aaa3	suppressing logs (under 4MB) for jenkins	2017-08-15 16:30:51 -07:00
Vinoth Chandar	86209640f7	Adding range based pruning to bloom index - keys compared lexicographically using String::compareTo - Range metadata additionally written into parquet file footers - Trim fat & few optimizations to speed up indexing - Add param to control whether input shall be cached, to speed up lookup - Add param to turn on/off range pruning - Auto compute of parallelism now simply factors in amount of comparisons done - More accurate parallelism computation when range pruning is on - tests added & hardened, docs updated	2017-08-04 13:22:13 -07:00
Nishith Agarwal	0b26b60a5c	fix for cleaning log files(mor)	2017-08-02 11:54:42 -07:00
Nishith Agarwal	19c22b231e	1. Use HoodieLogFormat to archive commits and other actions 2. Introduced avro schema for commits and compactions and an avro wrapper schema	2017-07-26 14:27:44 -07:00
Nishith Agarwal	616c9a68c3	Enabled deletes in merge_on_read	2017-07-26 13:37:27 -07:00
Prasanna Rajaperumal	7d3963b4ab	Pushing master to 0.4.0 as we continue to make minor releases over 0.3.8 (MVP for MOR)	2017-06-30 11:41:23 -07:00
Nishith Agarwal	3eba812a1b	[maven-release-plugin] prepare for next development iteration	2017-06-30 11:17:07 -07:00
Nishith Agarwal	06d44daea3	[maven-release-plugin] prepare release hoodie-0.3.9	2017-06-30 11:16:58 -07:00
Nishith Agarwal	348250d960	Using FsUtils instead of Files API to extract file extension	2017-06-29 19:26:31 -07:00
Prasanna Rajaperumal	5cc071f74e	Savepoint should not create a hole in the commit timeline	2017-06-27 16:36:09 -07:00
Vinoth Chandar	754ab88a2d	Introduce ReadOptimizedView & RealtimeView out of TableFileSystemView - Usage now marks code as clearly using either RO or RT views, for future evolution - Tests on all of FileGroups and FileSlices	2017-06-22 17:16:13 -07:00
Vinoth Chandar	c00f1a9ed9	Refactoring HoodieTableFileSystemView using FileGroups/FileSlices - Merged all filter* and get* methods - new constructor takes filestatus[] - All existing tests pass - FileGroup is all files that belong to a fileID within a partition - FileSlice is a generation of data and log files, starting at a base commit	2017-06-22 17:16:13 -07:00
gekath	52c507f83e	Writes relative paths to .commit files Handle case where path is read in as null from commit file Merged with updated release	2017-06-16 12:51:19 -07:00
gekath	db7311f85e	Writes relative paths to .commit files instead of absolute paths Clean up code Removed commented out code Fixed merge conflict with master	2017-06-16 12:51:19 -07:00
Prasanna Rajaperumal	0ed3fac5e3	[maven-release-plugin] prepare for next development iteration	2017-06-16 11:03:17 -07:00
Prasanna Rajaperumal	45732e440c	[maven-release-plugin] prepare release hoodie-0.3.8	2017-06-16 10:59:58 -07:00
Prasanna Rajaperumal	4b26be9f61	Fixes to RealtimeInputFormat and RealtimeRecordReader and update documentation for HiveSyncTool	2017-06-15 18:21:07 -07:00
Kaushik Devarajaiah	521555c576	Parallelize file version deletes during clean and related tests	2017-06-15 18:20:42 -07:00
Prasanna Rajaperumal	db6150c5ef	Refactor hoodie-hive	2017-06-09 13:06:33 -07:00
Danny Chen	c192dd60b4	Change from deprecated closeQuietly to try with resources	2017-06-05 19:11:53 -07:00
Nishith Agarwal	ba050973e3	updated HoodieRealtimeRecordReader to use HoodieCompactedLogRecordScanner, added test for recordreader	2017-06-02 11:33:59 -07:00
Prasanna Rajaperumal	933cc8071f	[maven-release-plugin] prepare for next development iteration	2017-05-24 14:02:50 -07:00
Prasanna Rajaperumal	bebae06b5b	[maven-release-plugin] prepare release hoodie-0.3.7	2017-05-24 14:02:41 -07:00
Prasanna Rajaperumal	240c91241b	Implement HoodieLogFormat replacing Avro as the default log format	2017-05-23 08:35:11 -07:00
Vinoth Chandar	da17c5c607	Introduce getCommitsAndCompactionsTimeline() explicitly & adjust usage across code base	2017-05-01 21:48:27 -07:00
Vinoth Chandar	bae0528013	Cleanup calls to HoodieTimeline.compareTimeStamps	2017-05-01 21:48:27 -07:00
Vinoth Chandar	7b1446548f	Initial impl of HoodieRealtimeInputFormat - Works end-end for flat schemas - Schema evolution & hardening remains - HoodieClientExample can now write mor tables as well	2017-05-01 21:48:27 -07:00
Vinoth Chandar	9f526396a0	Add support for merge_on_read tables to HoodieClientExample	2017-05-01 21:48:27 -07:00
Prasanna Rajaperumal	c3258039f0	[maven-release-plugin] prepare for next development iteration	2017-04-27 11:00:56 -07:00
Prasanna Rajaperumal	de1bdad756	[maven-release-plugin] prepare release hoodie-0.3.6	2017-04-27 11:00:45 -07:00
Prasanna Rajaperumal	8974e11161	Make sure properties set in HoodieWriteConfig is propogated down to individual configs. Fix a race condition which lets InputFormat to think file size is 0 when it is actually not	2017-04-27 10:52:25 -07:00
Prasanna Rajaperumal	91b088f29f	Implement Compaction policy abstraction. Implement LogSizeBased Bounded IO Compaction as the default strategy	2017-04-20 16:59:06 -07:00
Vinoth Chandar	2b6322318c	CR feedback	2017-04-03 18:28:01 -07:00
Vinoth Chandar	dce35ff0d7	Adding a config to control whether date partitioning can be assumed - false by default - CAUTION: If you have an existing tables without partition metadata, you need to set this to "true"	2017-04-03 18:28:01 -07:00
Vinoth Chandar	f9fd16069d	FSUtils.getAllPartitionsPaths() works based on .hoodie_partition_metadata - clean/rollback/write paths covered by existing tests - Snapshot copier fixed to copy metadata file also, and test fixed - Existing tables need to be repaired by addition of metadata, before this can be rolled out	2017-04-03 18:28:01 -07:00
Vinoth Chandar	3129770fd0	Create .hoodie_partition_metadata in each partition, linking back to basepath - Concurreny handled via taskID, failure recovery handled via renames - Falls back to search 3 levels up - Cli tool has command to add this to existing tables	2017-04-03 18:28:01 -07:00
Prasanna Rajaperumal	1e802ad4f2	Move HoodieAvroReader to hoodie-common, it will be used for compaction and in the record reader	2017-04-03 13:58:35 -07:00

1 2

100 Commits