lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Nishith Agarwal	e10100fe32	Reducing list status calls from listing logfile versions, some associated refactoring	2018-01-29 08:26:39 -08:00
Nishith Agarwal	937ae322ba	Reducing memory footprint required in HoodieAvroDataBlock and HoodieAppendHandle	2018-01-29 08:22:29 -08:00
Vinoth Chandar	85d32930cd	Update Gemfile.lock	2018-01-18 00:07:23 -08:00
vinothchandar	21ce846f18	Remove stateful fs member from HoodieTestUtils & FSUtils	2018-01-17 23:34:21 -08:00
vinothchandar	cf7f7aabb9	Nicer handling of timeline archival for Cloud storage - When append() is not supported, rollover to new file always (instead of failing) - Provide way to configure archive log folder (avoids small files inside .hoodie) - Datasets written via Spark datasource archive to .hoodie/archived - HoodieClientExample will now retain only 2,3 commits to exercise archival path during dev cycles - Few tweaks to code structure around CommitArchiveLog	2018-01-17 23:34:21 -08:00
Vinoth Chandar	0cd186c899	Multi FS Support - Reviving PR 191, to make FileSystem creation off actual path - Streamline all filesystem access to HoodieTableMetaClient - Hadoop Conf from Spark Context serialized & passed to executor code too - Pick up env vars prefixed with HOODIE_ENV_ into Configuration object - Cleanup usage of FSUtils.getFS, piggybacking off HoodieTableMetaClient.getFS - Adding s3a to supported schemes & support escaping "." in env vars - Tests use HoodieTestUtils.getDefaultHadoopConf	2018-01-17 23:34:21 -08:00
Nishith Agarwal	44839b88c6	Removing compaction action type and associated compaction timeline operations, replace with commit action type	2018-01-09 09:56:15 -08:00
vinoth chandar	a1c0d0dbad	Update README.md Reflect hudi	2017-12-10 07:50:37 -08:00
Nishith Agarwal	4aed5c7338	Adding a new Partition/Time based compaction strategy	2017-12-05 16:30:38 -08:00
Nishith Agarwal	051f600b7f	Enable hive sync even if there is no compaction commit	2017-11-30 18:22:58 -08:00
Nishith Agarwal	9b610f82c7	Separating out compaction() API	2017-11-14 22:56:29 -08:00
Vinoth Chandar	e45679f5e2	Reformatting code per Google Code Style all over	2017-11-12 23:19:02 -08:00
Vinoth Chandar	5a62480a92	Update docs on code style setup	2017-11-12 23:19:02 -08:00
Nishith Agarwal	abe964bebd	Implementing custom payload/merge hooks abstractions for application specific merge logic	2017-11-07 18:55:55 -08:00
Nishith Agarwal	c7d63a7622	1) Separated rollback as a table operation 2) Implement rollback for MOR	2017-10-12 07:36:46 -07:00
Vinoth Chandar	e1fe3ab937	[maven-release-plugin] prepare for next development iteration	2017-10-02 22:42:54 -07:00
Vinoth Chandar	50139fe904	[maven-release-plugin] prepare release hoodie-0.4.0	2017-10-02 22:42:32 -07:00
Vinoth Chandar	3768ad45fb	Release notes for 0.4.0	2017-10-02 22:26:22 -07:00
Vinoth Chandar	274aaf49fe	Incorporating code review feedback for DataSource	2017-10-02 20:44:53 -07:00
Vinoth Chandar	64e0573aca	Adding hoodie-spark to support Spark Datasource for Hoodie - Write with COW/MOR paths work fully - Read with RO view works on both storages* - Incremental view supported on COW - Refactored out HoodieReadClient methods, to just contain key based access - HoodieDataSourceHelpers class can be now used to construct inputs to datasource - Tests in hoodie-client using new helpers and mechanisms - Basic tests around save modes & insert/upserts (more to follow) - Bumped up scala to 2.11, since 2.10 is deprecated & complains with scalatest - Updated documentation to describe usage - New sample app written using the DataSource API	2017-10-02 20:44:53 -07:00
Kaushik Devarajaiah	c98ee057fc	capture record metadata before deflating for record counting	2017-10-02 10:46:06 -07:00
Vinoth Chandar	f2980052cd	Revert effects of PR #259	2017-09-28 10:29:58 -07:00
Vinoth Chandar	9f98ae643b	Adding canIndexLogFiles(), isImplicitWithStorage(), isGlobal() to HoodieIndex	2017-09-28 10:19:29 -07:00
Eric Sayle	6230e15191	Update deprecated hash function Guava deprecated hashString(String) in v15, and removed it in v16. Replace call with hashUnencodedString(String), which replace it, to be compatible with newer versions of Guava.	2017-09-18 17:39:19 -07:00
Jian Xu	7e9a4a89dd	Use getFileStatus to get single FileStatus for single file	2017-09-11 11:24:44 -07:00
Omkar Joshi	5c639c0b05	Adding support for UserDefinedBulkInsertPartitioner	2017-09-08 20:55:13 -07:00
Omkar Joshi	ec40d04d51	Fixing UpsertPartitioner to ensure that input records are deterministically assigned to output partitions	2017-09-07 17:03:56 -07:00
Nishith Agarwal	e2d13c6305	Fix build failing issues	2017-09-07 10:54:36 -07:00
Nishith Agarwal	63f1b12355	adding ability to read archived files written in log format	2017-08-25 14:40:07 -07:00
Nishith Agarwal	e484e91807	adding new config to separate shuffle and write parallelism	2017-08-18 16:05:25 -07:00
Jian Xu	b1cf097b0c	Add nested fields support for MOR tables	2017-08-16 10:35:26 -07:00
Nishith Agarwal	6a3c94aaa3	suppressing logs (under 4MB) for jenkins	2017-08-15 16:30:51 -07:00
Nishith Agarwal	5ee4ac40ae	Use CompletedFileSystemView instead of CompactedView considering deltacommits	2017-08-07 12:26:42 -07:00
Vinoth Chandar	45dd8980c3	Temporary fix for build break after rebase	2017-08-04 17:36:39 -07:00
Vinoth Chandar	86209640f7	Adding range based pruning to bloom index - keys compared lexicographically using String::compareTo - Range metadata additionally written into parquet file footers - Trim fat & few optimizations to speed up indexing - Add param to control whether input shall be cached, to speed up lookup - Add param to turn on/off range pruning - Auto compute of parallelism now simply factors in amount of comparisons done - More accurate parallelism computation when range pruning is on - tests added & hardened, docs updated	2017-08-04 13:22:13 -07:00
Nishith Agarwal	0b26b60a5c	fix for cleaning log files(mor)	2017-08-02 11:54:42 -07:00
Nishith Agarwal	19c22b231e	1. Use HoodieLogFormat to archive commits and other actions 2. Introduced avro schema for commits and compactions and an avro wrapper schema	2017-07-26 14:27:44 -07:00
Nishith Agarwal	616c9a68c3	Enabled deletes in merge_on_read	2017-07-26 13:37:27 -07:00
Vinoth Chandar	cf1dde0323	Add recent talks/presentations to documentation	2017-07-08 22:47:15 -07:00
Vinoth Chandar	e8b3ddd7cb	Add note on community engagement to committership guidelines	2017-07-08 22:47:15 -07:00
Prasanna Rajaperumal	7d3963b4ab	Pushing master to 0.4.0 as we continue to make minor releases over 0.3.8 (MVP for MOR)	2017-06-30 11:41:23 -07:00
Nishith Agarwal	3eba812a1b	[maven-release-plugin] prepare for next development iteration	2017-06-30 11:17:07 -07:00
Nishith Agarwal	06d44daea3	[maven-release-plugin] prepare release hoodie-0.3.9	2017-06-30 11:16:58 -07:00
Nishith Agarwal	348250d960	Using FsUtils instead of Files API to extract file extension	2017-06-29 19:26:31 -07:00
Nishith Agarwal	e5d9b818bc	Sync Tool registers 2 tables, RO and RT Tables	2017-06-28 15:41:36 -07:00
Prasanna Rajaperumal	5cc071f74e	Savepoint should not create a hole in the commit timeline	2017-06-27 16:36:09 -07:00
Jian Xu	29b906b763	Fix TimestampBasedKeyGenerator when DATE_STRING is used for partitionpath.field	2017-06-27 13:02:06 -07:00
Vinoth Chandar	754ab88a2d	Introduce ReadOptimizedView & RealtimeView out of TableFileSystemView - Usage now marks code as clearly using either RO or RT views, for future evolution - Tests on all of FileGroups and FileSlices	2017-06-22 17:16:13 -07:00
Vinoth Chandar	c00f1a9ed9	Refactoring HoodieTableFileSystemView using FileGroups/FileSlices - Merged all filter* and get* methods - new constructor takes filestatus[] - All existing tests pass - FileGroup is all files that belong to a fileID within a partition - FileSlice is a generation of data and log files, starting at a base commit	2017-06-22 17:16:13 -07:00
Vinoth Chandar	23e7badd8a	Rename IO Handles & introduce stub for BucketedIndex - UpdateHandle -> MergeHandle, InsertHandle -> CreateHandle - Also bunch of code cleanup in different places	2017-06-22 17:16:13 -07:00

... 6 7 8 9 10 ...

569 Commits