lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Jian Xu	d3df32fa03	Add back UseTempFolder changes in HoodieMergeHandle	2018-03-15 17:11:15 -07:00
Omkar Joshi	c5b4cb1b75	Spawning parallel writer thread to separate reading records from spark and writing records to parquet file	2018-03-15 16:58:14 -07:00
Nishith Agarwal	9dff8c2326	Adding a tool to read/inspect a HoodieLogFile	2018-03-15 16:48:28 -07:00
Jian Xu	ba7c258c61	Add more options in HoodieWriteConfig	2018-03-13 23:26:36 -07:00
Jian Xu	7f079632a6	Use hadoopConf in HoodieTableMetaClient and related tests	2018-03-12 11:47:55 -07:00
Vinoth Chandar	73534d467f	[maven-release-plugin] prepare for next development iteration	2018-03-07 21:04:10 -08:00
Vinoth Chandar	f2e5c6f9f8	[maven-release-plugin] prepare release hoodie-0.4.1	2018-03-07 21:04:00 -08:00
Nishith Agarwal	0eaa21111a	Re-factoring Compaction as first level API in WriteClient similar to upsert/insert	2018-03-07 16:16:39 -08:00
Nishith Agarwal	5405a6287b	Introducing HoodieLogFormat V2 with versioning support - HoodieLogFormat V2 has support for LogFormat evolution through versioning - LogVersion is associated with a LogBlock not a LogFile - Based on a version for a LogBlock, approporiate code path is executed - Implemented LazyReading of Hoodie Log Blocks with Memory / IO tradeoff - Implemented Reverse pointer to be able to traverse the log in reverse - Introduce new MAGIC for backwards compatibility with logs without versions	2018-03-06 21:14:11 -08:00
Jian Xu	dfd1979c51	Handle inflight clean instants during Hoodie instants archiving	2018-03-05 15:01:58 -08:00
Jian Xu	5d5c306e64	Add new APIs in HoodieReadClient and HoodieWriteClient	2018-02-28 13:58:12 -08:00
Nishith Agarwal	6fec9655a8	Added support for Disk Spillable Compaction to prevent OOM issues	2018-02-26 16:00:35 -08:00
Nishith Agarwal	d495484399	Write smaller sized multiple blocks to log file instead of a large one - Use SizeEstimator to size number of records to write - Configurable block size - Configurable log file size	2018-02-23 07:31:39 -08:00
Vinoth Chandar	eb3d0c470f	Fix formatting in HoodieWriteClient	2018-02-14 10:03:20 -08:00
Jian Xu	3bdd750982	Use FastDateFormat for thread safety Use FastDateFormat for thread safety, this is to fix an exception when a job is used to ingest multiple tables. An example exception: ``` Caused by: java.lang.NumberFormatException: multiple points at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1890) at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110) at java.lang.Double.parseDouble(Double.java:538) at java.text.DigitList.getDouble(DigitList.java:169) at java.text.DecimalFormat.parse(DecimalFormat.java:2056) at java.text.SimpleDateFormat.subParse(SimpleDateFormat.java:1867) at java.text.SimpleDateFormat.parse(SimpleDateFormat.java:1514) at java.text.DateFormat.parse(DateFormat.java:364) at com.uber.hoodie.HoodieWriteClient.commit(HoodieWriteClient.java:442) ```	2018-02-12 11:43:57 -08:00
Nishith Agarwal	7076c2e9f0	refactor classes to accept Map passed by RealtimeCompactor to avoid multiple map creations in HoodieMergeHandle	2018-02-07 11:16:01 -08:00
Nishith Agarwal	30049383f5	Small File Size correction handling for MOR table type	2018-02-07 11:01:10 -08:00
Nishith Agarwal	2116815261	Fixing Rollback for compaction/commit operation, added check for null commit - Fallback to old way of rollback by listing all partitions - Added null check to ensure only partitions which are to be rolledback are considered - Added location (committime) to workload stat - Added checks in CompactedScanner to guard against task retries - Introduce new logic for rollback (bounded by instant_time and target_instant time) - Reversed logfiles order	2018-02-06 16:55:23 -08:00
Nishith Agarwal	be0b1f3e57	Adding global indexing to HbaseIndex implementation - Adding tests or HbaseIndex - Enabling global index functionality	2018-02-05 15:21:22 -08:00
Jian Xu	15e669c60c	Incorporating code review feedback for finalizeWrite for COW #4	2018-02-02 11:38:25 -08:00
Jian Xu	3736243fb3	Rebases with latest upstream	2018-02-02 11:38:25 -08:00
Jian Xu	363e35bb0f	Add finalizeWrite support for HoodieMergeHandle	2018-02-02 11:38:25 -08:00
Jian Xu	acae6586f3	Incorporating code review feedback for finalizeWrite for COW #3	2018-02-02 11:38:25 -08:00
Jian Xu	37f2cdd7e4	Incorporating code review feedback for finalizeWrite for COW #2	2018-02-02 11:38:25 -08:00
Jian Xu	2fe4fef625	Incorporating code review feedback for finalizeWrite for COW	2018-02-02 11:38:25 -08:00
Jian Xu	c874248f23	Add FinalizeWrite in HoodieCreateHandle for COW tables	2018-02-02 11:38:25 -08:00
Nishith Agarwal	e10100fe32	Reducing list status calls from listing logfile versions, some associated refactoring	2018-01-29 08:26:39 -08:00
Nishith Agarwal	937ae322ba	Reducing memory footprint required in HoodieAvroDataBlock and HoodieAppendHandle	2018-01-29 08:22:29 -08:00
Vinoth Chandar	85d32930cd	Update Gemfile.lock	2018-01-18 00:07:23 -08:00
vinothchandar	21ce846f18	Remove stateful fs member from HoodieTestUtils & FSUtils	2018-01-17 23:34:21 -08:00
vinothchandar	cf7f7aabb9	Nicer handling of timeline archival for Cloud storage - When append() is not supported, rollover to new file always (instead of failing) - Provide way to configure archive log folder (avoids small files inside .hoodie) - Datasets written via Spark datasource archive to .hoodie/archived - HoodieClientExample will now retain only 2,3 commits to exercise archival path during dev cycles - Few tweaks to code structure around CommitArchiveLog	2018-01-17 23:34:21 -08:00
Vinoth Chandar	0cd186c899	Multi FS Support - Reviving PR 191, to make FileSystem creation off actual path - Streamline all filesystem access to HoodieTableMetaClient - Hadoop Conf from Spark Context serialized & passed to executor code too - Pick up env vars prefixed with HOODIE_ENV_ into Configuration object - Cleanup usage of FSUtils.getFS, piggybacking off HoodieTableMetaClient.getFS - Adding s3a to supported schemes & support escaping "." in env vars - Tests use HoodieTestUtils.getDefaultHadoopConf	2018-01-17 23:34:21 -08:00
Nishith Agarwal	44839b88c6	Removing compaction action type and associated compaction timeline operations, replace with commit action type	2018-01-09 09:56:15 -08:00
vinoth chandar	a1c0d0dbad	Update README.md Reflect hudi	2017-12-10 07:50:37 -08:00
Nishith Agarwal	4aed5c7338	Adding a new Partition/Time based compaction strategy	2017-12-05 16:30:38 -08:00
Nishith Agarwal	051f600b7f	Enable hive sync even if there is no compaction commit	2017-11-30 18:22:58 -08:00
Nishith Agarwal	9b610f82c7	Separating out compaction() API	2017-11-14 22:56:29 -08:00
Vinoth Chandar	e45679f5e2	Reformatting code per Google Code Style all over	2017-11-12 23:19:02 -08:00
Vinoth Chandar	5a62480a92	Update docs on code style setup	2017-11-12 23:19:02 -08:00
Nishith Agarwal	abe964bebd	Implementing custom payload/merge hooks abstractions for application specific merge logic	2017-11-07 18:55:55 -08:00
Nishith Agarwal	c7d63a7622	1) Separated rollback as a table operation 2) Implement rollback for MOR	2017-10-12 07:36:46 -07:00
Vinoth Chandar	e1fe3ab937	[maven-release-plugin] prepare for next development iteration	2017-10-02 22:42:54 -07:00
Vinoth Chandar	50139fe904	[maven-release-plugin] prepare release hoodie-0.4.0	2017-10-02 22:42:32 -07:00
Vinoth Chandar	3768ad45fb	Release notes for 0.4.0	2017-10-02 22:26:22 -07:00
Vinoth Chandar	274aaf49fe	Incorporating code review feedback for DataSource	2017-10-02 20:44:53 -07:00
Vinoth Chandar	64e0573aca	Adding hoodie-spark to support Spark Datasource for Hoodie - Write with COW/MOR paths work fully - Read with RO view works on both storages* - Incremental view supported on COW - Refactored out HoodieReadClient methods, to just contain key based access - HoodieDataSourceHelpers class can be now used to construct inputs to datasource - Tests in hoodie-client using new helpers and mechanisms - Basic tests around save modes & insert/upserts (more to follow) - Bumped up scala to 2.11, since 2.10 is deprecated & complains with scalatest - Updated documentation to describe usage - New sample app written using the DataSource API	2017-10-02 20:44:53 -07:00
Kaushik Devarajaiah	c98ee057fc	capture record metadata before deflating for record counting	2017-10-02 10:46:06 -07:00
Vinoth Chandar	f2980052cd	Revert effects of PR #259	2017-09-28 10:29:58 -07:00
Vinoth Chandar	9f98ae643b	Adding canIndexLogFiles(), isImplicitWithStorage(), isGlobal() to HoodieIndex	2017-09-28 10:19:29 -07:00
Eric Sayle	6230e15191	Update deprecated hash function Guava deprecated hashString(String) in v15, and removed it in v16. Replace call with hashUnencodedString(String), which replace it, to be compatible with newer versions of Guava.	2017-09-18 17:39:19 -07:00

1 2 3 4 5

245 Commits