lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Xavier Jodoin	8ad8030f2a	Fix wrong use of TemporaryFolder junit rule	2018-06-10 23:31:42 -07:00
vinothchandar	8f1d362015	Fixing deps & serialization for RTView - hoodie-hadoop-mr now needs objectsize bundled - Also updated docs with additional tuning tips	2018-06-10 19:16:44 -07:00
Vinoth Chandar	85dd265b7b	Improving out of box experience for data source - Fixes #246 - Bump up default parallelism to 1500, to handle large upserts - Add docs on s3 confuration & tuning tips with tested spark knobs - Fix bug to not duplicate hoodie metadata fields when input dataframe is another hoodie dataset - Improve speed of ROTablePathFilter by removing directory check - Move to spark-avro 4.0 to handle issue with nested fields with same name - Keep AvroConversionUtils in sync with spark-avro 4.0	2018-06-10 19:16:44 -07:00
Sunil Ramaiah	a97814462d	Added a filter function to filter the record keys in a parquet file	2018-05-17 19:01:11 -07:00
Nishith Agarwal	23d53763c4	enabling global index for MOR	2018-05-16 10:36:25 -07:00
Balaji Varadarajan	dfc0c61eb7	Support union mode in HoodieRealtimeRecordReader for pure insert workloads Also Replace BufferedIteratorPayload abstraction with function passing	2018-05-10 17:39:56 -07:00
Nishith Agarwal	93f345a032	Minor fixes for MergeOnRead MVP release readiness	2018-05-09 07:23:58 -07:00
Nishith Agarwal	75df72f575	Adding a fix/workaround when fs.append() unable to return a valid outputstream	2018-05-08 18:46:17 -07:00
Nishith Agarwal	04655e9e85	Adding metrics for MOR and COW	2018-04-26 09:32:45 -07:00
Balaji Varadarajan	c66004d79a	Add Support for ordering and limiting results in CLI show commands	2018-04-26 09:30:05 -07:00
Sunil Ramaiah	b9b9b24993	Added more comments and removed the extra new lines	2018-04-25 13:09:15 -07:00
Sunil Ramaiah	4d1fba24c9	Fix for updating duplicate records in same/different files in same parition	2018-04-25 13:09:15 -07:00
vinoth chandar	fa73a911cc	Update Gemfile.lock	2018-04-19 14:20:50 -07:00
Nishith Agarwal	c3c205fc02	Using BufferedFsInputStream to wrap FSInputStream for FSDataInputStream	2018-04-18 08:05:19 -07:00
Nishith Agarwal	720e42f52a	Parallelized read-write operations in Hoodie Merge phase	2018-04-12 11:46:42 -07:00
Balaji Varadarajan	6c226ca21a	Issue-329 : Refactoring TestHoodieClientOnCopyOnWriteStorage and adding test-cases	2018-04-09 16:34:58 -07:00
Vinoth Chandar	a4049329a5	Update release notes for 0.4.1 (post)	2018-04-02 09:31:01 -07:00
Balaji Varadarajan	788e4f2d2e	CodeStyle formatting to conform to basic Checkstyle rules. The code-style rules follow google style with some changes: 1. Increase line length from 100 to 120 2. Disable JavaDoc related checkstyles as this needs more manual work. Both source and test code are checked for code-style	2018-03-30 11:09:40 -07:00
Nishith Agarwal	987f5d6b96	Making ExternalSpillableMap generic for any datatype - Introduced concept of converters to be able to serde generic datatype for SpillableMap - Fixed/Added configs to Hoodie Configs - Changed HoodieMergeHandle to start using SpillableMap	2018-03-28 07:56:07 -07:00
Xavier Jodoin	fa787ab5ab	Replace deprecated jackson version	2018-03-27 14:27:20 -07:00
Nishith Agarwal	1b756db221	Adding config for parquet compression ratio	2018-03-25 22:17:36 -07:00
Jian Xu	48643795b8	Checking storage level before persisting preppedRecords	2018-03-22 22:15:52 -07:00
Kaushik Devarajaiah	291a88ba94	DeduplicateRecords based on recordKey if global index is used	2018-03-22 09:15:44 -07:00
Nishith Agarwal	123da020e2	- Fixing memory leak due to HoodieLogFileReader holding on to a logblock - Removed inMemory HashMap usage in merge(..) code in LogScanner	2018-03-16 12:43:31 -07:00
Jian Xu	d3df32fa03	Add back UseTempFolder changes in HoodieMergeHandle	2018-03-15 17:11:15 -07:00
Omkar Joshi	c5b4cb1b75	Spawning parallel writer thread to separate reading records from spark and writing records to parquet file	2018-03-15 16:58:14 -07:00
Nishith Agarwal	9dff8c2326	Adding a tool to read/inspect a HoodieLogFile	2018-03-15 16:48:28 -07:00
Jian Xu	ba7c258c61	Add more options in HoodieWriteConfig	2018-03-13 23:26:36 -07:00
Jian Xu	7f079632a6	Use hadoopConf in HoodieTableMetaClient and related tests	2018-03-12 11:47:55 -07:00
Vinoth Chandar	73534d467f	[maven-release-plugin] prepare for next development iteration	2018-03-07 21:04:10 -08:00
Vinoth Chandar	f2e5c6f9f8	[maven-release-plugin] prepare release hoodie-0.4.1	2018-03-07 21:04:00 -08:00
Nishith Agarwal	0eaa21111a	Re-factoring Compaction as first level API in WriteClient similar to upsert/insert	2018-03-07 16:16:39 -08:00
Nishith Agarwal	5405a6287b	Introducing HoodieLogFormat V2 with versioning support - HoodieLogFormat V2 has support for LogFormat evolution through versioning - LogVersion is associated with a LogBlock not a LogFile - Based on a version for a LogBlock, approporiate code path is executed - Implemented LazyReading of Hoodie Log Blocks with Memory / IO tradeoff - Implemented Reverse pointer to be able to traverse the log in reverse - Introduce new MAGIC for backwards compatibility with logs without versions	2018-03-06 21:14:11 -08:00
Jian Xu	dfd1979c51	Handle inflight clean instants during Hoodie instants archiving	2018-03-05 15:01:58 -08:00
Jian Xu	5d5c306e64	Add new APIs in HoodieReadClient and HoodieWriteClient	2018-02-28 13:58:12 -08:00
Nishith Agarwal	6fec9655a8	Added support for Disk Spillable Compaction to prevent OOM issues	2018-02-26 16:00:35 -08:00
Nishith Agarwal	d495484399	Write smaller sized multiple blocks to log file instead of a large one - Use SizeEstimator to size number of records to write - Configurable block size - Configurable log file size	2018-02-23 07:31:39 -08:00
Vinoth Chandar	eb3d0c470f	Fix formatting in HoodieWriteClient	2018-02-14 10:03:20 -08:00
Jian Xu	3bdd750982	Use FastDateFormat for thread safety Use FastDateFormat for thread safety, this is to fix an exception when a job is used to ingest multiple tables. An example exception: ``` Caused by: java.lang.NumberFormatException: multiple points at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1890) at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110) at java.lang.Double.parseDouble(Double.java:538) at java.text.DigitList.getDouble(DigitList.java:169) at java.text.DecimalFormat.parse(DecimalFormat.java:2056) at java.text.SimpleDateFormat.subParse(SimpleDateFormat.java:1867) at java.text.SimpleDateFormat.parse(SimpleDateFormat.java:1514) at java.text.DateFormat.parse(DateFormat.java:364) at com.uber.hoodie.HoodieWriteClient.commit(HoodieWriteClient.java:442) ```	2018-02-12 11:43:57 -08:00
Nishith Agarwal	7076c2e9f0	refactor classes to accept Map passed by RealtimeCompactor to avoid multiple map creations in HoodieMergeHandle	2018-02-07 11:16:01 -08:00
Nishith Agarwal	30049383f5	Small File Size correction handling for MOR table type	2018-02-07 11:01:10 -08:00
Nishith Agarwal	2116815261	Fixing Rollback for compaction/commit operation, added check for null commit - Fallback to old way of rollback by listing all partitions - Added null check to ensure only partitions which are to be rolledback are considered - Added location (committime) to workload stat - Added checks in CompactedScanner to guard against task retries - Introduce new logic for rollback (bounded by instant_time and target_instant time) - Reversed logfiles order	2018-02-06 16:55:23 -08:00
Nishith Agarwal	be0b1f3e57	Adding global indexing to HbaseIndex implementation - Adding tests or HbaseIndex - Enabling global index functionality	2018-02-05 15:21:22 -08:00
Jian Xu	15e669c60c	Incorporating code review feedback for finalizeWrite for COW #4	2018-02-02 11:38:25 -08:00
Jian Xu	3736243fb3	Rebases with latest upstream	2018-02-02 11:38:25 -08:00
Jian Xu	363e35bb0f	Add finalizeWrite support for HoodieMergeHandle	2018-02-02 11:38:25 -08:00
Jian Xu	acae6586f3	Incorporating code review feedback for finalizeWrite for COW #3	2018-02-02 11:38:25 -08:00
Jian Xu	37f2cdd7e4	Incorporating code review feedback for finalizeWrite for COW #2	2018-02-02 11:38:25 -08:00
Jian Xu	2fe4fef625	Incorporating code review feedback for finalizeWrite for COW	2018-02-02 11:38:25 -08:00
Jian Xu	c874248f23	Add FinalizeWrite in HoodieCreateHandle for COW tables	2018-02-02 11:38:25 -08:00

... 5 6 7 8 9 ...

569 Commits