lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Nishith Agarwal	129e433641	- Ugrading to Hive 2.x - Eliminating in-memory deltaRecordsMap - Use writerSchema to generate generic record needed by custom payloads - changes to make tests work with hive 2.x	2019-06-13 12:46:14 -07:00
Balaji Varadarajan	1c943ab230	Ensure log files are consistently ordered when scanning	2019-06-12 16:16:37 -07:00
Balaji Varadarajan	479908fd20	HUDI-125 : Change License for all source files and update RAT configurations	2019-06-09 11:41:55 -07:00
Balaji Varadarajan	30b0f2636f	Changes related to Licensing work 1. Go through dependencies list one round to ensure compliance. Generated current NOTICE list in all submodules (other apache projects like flink does this). To be on conservative side regarding licensing, NOTICE.txt lists all dependencies including transitive. Pending Compliance questions reported in https://issues.apache.org/jira/browse/LEGAL-461 2. Automate generating NOTICE.txt files to allow future package compliance issues be identified early as part of code-review process. 3. Added NOTICE.txt and LICENSE.txt to all HUDI jars	2019-06-07 17:58:57 -07:00
vinothchandar	66c0b81b49	[maven-release-plugin] prepare for next development iteration	2019-05-28 19:17:26 -07:00
vinothchandar	227785c022	[maven-release-plugin] prepare release hoodie-0.4.7	2019-05-28 19:17:15 -07:00
Balaji Varadarajan	145034c5fa	Spark Stage retry handling	2019-05-21 14:49:51 -07:00
vinothchandar	446f99aa0f	[maven-release-plugin] prepare for next development iteration	2019-05-14 07:29:22 -07:00
vinothchandar	cc38abecc8	[maven-release-plugin] prepare release hoodie-0.4.6	2019-05-14 07:29:11 -07:00
Nishith Agarwal	af46078a82	converting map task memory from mb to bytes	2019-05-13 21:23:30 -07:00
vinothchandar	687395e40f	[maven-release-plugin] prepare for next development iteration	2019-02-27 07:16:27 -08:00
vinothchandar	bbf40ef987	[maven-release-plugin] prepare release hoodie-0.4.5	2019-02-27 07:16:15 -08:00
Bhavani Sudha Saktheeswaran	639c287cab	Close FSDataInputStream for meta file open in HoodiePartitionMetadata	2019-02-15 22:16:31 -08:00
Balaji Varadarajan	defcf6a0b9	Fix Hoodie Record Reader to work with non-partitioned dataset	2019-02-11 18:29:23 -08:00
Balaji Varadarajan	3a0044216c	New Features in DeltaStreamer : (1) Apply transformation when using delta-streamer to ingest data. (2) Add Hudi Incremental Source for Delta Streamer (3) Allow delta-streamer config-property to be passed as command-line (4) Add Hive Integration to Delta-Streamer and address Review comments (5) Ensure MultiPartKeysValueExtractor handle hive style partition description (6) Reuse same spark session on both source and transformer (7) Support extracting partition fields from _hoodie_partition_path for HoodieIncrSource (8) Reuse Binary Avro coders (9) Add push down filter for Incremental source (10) Add Hoodie DeltaStreamer metrics to track total time taken	2019-02-11 18:22:05 -08:00
Nishith Agarwal	110df7190b	Enabling hard deletes for MergeOnRead table type	2018-12-31 12:49:58 -08:00
arukavytsia	6946dd7557	General enhancements	2018-12-18 12:52:39 -08:00
Balaji Varadarajan	8485b9e263	Fix regression which broke HudiInputFormat handling of non-hoodie datasets	2018-10-16 18:39:56 +01:00
Balaji Varadarajan	9710b5a3a6	Ensure Hoodie metadata folder and files are filtered out when constructing Parquet Data Source	2018-10-01 14:27:14 +05:30
vinothchandar	7ba842c0fe	[maven-release-plugin] prepare for next development iteration	2018-09-28 11:27:00 +05:30
vinothchandar	5847b61f44	[maven-release-plugin] prepare release hoodie-0.4.4	2018-09-28 11:26:15 +05:30
Balaji Varadarajan	4c74dd4cad	Travis CI tests needs to be run in quieter mode (WARN log level) to avoid max log-size errors	2018-09-26 21:10:20 +05:30
Yishuang Lu	faf93b6340	Fix the name of avro schema file in Test Fixed the name of avro schema file in Test Signed-off-by: Yishuang Lu <luystu@gmail.com>	2018-09-24 21:58:34 +05:30
Balaji Varadarajan	5cb28e7b1f	Explicitly release resources in LogFileReader and TestHoodieClientBase	2018-09-20 13:24:57 +05:30
Vinoth Chandar	bd5af89f12	[maven-release-plugin] rollback the release of hoodie-0.4.4	2018-09-13 15:01:53 +05:30
Vinoth Chandar	d1cc864a43	[maven-release-plugin] prepare for next development iteration	2018-09-12 23:59:47 +05:30
Vinoth Chandar	b748bc836d	[maven-release-plugin] prepare release hoodie-0.4.4	2018-09-12 23:59:34 +05:30
Vinoth Chandar	eca49a255e	Rebasing and fixing conflicts against master	2018-09-11 11:03:30 +05:30
Vinoth Chandar	a5359662be	Moving depedencies off cdh to apache + Hive2 support - Tests redone in the process - Main changes are to RealtimeRecordReader and how it treats maps/arrays - Make hive sync work with Hive 1/2 and CDH environments - Fixes to make corner cases for Hive queries - Spark Hive integration - Working version across Apache and CDH versions - Known Issue - https://github.com/uber/hudi/issues/439	2018-09-11 11:03:30 +05:30
Vinoth Chandar	89cd6b0726	[maven-release-plugin] prepare for next development iteration	2018-08-22 21:30:05 -07:00
Vinoth Chandar	8d305c5a86	[maven-release-plugin] prepare release hoodie-0.4.3	2018-08-22 21:29:53 -07:00
Balaji Varadarajan	2e12c86d01	Ensure Compaction Operation compacts the data file as defined in the workload	2018-08-07 08:19:50 -07:00
Balaji Varadarajan	2f8ce93030	Async Compaction Main API changes	2018-08-07 08:19:50 -07:00
Balaji Varadarajan	6d01ae8ca0	FileSystemView and Timeline level changes to support Async Compaction	2018-08-07 08:19:50 -07:00
Vinoth Chandar	34827d50e1	[maven-release-plugin] prepare for next development iteration	2018-06-11 08:59:13 -07:00
Vinoth Chandar	43ef385730	[maven-release-plugin] prepare release hoodie-0.4.2	2018-06-11 08:59:02 -07:00
Xavier Jodoin	8ad8030f2a	Fix wrong use of TemporaryFolder junit rule	2018-06-10 23:31:42 -07:00
vinothchandar	8f1d362015	Fixing deps & serialization for RTView - hoodie-hadoop-mr now needs objectsize bundled - Also updated docs with additional tuning tips	2018-06-10 19:16:44 -07:00
Vinoth Chandar	85dd265b7b	Improving out of box experience for data source - Fixes #246 - Bump up default parallelism to 1500, to handle large upserts - Add docs on s3 confuration & tuning tips with tested spark knobs - Fix bug to not duplicate hoodie metadata fields when input dataframe is another hoodie dataset - Improve speed of ROTablePathFilter by removing directory check - Move to spark-avro 4.0 to handle issue with nested fields with same name - Keep AvroConversionUtils in sync with spark-avro 4.0	2018-06-10 19:16:44 -07:00
Balaji Varadarajan	dfc0c61eb7	Support union mode in HoodieRealtimeRecordReader for pure insert workloads Also Replace BufferedIteratorPayload abstraction with function passing	2018-05-10 17:39:56 -07:00
Nishith Agarwal	93f345a032	Minor fixes for MergeOnRead MVP release readiness	2018-05-09 07:23:58 -07:00
Nishith Agarwal	c3c205fc02	Using BufferedFsInputStream to wrap FSInputStream for FSDataInputStream	2018-04-18 08:05:19 -07:00
Balaji Varadarajan	788e4f2d2e	CodeStyle formatting to conform to basic Checkstyle rules. The code-style rules follow google style with some changes: 1. Increase line length from 100 to 120 2. Disable JavaDoc related checkstyles as this needs more manual work. Both source and test code are checked for code-style	2018-03-30 11:09:40 -07:00
Jian Xu	7f079632a6	Use hadoopConf in HoodieTableMetaClient and related tests	2018-03-12 11:47:55 -07:00
Vinoth Chandar	73534d467f	[maven-release-plugin] prepare for next development iteration	2018-03-07 21:04:10 -08:00
Vinoth Chandar	f2e5c6f9f8	[maven-release-plugin] prepare release hoodie-0.4.1	2018-03-07 21:04:00 -08:00
Nishith Agarwal	5405a6287b	Introducing HoodieLogFormat V2 with versioning support - HoodieLogFormat V2 has support for LogFormat evolution through versioning - LogVersion is associated with a LogBlock not a LogFile - Based on a version for a LogBlock, approporiate code path is executed - Implemented LazyReading of Hoodie Log Blocks with Memory / IO tradeoff - Implemented Reverse pointer to be able to traverse the log in reverse - Introduce new MAGIC for backwards compatibility with logs without versions	2018-03-06 21:14:11 -08:00
Nishith Agarwal	6fec9655a8	Added support for Disk Spillable Compaction to prevent OOM issues	2018-02-26 16:00:35 -08:00
Nishith Agarwal	2116815261	Fixing Rollback for compaction/commit operation, added check for null commit - Fallback to old way of rollback by listing all partitions - Added null check to ensure only partitions which are to be rolledback are considered - Added location (committime) to workload stat - Added checks in CompactedScanner to guard against task retries - Introduce new logic for rollback (bounded by instant_time and target_instant time) - Reversed logfiles order	2018-02-06 16:55:23 -08:00
vinothchandar	21ce846f18	Remove stateful fs member from HoodieTestUtils & FSUtils	2018-01-17 23:34:21 -08:00

1 2 3

118 Commits