lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
vinoth chandar	a16aa2a78f	Create CNAME	2019-02-15 21:53:08 -08:00
Balaji Varadarajan	3a0044216c	New Features in DeltaStreamer : (1) Apply transformation when using delta-streamer to ingest data. (2) Add Hudi Incremental Source for Delta Streamer (3) Allow delta-streamer config-property to be passed as command-line (4) Add Hive Integration to Delta-Streamer and address Review comments (5) Ensure MultiPartKeysValueExtractor handle hive style partition description (6) Reuse same spark session on both source and transformer (7) Support extracting partition fields from _hoodie_partition_path for HoodieIncrSource (8) Reuse Binary Avro coders (9) Add push down filter for Incremental source (10) Add Hoodie DeltaStreamer metrics to track total time taken	2019-02-11 18:22:05 -08:00
Vinoth Chandar	c70dbc13e9	Updating new slack signup link	2019-02-06 13:52:00 -08:00
Vinoth Chandar	1362942aa3	Enabling auto tuning of insert splits by default	2018-11-08 09:48:23 -08:00
Balaji Varadarajan	07324e7a20	Compaction validate, unschedule and repair	2018-10-25 14:12:47 -07:00
Nishith Agarwal	48aa026dc4	Adding documentation for migration guide and COW vs MOR tradeoffs, moving some docs around for more clarity	2018-10-19 15:00:38 -07:00
Balaji Varadarajan	f3418e4718	Docker Container Build and Run setup with foundations for adding docker integration tests. Docker images built with Hadoop 2.8.4 Hive 2.3.3 and Spark 2.3.1 and published to docker-hub Look at quickstart document for how to setup docker and run demo	2018-10-02 09:28:21 +05:30
vinoth chandar	06bdba3cef	Update Gemfile.lock with newer jekyll version	2018-09-29 20:50:03 +05:30
vinothchandar	9ca6f91e97	Perform consistency checks during write finalize - Check to ensure written files are listable on storage - Docs reflected to capture how this helps with s3 storage - Unit tests added, corrections to existing tests - Fix DeltaStreamer to manage archived commits in a separate folder	2018-09-28 08:04:41 +05:30
Vinoth Chandar	a5359662be	Moving depedencies off cdh to apache + Hive2 support - Tests redone in the process - Main changes are to RealtimeRecordReader and how it treats maps/arrays - Make hive sync work with Hive 1/2 and CDH environments - Fixes to make corner cases for Hive queries - Spark Hive integration - Working version across Apache and CDH versions - Known Issue - https://github.com/uber/hudi/issues/439	2018-09-11 11:03:30 +05:30
Balaji Varadarajan	ea7823a9dd	Docs for describing async compaction and how to operate it	2018-09-10 11:52:20 +08:00
Vinoth Chandar	d58ddbd999	Reworking the deltastreamer tool - Standardize version of jackson - DFSPropertiesConfiguration replaces usage of commons PropertiesConfiguration - Remove dependency on ConstructorUtils - Throw error if ordering value is not present, during key generation - Switch to shade plugin for hoodie-utilities - Added support for consumption for Confluent avro kafka serdes - Support for Confluent schema registry - KafkaSource now deals with skews nicely, by doing round robin allocation of source limit across partitions - Added support for BULK_INSERT operations as well - Pass in the payload class config properly into HoodieWriteClient - Fix documentation based on new usage - Adding tests on deltastreamer, sources and all new util classes.	2018-09-08 10:24:32 +08:00
vinoth chandar	fad4b513ea	Update Gemfile.lock with higher ffi version	2018-09-06 08:54:32 +08:00
vinothchandar	8f1d362015	Fixing deps & serialization for RTView - hoodie-hadoop-mr now needs objectsize bundled - Also updated docs with additional tuning tips	2018-06-10 19:16:44 -07:00
Vinoth Chandar	85dd265b7b	Improving out of box experience for data source - Fixes #246 - Bump up default parallelism to 1500, to handle large upserts - Add docs on s3 confuration & tuning tips with tested spark knobs - Fix bug to not duplicate hoodie metadata fields when input dataframe is another hoodie dataset - Improve speed of ROTablePathFilter by removing directory check - Move to spark-avro 4.0 to handle issue with nested fields with same name - Keep AvroConversionUtils in sync with spark-avro 4.0	2018-06-10 19:16:44 -07:00
Balaji Varadarajan	c66004d79a	Add Support for ordering and limiting results in CLI show commands	2018-04-26 09:30:05 -07:00
vinoth chandar	fa73a911cc	Update Gemfile.lock	2018-04-19 14:20:50 -07:00
Balaji Varadarajan	788e4f2d2e	CodeStyle formatting to conform to basic Checkstyle rules. The code-style rules follow google style with some changes: 1. Increase line length from 100 to 120 2. Disable JavaDoc related checkstyles as this needs more manual work. Both source and test code are checked for code-style	2018-03-30 11:09:40 -07:00
Nishith Agarwal	987f5d6b96	Making ExternalSpillableMap generic for any datatype - Introduced concept of converters to be able to serde generic datatype for SpillableMap - Fixed/Added configs to Hoodie Configs - Changed HoodieMergeHandle to start using SpillableMap	2018-03-28 07:56:07 -07:00
Nishith Agarwal	5405a6287b	Introducing HoodieLogFormat V2 with versioning support - HoodieLogFormat V2 has support for LogFormat evolution through versioning - LogVersion is associated with a LogBlock not a LogFile - Based on a version for a LogBlock, approporiate code path is executed - Implemented LazyReading of Hoodie Log Blocks with Memory / IO tradeoff - Implemented Reverse pointer to be able to traverse the log in reverse - Introduce new MAGIC for backwards compatibility with logs without versions	2018-03-06 21:14:11 -08:00
Nishith Agarwal	d495484399	Write smaller sized multiple blocks to log file instead of a large one - Use SizeEstimator to size number of records to write - Configurable block size - Configurable log file size	2018-02-23 07:31:39 -08:00
Vinoth Chandar	85d32930cd	Update Gemfile.lock	2018-01-18 00:07:23 -08:00
Vinoth Chandar	5a62480a92	Update docs on code style setup	2017-11-12 23:19:02 -08:00
Vinoth Chandar	274aaf49fe	Incorporating code review feedback for DataSource	2017-10-02 20:44:53 -07:00
Vinoth Chandar	64e0573aca	Adding hoodie-spark to support Spark Datasource for Hoodie - Write with COW/MOR paths work fully - Read with RO view works on both storages* - Incremental view supported on COW - Refactored out HoodieReadClient methods, to just contain key based access - HoodieDataSourceHelpers class can be now used to construct inputs to datasource - Tests in hoodie-client using new helpers and mechanisms - Basic tests around save modes & insert/upserts (more to follow) - Bumped up scala to 2.11, since 2.10 is deprecated & complains with scalatest - Updated documentation to describe usage - New sample app written using the DataSource API	2017-10-02 20:44:53 -07:00
Vinoth Chandar	86209640f7	Adding range based pruning to bloom index - keys compared lexicographically using String::compareTo - Range metadata additionally written into parquet file footers - Trim fat & few optimizations to speed up indexing - Add param to control whether input shall be cached, to speed up lookup - Add param to turn on/off range pruning - Auto compute of parallelism now simply factors in amount of comparisons done - More accurate parallelism computation when range pruning is on - tests added & hardened, docs updated	2017-08-04 13:22:13 -07:00
Vinoth Chandar	cf1dde0323	Add recent talks/presentations to documentation	2017-07-08 22:47:15 -07:00
Vinoth Chandar	e8b3ddd7cb	Add note on community engagement to committership guidelines	2017-07-08 22:47:15 -07:00
Prasanna Rajaperumal	e44f9b889b	Added CHANGELOG.md and updated community contributions guideline	2017-06-16 10:48:37 -07:00
Prasanna Rajaperumal	4b26be9f61	Fixes to RealtimeInputFormat and RealtimeRecordReader and update documentation for HiveSyncTool	2017-06-15 18:21:07 -07:00
Zeeshan Qureshi	43a55b09fd	Add GCS to supported filesystems	2017-05-18 10:30:34 -07:00
vinoth chandar	1b0a027942	Update community.md with committership guidelines	2017-05-04 17:25:54 -07:00
Vinoth Chandar	b4e787ce1d	Update docs	2017-05-01 21:48:27 -07:00
Vinoth Chandar	848814bece	Adding docs for deltastreamer, hivesync tool usage	2017-04-03 21:27:49 -07:00
Vinoth Chandar	2b6322318c	CR feedback	2017-04-03 18:28:01 -07:00
Vinoth Chandar	e0fc4ec38e	Documentation update + helper method for WriteConfig builder	2017-04-03 18:28:01 -07:00
Yash Sharma	e3b273e9fd	formatting for docs	2017-03-28 05:08:54 -07:00
Yash Sharma	bca7e7dae4	improve documentations	2017-03-28 05:08:54 -07:00
Yash Sharma	d6f94b998d	Hoodie operability with S3	2017-03-28 05:08:54 -07:00
prazanna	a7cd021f26	Update incremental pull query documentation	2017-03-23 16:20:54 -07:00
ovj	b02910c588	few fixes to quick start document (#112 )	2017-03-22 18:25:26 -07:00
Vinoth Chandar	c3257b9680	Fix ugly favicon	2017-03-12 20:30:42 -07:00
Vinoth Chandar	b252633fab	Add analytics to site	2017-03-12 20:30:42 -07:00
Prasanna Rajaperumal	41e08018fc	Fixing minor documentation fixes	2017-03-02 11:42:04 -08:00
Prasanna Rajaperumal	d84aea3512	Fixing minor documentation fixes	2017-03-02 11:39:40 -08:00
prazanna	8a2a9ae764	Making minor documentation fixes	2017-03-02 11:35:09 -08:00
Vinoth Chandar	33a85900f8	Adding admin guide, guide for sql queries and incr processing	2017-02-19 20:33:21 -08:00
Vinoth Chandar	dcc15d5d6f	Adding docs for running sql queries on hoodie datasets	2017-02-19 20:33:21 -08:00
vinoth chandar	66e272e9eb	Docs for performance section (#80 ) * Adding performance section * minor edit to perf section	2017-02-17 18:30:56 -08:00
vinoth chandar	c7a8e15c78	Docs for impl & comparison (#79 ) * Initial version of comparison, implementation * Finished doc for comparison to other systems	2017-02-17 08:25:17 -08:00

1 2

62 Commits