lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Ho Tien Vu	11c4121f73	Fixed TableNotFoundException when write with structured streaming (#778 ) - When write to a new hoodie table, if checkpoint dir is under target path, Spark will create the base path and thus skip initializing .hoodie which result in error - apply .hoodie existent check for all save mode	2019-07-12 09:17:16 -07:00
Jaimin Shah	17e878f721	adding support for complex keys (#728 ) - Resolving the issue related to ambiguity in recordKey by creating and parsing json object as string. - added unit test for ComplexKeyGenerator - minor changes	2019-06-21 00:25:06 -07:00
Balaji Varadarajan	a0d7ab2384	HUDI-70 : Making DeltaStreamer run in continuous mode with concurrent compaction	2019-06-18 17:48:14 -07:00
Balaji Varadarajan	51d122b5c3	Close Hoodie Clients which are opened to properly shutdown embedded timeline service	2019-06-11 20:22:14 -07:00
Balaji Varadarajan	479908fd20	HUDI-125 : Change License for all source files and update RAT configurations	2019-06-09 11:41:55 -07:00
Balaji Varadarajan	30b0f2636f	Changes related to Licensing work 1. Go through dependencies list one round to ensure compliance. Generated current NOTICE list in all submodules (other apache projects like flink does this). To be on conservative side regarding licensing, NOTICE.txt lists all dependencies including transitive. Pending Compliance questions reported in https://issues.apache.org/jira/browse/LEGAL-461 2. Automate generating NOTICE.txt files to allow future package compliance issues be identified early as part of code-review process. 3. Added NOTICE.txt and LICENSE.txt to all HUDI jars	2019-06-07 17:58:57 -07:00
Vinoth Chandar	7b4a28ecf8	Move depedency repos to https urls	2019-05-31 20:37:03 -07:00
Vinoth Chandar	a5e2439514	Turn off noisy test	2019-05-30 21:35:12 -07:00
guanjianhui	6b5abb5d92	fix maven pom	2019-05-29 16:16:29 -07:00
vinothchandar	66c0b81b49	[maven-release-plugin] prepare for next development iteration	2019-05-28 19:17:26 -07:00
vinothchandar	227785c022	[maven-release-plugin] prepare release hoodie-0.4.7	2019-05-28 19:17:15 -07:00
Balaji Varadarajan	a7e6cf5197	Support nested types for recordKey, partitionPath and combineKey	2019-05-18 07:14:58 -07:00
vinothchandar	446f99aa0f	[maven-release-plugin] prepare for next development iteration	2019-05-14 07:29:22 -07:00
vinothchandar	cc38abecc8	[maven-release-plugin] prepare release hoodie-0.4.6	2019-05-14 07:29:11 -07:00
Balaji Varadarajan	978470af33	Rollback inflights when using Spark [Streaming] write	2019-05-02 12:51:02 -07:00
vinothchandar	57a8b9cc8c	Making DataSource/DeltaStreamer use defaults for combining - Addresses issue where insert will combine and remove duplicates within batch - Setting default insert combining to false (write client default) - Set to true if filtering duplicates on insert/bulk_insert	2019-05-01 13:21:21 -07:00
Naoki Takezoe	461ce18bd1	Fix to enable hoodie.datasource.read.incr.filters	2019-04-26 11:14:06 -07:00
lyogev	9ef51deb84	Add empty payload class to support deletes via apache spark	2019-04-17 23:00:20 -07:00
Vinoth Chandar	b34a204a52	Fixing small file handling, inline compaction defaults - Small file limit is now 100MB by default - Turned on inline compaction by default for MOR - Changes take effect on DataSource and DeltaStreamer	2019-04-03 10:56:10 -07:00
Vinoth Chandar	e56c1612e4	Fixed HUDI-87 : Remove schemastr from BaseAvroPayload	2019-03-27 23:03:25 -07:00
Nishith Agarwal	3d9041e216	Fixing source schema and writer schema distinction in payloads	2019-03-26 19:44:27 -07:00
kaka11chen	b514e1ab18	Fix avro doesn't have short and byte byte.	2019-03-06 16:09:24 -08:00
vinothchandar	687395e40f	[maven-release-plugin] prepare for next development iteration	2019-02-27 07:16:27 -08:00
vinothchandar	bbf40ef987	[maven-release-plugin] prepare release hoodie-0.4.5	2019-02-27 07:16:15 -08:00
Kent Yao	8dddecf00f	handle no such element exception in HoodieSparkSqlWriter	2019-02-15 22:11:48 -08:00
Balaji Varadarajan	3a0044216c	New Features in DeltaStreamer : (1) Apply transformation when using delta-streamer to ingest data. (2) Add Hudi Incremental Source for Delta Streamer (3) Allow delta-streamer config-property to be passed as command-line (4) Add Hive Integration to Delta-Streamer and address Review comments (5) Ensure MultiPartKeysValueExtractor handle hive style partition description (6) Reuse same spark session on both source and transformer (7) Support extracting partition fields from _hoodie_partition_path for HoodieIncrSource (8) Reuse Binary Avro coders (9) Add push down filter for Incremental source (10) Add Hoodie DeltaStreamer metrics to track total time taken	2019-02-11 18:22:05 -08:00
Kent Yao	2b55f0751f	Using immutable map instead of mutables to generate parameters	2019-01-30 16:09:40 -08:00
arukavytsia	6946dd7557	General enhancements	2018-12-18 12:52:39 -08:00
Balaji Varadarajan	30c5f8b7bd	Ensure Hoodie works for non-partitioned Hive table	2018-12-12 13:35:16 -08:00
xubo245	466ff73ffb	fix some spell errorin Hudi	2018-12-12 13:06:25 -08:00
jiale.tan	bf65219b73	feat(SparkDataSource): add structured streaming	2018-12-04 16:33:00 -08:00
Nishith Agarwal	7243ce40c9	Serializing the complete payload object instead of serializing just the GenericRecord Removing Converter hierarchy as we now depend purely on JavaSerialization and require the payload to be java serializable	2018-12-04 11:43:41 -08:00
jiale.tan	1628d044ac	feat(SparkDataSource): add additional feature to drop later arriving dups	2018-10-16 11:52:50 -07:00
Vinoth Chandar	1fca9b21cc	Add --filter-dupes to DeltaStreamer - Optionally filter out duplicates before inserting data - Unit tests	2018-10-04 11:25:18 +05:30
Balaji Varadarajan	f3418e4718	Docker Container Build and Run setup with foundations for adding docker integration tests. Docker images built with Hadoop 2.8.4 Hive 2.3.3 and Spark 2.3.1 and published to docker-hub Look at quickstart document for how to setup docker and run demo	2018-10-02 09:28:21 +05:30
vinothchandar	7ba842c0fe	[maven-release-plugin] prepare for next development iteration	2018-09-28 11:27:00 +05:30
vinothchandar	5847b61f44	[maven-release-plugin] prepare release hoodie-0.4.4	2018-09-28 11:26:15 +05:30
vinothchandar	9ca6f91e97	Perform consistency checks during write finalize - Check to ensure written files are listable on storage - Docs reflected to capture how this helps with s3 storage - Unit tests added, corrections to existing tests - Fix DeltaStreamer to manage archived commits in a separate folder	2018-09-28 08:04:41 +05:30
Balaji Varadarajan	4c74dd4cad	Travis CI tests needs to be run in quieter mode (WARN log level) to avoid max log-size errors	2018-09-26 21:10:20 +05:30
Vinoth Chandar	f44bcc5b03	Fix bug with incrementally pulling older data	2018-09-18 02:34:00 +05:30
Vinoth Chandar	bd5af89f12	[maven-release-plugin] rollback the release of hoodie-0.4.4	2018-09-13 15:01:53 +05:30
Vinoth Chandar	d1cc864a43	[maven-release-plugin] prepare for next development iteration	2018-09-12 23:59:47 +05:30
Vinoth Chandar	b748bc836d	[maven-release-plugin] prepare release hoodie-0.4.4	2018-09-12 23:59:34 +05:30
Balaji Varadarajan	18a39715c9	Bump up versions in packaging modules and remove commons-lang3 dep	2018-09-11 11:03:30 +05:30
Vinoth Chandar	a5359662be	Moving depedencies off cdh to apache + Hive2 support - Tests redone in the process - Main changes are to RealtimeRecordReader and how it treats maps/arrays - Make hive sync work with Hive 1/2 and CDH environments - Fixes to make corner cases for Hive queries - Spark Hive integration - Working version across Apache and CDH versions - Known Issue - https://github.com/uber/hudi/issues/439	2018-09-11 11:03:30 +05:30
Vinoth Chandar	d58ddbd999	Reworking the deltastreamer tool - Standardize version of jackson - DFSPropertiesConfiguration replaces usage of commons PropertiesConfiguration - Remove dependency on ConstructorUtils - Throw error if ordering value is not present, during key generation - Switch to shade plugin for hoodie-utilities - Added support for consumption for Confluent avro kafka serdes - Support for Confluent schema registry - KafkaSource now deals with skews nicely, by doing round robin allocation of source limit across partitions - Added support for BULK_INSERT operations as well - Pass in the payload class config properly into HoodieWriteClient - Fix documentation based on new usage - Adding tests on deltastreamer, sources and all new util classes.	2018-09-08 10:24:32 +08:00
Nishith Agarwal	459e523d9e	1. Small file size handling for inserts into log files. In summary, the total size of the log file is compared with the parquet max file size and if there is scope to add inserts the add it.	2018-09-06 08:52:08 +08:00
Nishith Agarwal	324de298bc	Removing dependency on apache-commons lang 3, adding necessary classes as needed	2018-09-06 08:26:48 +08:00
Vinoth Chandar	89cd6b0726	[maven-release-plugin] prepare for next development iteration	2018-08-22 21:30:05 -07:00
Vinoth Chandar	8d305c5a86	[maven-release-plugin] prepare release hoodie-0.4.3	2018-08-22 21:29:53 -07:00

1 2

71 Commits