lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
wangxianghu	537502a8ef	[MINOR] Add apacheflink label (#2268 )	2020-11-22 10:41:11 +08:00
Gary Li	c8d5ea2752	[MINOR] clean up and add comments to flink client (#2261 )	2020-11-19 15:27:52 +08:00
pengzhiwei	d7af8caa45	[HUDI-1384] Decoupling hive jdbc dependency when HIVE_USE_JDBC_OPT_KEY set false (#2241 )	2020-11-19 13:44:03 +08:00
wangxianghu	a23230c8c2	[HUDI-1400] Replace Operation enum with WriteOperationType (#2259 )	2020-11-19 13:40:04 +08:00
wangxianghu	4d05680038	[HUDI-1327] Introduce base implemetation of hudi-flink-client (#2176 )	2020-11-18 17:57:11 +08:00
Karl_Wang	430d4b428e	[HUDI-1377] remove duplicate code (#2235 )	2020-11-10 10:08:08 -08:00
Balaji Varadarajan	42b6aeca28	[HUDI-1358] Fix Memory Leak in HoodieLogFormatWriter (#2217 )	2020-11-09 19:26:13 -08:00
wenningd	0364498ae3	[HUDI-1375] Fix bug in HoodieAvroUtils.removeMetadataFields() method (#2232 ) Co-authored-by: Wenning Ding <wenningd@amazon.com>	2020-11-05 17:30:17 -08:00
satishkotha	33ec88fc38	[HUDI-1352] Add FileSystemView APIs to query pending clustering operations (#2202 )	2020-11-05 08:49:58 -08:00
lw0090	5f5c15b0d9	[HUDI-892] RealtimeParquetInputFormat skip adding projection columns if there are no log files (#2190 ) * [HUDI-892] RealtimeParquetInputFormat skip adding projection columns if there are no log files * [HUDI-892] for test * [HUDI-892] fix bug generate array from split * [HUDI-892] revert test log	2020-11-02 20:00:12 -08:00
wangxianghu	d160abb437	[HUDI-912] Refactor and relocate KeyGenerator to support more engines (#2200 ) * [HUDI-912] Refactor and relocate KeyGenerator to support more engines * Rename KeyGenerators	2020-11-02 13:12:51 -08:00
Venkatesh Rudraraju	59f995a3f5	Use RateLimiter instead of sleep. Repartition WriteStatus to optimize Hbase index writes (#1484 )	2020-11-02 08:33:27 -08:00
Sivabalan Narayanan	a205dd10fa	[HUDI-1338] Adding Delete support to test suite framework (#2172 ) - Adding Delete support to test suite. Added DeleteNode Added support to generate delete records	2020-11-01 00:15:41 -04:00
Prashant Wason	6310a2307a	[HUDI-1351] Improvements to the hudi test suite for scalability and repeated testing. (#2197 ) 1. Added the --clean-input and --clean-output parameters to clean the input and output directories before starting the job 2. Added the --delete-old-input parameter to deleted older batches for data already ingested. This helps keep number of redundant files low. 3. Added the --input-parallelism parameter to restrict the parallelism when generating input data. This helps keeping the number of generated input files low. 4. Added an option start_offset to Dag Nodes. Without ability to specify start offsets, data is generated into existing partitions. With start offset, DAG can control on which partition, the data is to be written. 5. Fixed generation of records for correct number of partitions - In the existing implementation, the partition is chosen as a random long. This does not guarantee exact number of requested partitions to be created. 6. Changed variable blacklistedFields to be a Set as that is faster than List for membership checks. 7. Fixed integer division for Math.ceil. If two integers are divided, the result is not double unless one of the integer is casted to double.	2020-10-29 06:50:37 -07:00
liujinhui	736a940854	[HUDI-1274] Make hive synchronization supports hourly partition (#2122 )	2020-10-29 11:29:50 +08:00
n3nash	e109a61803	1. Fix merge on read DAG to make docker demo pass (#2092 ) 1. Fix merge on read DAG to make docker demo pass (#2092) 2. Fix repeat_count, rollback node	2020-10-28 22:34:26 -04:00
wangxianghu	e206ddd431	[MINOR] Private the NoArgsConstructor of SparkMergeHelper and code clean (#2194 )	2020-10-26 12:22:11 +08:00
lw0090	8545ea3856	[HUDI-1118] Cleanup rollback files residing in .hoodie folder (#2205 )	2020-10-25 21:04:56 -07:00
Prashant Wason	49e855c348	[HUDI-1326] Added an API to force publish metrics and flush them. (#2152 ) * [HUDI-1326] Added an API to force publish metrics and flush them. Using the added API, publish metrics after each level of the DAG completed in hudi-test-suite. * Code cleanups Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2020-10-24 16:47:24 -07:00
Raymond Xu	14c4611857	[MINOR] Fix caller to SparkBulkInsertCommitActionExecutor (#2195 ) Fixed calling the wrong constructor	2020-10-21 19:50:10 -07:00
Shen Hong	49407169ac	[HUDI-1209] Properties File must be optional when running deltastreamer (#2085 )	2020-10-21 17:49:28 -07:00
Pratyaksh Sharma	e4931744eb	[HUDI-1200] fixed NPE in CustomKeyGenerator (#2093 ) - config field is no longer transient in key generator - verified that the key generator object is shipped from the driver to executors, just the one time and reused for each record	2020-10-20 23:36:25 -07:00
Ho Tien Vu	af5ef4d49d	[HUDI-1330] handle prefix filtering at directory level (#2157 ) The current DFSPathSelector only ignore prefix(_, .) at the file level while files under subdirectories e.g. (.checkpoint/*) are still considered which result in bad-format exception during reading.	2020-10-20 23:20:19 -07:00
Ho Tien Vu	fd269ddeb0	[MINOR] Make sure factory method is used to instanciate DFSPathSelector (#2187 ) * Move createSourceSelector into DFSPathSelector factory method * Replace constructor call with factory method * Added some javadoc	2020-10-20 17:52:31 +08:00
Bhavani Sudha Saktheeswaran	6490b029dd	[HUDI-1345] Remove Hbase and htrace relocation from utilities bundle (#2185 )	2020-10-19 16:11:08 -05:00
lw0090	4d80e1e221	[HUDI-284] add more test for UpdateSchemaEvolution (#2127 ) Unit test different schema evolution scenarios.	2020-10-19 07:38:04 -07:00
Guy Khazma	35d406de40	[HUDI-1344] IBM Cloud Object Storage Support (#2182 )	2020-10-18 17:24:53 +08:00
lw0090	ec6267c303	[HUDI-307] add test to check timestamp date decimal type write and read consistent (#2177 )	2020-10-18 17:18:50 +08:00
rmpifer	a44f66869f	[HUDI-1289] Remove relocation of pattern for hbase dependencies and add shading of guava in hadoop, spark, and presto bundles (#2147 ) - Update hudi-spark-bundle pom to not relocate hbase and htrace pattern - Remove codec relocation as this is not included in bundle which was causing error	2020-10-14 17:04:35 -07:00
satishkotha	7fa641ea9a	[HUDI-1302] Add support for timestamp field in HiveSync (#2129 )	2020-10-13 22:58:00 -07:00
wangxianghu	c7d962efff	[HUDI-1328] Introduce HoodieFlinkEngineContext to hudi-flink-client (#2161 )	2020-10-14 09:30:49 +08:00
lw0090	b66c3ef23a	[HUDI-1298] Add better error messages when IOException occurs during log file reading (#2133 )	2020-10-13 00:45:10 -07:00
satishkotha	0d407342ef	[HUDI-1304] Add unit test for testing compaction on replaced file groups (#2150 )	2020-10-12 16:48:29 -07:00
Raymond Xu	c5e10d668f	[HUDI-995] Migrate HoodieTestUtils APIs to HoodieTestTable (#2167 ) Remove APIs in `HoodieTestUtils` - `createCommitFiles` - `createDataFile` - `createNewLogFile` - `createCompactionRequest` Migrated usages in `TestCleaner#testPendingCompactions`. Also improved some API names in `HoodieTestTable`.	2020-10-12 14:39:10 +08:00
hj2016	c0472d3317	[HUDI-1184] Fix the support of hbase index partition path change (#1978 ) When the hbase index is used, when the record partition is changed to another partition, the path does not change according to the value of the partition column Co-authored-by: huangjing <huangjing@clinbrain.com>	2020-10-11 19:05:57 -07:00
dugenkui	b58daf29ba	[MINOR] remove unused generics type (#2163 )	2020-10-11 18:38:42 -07:00
lw0090	2126f13e13	[HUDI-791] Replace null by Option in Delta Streamer (#2171 )	2020-10-11 18:29:57 -07:00
dugenkui	032bc3b08f	[MINOR] NPE Optimization for Option (#2158 )	2020-10-11 17:55:41 -07:00
dugenkui	d4d4c8c899	[MINOR] Fix typo and others (#2164 ) * remove HoodieSerializationException that will never be throw * remove unused method, make HoodieException more readable * fix typo	2020-10-11 17:52:44 -07:00
lw0090	86db4da33c	[HUDI-1339] delete useless import in hudi-spark module (#2173 )	2020-10-11 17:10:52 -07:00
lw0090	585ce0094d	[HUDI-1301] use spark INCREMENTAL mode query hudi dataset support schema version. (#2125 )	2020-10-10 20:53:41 +08:00
vinoyang	eafd7bf289	[MINOR] Fix wrong javadoc and refactor some naming issues (#2156 )	2020-10-09 15:09:26 -07:00
dugenkui	00271af64e	[MINOR] Fix typo (#2159 ) * fix typo * fix typo	2020-10-09 14:52:55 -07:00
Raymond Xu	1d1d91d444	[HUDI-995] Migrate HoodieTestUtils APIs to HoodieTestTable (#2143 ) * [HUDI-995] Migrate HoodieTestUtils APIs to HoodieTestTable Remove APIs in `HoodieTestUtils` - listAllDataFilesAndLogFilesInPath - listAllLogFilesInPath - listAllDataFilesInPath - writeRecordsToLogFiles - createCleanFiles - createPendingCleanFiles Migrate the callers to use `HoodieTestTable` and `HoodieWriteableTestTable` with new APIs added - listAllBaseAndLogFiles - listAllLogFiles - listAllBaseFiles - withLogAppends - addClean - addInflightClean Also added related APIs in `FileCreateUtils` - createCleanFile - createRequestedCleanFile - createInflightCleanFile	2020-10-09 10:21:27 +08:00
Prashant Wason	788d236c44	[HUDI-1303] Some improvements for the HUDI Test Suite. (#2128 ) 1. Use the DAG Node's label from the yaml as its name instead of UUID names which are not descriptive when debugging issues from logs. 2. Fix CleanNode constructor which is not correctly implemented 3. When generating upsets, allows more granualar control over the number of inserts and upserts - zero or more inserts and upserts can be specified instead of always requiring both inserts and upserts. 4. Fixed generation of records of specific size - The current code was using a class variable "shouldAddMore" which was reset to false after the first record generation causing subsequent records to be of minimum size. - In this change, we pre-calculate the extra size of the complex fields. When generating records, for complex fields we read the field size from this map. 5. Refresh the timeline of the DeltaSync service before calling readFromSource. This ensures that only the newest generated data is read and data generated in the older Dag Nodes is ignored (as their AVRO files will have an older timestamp). 6. Making --workload-generator-classname an optional parameter as most probably the default will be used	2020-10-07 08:33:51 -04:00
Pratyaksh Sharma	524193eb4b	[HUDI-603]: DeltaStreamer can now fetch schema before every run in continuous mode (#1566 ) Co-authored-by: Balaji Varadarajan <balaji.varadarajan@robinhood.com>	2020-10-06 20:34:03 -07:00
rmpifer	fed01cd3c9	[MINOR] Update spark master default to yarn (#2148 )	2020-10-05 15:22:28 -07:00
lw0090	fdae388626	[HUDI-1203] add port configuration for EmbeddedTimelineService (#2142 )	2020-10-05 11:36:54 -07:00
Shen Hong	b335459c80	[HUDI-1208] Ordering Field should be optional when precombine is turned off (#2088 )	2020-10-04 11:34:21 -07:00
Pratyaksh Sharma	080ba3ed54	[HUDI-1199] relocated jetty in hudi-utilities-bundle pom (#1990 ) * [HUDI-1199]: relocated jetty in hudi-utilities-bundle pom * [HUDI-1199]: re trigger travis build	2020-10-04 11:22:01 -07:00

1 2 3 4 5 ...

1215 Commits