lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Raymond Xu	5ee676e34f	[MINOR] Move a test method to Transformations (#1934 ) - Move TestHoodieKeyLocationFetchHandle#getRecordsPerPartition to Transformations - Improve some var namings	2020-08-08 18:25:55 +08:00
cheshta2904	1072f2748a	[HUDI-1026] Removed slf4j dependency from HoodieClientTestHarness (#1928 )	2020-08-08 12:07:22 +08:00
Gary Li	4f74a84607	[HUDI-69] Support Spark Datasource for MOR table - RDD approach (#1848 ) - This PR implements Spark Datasource for MOR table in the RDD approach. - Implemented SnapshotRelation - Implemented HudiMergeOnReadRDD - Implemented separate Iterator to handle merge and unmerge record reader. - Added TestMORDataSource to verify this feature. - Clean up test file name, add tests for mixed query type tests - We can now revert the change made in DefaultSource Co-authored-by: Vinoth Chandar <vchandar@confluent.io>	2020-08-07 00:28:14 -07:00
Udit Mehrotra	ab453f2623	[HUDI-999] [RFC-12] Parallelize fetching of source data files/partitions (#1924 )	2020-08-06 23:44:57 -07:00
Prashant Wason	c21209cb58	[HUDI-1149] Added a console metrics reporter and associated unit tests.	2020-08-05 10:31:46 -07:00
Balaji Varadarajan	7a2429f5ba	[HUDI-575] Spark Streaming with async compaction support (#1752 )	2020-08-05 07:50:15 -07:00
liujianhui	d3711a2641	[HUDI-525] lack of insert info in delta_commit inflight [HUDI-525] lack of insert info in delta_commit inflight [HUDI-525] lack of insert info in delta_commit inflight [HUDI-525] lack of insert info in delta_commit inflight [HUDI-525] lack of insert info in delta_commit inflight [HUDI-525] lack of insert info in delta_commit inflight HUDI-525	2020-08-04 17:43:57 -07:00
Sivabalan Narayanan	ab11ba43e1	[REVERT] "[HUDI-1058] Make delete marker configurable (#1819 )" (#1914 ) This reverts commit `433d7d2c98`.	2020-08-04 15:20:38 -07:00
vinoth chandar	539621bd33	[HUDI-242] Support for RFC-12/Bootstrapping of external datasets to hudi (#1876 ) - [HUDI-418] Bootstrap Index Implementation using HFile with unit-test - [HUDI-421] FileSystem View Changes to support Bootstrap with unit-tests - [HUDI-424] Implement Query Side Integration for querying tables containing bootstrap file slices - [HUDI-423] Implement upsert functionality for handling updates to these bootstrap file slices - [HUDI-421] Bootstrap Write Client with tests - [HUDI-425] Added HoodieDeltaStreamer support - [HUDI-899] Add a knob to change partition-path style while performing metadata bootstrap - [HUDI-900] Metadata Bootstrap Key Generator needs to handle complex keys correctly - [HUDI-424] Simplify Record reader implementation - [HUDI-423] Implement upsert functionality for handling updates to these bootstrap file slices - [HUDI-420] Hoodie Demo working with hive and sparkSQL. Also, Hoodie CLI working with bootstrap tables Co-authored-by: Mehrotra <uditme@amazon.com> Co-authored-by: Vinoth Chandar <vinoth@apache.org> Co-authored-by: Balaji Varadarajan <varadarb@uber.com>	2020-08-03 20:19:21 -07:00
Sivabalan Narayanan	266bce12b3	[MINOR] Fixing usage of right config value for parallelism to dedup in Bulk Insert (#1905 )	2020-08-03 10:38:36 -07:00
Shen Hong	433d7d2c98	[HUDI-1058] Make delete marker configurable (#1819 )	2020-08-03 11:06:31 -04:00
Raymond Xu	10e4268792	[HUDI-995] Use Transformations, Assertions and SchemaTestUtil (#1884 ) - Consolidate transform functions for tests in Transformations.java - Consolidate assertion functions for tests in Assertions.java - Make use of SchemaTestUtil for loading schema from resource	2020-08-01 20:57:18 +08:00
Udit Mehrotra	e79fbc07fe	[HUDI-1054] Several performance fixes during finalizing writes (#1768 ) Co-authored-by: Udit Mehrotra <uditme@amazon.com>	2020-07-31 20:10:28 -07:00
Y Ethan Guo	ccd70a7e48	[HUDI-472] Introduce configurations and new modes of sorting for bulk_insert (#1149 ) * [HUDI-472] Introduce the configuration and new modes of record sorting for bulk_insert(#1149). Three sorting modes are implemented: global sort ("global_sort"), local sort inside each RDD partition ("partition_sort") and no sort ("none")	2020-07-31 09:52:42 -04:00
Sivabalan Narayanan	b2763f433b	[MINOR] Fixing default index parallelism for simple index (#1882 )	2020-07-28 08:22:09 -07:00
Raymond Xu	ca36c44cb3	[HUDI-995] Move TestRawTripPayload and HoodieTestDataGenerator to hudi-common (#1873 )	2020-07-27 19:21:45 +08:00
Shen Hong	c3279cd598	[HUDI-1082] Fix minor bug in deciding the insert buckets (#1838 )	2020-07-23 08:31:49 -04:00
Mathieu	da106803b6	[HUDI-1037] Introduce a write committed callback hook and given a default http callback implementation (#1842 )	2020-07-23 19:07:05 +08:00
zherenyu831	c39778c150	[HUDI-1113] Add user define metrics reporter (#1851 )	2020-07-23 13:46:36 +08:00
vinoth chandar	3dd189ec7d	[MINOR] Fix checkstyle issue on TestHoodieClientOnCopyOnWriteStorage (#1865 )	2020-07-22 21:54:45 -07:00
vinoth chandar	a8bd76c299	[HUDI-1029] In inline compaction mode, previously failed compactions needs to be retried before new compactions (#1857 ) - Prevents failed compactions from causing issues with future commits	2020-07-22 21:22:06 -07:00
vinoth chandar	9bd37ef291	[MINOR] Fix flaky testUpsertsUpdatePartitionPath* tests (#1863 )	2020-07-22 22:52:34 -04:00
Sivabalan Narayanan	5b6026ba43	[HUDI-802] Fixing deletes for inserts in same batch in write path (#1792 ) * Fixing deletes for inserts in same batch in write path * Fixing delta streamer tests * Adding tests for OverwriteWithLatestAvroPayload	2020-07-22 19:39:57 -07:00
Raymond Xu	5e7ab11e2e	[HUDI-994] Move TestHoodieIndex test cases to unit tests (#1850 )	2020-07-21 10:23:43 -07:00
lw0090	1ec89e9a94	[HUDI-839] Introducing support for rollbacks using marker files (#1756 ) * [HUDI-839] Introducing rollback strategy using marker files - Adds a new mechanism for rollbacks where it's based on the marker files generated during the write - Consequently, marker file/dir deletion now happens post commit, instead of during finalize - Marker files are also generated for AppendHandle, making it consistent throughout the write path - Until upgrade-downgrade mechanism can upgrade non-marker based inflight writes to marker based, this should only be turned on for new datasets. - Added marker dir deletion after successful commit/rollback, individual files are not deleted during finalize - Fail safe for deleting marker directories, now during timeline archival process - Added check to ensure completed instants are not rolled back using marker based strategy. This will be incorrect - Reworked tests to rollback inflight instants, instead of completed instants whenever necessary - Added an unit test for MarkerBasedRollbackStrategy Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2020-07-20 22:41:42 -07:00
Prashant Wason	b71f25f210	[HUDI-92] Provide reasonable names for Spark DAG stages in HUDI. (#1289 )	2020-07-19 10:29:25 -07:00
Raymond Xu	b399b4ad43	[HUDI-996] Add functional test in hudi-client (#1824 ) - Add functional test suite in hudi-client - Tag TestHBaseIndex as functional	2020-07-15 08:28:50 +08:00
Raymond Xu	f5dc8ca733	[HUDI-994] Split TestHBaseIndex to unit tests (#1818 ) - Refactor and improve TestHBaseIndex for performance - Move HBaseIndex unit tests to different test classes	2020-07-13 20:32:01 -07:00
Sivabalan Narayanan	21bb1b505a	[HUDI-1068] Fixing deletes in global bloom when update partition path is set (#1793 )	2020-07-13 22:34:07 -04:00
Raymond Xu	20ac7c3337	[HUDI-994] Make TestHBaseQPSResourceAllocator a unit test (#1820 )	2020-07-11 09:15:05 -07:00
Raymond Xu	7b2a947aed	[HUDI-1069] Remove duplicate assertNoWriteErrors() (#1797 )	2020-07-08 13:58:15 +08:00
Shen Hong	be85a6c32b	[HUDI-1004] Support update metrics in HoodieDeltaStreamerMetrics (#1732 )	2020-07-06 09:44:02 -07:00
Raymond Xu	3b9a30528b	[HUDI-996] Add functional test suite for hudi-utilities (#1746 ) - Share resources for functional tests - Add suite for functional test classes from hudi-utilities	2020-07-05 16:44:31 -07:00
baobaoyeye	2be924fd3a	[HUDI-760]Remove Rolling Stat management from Hudi Writer (#1739 )	2020-06-30 20:07:09 -07:00
Balaji Varadarajan	8919be6a5d	[HUDI-855] Run Cleaner async with writing (#1577 ) - Cleaner can now run concurrently with write operation - Configs to turn on/off Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2020-06-28 02:04:50 -07:00
Raymond Xu	31247e9b34	[HUDI-896] Report test coverage by modules & parallelize CI (#1753 ) - use codecov flags for each module to report coverage - parallelize CI jobs for shorter time - add a testcase for MetricsReporterFactory (to trigger codecov comment)	2020-06-27 23:16:12 -07:00
Prashant Wason	2603cfb33e	[HUDI-684] Introduced abstraction for writing and reading different types of base file formats. (#1687 ) Notable changes: 1. HoodieFileWriter and HoodieFileReader abstractions for writer/reader side of a base file format 2. HoodieDataBlock abstraction for creation specific data blocks for base file formats. (e.g. Parquet has HoodieAvroDataBlock) 3. All hardocded references to Parquet / Parquet based classes have been abstracted to call methods which accept a base file format 4. HiveSyncTool accepts the base file format as a CLI parameter 5. HoodieDeltaStreamer accepts the base file format as a CLI parameter 6. HoodieSparkSqlWriter accepts the base file format as a parameter	2020-06-25 23:46:55 -07:00
wangxianghu	5e47673341	[HUDI-1035] Remove unused class KeyLookupResult (#1754 )	2020-06-23 17:01:03 -07:00
Shen Hong	89e37d5273	[HUDI-908] Add some data types to HoodieTestDataGenerator and fix some some bugs. (#1690 )	2020-06-22 08:13:28 -07:00
wangxianghu	68a656b016	[HUDI-1032] Remove unused code in HoodieCopyOnWriteTable and code clean (#1750 )	2020-06-21 07:34:47 -07:00
Raymond Xu	8a9fdd603e	[HUDI-1023] Add validation error messages in delta sync (#1710 ) - Remove explicitly specifying BLOOM_INDEX since thats the default anyway	2020-06-19 12:12:35 -07:00
Satish Kotha	a7fd331624	Add unit test for snapshot reads in hadoop-mr	2020-06-13 10:23:05 -07:00
sathyaprakashg	df2e0c760e	HUDI-942 Increase default value number of delta commits for inline compaction (#1664 ) Co-authored-by: Sathyaprakash Govindasamy <sathyaprakashg@zillowgroup.com>	2020-06-10 16:16:44 -07:00
Gary Li	37838cea60	[HUDI-822] decouple Hudi related logics from HoodieInputFormat (#1592 ) - Refactoring business logic out of InputFormat into Utils helpers.	2020-06-09 06:10:16 -07:00
shenhong	3387b3841f	[HUDI-1005] fix NPE in HoodieWriteClient.clean	2020-06-09 05:57:04 -07:00
Shen Hong	6318e943d1	[HUDI-1016] Code optimization in MergeOnReadRollbackActionExecutor(#1718 )	2020-06-09 19:14:26 +08:00
garyli1019	22cd824d99	HUDI-494 fix incorrect record size estimation	2020-06-08 20:29:29 -07:00
garyli1019	e9cab67b80	[HUDI-988] Fix More Unit Test Flakiness	2020-06-07 23:14:46 -07:00
Balaji Varadarajan	fb283934a3	[HUDI-990] Timeline API : filterCompletedAndCompactionInstants needs to handle requested state correctly. Also ensure timeline gets reloaded after we revert committed transactions	2020-06-04 02:52:21 -07:00
Balaji Varadarajan	a68180b179	[HUDI-988] Fix Unit Test Flakiness : Ensure all instantiations of HoodieWriteClient is closed properly. Fix bug in TestRollbacks. Make CLI unit tests for Hudi CLI check skip redering strings	2020-06-04 02:52:21 -07:00

... 9 10 11 12 13 ...

731 Commits