lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Ron Barabash	1b61eb45e0	Adding support for optional skipping single archiving failures	2019-06-20 22:54:45 -07:00
Balaji Varadarajan	66c7fa2322	Reword confusing message and reducing the severity level	2019-06-20 22:46:09 -07:00
Balaji Varadarajan	2c40e8419e	Ensure TableMetaClient and FileSystem instances have exclusive copy of Configuration	2019-06-20 14:05:00 -07:00
Balaji Varadarajan	a0d7ab2384	HUDI-70 : Making DeltaStreamer run in continuous mode with concurrent compaction	2019-06-18 17:48:14 -07:00
Balaji Varadarajan	a1483f2c5f	HUDI-148 Small File selection logic for MOR must skip fileIds selected for pending compaction correctly	2019-06-17 18:35:17 -07:00
Nishith Agarwal	8e08d498c9	Reading baseCommitTime from the latest file slice as opposed to the tagged record value	2019-06-17 16:46:16 -07:00
Balaji Varadarajan	cd7623e216	All Opened hoodie clients in tests needs to be closed TestMergeOnReadTable must use embedded timeline server	2019-06-13 12:30:07 -07:00
Balaji Varadarajan	136f8478a3	TestMergeOnReadTable must use embedded timeline server	2019-06-12 19:08:09 -07:00
Balaji Varadarajan	04fc86b43d	Turn on embedded server for all client tests	2019-06-12 18:14:55 -07:00
Vinoth Chandar	b791473a6d	Introduce HoodieReadHandle abstraction into index - Generalized BloomIndex to work with file ids instead of paths - Abstracted away Bloom filter checking into HoodieLookupHandle - Abstracted away range information retrieval into HoodieRangeInfoHandle	2019-06-12 10:46:14 -07:00
Balaji Varadarajan	065173211e	HUDI-147 Compaction Inflight Rollback not deleting Marker directory	2019-06-09 11:45:54 -07:00
Balaji Varadarajan	479908fd20	HUDI-125 : Change License for all source files and update RAT configurations	2019-06-09 11:41:55 -07:00
Balaji Varadarajan	30b0f2636f	Changes related to Licensing work 1. Go through dependencies list one round to ensure compliance. Generated current NOTICE list in all submodules (other apache projects like flink does this). To be on conservative side regarding licensing, NOTICE.txt lists all dependencies including transitive. Pending Compliance questions reported in https://issues.apache.org/jira/browse/LEGAL-461 2. Automate generating NOTICE.txt files to allow future package compliance issues be identified early as part of code-review process. 3. Added NOTICE.txt and LICENSE.txt to all HUDI jars	2019-06-07 17:58:57 -07:00
guanjianhui	6b5abb5d92	fix maven pom	2019-05-29 16:16:29 -07:00
Balaji Varadarajan	d860fb18b6	HUDI-139 Compaction running twice due to duplicate "map" transformation while finalizing compaction	2019-05-29 15:12:30 -07:00
vinothchandar	66c0b81b49	[maven-release-plugin] prepare for next development iteration	2019-05-28 19:17:26 -07:00
vinothchandar	227785c022	[maven-release-plugin] prepare release hoodie-0.4.7	2019-05-28 19:17:15 -07:00
Balaji Varadarajan	33f5208c1e	Only inflight commit timeline (.commit/.deltacommit) must be used when checking for sanity during compaction scheduling	2019-05-28 16:54:20 -07:00
Balaji Varadarajan	d0d2fa0337	Reduce logging in unit-test runs	2019-05-24 23:43:54 -07:00
Venkat	f2d91a455e	default implementation for HBase index qps allocator (#685 ) * default implementation and configs for HBase index qps allocator * Test for QPS allocator and address CR * fix QPS allocator test	2019-05-24 18:43:46 -07:00
Balaji Varadarajan	99b0c72aa6	HUDI-131 Zero FIle Listing in Compactor run	2019-05-24 18:34:14 -07:00
Vinoth Chandar	4074c5eb23	Fixed HUDI-116 : Handle duplicate record keys across partitions - Join based on HoodieKey and not RecordKey during tagging - Unit tests changed to run with duplicate keys - Special casing GlobalBloom to still join by recordkey	2019-05-24 18:32:49 -07:00
Balaji Varadarajan	145034c5fa	Spark Stage retry handling	2019-05-21 14:49:51 -07:00
David Muto (pseudomuto)	3fd2fd6e9d	Remove redundant string from file comp rdd	2019-05-21 13:07:32 -07:00
Balaji Varadarajan	64fec64097	Timeline Service with Incremental View Syncing support	2019-05-16 13:25:33 -07:00
vinothchandar	446f99aa0f	[maven-release-plugin] prepare for next development iteration	2019-05-14 07:29:22 -07:00
vinothchandar	cc38abecc8	[maven-release-plugin] prepare release hoodie-0.4.6	2019-05-14 07:29:11 -07:00
Balaji Varadarajan	9cce9abf4d	Fix various errors found by long running delta-streamer tests 1. Parquet Avro schema mismatch errors when ingesting are sometimes silently ignored due to race-condition in BoundedInMemoryExecutor. This was reproducible when running long-running delta-streamer with wrong schema and it caused data-loss 2. Fix behavior of Delta-Streamer to error out by default if there are any error records 3. Fix a bug in tracking write errors in WriteStats. Earlier the write errors were tracking sampled errors as opposed to total errors. 4. Delta Streamer does not commit the changes done as part of inline compaction as auto-commit is force disabled. Fix this behavior to always auto-commit inline compaction as it would not otherwise commit.	2019-05-13 10:47:34 -07:00
Vinoth Chandar	a0e62b7919	Bucketized Bloom Filter checking - Tackles the skew seen in sort based partitioning/checking - Parameterized the HoodieBloomIndex test - Config to turn on/off (on by default) - Unit tests & also tested at scale	2019-05-11 16:38:28 -07:00
Vinoth Chandar	ea20d47248	Introduce config to control interval tree pruning - turned on by default - Minor code refactoring/restructuring	2019-04-29 11:38:23 -07:00
Sivabalan Narayanan	7129dc5bb7	Improving Tag location using interval trees for index files Adding interface for index look up Adding index filtering implementations for global bloom index too	2019-04-29 11:38:23 -07:00
Nishith Agarwal	a8feee9293	Performing commit archiving in batches to avoid keeping a huge chunk in memory	2019-04-10 15:17:04 -07:00
Nishith Agarwal	2577014617	1. Minor changes to fix compaction 2. Adding 2 compaction policies	2019-04-03 17:38:17 -07:00
Vinoth Chandar	b34a204a52	Fixing small file handling, inline compaction defaults - Small file limit is now 100MB by default - Turned on inline compaction by default for MOR - Changes take effect on DataSource and DeltaStreamer	2019-04-03 10:56:10 -07:00
Vinoth Chandar	51f4908989	Follow up HUDI-27 : Call super.close() in HoodieWraperFileSystem::close()	2019-04-02 21:31:41 -07:00
Vinoth Chandar	5847f0c934	Fix HUDI-27 : Support num_cores > 1 for writing through spark - Users using spark.executor.cores > 1 used to fail due to "FileSystem closed" - This is due to HoodieWrapperFileSystem closing the wrapped filesytem obj - FileSystem.getInternal caching code races threads and closes the extra fs instance(s) - Bumped up num cores in tests to 8, speeds up tests by 3-4 mins	2019-03-28 15:56:21 -07:00
Vinoth Chandar	f1410bfdcd	Fixes HUDI-38: Reduce memory overhead of WriteStatus - For implicit indexes (e.g BloomIndex), don't buffer up written records - By default, only collect 10% of failing records to avoid OOMs - Improves debuggability via above, since data errors can now show up in collect() - Unit tests & fixing subclasses & adjusting tests	2019-03-28 10:32:59 -07:00
Vinoth Chandar	e56c1612e4	Fixed HUDI-87 : Remove schemastr from BaseAvroPayload	2019-03-27 23:03:25 -07:00
Vinoth Chandar	372fbc4733	Fixes HUDI-9 : Check precondition minInstantsToKeep > cleanerCommitsRetained - Added a precondition check, otherwise incr pull could miss commits - Lowered default cleaner retention to 10, to enable simpler understanding for newbies - Bumped down min/max instants to retain as well	2019-03-27 11:02:17 -07:00
Nishith Agarwal	3d9041e216	Fixing source schema and writer schema distinction in payloads	2019-03-26 19:44:27 -07:00
ambition119	395806fc68	[HUDI-63] Removed unused BucketedIndex code	2019-03-26 10:12:47 -07:00
Nishith Agarwal	9e59da7fd9	Refactor HoodieTable Rollback to write one rollback instant for a batch of commits to rollback	2019-03-19 10:10:16 -07:00
Nishith Agarwal	0dd4a90b03	Enable multi/nested rollbacks for MOR table type	2019-03-19 10:10:16 -07:00
kaka11chen	48797b1ae1	Add compression codec configurations for HoodieParquetWriter.	2019-03-18 07:48:20 -07:00
Omkar Joshi	4a8bec7ea5	Handling duplicate record update for single partition (duplicates in single or different parquet files)	2019-03-10 20:15:17 -07:00
Balaji Varadarajan	3ae6cb4ed5	FileSystem View must treat same fileIds present in different partitions as different file-groups and handle pending compaction correctly	2019-03-01 10:49:04 -08:00
vinothchandar	687395e40f	[maven-release-plugin] prepare for next development iteration	2019-02-27 07:16:27 -08:00
vinothchandar	bbf40ef987	[maven-release-plugin] prepare release hoodie-0.4.5	2019-02-27 07:16:15 -08:00
Balaji Varadarajan	8adaca3454	Table rollback for inflight compactions MUST not delete instant files at any time to avoid race conditions	2019-02-11 18:30:21 -08:00
Balaji Varadarajan	3a0044216c	New Features in DeltaStreamer : (1) Apply transformation when using delta-streamer to ingest data. (2) Add Hudi Incremental Source for Delta Streamer (3) Allow delta-streamer config-property to be passed as command-line (4) Add Hive Integration to Delta-Streamer and address Review comments (5) Ensure MultiPartKeysValueExtractor handle hive style partition description (6) Reuse same spark session on both source and transformer (7) Support extracting partition fields from _hoodie_partition_path for HoodieIncrSource (8) Reuse Binary Avro coders (9) Add push down filter for Incremental source (10) Add Hoodie DeltaStreamer metrics to track total time taken	2019-02-11 18:22:05 -08:00

1 2 3 4 5 ...

268 Commits