董可伦
017ddbbfac
[MINOR] Fix typos ( #4567 )
2022-01-11 23:17:10 -08:00
Town
4b0111974f
[HUDI-3184] hudi-flink support timestamp-micros ( #4548 )
...
* support both avro and parquet code path
* string rowdata conversion is also supported
2022-01-12 10:53:51 +08:00
Pratyaksh Sharma
a392e9ba46
[HUDI-485] Corrected the check for incremental sql ( #2768 )
...
* [HUDI-485]: corrected the check for incremental sql
* [HUDI-485]: added tests
* code review comments addressed
* [HUDI-485]: added happy flow test case
2022-01-12 08:22:07 +05:30
Alexey Kudinkin
6cdcd89afa
[HUDI-3094] Unify Hive's InputFormat implementations to avoid duplication ( #4417 )
2022-01-11 15:02:13 -08:00
xuzifu666
4b2fd37fb4
[MINOR] Remove unused static var in HoodieAvroWriteSupport ( #4543 )
2022-01-11 11:53:45 -08:00
Todd Gao
c9bc626299
[HUDI-3211] Claim RFC number for RFC for Hudi Connector for Presto ( #4562 )
2022-01-11 14:08:27 +05:30
Raymond Xu
f74cd57320
[HUDI-3195] Fix spark 3 pom ( #4554 )
...
- drop 3.0.x profile
- update readme
- update build CI bot.yml
- fix spark 3 bundle name
2022-01-10 19:11:22 -08:00
Sivabalan Narayanan
67ad4992e1
Removing extraneous warn logs in ClusteringUtils ( #4553 )
2022-01-11 08:20:14 +05:30
Alexey Kudinkin
f1e3762a94
[HUDI-2950] Addressing performance traps in Bulk Insert/Layout Optimization ( #4234 )
...
* Cleaned up Z-curve/Hilbert ordering seqs:
- Streamlined flow
- Removed unnecessary operations (double-mapping, boxing, etc)
Updated `CollectionUtils::combine` to avoid AL resizing
* Tidying up
* Reducing small objects churn due to Scala/Java conversions by re-using `RowFactory`, passing `Object[]`
* Fixing name resolution (disambiguation overloads)
* `lint`
* Replaced `OverwriteAvroPayloadRecord` w/ `RewriteRecordPayload` to avoid unnecessary Avro ser/de loop
* Added `PathCachingFileName` to avoid fetching substrings every time file-name is fetched;
Inject `PathCachingFileName` into `HoodieWrapperFileSystem.convertPathWithScheme`
* Drastically reducing size of the `ArrayDeque` allocated by `ObjectSizeCalculator`
* XXX
* Missing license
* Fixed refs (after rebase)
* Fixing compilation failure in Scala 2.11
* `PathCachingFileName` > `FileNameCachingPath`
* Tidying up
2022-01-10 18:23:22 -08:00
t0il3ts0ap
c8df9b09d7
[HUDI-3148] Create pushgateway client based on port ( #4497 )
...
Co-authored-by: anoop narang <anoop.narang@navi.com >
Co-authored-by: sivabalan narayanan <n.siva.b@gmail.com >
2022-01-10 18:09:47 -05:00
Y Ethan Guo
f230eca9b5
[MINOR] Fix port number in setupKafka.sh ( #4546 )
2022-01-10 16:07:52 -05:00
Sivabalan Narayanan
7a8b94c82d
[HUDI-3180] Include files from completed commits while bootstrapping metadata table ( #4519 )
2022-01-10 15:33:15 -05:00
Y Ethan Guo
bc95571caa
[HUDI-2735] Allow empty commits in Kafka Connect Sink for Hudi ( #4544 )
2022-01-10 15:31:25 -05:00
Manoj Govindassamy
251d4eb3b6
[HUDI-3030] InProcessLockPovider as default when any async servcies enabled with no lock provider override ( #4406 )
...
* [HUDI-3030] InProcessLockPovider as default when any async servcies enabled with no lock provider override
- Making InProcessLockProvider as the default lock provider when
any async services are enabled and when no lock provider is
explicitly set.
- This is the workaround for metadata table updates racing with
async table serice operations
* [HUDI-3030] InProcessLockPovider as default when any async servcies enabled with no lock provider override
- Renaming isAnyTableServicesInline/Async() to areAnyTableServicesInline/Async()
* [HUDI-3030] InProcessLockPovider as default when any async servcies enabled with no lock provider override
- Additionally checking for write config properties when verifying
the lock provider override. Updated the unit test for this case.
2022-01-10 08:40:24 +05:30
Sivabalan Narayanan
56f93f4ebd
Removing rollbacks instants from timeline for restore operation ( #4518 )
2022-01-10 07:44:28 +05:30
Thinking Chen
e9a7f49f55
[HUDI-3112] Fix KafkaConnect cannot sync to Hive Problem ( #4458 )
2022-01-09 15:31:57 -08:00
Sivabalan Narayanan
604d9885f1
[HUDI-3009] making some fixes to S3 incremental source ( #4517 )
2022-01-09 12:46:52 -05:00
RexAn
977d3c6dad
[HUDI-3157] Remove aws jars from hudi bundles ( #4542 )
...
Co-authored-by: Hui An <hui.an@shopee.com >
2022-01-09 02:23:46 -08:00
YueZhang
cf362fb2d5
[MINOR] Fix some code style issues based on check-style plugin ( #4532 )
...
Co-authored-by: yuezhang <yuezhang@freewheel.tv >
2022-01-09 01:14:56 -08:00
Yann Byron
36790709f7
[HUDI-3125] spark-sql write timestamp directly ( #4471 )
2022-01-08 23:43:25 -08:00
Thinking Chen
0d8ca8da4e
[HUDI-3104] Kafka-connect support of hadoop config environments and properties ( #4451 )
2022-01-08 23:10:17 -08:00
Sivabalan Narayanan
98ec215079
[HUDI-3178] Fixing metadata table compaction so as to not include uncommitted data ( #4530 )
...
- There is a chance that the actual write eventually failed in data table but commit was successful in Metadata table, and if compaction was triggered in MDT, compaction could have included the uncommitted data. But once compacted, it may never be ignored while reading from metadata table. So, this patch fixes the bug. Metadata table compaction is triggered before applying the commit to metadata table to circumvent this issue.
2022-01-08 10:34:47 -05:00
Sagar Sumit
46bb00e4df
[HUDI-3139] Shade htrace and parquet-avro in presto bundle ( #4495 )
...
Filter out unnecessary classes
2022-01-08 10:29:36 -05:00
Sagar Sumit
827549949c
[HUDI-2909] Handle logical type in TimestampBasedKeyGenerator ( #4203 )
...
* [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator
Timestampbased key generator was returning diff values for row writer and non row writer path. this patch fixes it and is guarded by a config flag (`hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled`)
2022-01-08 10:22:44 -05:00
Yann Byron
03a83ffeb5
[HUDI-3195] optimize spark3 pom and modify build command ( #4538 )
2022-01-07 23:21:39 -08:00
董可伦
4f6cdd73a3
[HUDI-3192] Spark metastore schema evolution broken ( #4533 )
2022-01-08 10:48:37 +08:00
Sagar Sumit
518488c633
[HUDI-3185] HoodieConfig#getBoolean should return false when default not set ( #4536 )
...
Remove unnecessary config
2022-01-07 16:20:11 -05:00
Sivabalan Narayanan
2e561defe9
[HUDI-2947] Fixing checkpoint fetch in detlastreamer ( #4485 )
...
* Fixing checkpoint fetch in detlastreamer
* Addressing comments
2022-01-07 22:08:58 +05:30
董可伦
b1df60672b
[MINOR] fix typos in DDLExecutor ( #4534 )
2022-01-07 07:59:55 -05:00
Y Ethan Guo
76a72641f1
[HUDI-3188] Update quick start guide for Kafka Connect Sink for Hudi ( #4527 )
2022-01-07 07:56:08 -05:00
Raymond Xu
2467c137e4
[HUDI-3100] Add config for hive conditional sync ( #4440 )
2022-01-06 23:26:35 -08:00
YueZhang
b2b23f5d3a
[HUDI-3183] Wrong result of HoodieArchivedTimeline loadInstants with TimeRangeFilter ( #4521 )
...
Co-authored-by: yuezhang <yuezhang@freewheel.tv >
2022-01-06 21:16:29 -05:00
Thinking Chen
d7afc58d0c
[HUDI-3118] Add default HUDI_DIR in setupKafka.sh ( #4460 )
2022-01-06 15:46:51 -08:00
xuzifu666
f0c2912d35
[MINOR] Remove unused methods in HoodieColumnProjectionUtils ( #4408 )
2022-01-06 15:36:13 -08:00
Sivabalan Narayanan
8718c30324
[HUDI-3165] Enabling InProcessLockProvider for all multi-writer tests instead of FileSystemBasedLockProviderTestClass ( #4427 )
2022-01-06 13:04:10 -05:00
Sivabalan Narayanan
2954027b92
[HUDI-52] Enabling savepoint and restore for MOR table ( #4507 )
...
* Enabling restore for MOR table
* Fixing savepoint for compaction commits in MOR
2022-01-06 21:26:08 +05:30
Sivabalan Narayanan
b6891d253f
[HUDI-44] Adding support to preserve commit metadata for compaction ( #4428 )
2022-01-06 20:27:37 +05:30
hehexiaoduantui
50fa5a6aa7
Update HiveIncrementalPuller to configure filesystem ( #4431 )
...
* Update HiveIncrementalPuller.java
fix get FileSystem bug
* Update HiveIncrementalPuller.java
fix error
* Update HiveIncrementalPuller.java
fie error
2022-01-06 13:19:30 +05:30
fengli
205e48f53f
[HUDI-3132] Minor fixes for HoodieCatalog
...
close apache/hudi#4486
2022-01-06 11:17:23 +08:00
Vinish Reddy
eee715b3ff
[HUDI-3168] Fixing null schema with empty commit in incremental relation ( #4513 )
2022-01-05 11:43:10 -05:00
Sagar Sumit
75133f9942
[HUDI-3170] Do not preserve filename when preserveCommitMetadata enabled ( #4512 )
2022-01-05 08:09:58 -05:00
Danny Chan
0e297c0c4c
[HUDI-3171] Sync empty table to hive metastore ( #4511 )
2022-01-05 16:41:33 +08:00
Sivabalan Narayanan
a66212d204
[HUDI-2966] Closing LogRecordScanner in compactor ( #4478 )
...
* Closing LogRecordScanner in compactor
* Addressing comments
2022-01-05 10:57:18 +08:00
Nicolas Paris
37b15ff458
[HUDI-3147] Add endpoint_url to dynamodb lock provider ( #4500 )
...
Co-authored-by: Nicolas Paris <nicolas.paris@adevinta.com >
2022-01-04 16:42:28 -05:00
Manoj Govindassamy
bf4e3d63e7
[HUDI-3141] Metadata merged log record reader - avoiding NullPointerException when records by keys ( #4505 )
...
- HoodieMetadataMergedLogRecordReader#getRecordsByKeys() and its parent class methods
are not thread safe. When multiple queries come in for gettting log records
by keys, they all operate on the same log record reader instance provided by
HoodieBackedTableMetadata#openReadersIfNeeded() and they trip over each other
as they clear/put/get the same class memeber records.
- The fix is to streamline the mutatation to class member records. Making
HoodieMetadataMergedLogRecordReader#getRecordsByKeys() a synchronized method
to avoid concurrent log records readers getting into NPE.
2022-01-04 16:41:33 -05:00
Sagar Sumit
aaf5727495
[HUDI-2774] Handle duplicate instants when fetching pending clustering plans ( #4118 )
2022-01-04 16:32:05 -05:00
Sivabalan Narayanan
7329d229d5
Adding tests to validate different key generators ( #4473 )
2022-01-04 10:48:04 +05:30
leesf
29ab6fb9ad
[HUDI-3140] Fix bulk_insert failure on Spark 3.2.0 ( #4498 )
2022-01-04 09:59:59 +08:00
harshal
2b2ae34cb9
[HUDI-2558] Fixing Clustering w/ sort columns with null values fails ( #4404 )
2022-01-03 12:19:43 +05:30
Raymond Xu
0273f2e65d
[MINOR] Update README.md ( #4492 )
...
Update Spark 3 build instructions
2022-01-02 20:34:37 -08:00