1
0
Commit Graph

162 Commits

Author SHA1 Message Date
喻兆靖
c20db99a7b [HUDI-2207] Support independent flink hudi clustering function 2022-05-24 20:16:48 +08:00
Danny Chan
676d5cefe0 [HUDI-4138] Fix the concurrency modification of hoodie table config for flink (#5660)
* Remove the metadata cleaning strategy for flink, that means the multi-modal index may be affected
* Improve the HoodieTable#clearMetadataTablePartitionsConfig to only update table config when necessary
* Remove the modification of read code path in HoodieTableConfig
2022-05-24 13:07:55 +08:00
Heap
47b764ec33 [HUDI-4134] Fix Method naming consistency issues in FSUtils (#5655) 2022-05-23 15:28:48 -07:00
Danny Chan
c7576f7613 [HUDI-4130] Remove the upgrade/downgrade for flink #initTable (#5642) 2022-05-20 21:31:23 +08:00
Danny Chan
ebbe56e862 [minor] Some code refactoring for LogFileComparator and Instant instantiation (#5600) 2022-05-18 09:30:09 +08:00
BruceLin
99555c897a [HUDI-4110] Clean the marker files for flink compaction (#5604) 2022-05-17 21:09:27 +08:00
Danny Chan
43e08193ef [HUDI-4098] Metadata table heartbeat for instant has expired, last heartbeat 0 (#5583) 2022-05-16 17:40:08 +08:00
wqwl611
52e63b39d6 [HUDI-4097] add table info to jobStatus (#5529)
Co-authored-by: wqwl611 <wqwl611@gmail.com>
2022-05-13 21:01:15 -04:00
guanziyue
abb4893b25 [HUDI-2875] Make HoodieParquetWriter Thread safe and memory executor exit gracefully (#4264) 2022-05-05 13:49:34 -07:00
董可伦
b8e465fdfc [MINOR] Fix typos in log4j-surefire.properties (#5212) 2022-04-15 13:33:37 -07:00
Danny Chan
e33149be9a [HUDI-3808] Flink bulk_insert timestamp(3) can not be read by Spark (#5236) 2022-04-07 15:17:39 +08:00
Sagar Sumit
898be6174a [HUDI-3782] Fixing table config when any of the index is disabled (#5222) 2022-04-05 23:06:52 -04:00
Prashant Wason
b28f0d6ceb [HUDI-3290] Different file formats for the partition metadata file. (#5179)
* [HUDI-3290] Different file formats for the partition metadata file.

Partition metadata files are stored in each partition to help identify the base path of a table. These files are saved in the properties file format. Some query engines do not work when non Parquet/ORC files are found in a partition.

Added a new table config 'hoodie.partition.metafile.use.data.format' which when enabled (default false for backward compatibility) ensures that partition metafiles will be saved in the same format as the base files of a dataset.

For new datasets, the config can be set via hudi-cli. Deltastreamer has a new parameter --partition-metafile-use-data-format which will create a table with this setting.

* Code review comments

- Adding a new command to migrate from text to base file formats for meta file.
- Reimplementing readFromFS() to first read the text format, then base format
- Avoid extra exists() checks in readFromFS()
- Added unit tests, enabled parquet format across hoodie-hadoop-mr
- Code cleanup, restructuring, naming consistency.

* Wiring in all the other Spark code paths to respect this config

 - Turned on parquet meta format for COW data source tests
 - Removed the deltastreamer command line to keep it shorter

* populate HoodiePartitionMetadata#format after readFromFS()

Co-authored-by: Vinoth Chandar <vinoth@apache.org>
Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>
2022-04-04 08:08:20 -07:00
YueZhang
020786a5f9 [HUDI-3451] Delete metadata table when the write client disables MDT (#5186)
* Add checks for metadata table init to avoid possible out-of-sync

* Revise the logic to reuse existing table config

* Revise docs and naming

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com>
2022-04-02 16:31:06 +05:30
Danny Chan
6df14f15a3 [HUDI-2752] The MOR DELETE block breaks the event time sequence of CDC (#4880) 2022-04-01 20:46:51 +08:00
Sagar Sumit
a048e940fd [HUDI-3743] Support DELETE_PARTITION for metadata table (#5169)
In order to drop any metadata partition (index), we can reuse the DELETE_PARTITION operation in metadata table. Subsequent to this, we can support drop index (with table config update) for async metadata indexer.

- Add a new API in HoodieTableMetadataWriter
- Current only supported for Spark metadata writer
2022-03-31 21:29:17 -04:00
Sagar Sumit
28dafa774e [HUDI-2488][HUDI-3175] Implement async metadata indexing (#4693)
- Add a new action called INDEX, whose state transition is described in the RFC.
- Changes in timeline to support the new action.
- Add an index planner in ScheduleIndexActionExecutor.
- Add index plan executor in RunIndexActionExecutor.
- Add 3 APIs in HoodieTableMetadataWriter; a) scheduleIndex: will generate an index plan based on latest completed instant, initialize file groups and add a requested INDEX instant, b) index: executes the index plan and also takes care of writes that happened after indexing was requested, c) dropIndex: will drop index by removing the given metadata partition.
- Add 2 new table configs to serve as the source of truth for inflight and completed indexes.
- Support upgrade/downgrade taking care of the newly added configs.
- Add tool to trigger indexing in HoodieIndexer.
- Handle corner cases related to partial failures.
- Abort gracefully after deleting partition and instant.
- Handle other actions in timeline to consider before catching up
2022-04-01 01:33:12 +05:30
Sivabalan Narayanan
4569734d60 [HUDI-3713] Guarding archival for multi-writer (#5138) 2022-03-31 01:44:31 -04:00
YueZhang
2dbb273d26 [HUDI-3721] Delete MDT if necessary when trigger rollback to savepoint (#5173)
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2022-03-30 20:26:37 -07:00
Danny Chan
b9fbada2f2 [minor] Follow 3178, fix the flink metadata table compaction (#5175) 2022-03-30 20:45:29 +08:00
Alexey Kudinkin
e5a2baeed0 [HUDI-3549] Removing dependency on "spark-avro" (#4955)
Hudi will be taking on promise for it bundles to stay compatible with Spark minor versions (for ex 2.4, 3.1, 3.2): meaning that single build of Hudi (for ex "hudi-spark3.2-bundle") will be compatible with ALL patch versions in that minor branch (in that case 3.2.1, 3.2.0, etc)

To achieve that we'll have to remove (and ban) "spark-avro" as a dependency, which on a few occasions was the root-cause of incompatibility b/w consecutive Spark patch versions (most recently 3.2.1 and 3.2.0, due to this PR).

Instead of bundling "spark-avro" as dependency, we will be copying over some of the classes Hudi depends on and maintain them along the Hudi code-base to make sure we're able to provide for the aforementioned guarantee. To workaround arising compatibility issues we will be applying local patches to guarantee compatibility of Hudi bundles w/in the Spark minor version branches.

Following Hudi modules to Spark minor branches is currently maintained:

"hudi-spark3" -> 3.2.x
"hudi-spark3.1.x" -> 3.1.x
"hudi-spark2" -> 2.4.x
Following classes hierarchies (borrowed from "spark-avro") are maintained w/in these Spark-specific modules to guarantee compatibility with respective minor version branches:

AvroSerializer
AvroDeserializer
AvroUtils
Each of these classes has been correspondingly copied from Spark 3.2.1 (for 3.2.x branch), 3.1.2 (for 3.1.x branch), 2.4.4 (for 2.4.x branch) into their respective modules.

SchemaConverters class in turn is shared across all those modules given its relative stability (there're only cosmetical changes from 2.4.4 to 3.2.1).
All of the aforementioned classes have their corresponding scope of visibility limited to corresponding packages (org.apache.spark.sql.avro, org.apache.spark.sql) to make sure broader code-base does not become dependent on them and instead relies on facades abstracting them.

Additionally, given that Hudi plans on supporting all the patch versions of Spark w/in aforementioned minor versions branches of Spark, additional build steps were added to validate that Hudi could be properly compiled against those versions. Testing, however, is performed against the most recent patch versions of Spark with the help of Azure CI.

Brief change log:
- Removing spark-avro bundling from Hudi by default
- Scaffolded Spark 3.2.x hierarchy
- Bootstrapped Spark 3.1.x Avro serializer/deserializer hierarchy
- Bootstrapped Spark 2.4.x Avro serializer/deserializer hierarchy
- Moved ExpressionCodeGen,ExpressionPayload into hudi-spark module
- Fixed AvroDeserializer to stay compatible w/ both Spark 3.2.1 and 3.2.0
- Modified bot.yml to build full matrix of support Spark versions
- Removed "spark-avro" dependency from all modules
- Fixed relocation of spark-avro classes in bundles to assist in running integ-tests.
2022-03-29 14:44:47 -04:00
Y Ethan Guo
9b3dd2e0b7 [HUDI-3624] Check all instants before starting a commit in metadata table (#5098) 2022-03-24 17:13:58 -07:00
wxp4532
26e5d2e6fc [HUDI-3559] Flink bucket index with COW table throws NoSuchElementException
Actually method FlinkWriteHelper#deduplicateRecords does not guarantee the records sequence, but there is a
implicit constraint: all the records in one bucket should have the same bucket type(instant time here),
the BucketStreamWriteFunction breaks the rule and fails to comply with this constraint.

close apache/hudi#5018
2022-03-21 17:34:54 +08:00
Raymond Xu
7446ff95a7 [HUDI-2439] Replace RDD with HoodieData in HoodieSparkTable and commit executors (#4856)
- Adopt HoodieData in Spark action commit executors
- Make Spark independent DeleteHelper, WriteHelper, MergeHelper in hudi-client-common
- Make HoodieTable in WriteClient APIs have raw type to decouple with Client's generic types
2022-03-17 04:17:56 -07:00
Sagar Sumit
575bc63468 [HUDI-3356][HUDI-3203] HoodieData for metadata index records; BloomFilter construction from index based on the type param (#4848)
Rework of #4761 
This diff introduces following changes:

- Write stats are converted to metadata index records during the commit. Making them use the HoodieData type so that the record generation scales up with needs. 
- Metadata index init support for bloom filter and column stats partitions.
- When building the BloomFilter from the index records, using the type param stored in the payload instead of hardcoded type.
- Delta writes can change column ranges and the column stats index need to be properly updated with new ranges to be consistent with the table dataset. This fix add column stats index update support for the delta writes.

Co-authored-by: Manoj Govindassamy <manoj.govindassamy@gmail.com>
2022-03-08 10:39:04 -05:00
Alexey Kudinkin
a66fd40692 [HUDI-3365] Make sure Metadata Table records are updated appropriately on HDFS (#4739)
- This change makes sure MT records are updated appropriately on HDFS: previously after Log File append operations MT records were updated w/ just the size of the deltas being appended to the original files, which have been found to be the cause of issues in case of Rollbacks that were instead updating MT with records bearing the full file-size.

- To make sure that we hedge against similar issues going f/w, this PR alleviates this discrepancy and streamlines the flow of MT table always ingesting records bearing full file-sizes.
2022-03-07 15:38:27 -05:00
Raymond Xu
c77b2591d0 [HUDI-2439] Remove SparkBoundedInMemoryExecutor (#4860) 2022-02-26 08:02:12 -05:00
YueZhang
359fbfde79 [HUDI-2648] Retry FileSystem action instead of failed directly. (#3887)
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2022-02-20 15:31:31 -05:00
Raymond Xu
27bd7b538e [HUDI-1576] Make archiving an async service (#4795) 2022-02-14 21:15:06 -05:00
YueZhang
76e2faa28d [HUDI-3370] The files recorded in the commit may not match the actual ones for MOR Compaction (#4753)
* use HoodieCommitMetadata to replace writeStatuses computation

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2022-02-14 11:12:52 +08:00
Sivabalan Narayanan
e7ec3a82dc [HUDI-2432] Adding restore.requested instant and restore plan for restore action (#4605)
- This adds a restore plan and serializes it to restore.requested meta file in timeline. This also means that we are introducing schedule and execution phases for restore which was not present before.
2022-02-10 08:06:23 -05:00
Y Ethan Guo
b8601a9f58 [HUDI-2656] Generalize HoodieIndex for flexible record data type (#3893)
Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>
2022-02-03 20:24:04 -08:00
Manoj Govindassamy
5927bdd1c0 [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups (#4352)
* [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups

- Today, base files have bloom filter at their footers and index lookups
  have to load the base file to perform any bloom lookups. Though we have
  interval tree based file purging, we still end up in significant amount
  of base file read for the bloom filter for the end index lookups for the
  keys. This index lookup operation can be made more performant by having
  all the bloom filters in a new metadata partition and doing pointed
  lookups based on keys.

* [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups

 - Adding indexing support for clean, restore and rollback operations.
   Each of these operations will now be converted to index records for
   bloom filter and column stats additionally.

* [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups

 - Making hoodie key consistent for both column stats and bloom index by
   including fileId instead of fileName, in both read and write paths.

 - Performance optimization for looking up records in the metadata table.

 - Avoiding multi column sorting needed for HoodieBloomMetaIndexBatchCheckFunction

* [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups

 - HoodieBloomMetaIndexBatchCheckFunction cleanup to remove unused classes

 - Base file checking before reading the file footer for bloom or column stats

* [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups

 - Updating the bloom index and column stats index to have full file name
   included in the key instead of just file id.

 - Minor test fixes.

* [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups

 - Fixed flink commit method to handle metadata table all partition update records

 - TestBloomIndex fixes

* [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups

 - SparkHoodieBloomIndexHelper code simplification for various config modes

 - Signature change for getBloomFilters() and getColumnStats(). Callers can
   just pass in interested partition and file names, the index key is then
   constructed internally based on the passed in parameters.

 - KeyLookupHandle and KeyLookupResults code refactoring

 - Metadata schema changes - removed the reserved field

* [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups

 - Removing HoodieColumnStatsMetadata and using HoodieColumnRangeMetadata instead.
   Fixed the users of the the removed class.

* [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups

 - Extending meta index test to cover deletes, compactions, clean
   and restore table operations. Also, fixed the getBloomFilters()
   and getColumnStats() to account for deleted entries.

* [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups

 - Addressing review comments - java doc for new classes, keys sorting for
   lookup, index methods renaming.

* [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups

 - Consolidated the bloom filter checking for keys in to one
   HoodieMetadataBloomIndexCheckFunction instead of a spearate batch
   and lazy mode. Removed all the configs around it.

 - Made the metadata table partition file group count configurable.

 - Fixed the HoodieKeyLookupHandle to have auto closable file reader
   when checking bloom filter and range keys.

 - Config property renames. Test fixes.

* [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups

 - Enabling column stats indexing for all columns by default

 - Handling column stat generation errors and test update

* [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups

 - Metadata table partition file group count taken from the slices when
   the table is bootstrapped.

 - Prep records for the commit refactored to the base class

 - HoodieFileReader interface changes for filtering keys

 - Multi column and data types support for colums stats index

* [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups

 - rebase to latest master and merge fixes for the build and test failures

* [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups

 - Extending the metadata column stats type payload schema to include
   more statistics about the column ranges to help query integration.

* [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups

 - Addressing review comments
2022-02-03 18:12:48 +05:30
Alexey Kudinkin
a68e1dc2db [HUDI-431] Adding support for Parquet in MOR LogBlocks (#4333)
- Adding support for Parquet in MOR tables Log blocks

Co-authored-by: Sivabalan Narayanan <n.siva.b@gmail.com>
2022-02-02 14:35:05 -05:00
Raymond Xu
0bd38f26ca [HUDI-2596] Make class names consistent in hudi-client (#4680) 2022-01-27 17:05:08 -08:00
Sivabalan Narayanan
e00a9042e9 [HUDI-3072] Fixing conflict resolution in transaction management code path for auto commit code path (#4588)
* Fixing conflict resolution in transaction management code path for auto commit code path

* Addressing comments

* Fixing test failures
2022-01-24 16:13:28 +05:30
Shawy Geng
a4e622ac61 [HUDI-1951] Add bucket hash index, compatible with the hive bucket (#3173)
* [HUDI-2154] Add index key field to HoodieKey

* [HUDI-2157] Add the bucket index and its read/write implemention of Spark engine.
* revert HUDI-2154 add index key field to HoodieKey
* fix all comments and introduce a new tricky way to get index key at runtime
support double insert for bucket index
* revert spark read optimizer based on bucket index
* add the storage layout
* index tag, hash function and add ut
* fix ut
* address partial comments
* Code review feedback
* add layout config and docs
* fix ut
* rename hoodie.layout and rebase master

Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2021-12-30 12:38:26 -08:00
Ron
674c149234 [HUDI-3083] Support component data types for flink bulk_insert (#4470)
* [HUDI-3083] Support component data types for flink bulk_insert

* add nested row type test
2021-12-30 11:15:54 +08:00
Sivabalan Narayanan
5c0e4ce005 Revert "[HUDI-3043] Revert async cleaner leak commit to unblock CI failure (#4343)" (#4465)
This reverts commit 7e7ad1558c.
2021-12-30 10:45:09 +08:00
Danny Chan
f1286c2c76 [HUDI-3032] Do not clean the log files right after compaction for metadata table (#4336) 2021-12-22 11:10:27 +08:00
Manoj Govindassamy
d1d48ed494 [HUDI-3029] Transaction manager: avoid deadlock when doing begin and end transactions (#4363)
* [HUDI-3029] Transaction manager: avoid deadlock when doing begin and end transactions

 - Transaction manager has begin and end transactions as synchronized methods.
   Based on the lock provider implementaion, this can lead to deadlock
   situation when the underlying lock() calls are blocking or with a long timeout.

 - Fixing transaction manager begin and end transactions to not get to deadlock
   and to not assume anything on the lock provider implementation.
2021-12-18 09:43:17 -05:00
Sivabalan Narayanan
7e7ad1558c [HUDI-3043] Revert async cleaner leak commit to unblock CI failure (#4343)
* Revert "[HUDI-2959] Fix the thread leak of cleaning service (#4252)"
Reverting to unblock CI failure for now. will revisit this with the right fix
2021-12-16 21:51:28 -05:00
WangMinChao
9a2030ab31 [HUDI-3024] Add explicit write handler for flink (#4329)
Co-authored-by: wangminchao <wangminchao@asinking.com>
2021-12-15 20:16:48 +08:00
Manoj Govindassamy
b22c2c611b [HUDI-2938] Metadata table util to get latest file slices for reader/writers (#4218) 2021-12-11 20:42:36 -08:00
Danny Chan
9bdcee00c0 [HUDI-2959] Fix the thread leak of cleaning service (#4252) 2021-12-11 12:08:47 +08:00
Sivabalan Narayanan
1d4fb827e7 [HUDI-2923] Fixing metadata table reader when metadata compaction is inflight (#4206)
* [HUDI-2923] Fixing metadata table reader when metadata compaction is inflight

* Fixing retry of pending compaction in metadata table and enhancing tests
2021-12-03 21:44:50 -08:00
rmahindra123
91d2e61433 [HUDI-2904] Fix metadata table archival overstepping between regular writers and table services (#4186)
- Co-authored-by: Rajesh Mahindra <rmahindra@Rajeshs-MacBook-Pro.local>
- Co-authored-by: Sivabalan Narayanan <n.siva.b@gmail.com>
2021-12-02 13:32:26 -05:00
Manoj Govindassamy
2c7656c35f [HUDI-2475] [HUDI-2862] Metadata table creation and avoid bootstrapping race for write client & add locking for upgrade (#4114)
Co-authored-by: Sivabalan Narayanan <n.siva.b@gmail.com>
2021-11-26 23:19:26 -08:00
Alexey Kudinkin
5755ff25a4 [HUDI-2814] Addressing issues w/ Z-order Layout Optimization (#4060)
* `ZCurveOptimizeHelper` > `ZOrderingIndexHelper`;
Moved Z-index helper under `hudi.index.zorder` package

* Tidying up `ZOrderingIndexHelper`

* Fixing compilation

* Fixed index new/original table merging sequence to always prefer values from new index;
Cleaned up `HoodieSparkUtils`

* Added test for `mergeIndexSql`

* Abstracted Z-index name composition w/in `ZOrderingIndexHelper`;

* Fixed `DataSkippingUtils` to interrupt prunning in case data filter contains non-indexed column reference

* Properly handle exceptions origination during pruning in `HoodieFileIndex`

* Make sure no errors are logged upon encountering `AnalysisException`

* Cleaned up Z-index updating sequence;
Tidying up comments, java-docs;

* Fixed Z-index to properly handle changes of the list of clustered columns

* Tidying up

* `lint`

* Suppressing `JavaDocStyle` first sentence check

* Fixed compilation

* Fixing incorrect `DecimalType` conversion

* Refactored test `TestTableLayoutOptimization`
  - Added Z-index table composition test (against fixtures)
  - Separated out GC test;
Tidying up

* Fixed tests re-shuffling column order for Z-Index table `DataFrame` to align w/ the one by one loaded from JSON

* Scaffolded `DataTypeUtils` to do basic checks of Spark types;
Added proper compatibility checking b/w old/new index-tables

* Added test for Z-index tables merging

* Fixed import being shaded by creating internal `hudi.util` package

* Fixed packaging for `TestOptimizeTable`

* Revised `updateMetadataIndex` seq to provide Z-index updating process w/ source table schema

* Make sure existing Z-index table schema is sync'd to source table's one

* Fixed shaded refs

* Fixed tests

* Fixed type conversion of Parquet provided metadata values into Spark expected schemas

* Fixed `composeIndexSchema` utility to propose proper schema

* Added more tests for Z-index:
  - Checking that Z-index table is built correctly
  - Checking that Z-index tables are merged correctly (during update)

* Fixing source table

* Fixing tests to read from Parquet w/ proper schema

* Refactored `ParquetUtils` utility reading stats from Parquet footers

* Fixed incorrect handling of Decimals extracted from Parquet footers

* Worked around issues in javac failign to compile stream's collection

* Fixed handling of `Date` type

* Fixed handling of `DateType` to be parsed as `LocalDate`

* Updated fixture;
Make sure test loads Z-index fixture using proper schema

* Removed superfluous scheme adjusting when reading from Parquet, since Spark is actually able to perfectly restore schema (given Parquet was previously written by Spark as well)

* Fixing race-condition in Parquet's `DateStringifier` trying to share `SimpleDataFormat` object which is inherently not thread-safe

* Tidying up

* Make sure schema is used upon reading to validate input files are in the appropriate format;
Tidying up;

* Worked around javac (1.8) inability to infer expression type properly

* Updated fixtures;
Tidying up

* Fixing compilation after rebase

* Assert clustering have in Z-order layout optimization testing

* Tidying up exception messages

* XXX

* Added test validating Z-index lookup filter correctness

* Added more test-cases;
Tidying up

* Added tests for string expressions

* Fixed incorrect Z-index filter lookup translations

* Added more test-cases

* Added proper handling on complex negations of AND/OR expressions by pushing NOT operator down into inner expressions for appropriate handling

* Added `-target:jvm-1.8` for `hudi-spark` module

* Adding more tests

* Added tests for non-indexed columns

* Properly handle non-indexed columns by falling back to a re-write of containing expression as  `TrueLiteral` instead

* Fixed tests

* Removing the parquet test files and disabling corresponding tests

Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2021-11-26 10:02:15 -08:00
Sivabalan Narayanan
7bb90e8caf [HUDI-2794] Guarding table service commits within a single lock to commit to both data table and metadata table (#4037)
* Fixing a single lock to commit table services across metadata table and data table

* Addressing comments

* rebasing with master
2021-11-25 11:19:30 -08:00