1
0

[HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups (#4352)

* [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups

- Today, base files have bloom filter at their footers and index lookups
  have to load the base file to perform any bloom lookups. Though we have
  interval tree based file purging, we still end up in significant amount
  of base file read for the bloom filter for the end index lookups for the
  keys. This index lookup operation can be made more performant by having
  all the bloom filters in a new metadata partition and doing pointed
  lookups based on keys.

* [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups

 - Adding indexing support for clean, restore and rollback operations.
   Each of these operations will now be converted to index records for
   bloom filter and column stats additionally.

* [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups

 - Making hoodie key consistent for both column stats and bloom index by
   including fileId instead of fileName, in both read and write paths.

 - Performance optimization for looking up records in the metadata table.

 - Avoiding multi column sorting needed for HoodieBloomMetaIndexBatchCheckFunction

* [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups

 - HoodieBloomMetaIndexBatchCheckFunction cleanup to remove unused classes

 - Base file checking before reading the file footer for bloom or column stats

* [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups

 - Updating the bloom index and column stats index to have full file name
   included in the key instead of just file id.

 - Minor test fixes.

* [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups

 - Fixed flink commit method to handle metadata table all partition update records

 - TestBloomIndex fixes

* [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups

 - SparkHoodieBloomIndexHelper code simplification for various config modes

 - Signature change for getBloomFilters() and getColumnStats(). Callers can
   just pass in interested partition and file names, the index key is then
   constructed internally based on the passed in parameters.

 - KeyLookupHandle and KeyLookupResults code refactoring

 - Metadata schema changes - removed the reserved field

* [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups

 - Removing HoodieColumnStatsMetadata and using HoodieColumnRangeMetadata instead.
   Fixed the users of the the removed class.

* [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups

 - Extending meta index test to cover deletes, compactions, clean
   and restore table operations. Also, fixed the getBloomFilters()
   and getColumnStats() to account for deleted entries.

* [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups

 - Addressing review comments - java doc for new classes, keys sorting for
   lookup, index methods renaming.

* [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups

 - Consolidated the bloom filter checking for keys in to one
   HoodieMetadataBloomIndexCheckFunction instead of a spearate batch
   and lazy mode. Removed all the configs around it.

 - Made the metadata table partition file group count configurable.

 - Fixed the HoodieKeyLookupHandle to have auto closable file reader
   when checking bloom filter and range keys.

 - Config property renames. Test fixes.

* [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups

 - Enabling column stats indexing for all columns by default

 - Handling column stat generation errors and test update

* [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups

 - Metadata table partition file group count taken from the slices when
   the table is bootstrapped.

 - Prep records for the commit refactored to the base class

 - HoodieFileReader interface changes for filtering keys

 - Multi column and data types support for colums stats index

* [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups

 - rebase to latest master and merge fixes for the build and test failures

* [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups

 - Extending the metadata column stats type payload schema to include
   more statistics about the column ranges to help query integration.

* [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups

 - Addressing review comments
This commit is contained in:
Manoj Govindassamy
2022-02-03 04:42:48 -08:00
committed by GitHub
parent d681824982
commit 5927bdd1c0
49 changed files with 2304 additions and 522 deletions

View File

@@ -30,27 +30,140 @@
"doc": "Type of the metadata record",
"type": "int"
},
{ "name": "filesystemMetadata",
{
"doc": "Contains information about partitions and files within the dataset",
"type": ["null", {
"type": "map",
"values": {
"name": "filesystemMetadata",
"type": [
"null",
{
"type": "map",
"values": {
"type": "record",
"name": "HoodieMetadataFileInfo",
"fields": [
{
"name": "size",
"type": "long",
"doc": "Size of the file"
},
{
"name": "isDeleted",
"type": "boolean",
"doc": "True if this file has been deleted"
}
]
}
}
]
},
{
"doc": "Metadata Index of bloom filters for all data files in the user table",
"name": "BloomFilterMetadata",
"type": [
"null",
{
"doc": "Data file bloom filter details",
"name": "HoodieMetadataBloomFilter",
"type": "record",
"name": "HoodieMetadataFileInfo",
"fields": [
{
"name": "size",
"type": "long",
"doc": "Size of the file"
"doc": "Bloom filter type code",
"name": "type",
"type": "string"
},
{
"doc": "Instant timestamp when this metadata was created/updated",
"name": "timestamp",
"type": "string"
},
{
"doc": "Bloom filter binary byte array",
"name": "bloomFilter",
"type": "bytes"
},
{
"doc": "Bloom filter entry valid/deleted flag",
"name": "isDeleted",
"type": "boolean",
"doc": "True if this file has been deleted"
"type": "boolean"
}
]
}
}]
]
},
{
"doc": "Metadata Index of column statistics for all data files in the user table",
"name": "ColumnStatsMetadata",
"type": [
"null",
{
"doc": "Data file column statistics",
"name": "HoodieMetadataColumnStats",
"type": "record",
"fields": [
{
"doc": "File name for which this column statistics applies",
"name": "fileName",
"type": [
"null",
"string"
]
},
{
"doc": "Minimum value in the range. Based on user data table schema, we can convert this to appropriate type",
"name": "minValue",
"type": [
"null",
"string"
]
},
{
"doc": "Maximum value in the range. Based on user data table schema, we can convert it to appropriate type",
"name": "maxValue",
"type": [
"null",
"string"
]
},
{
"doc": "Total count of values",
"name": "valueCount",
"type": [
"null",
"long"
]
},
{
"doc": "Total count of null values",
"name": "nullCount",
"type": [
"null",
"long"
]
},
{
"doc": "Total storage size on disk",
"name": "totalSize",
"type": [
"null",
"long"
]
},
{
"doc": "Total uncompressed storage size on disk",
"name": "totalUncompressedSize",
"type": [
"null",
"long"
]
},
{
"doc": "Column range entry valid/deleted flag",
"name": "isDeleted",
"type": "boolean"
}
]
}
]
}
]
}

View File

@@ -63,7 +63,7 @@ public class HoodieDynamicBoundedBloomFilter implements BloomFilter {
* @param serString the serialized string which represents the {@link HoodieDynamicBoundedBloomFilter}
* @param typeCode type code of the bloom filter
*/
HoodieDynamicBoundedBloomFilter(String serString, BloomFilterTypeCode typeCode) {
public HoodieDynamicBoundedBloomFilter(String serString, BloomFilterTypeCode typeCode) {
// ignoring the type code for now, since we have just one version
byte[] bytes = Base64CodecUtil.decode(serString);
DataInputStream dis = new DataInputStream(new ByteArrayInputStream(bytes));

View File

@@ -124,6 +124,47 @@ public final class HoodieMetadataConfig extends HoodieConfig {
.sinceVersion("0.10.0")
.withDocumentation("Enable full scanning of log files while reading log records. If disabled, hudi does look up of only interested entries.");
public static final ConfigProperty<Boolean> ENABLE_METADATA_INDEX_BLOOM_FILTER = ConfigProperty
.key(METADATA_PREFIX + ".index.bloom.filter.enable")
.defaultValue(false)
.sinceVersion("0.11.0")
.withDocumentation("Enable indexing user data files bloom filters under metadata table. When enabled, "
+ "metadata table will have a partition to store the bloom filter index and will be "
+ "used during the index lookups.");
public static final ConfigProperty<Integer> METADATA_INDEX_BLOOM_FILTER_FILE_GROUP_COUNT = ConfigProperty
.key(METADATA_PREFIX + ".index.bloom.filter.file.group.count")
.defaultValue(4)
.sinceVersion("0.11.0")
.withDocumentation("Metadata bloom filter index partition file group count. This controls the size of the base and "
+ "log files and read parallelism in the bloom filter index partition. The recommendation is to size the "
+ "file group count such that the base files are under 1GB.");
public static final ConfigProperty<Boolean> ENABLE_METADATA_INDEX_COLUMN_STATS = ConfigProperty
.key(METADATA_PREFIX + ".index.column.stats.enable")
.defaultValue(false)
.sinceVersion("0.11.0")
.withDocumentation("Enable indexing user data files column ranges under metadata table key lookups. When "
+ "enabled, metadata table will have a partition to store the column ranges and will "
+ "used for pruning files during the index lookups.");
public static final ConfigProperty<Integer> METADATA_INDEX_COLUMN_STATS_FILE_GROUP_COUNT = ConfigProperty
.key(METADATA_PREFIX + ".index.column.stats.file.group.count")
.defaultValue(2)
.sinceVersion("0.11.0")
.withDocumentation("Metadata column stats partition file group count. This controls the size of the base and "
+ "log files and read parallelism in the column stats index partition. The recommendation is to size the "
+ "file group count such that the base files are under 1GB.");
public static final ConfigProperty<Boolean> ENABLE_METADATA_INDEX_COLUMN_STATS_FOR_ALL_COLUMNS = ConfigProperty
.key(METADATA_PREFIX + ".index.column.stats.all_columns.enable")
.defaultValue(true)
.sinceVersion("0.11.0")
.withDocumentation("Enable indexing user data files column ranges under metadata table key lookups. When "
+ "enabled, metadata table will have a partition to store the column ranges and will "
+ "used for pruning files during the index lookups. Only applies if "
+ ENABLE_METADATA_INDEX_COLUMN_STATS.key() + " is enabled.A");
public static final ConfigProperty<Boolean> POPULATE_META_FIELDS = ConfigProperty
.key(METADATA_PREFIX + ".populate.meta.fields")
.defaultValue(false)
@@ -157,6 +198,26 @@ public final class HoodieMetadataConfig extends HoodieConfig {
return getBoolean(ENABLE);
}
public boolean isBloomFilterIndexEnabled() {
return getBooleanOrDefault(ENABLE_METADATA_INDEX_BLOOM_FILTER);
}
public boolean isColumnStatsIndexEnabled() {
return getBooleanOrDefault(ENABLE_METADATA_INDEX_COLUMN_STATS);
}
public boolean isMetadataColumnStatsIndexForAllColumnsEnabled() {
return getBooleanOrDefault(ENABLE_METADATA_INDEX_COLUMN_STATS_FOR_ALL_COLUMNS);
}
public int getBloomFilterIndexFileGroupCount() {
return getIntOrDefault(METADATA_INDEX_BLOOM_FILTER_FILE_GROUP_COUNT);
}
public int getColumnStatsIndexFileGroupCount() {
return getIntOrDefault(METADATA_INDEX_COLUMN_STATS_FILE_GROUP_COUNT);
}
public boolean enableMetrics() {
return getBoolean(METRICS_ENABLE);
}
@@ -199,6 +260,31 @@ public final class HoodieMetadataConfig extends HoodieConfig {
return this;
}
public Builder withMetadataIndexBloomFilter(boolean enable) {
metadataConfig.setValue(ENABLE_METADATA_INDEX_BLOOM_FILTER, String.valueOf(enable));
return this;
}
public Builder withMetadataIndexBloomFilterFileGroups(int fileGroupCount) {
metadataConfig.setValue(METADATA_INDEX_BLOOM_FILTER_FILE_GROUP_COUNT, String.valueOf(fileGroupCount));
return this;
}
public Builder withMetadataIndexColumnStats(boolean enable) {
metadataConfig.setValue(ENABLE_METADATA_INDEX_COLUMN_STATS, String.valueOf(enable));
return this;
}
public Builder withMetadataIndexColumnStatsFileGroupCount(int fileGroupCount) {
metadataConfig.setValue(METADATA_INDEX_COLUMN_STATS_FILE_GROUP_COUNT, String.valueOf(fileGroupCount));
return this;
}
public Builder withMetadataIndexForAllColumns(boolean enable) {
metadataConfig.setValue(ENABLE_METADATA_INDEX_COLUMN_STATS_FOR_ALL_COLUMNS, String.valueOf(enable));
return this;
}
public Builder enableMetrics(boolean enableMetrics) {
metadataConfig.setValue(METRICS_ENABLE, String.valueOf(enableMetrics));
return this;

View File

@@ -28,14 +28,21 @@ public class HoodieColumnRangeMetadata<T> {
private final String columnName;
private final T minValue;
private final T maxValue;
private final long numNulls;
private final long nullCount;
private final long valueCount;
private final long totalSize;
private final long totalUncompressedSize;
public HoodieColumnRangeMetadata(final String filePath, final String columnName, final T minValue, final T maxValue, final long numNulls) {
public HoodieColumnRangeMetadata(final String filePath, final String columnName, final T minValue, final T maxValue,
final long nullCount, long valueCount, long totalSize, long totalUncompressedSize) {
this.filePath = filePath;
this.columnName = columnName;
this.minValue = minValue;
this.maxValue = maxValue;
this.numNulls = numNulls;
this.nullCount = nullCount;
this.valueCount = valueCount;
this.totalSize = totalSize;
this.totalUncompressedSize = totalUncompressedSize;
}
public String getFilePath() {
@@ -54,8 +61,20 @@ public class HoodieColumnRangeMetadata<T> {
return this.maxValue;
}
public long getNumNulls() {
return numNulls;
public long getNullCount() {
return nullCount;
}
public long getValueCount() {
return valueCount;
}
public long getTotalSize() {
return totalSize;
}
public long getTotalUncompressedSize() {
return totalUncompressedSize;
}
@Override
@@ -71,12 +90,15 @@ public class HoodieColumnRangeMetadata<T> {
&& Objects.equals(getColumnName(), that.getColumnName())
&& Objects.equals(getMinValue(), that.getMinValue())
&& Objects.equals(getMaxValue(), that.getMaxValue())
&& Objects.equals(getNumNulls(), that.getNumNulls());
&& Objects.equals(getNullCount(), that.getNullCount())
&& Objects.equals(getValueCount(), that.getValueCount())
&& Objects.equals(getTotalSize(), that.getTotalSize())
&& Objects.equals(getTotalUncompressedSize(), that.getTotalUncompressedSize());
}
@Override
public int hashCode() {
return Objects.hash(getColumnName(), getMinValue(), getMaxValue(), getNumNulls());
return Objects.hash(getColumnName(), getMinValue(), getMaxValue(), getNullCount());
}
@Override
@@ -86,6 +108,10 @@ public class HoodieColumnRangeMetadata<T> {
+ "columnName='" + columnName + '\''
+ ", minValue=" + minValue
+ ", maxValue=" + maxValue
+ ", numNulls=" + numNulls + '}';
+ ", nullCount=" + nullCount
+ ", valueCount=" + valueCount
+ ", totalSize=" + totalSize
+ ", totalUncompressedSize=" + totalUncompressedSize
+ '}';
}
}

View File

@@ -175,11 +175,11 @@ public abstract class AbstractHoodieLogRecordReader {
return this.simpleKeyGenFields.get().getKey();
}
public void scan() {
public synchronized void scan() {
scan(Option.empty());
}
public void scan(Option<List<String>> keys) {
public synchronized void scan(Option<List<String>> keys) {
currentInstantLogBlocks = new ArrayDeque<>();
progress = 0.0f;
totalLogFiles = new AtomicLong(0);

View File

@@ -30,6 +30,8 @@ import org.apache.hudi.common.model.HoodieRecord;
import org.apache.hudi.exception.HoodieIOException;
import org.apache.hudi.exception.MetadataNotFoundException;
import org.apache.hudi.keygen.BaseKeyGenerator;
import org.apache.log4j.LogManager;
import org.apache.log4j.Logger;
import org.apache.parquet.avro.AvroParquetReader;
import org.apache.parquet.avro.AvroReadSupport;
import org.apache.parquet.avro.AvroSchemaConverter;
@@ -62,6 +64,8 @@ import java.util.stream.Stream;
*/
public class ParquetUtils extends BaseFileUtils {
private static final Logger LOG = LogManager.getLogger(ParquetUtils.class);
/**
* Read the rowKey list matching the given filter, from the given parquet file. If the filter is empty, then this will
* return all the rowkeys.
@@ -300,18 +304,21 @@ public class ParquetUtils extends BaseFileUtils {
Map<String, List<HoodieColumnRangeMetadata<Comparable>>> columnToStatsListMap = metadata.getBlocks().stream().sequential()
.flatMap(blockMetaData -> blockMetaData.getColumns().stream()
.filter(f -> cols.contains(f.getPath().toDotString()))
.map(columnChunkMetaData ->
new HoodieColumnRangeMetadata<Comparable>(
parquetFilePath.getName(),
columnChunkMetaData.getPath().toDotString(),
convertToNativeJavaType(
columnChunkMetaData.getPrimitiveType(),
columnChunkMetaData.getStatistics().genericGetMin()),
convertToNativeJavaType(
columnChunkMetaData.getPrimitiveType(),
columnChunkMetaData.getStatistics().genericGetMax()),
columnChunkMetaData.getStatistics().getNumNulls())))
.collect(Collectors.groupingBy(HoodieColumnRangeMetadata::getColumnName));
.map(columnChunkMetaData ->
new HoodieColumnRangeMetadata<Comparable>(
parquetFilePath.getName(),
columnChunkMetaData.getPath().toDotString(),
convertToNativeJavaType(
columnChunkMetaData.getPrimitiveType(),
columnChunkMetaData.getStatistics().genericGetMin()),
convertToNativeJavaType(
columnChunkMetaData.getPrimitiveType(),
columnChunkMetaData.getStatistics().genericGetMax()),
columnChunkMetaData.getStatistics().getNumNulls(),
columnChunkMetaData.getValueCount(),
columnChunkMetaData.getTotalSize(),
columnChunkMetaData.getTotalUncompressedSize()))
).collect(Collectors.groupingBy(HoodieColumnRangeMetadata::getColumnName));
// Combine those into file-level statistics
// NOTE: Inlining this var makes javac (1.8) upset (due to its inability to infer
@@ -355,13 +362,17 @@ public class ParquetUtils extends BaseFileUtils {
maxValue = one.getMaxValue().compareTo(another.getMaxValue()) < 0 ? another.getMaxValue() : one.getMaxValue();
} else if (one.getMaxValue() == null) {
maxValue = another.getMaxValue();
} else {
} else {
maxValue = one.getMaxValue();
}
return new HoodieColumnRangeMetadata<T>(
one.getFilePath(),
one.getColumnName(), minValue, maxValue, one.getNumNulls() + another.getNumNulls());
one.getColumnName(), minValue, maxValue,
one.getNullCount() + another.getNullCount(),
one.getValueCount() + another.getValueCount(),
one.getTotalSize() + another.getTotalSize(),
one.getTotalUncompressedSize() + another.getTotalUncompressedSize());
}
private static Comparable<?> convertToNativeJavaType(PrimitiveType primitiveType, Comparable val) {
@@ -408,7 +419,7 @@ public class ParquetUtils extends BaseFileUtils {
return BigDecimal.valueOf((Long) val, scale);
} else if (val instanceof Binary) {
// NOTE: Unscaled number is stored in BE format (most significant byte is 0th)
return new BigDecimal(new BigInteger(((Binary)val).getBytesUnsafe()), scale);
return new BigDecimal(new BigInteger(((Binary) val).getBytesUnsafe()), scale);
} else {
throw new UnsupportedOperationException(String.format("Unsupported value type (%s)", val.getClass().getName()));
}

View File

@@ -24,14 +24,21 @@ import org.apache.hudi.common.util.Base64CodecUtil;
/**
* A stateful Hoodie object ID representing any table column.
*/
public class ColumnID extends HoodieID {
public class ColumnIndexID extends HoodieIndexID {
private static final Type TYPE = Type.COLUMN;
private static final HashID.Size ID_COLUMN_HASH_SIZE = HashID.Size.BITS_64;
public static final HashID.Size ID_COLUMN_HASH_SIZE = HashID.Size.BITS_64;
private final String column;
private final byte[] hash;
public ColumnID(final String message) {
this.hash = HashID.hash(message, ID_COLUMN_HASH_SIZE);
public ColumnIndexID(final String column) {
this.column = column;
this.hash = HashID.hash(column, ID_COLUMN_HASH_SIZE);
}
@Override
public String getName() {
return column;
}
@Override

View File

@@ -24,14 +24,21 @@ import org.apache.hudi.common.util.Base64CodecUtil;
/**
* Hoodie object ID representing any file.
*/
public class FileID extends HoodieID {
public class FileIndexID extends HoodieIndexID {
private static final Type TYPE = Type.FILE;
private static final HashID.Size ID_FILE_HASH_SIZE = HashID.Size.BITS_128;
private final String fileName;
private final byte[] hash;
public FileID(final String message) {
this.hash = HashID.hash(message, ID_FILE_HASH_SIZE);
public FileIndexID(final String fileName) {
this.fileName = fileName;
this.hash = HashID.hash(fileName, ID_FILE_HASH_SIZE);
}
@Override
public String getName() {
return fileName;
}
@Override

View File

@@ -24,9 +24,10 @@ import org.apache.hudi.exception.HoodieNotSupportedException;
import java.io.Serializable;
/**
* A serializable ID that can be used to identify any Hoodie table fields and resources.
* A serializable ID that can be used to identify any Hoodie table fields and
* resources in the on-disk index.
*/
public abstract class HoodieID implements Serializable {
public abstract class HoodieIndexID implements Serializable {
private static final long serialVersionUID = 1L;
@@ -50,6 +51,13 @@ public abstract class HoodieID implements Serializable {
}
}
/**
* Get the resource name for which this index id is generated.
*
* @return The resource name
*/
public abstract String getName();
/**
* Get the number of bits representing this ID in memory.
* <p>
@@ -74,7 +82,7 @@ public abstract class HoodieID implements Serializable {
public abstract String toString();
/**
*
* Get the Base64 encoded version of the ID.
*/
public String asBase64EncodedString() {
throw new HoodieNotSupportedException("Unsupported hash for " + getType());

View File

@@ -24,14 +24,21 @@ import org.apache.hudi.common.util.Base64CodecUtil;
/**
* Hoodie object ID representing any partition.
*/
public class PartitionID extends HoodieID {
public class PartitionIndexID extends HoodieIndexID {
private static final Type TYPE = Type.PARTITION;
private static final HashID.Size ID_PARTITION_HASH_SIZE = HashID.Size.BITS_64;
private final String partition;
private final byte[] hash;
public PartitionID(final String message) {
this.hash = HashID.hash(message, ID_PARTITION_HASH_SIZE);
public PartitionIndexID(final String partition) {
this.partition = partition;
this.hash = HashID.hash(partition, ID_PARTITION_HASH_SIZE);
}
@Override
public String getName() {
return partition;
}
@Override

View File

@@ -20,6 +20,8 @@ package org.apache.hudi.io.storage;
import java.io.IOException;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.util.Set;
import org.apache.avro.Schema;
@@ -35,6 +37,10 @@ public interface HoodieFileReader<R extends IndexedRecord> extends AutoCloseable
public Set<String> filterRowKeys(Set<String> candidateRowKeys);
default Map<String, R> getRecordsByKeys(List<String> rowKeys) throws IOException {
throw new UnsupportedOperationException();
}
public Iterator<R> getRecordIterator(Schema readerSchema) throws IOException;
default Iterator<R> getRecordIterator() throws IOException {

View File

@@ -22,12 +22,14 @@ import java.io.IOException;
import java.nio.ByteBuffer;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashSet;
import java.util.HashMap;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;
import java.util.Set;
import java.util.TreeSet;
import java.util.stream.Collectors;
import org.apache.avro.Schema;
import org.apache.avro.generic.IndexedRecord;
@@ -53,8 +55,11 @@ import org.apache.hudi.common.util.ValidationUtils;
import org.apache.hudi.common.util.io.ByteBufferBackedInputStream;
import org.apache.hudi.exception.HoodieException;
import org.apache.hudi.exception.HoodieIOException;
import org.apache.log4j.LogManager;
import org.apache.log4j.Logger;
public class HoodieHFileReader<R extends IndexedRecord> implements HoodieFileReader {
public class HoodieHFileReader<R extends IndexedRecord> implements HoodieFileReader<R> {
private static final Logger LOG = LogManager.getLogger(HoodieHFileReader.class);
private Path path;
private Configuration conf;
private HFile.Reader reader;
@@ -133,23 +138,50 @@ public class HoodieHFileReader<R extends IndexedRecord> implements HoodieFileRea
}
}
/**
* Filter keys by availability.
* <p>
* Note: This method is performant when the caller passes in a sorted candidate keys.
*
* @param candidateRowKeys - Keys to check for the availability
* @return Subset of candidate keys that are available
*/
@Override
public Set<String> filterRowKeys(Set candidateRowKeys) {
// Current implementation reads all records and filters them. In certain cases, it many be better to:
// 1. Scan a limited subset of keys (min/max range of candidateRowKeys)
// 2. Lookup keys individually (if the size of candidateRowKeys is much less than the total keys in file)
try {
List<Pair<String, R>> allRecords = readAllRecords();
Set<String> rowKeys = new HashSet<>();
allRecords.forEach(t -> {
if (candidateRowKeys.contains(t.getFirst())) {
rowKeys.add(t.getFirst());
}
});
return rowKeys;
} catch (IOException e) {
throw new HoodieIOException("Failed to read row keys from " + path, e);
public Set<String> filterRowKeys(Set<String> candidateRowKeys) {
return candidateRowKeys.stream().filter(k -> {
try {
return isKeyAvailable(k);
} catch (IOException e) {
LOG.error("Failed to check key availability: " + k);
return false;
}
}).collect(Collectors.toSet());
}
@Override
public Map<String, R> getRecordsByKeys(List<String> rowKeys) throws IOException {
return filterRecordsImpl(new TreeSet<>(rowKeys));
}
/**
* Filter records by sorted keys.
* <p>
* TODO: Implement single seek and sequential scan till the last candidate key
* instead of repeated seeks.
*
* @param sortedCandidateRowKeys - Sorted set of keys to fetch records for
* @return Map of keys to fetched records
* @throws IOException When the deserialization of records fail
*/
private synchronized Map<String, R> filterRecordsImpl(TreeSet<String> sortedCandidateRowKeys) throws IOException {
HashMap<String, R> filteredRecords = new HashMap<>();
for (String key : sortedCandidateRowKeys) {
Option<R> record = getRecordByKey(key);
if (record.isPresent()) {
filteredRecords.put(key, record.get());
}
}
return filteredRecords;
}
public List<Pair<String, R>> readAllRecords(Schema writerSchema, Schema readerSchema) throws IOException {
@@ -246,6 +278,19 @@ public class HoodieHFileReader<R extends IndexedRecord> implements HoodieFileRea
};
}
private boolean isKeyAvailable(String key) throws IOException {
final KeyValue kv = new KeyValue(key.getBytes(), null, null, null);
synchronized (this) {
if (keyScanner == null) {
keyScanner = reader.getScanner(false, false);
}
if (keyScanner.seekTo(kv) == 0) {
return true;
}
}
return false;
}
@Override
public Option getRecordByKey(String key, Schema readerSchema) throws IOException {
byte[] value = null;

View File

@@ -19,6 +19,8 @@
package org.apache.hudi.metadata;
import org.apache.hudi.avro.model.HoodieMetadataColumnStats;
import org.apache.hudi.avro.model.HoodieMetadataBloomFilter;
import org.apache.hudi.common.config.HoodieMetadataConfig;
import org.apache.hudi.common.config.SerializableConfiguration;
import org.apache.hudi.common.engine.HoodieEngineContext;
@@ -30,7 +32,11 @@ import org.apache.hudi.common.table.HoodieTableMetaClient;
import org.apache.hudi.common.table.timeline.HoodieInstant;
import org.apache.hudi.common.util.HoodieTimer;
import org.apache.hudi.common.util.Option;
import org.apache.hudi.common.util.ValidationUtils;
import org.apache.hudi.common.util.collection.Pair;
import org.apache.hudi.common.util.hash.ColumnIndexID;
import org.apache.hudi.common.util.hash.FileIndexID;
import org.apache.hudi.common.util.hash.PartitionIndexID;
import org.apache.hudi.exception.HoodieMetadataException;
import org.apache.hadoop.fs.FileStatus;
@@ -39,12 +45,15 @@ import org.apache.log4j.LogManager;
import org.apache.log4j.Logger;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Set;
import java.util.TreeSet;
import java.util.stream.Collectors;
public abstract class BaseTableMetadata implements HoodieTableMetadata {
@@ -63,7 +72,9 @@ public abstract class BaseTableMetadata implements HoodieTableMetadata {
// Directory used for Spillable Map when merging records
protected final String spillableMapDirectory;
protected boolean enabled;
protected boolean isMetadataTableEnabled;
protected boolean isBloomFilterIndexEnabled = false;
protected boolean isColumnStatsIndexEnabled = false;
protected BaseTableMetadata(HoodieEngineContext engineContext, HoodieMetadataConfig metadataConfig,
String dataBasePath, String spillableMapDirectory) {
@@ -74,7 +85,7 @@ public abstract class BaseTableMetadata implements HoodieTableMetadata {
this.spillableMapDirectory = spillableMapDirectory;
this.metadataConfig = metadataConfig;
this.enabled = metadataConfig.enabled();
this.isMetadataTableEnabled = metadataConfig.enabled();
if (metadataConfig.enableMetrics()) {
this.metrics = Option.of(new HoodieMetadataMetrics(Registry.getRegistry("HoodieMetadata")));
} else {
@@ -84,16 +95,15 @@ public abstract class BaseTableMetadata implements HoodieTableMetadata {
/**
* Return the list of partitions in the dataset.
*
* <p>
* If the Metadata Table is enabled, the listing is retrieved from the stored metadata. Otherwise, the list of
* partitions is retrieved directly from the underlying {@code FileSystem}.
*
* <p>
* On any errors retrieving the listing from the metadata, defaults to using the file system listings.
*
*/
@Override
public List<String> getAllPartitionPaths() throws IOException {
if (enabled) {
if (isMetadataTableEnabled) {
try {
return fetchAllPartitionPaths();
} catch (Exception e) {
@@ -106,10 +116,10 @@ public abstract class BaseTableMetadata implements HoodieTableMetadata {
/**
* Return the list of files in a partition.
*
* <p>
* If the Metadata Table is enabled, the listing is retrieved from the stored metadata. Otherwise, the list of
* partitions is retrieved directly from the underlying {@code FileSystem}.
*
* <p>
* On any errors retrieving the listing from the metadata, defaults to using the file system listings.
*
* @param partitionPath The absolute path of the partition to list
@@ -117,7 +127,7 @@ public abstract class BaseTableMetadata implements HoodieTableMetadata {
@Override
public FileStatus[] getAllFilesInPartition(Path partitionPath)
throws IOException {
if (enabled) {
if (isMetadataTableEnabled) {
try {
return fetchAllFilesInPartition(partitionPath);
} catch (Exception e) {
@@ -132,7 +142,7 @@ public abstract class BaseTableMetadata implements HoodieTableMetadata {
@Override
public Map<String, FileStatus[]> getAllFilesInPartitions(List<String> partitions)
throws IOException {
if (enabled) {
if (isMetadataTableEnabled) {
try {
List<Path> partitionPaths = partitions.stream().map(entry -> new Path(entry)).collect(Collectors.toList());
Map<String, FileStatus[]> partitionsFilesMap = fetchAllFilesInPartitionPaths(partitionPaths);
@@ -146,12 +156,124 @@ public abstract class BaseTableMetadata implements HoodieTableMetadata {
.getAllFilesInPartitions(partitions);
}
@Override
public Option<ByteBuffer> getBloomFilter(final String partitionName, final String fileName)
throws HoodieMetadataException {
if (!isBloomFilterIndexEnabled) {
LOG.error("Metadata bloom filter index is disabled!");
return Option.empty();
}
final Pair<String, String> partitionFileName = Pair.of(partitionName, fileName);
Map<Pair<String, String>, ByteBuffer> bloomFilters = getBloomFilters(Collections.singletonList(partitionFileName));
if (bloomFilters.isEmpty()) {
LOG.error("Meta index: missing bloom filter for partition: " + partitionName + ", file: " + fileName);
return Option.empty();
}
ValidationUtils.checkState(bloomFilters.containsKey(partitionFileName));
return Option.of(bloomFilters.get(partitionFileName));
}
@Override
public Map<Pair<String, String>, ByteBuffer> getBloomFilters(final List<Pair<String, String>> partitionNameFileNameList)
throws HoodieMetadataException {
if (!isBloomFilterIndexEnabled) {
LOG.error("Metadata bloom filter index is disabled!");
return Collections.emptyMap();
}
if (partitionNameFileNameList.isEmpty()) {
return Collections.emptyMap();
}
HoodieTimer timer = new HoodieTimer().startTimer();
Set<String> partitionIDFileIDSortedStrings = new TreeSet<>();
Map<String, Pair<String, String>> fileToKeyMap = new HashMap<>();
partitionNameFileNameList.forEach(partitionNameFileNamePair -> {
final String bloomFilterIndexKey = HoodieMetadataPayload.getBloomFilterIndexKey(
new PartitionIndexID(partitionNameFileNamePair.getLeft()), new FileIndexID(partitionNameFileNamePair.getRight()));
partitionIDFileIDSortedStrings.add(bloomFilterIndexKey);
fileToKeyMap.put(bloomFilterIndexKey, partitionNameFileNamePair);
}
);
List<String> partitionIDFileIDStrings = new ArrayList<>(partitionIDFileIDSortedStrings);
List<Pair<String, Option<HoodieRecord<HoodieMetadataPayload>>>> hoodieRecordList =
getRecordsByKeys(partitionIDFileIDStrings, MetadataPartitionType.BLOOM_FILTERS.getPartitionPath());
metrics.ifPresent(m -> m.updateMetrics(HoodieMetadataMetrics.LOOKUP_BLOOM_FILTERS_METADATA_STR,
(timer.endTimer() / partitionIDFileIDStrings.size())));
Map<Pair<String, String>, ByteBuffer> partitionFileToBloomFilterMap = new HashMap<>();
for (final Pair<String, Option<HoodieRecord<HoodieMetadataPayload>>> entry : hoodieRecordList) {
if (entry.getRight().isPresent()) {
final Option<HoodieMetadataBloomFilter> bloomFilterMetadata =
entry.getRight().get().getData().getBloomFilterMetadata();
if (bloomFilterMetadata.isPresent()) {
if (!bloomFilterMetadata.get().getIsDeleted()) {
ValidationUtils.checkState(fileToKeyMap.containsKey(entry.getLeft()));
partitionFileToBloomFilterMap.put(fileToKeyMap.get(entry.getLeft()), bloomFilterMetadata.get().getBloomFilter());
}
} else {
LOG.error("Meta index bloom filter missing for: " + fileToKeyMap.get(entry.getLeft()));
}
}
}
return partitionFileToBloomFilterMap;
}
@Override
public Map<Pair<String, String>, HoodieMetadataColumnStats> getColumnStats(final List<Pair<String, String>> partitionNameFileNameList, final String columnName)
throws HoodieMetadataException {
if (!isColumnStatsIndexEnabled) {
LOG.error("Metadata column stats index is disabled!");
return Collections.emptyMap();
}
Map<String, Pair<String, String>> columnStatKeyToFileNameMap = new HashMap<>();
TreeSet<String> sortedKeys = new TreeSet<>();
final ColumnIndexID columnIndexID = new ColumnIndexID(columnName);
for (Pair<String, String> partitionNameFileNamePair : partitionNameFileNameList) {
final String columnStatsIndexKey = HoodieMetadataPayload.getColumnStatsIndexKey(
new PartitionIndexID(partitionNameFileNamePair.getLeft()),
new FileIndexID(partitionNameFileNamePair.getRight()),
columnIndexID);
sortedKeys.add(columnStatsIndexKey);
columnStatKeyToFileNameMap.put(columnStatsIndexKey, partitionNameFileNamePair);
}
List<String> columnStatKeys = new ArrayList<>(sortedKeys);
HoodieTimer timer = new HoodieTimer().startTimer();
List<Pair<String, Option<HoodieRecord<HoodieMetadataPayload>>>> hoodieRecordList =
getRecordsByKeys(columnStatKeys, MetadataPartitionType.COLUMN_STATS.getPartitionPath());
metrics.ifPresent(m -> m.updateMetrics(HoodieMetadataMetrics.LOOKUP_COLUMN_STATS_METADATA_STR, timer.endTimer()));
Map<Pair<String, String>, HoodieMetadataColumnStats> fileToColumnStatMap = new HashMap<>();
for (final Pair<String, Option<HoodieRecord<HoodieMetadataPayload>>> entry : hoodieRecordList) {
if (entry.getRight().isPresent()) {
final Option<HoodieMetadataColumnStats> columnStatMetadata =
entry.getRight().get().getData().getColumnStatMetadata();
if (columnStatMetadata.isPresent()) {
if (!columnStatMetadata.get().getIsDeleted()) {
ValidationUtils.checkState(columnStatKeyToFileNameMap.containsKey(entry.getLeft()));
final Pair<String, String> partitionFileNamePair = columnStatKeyToFileNameMap.get(entry.getLeft());
ValidationUtils.checkState(!fileToColumnStatMap.containsKey(partitionFileNamePair));
fileToColumnStatMap.put(partitionFileNamePair, columnStatMetadata.get());
}
} else {
LOG.error("Meta index column stats missing for: " + entry.getLeft());
}
}
}
return fileToColumnStatMap;
}
/**
* Returns a list of all partitions.
*/
protected List<String> fetchAllPartitionPaths() throws IOException {
HoodieTimer timer = new HoodieTimer().startTimer();
Option<HoodieRecord<HoodieMetadataPayload>> hoodieRecord = getRecordByKey(RECORDKEY_PARTITION_LIST, MetadataPartitionType.FILES.partitionPath());
Option<HoodieRecord<HoodieMetadataPayload>> hoodieRecord = getRecordByKey(RECORDKEY_PARTITION_LIST,
MetadataPartitionType.FILES.getPartitionPath());
metrics.ifPresent(m -> m.updateMetrics(HoodieMetadataMetrics.LOOKUP_PARTITIONS_STR, timer.endTimer()));
List<String> partitions = Collections.emptyList();
@@ -181,7 +303,8 @@ public abstract class BaseTableMetadata implements HoodieTableMetadata {
}
HoodieTimer timer = new HoodieTimer().startTimer();
Option<HoodieRecord<HoodieMetadataPayload>> hoodieRecord = getRecordByKey(partitionName, MetadataPartitionType.FILES.partitionPath());
Option<HoodieRecord<HoodieMetadataPayload>> hoodieRecord = getRecordByKey(partitionName,
MetadataPartitionType.FILES.getPartitionPath());
metrics.ifPresent(m -> m.updateMetrics(HoodieMetadataMetrics.LOOKUP_FILES_STR, timer.endTimer()));
FileStatus[] statuses = {};
@@ -215,7 +338,7 @@ public abstract class BaseTableMetadata implements HoodieTableMetadata {
HoodieTimer timer = new HoodieTimer().startTimer();
List<Pair<String, Option<HoodieRecord<HoodieMetadataPayload>>>> partitionsFileStatus =
getRecordsByKeys(new ArrayList<>(partitionInfo.keySet()), MetadataPartitionType.FILES.partitionPath());
getRecordsByKeys(new ArrayList<>(partitionInfo.keySet()), MetadataPartitionType.FILES.getPartitionPath());
metrics.ifPresent(m -> m.updateMetrics(HoodieMetadataMetrics.LOOKUP_FILES_STR, timer.endTimer()));
Map<String, FileStatus[]> result = new HashMap<>();

View File

@@ -18,6 +18,7 @@
package org.apache.hudi.metadata;
import org.apache.hudi.avro.model.HoodieMetadataColumnStats;
import org.apache.hudi.common.config.SerializableConfiguration;
import org.apache.hudi.common.engine.HoodieEngineContext;
import org.apache.hudi.common.fs.FSUtils;
@@ -29,8 +30,10 @@ import org.apache.hudi.common.util.collection.Pair;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hudi.exception.HoodieMetadataException;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.util.Arrays;
import java.util.Collections;
import java.util.List;
@@ -139,4 +142,21 @@ public class FileSystemBackedTableMetadata implements HoodieTableMetadata {
public void reset() {
// no-op
}
public Option<ByteBuffer> getBloomFilter(final String partitionName, final String fileName)
throws HoodieMetadataException {
throw new HoodieMetadataException("Unsupported operation: getBloomFilter for " + fileName);
}
@Override
public Map<Pair<String, String>, ByteBuffer> getBloomFilters(final List<Pair<String, String>> partitionNameFileNameList)
throws HoodieMetadataException {
throw new HoodieMetadataException("Unsupported operation: getBloomFilters!");
}
@Override
public Map<Pair<String, String>, HoodieMetadataColumnStats> getColumnStats(final List<Pair<String, String>> partitionNameFileNameList, final String columnName)
throws HoodieMetadataException {
throw new HoodieMetadataException("Unsupported operation: getColumnsStats!");
}
}

View File

@@ -18,6 +18,9 @@
package org.apache.hudi.metadata;
import org.apache.avro.Schema;
import org.apache.avro.generic.GenericRecord;
import org.apache.hadoop.fs.Path;
import org.apache.hudi.avro.HoodieAvroUtils;
import org.apache.hudi.avro.model.HoodieMetadataRecord;
import org.apache.hudi.avro.model.HoodieRestoreMetadata;
@@ -48,10 +51,6 @@ import org.apache.hudi.exception.HoodieMetadataException;
import org.apache.hudi.exception.TableNotFoundException;
import org.apache.hudi.io.storage.HoodieFileReader;
import org.apache.hudi.io.storage.HoodieFileReaderFactory;
import org.apache.avro.Schema;
import org.apache.avro.generic.GenericRecord;
import org.apache.hadoop.fs.Path;
import org.apache.log4j.LogManager;
import org.apache.log4j.Logger;
@@ -64,6 +63,7 @@ import java.util.List;
import java.util.Map;
import java.util.Set;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.stream.Collectors;
/**
@@ -80,8 +80,9 @@ public class HoodieBackedTableMetadata extends BaseTableMetadata {
// should we reuse the open file handles, across calls
private final boolean reuse;
// Readers for latest file slice corresponding to file groups in the metadata partition of interest
private Map<String, Pair<HoodieFileReader, HoodieMetadataMergedLogRecordReader>> partitionReaders = new ConcurrentHashMap<>();
// Readers for the latest file slice corresponding to file groups in the metadata partition
private Map<Pair<String, String>, Pair<HoodieFileReader, HoodieMetadataMergedLogRecordReader>> partitionReaders =
new ConcurrentHashMap<>();
public HoodieBackedTableMetadata(HoodieEngineContext engineContext, HoodieMetadataConfig metadataConfig,
String datasetBasePath, String spillableMapDirectory) {
@@ -97,7 +98,7 @@ public class HoodieBackedTableMetadata extends BaseTableMetadata {
private void initIfNeeded() {
this.metadataBasePath = HoodieTableMetadata.getMetadataTableBasePath(dataBasePath);
if (!enabled) {
if (!isMetadataTableEnabled) {
if (!HoodieTableMetadata.isMetadataTable(metadataBasePath)) {
LOG.info("Metadata table is disabled.");
}
@@ -105,14 +106,16 @@ public class HoodieBackedTableMetadata extends BaseTableMetadata {
try {
this.metadataMetaClient = HoodieTableMetaClient.builder().setConf(hadoopConf.get()).setBasePath(metadataBasePath).build();
this.metadataTableConfig = metadataMetaClient.getTableConfig();
this.isBloomFilterIndexEnabled = metadataConfig.isBloomFilterIndexEnabled();
this.isColumnStatsIndexEnabled = metadataConfig.isColumnStatsIndexEnabled();
} catch (TableNotFoundException e) {
LOG.warn("Metadata table was not found at path " + metadataBasePath);
this.enabled = false;
this.isMetadataTableEnabled = false;
this.metadataMetaClient = null;
this.metadataTableConfig = null;
} catch (Exception e) {
LOG.error("Failed to initialize metadata table at path " + metadataBasePath, e);
this.enabled = false;
this.isMetadataTableEnabled = false;
this.metadataMetaClient = null;
this.metadataTableConfig = null;
}
@@ -125,30 +128,43 @@ public class HoodieBackedTableMetadata extends BaseTableMetadata {
return recordsByKeys.size() == 0 ? Option.empty() : recordsByKeys.get(0).getValue();
}
protected List<Pair<String, Option<HoodieRecord<HoodieMetadataPayload>>>> getRecordsByKeys(List<String> keys, String partitionName) {
Pair<HoodieFileReader, HoodieMetadataMergedLogRecordReader> readers = openReadersIfNeeded(keys.get(0), partitionName);
try {
List<Long> timings = new ArrayList<>();
HoodieFileReader baseFileReader = readers.getKey();
HoodieMetadataMergedLogRecordReader logRecordScanner = readers.getRight();
@Override
protected List<Pair<String, Option<HoodieRecord<HoodieMetadataPayload>>>> getRecordsByKeys(List<String> keys,
String partitionName) {
Map<Pair<String, FileSlice>, List<String>> partitionFileSliceToKeysMap = getPartitionFileSliceToKeysMapping(partitionName, keys);
List<Pair<String, Option<HoodieRecord<HoodieMetadataPayload>>>> result = new ArrayList<>();
AtomicInteger fileSlicesKeysCount = new AtomicInteger();
partitionFileSliceToKeysMap.forEach((partitionFileSlicePair, fileSliceKeys) -> {
Pair<HoodieFileReader, HoodieMetadataMergedLogRecordReader> readers = openReadersIfNeeded(partitionName,
partitionFileSlicePair.getRight());
try {
List<Long> timings = new ArrayList<>();
HoodieFileReader baseFileReader = readers.getKey();
HoodieMetadataMergedLogRecordReader logRecordScanner = readers.getRight();
if (baseFileReader == null && logRecordScanner == null) {
return Collections.emptyList();
}
if (baseFileReader == null && logRecordScanner == null) {
return;
}
// local map to assist in merging with base file records
Map<String, Option<HoodieRecord<HoodieMetadataPayload>>> logRecords = readLogRecords(logRecordScanner, keys, timings);
List<Pair<String, Option<HoodieRecord<HoodieMetadataPayload>>>> result = readFromBaseAndMergeWithLogRecords(
baseFileReader, keys, logRecords, timings, partitionName);
LOG.info(String.format("Metadata read for %s keys took [baseFileRead, logMerge] %s ms", keys.size(), timings));
return result;
} catch (IOException ioe) {
throw new HoodieIOException("Error merging records from metadata table for " + keys.size() + " key : ", ioe);
} finally {
if (!reuse) {
close(partitionName);
// local map to assist in merging with base file records
Map<String, Option<HoodieRecord<HoodieMetadataPayload>>> logRecords = readLogRecords(logRecordScanner,
fileSliceKeys, timings);
result.addAll(readFromBaseAndMergeWithLogRecords(baseFileReader, fileSliceKeys, logRecords,
timings, partitionName));
LOG.debug(String.format("Metadata read for %s keys took [baseFileRead, logMerge] %s ms",
fileSliceKeys.size(), timings));
fileSlicesKeysCount.addAndGet(fileSliceKeys.size());
} catch (IOException ioe) {
throw new HoodieIOException("Error merging records from metadata table for " + keys.size() + " key : ", ioe);
} finally {
if (!reuse) {
close(Pair.of(partitionFileSlicePair.getLeft(), partitionFileSlicePair.getRight().getFileId()));
}
}
}
});
ValidationUtils.checkState(keys.size() == fileSlicesKeysCount.get());
return result;
}
private Map<String, Option<HoodieRecord<HoodieMetadataPayload>>> readLogRecords(HoodieMetadataMergedLogRecordReader logRecordScanner,
@@ -190,11 +206,11 @@ public class HoodieBackedTableMetadata extends BaseTableMetadata {
// Retrieve record from base file
if (baseFileReader != null) {
HoodieTimer readTimer = new HoodieTimer();
Map<String, GenericRecord> baseFileRecords = baseFileReader.getRecordsByKeys(keys);
for (String key : keys) {
readTimer.startTimer();
Option<GenericRecord> baseRecord = baseFileReader.getRecordByKey(key);
if (baseRecord.isPresent()) {
hoodieRecord = getRecord(baseRecord, partitionName);
if (baseFileRecords.containsKey(key)) {
hoodieRecord = getRecord(Option.of(baseFileRecords.get(key)), partitionName);
metrics.ifPresent(m -> m.updateMetrics(HoodieMetadataMetrics.BASEFILE_READ_STR, readTimer.endTimer()));
// merge base file record w/ log record if present
if (logRecords.containsKey(key) && logRecords.get(key).isPresent()) {
@@ -233,38 +249,52 @@ public class HoodieBackedTableMetadata extends BaseTableMetadata {
}
/**
* Returns a new pair of readers to the base and log files.
* Get the latest file slices for the interested keys in a given partition.
*
* @param partitionName - Partition to get the file slices from
* @param keys - Interested keys
* @return FileSlices for the keys
*/
private Pair<HoodieFileReader, HoodieMetadataMergedLogRecordReader> openReadersIfNeeded(String key, String partitionName) {
return partitionReaders.computeIfAbsent(partitionName, k -> {
try {
final long baseFileOpenMs;
final long logScannerOpenMs;
HoodieFileReader baseFileReader = null;
HoodieMetadataMergedLogRecordReader logRecordScanner = null;
private Map<Pair<String, FileSlice>, List<String>> getPartitionFileSliceToKeysMapping(final String partitionName, final List<String> keys) {
// Metadata is in sync till the latest completed instant on the dataset
List<FileSlice> latestFileSlices =
HoodieTableMetadataUtil.getPartitionLatestMergedFileSlices(metadataMetaClient, partitionName);
// Metadata is in sync till the latest completed instant on the dataset
Map<Pair<String, FileSlice>, List<String>> partitionFileSliceToKeysMap = new HashMap<>();
for (String key : keys) {
final FileSlice slice = latestFileSlices.get(HoodieTableMetadataUtil.mapRecordKeyToFileGroupIndex(key,
latestFileSlices.size()));
final Pair<String, FileSlice> partitionNameFileSlicePair = Pair.of(partitionName, slice);
partitionFileSliceToKeysMap.computeIfAbsent(partitionNameFileSlicePair, k -> new ArrayList<>()).add(key);
}
return partitionFileSliceToKeysMap;
}
/**
* Create a file reader and the record scanner for a given partition and file slice
* if readers are not already available.
*
* @param partitionName - Partition name
* @param slice - The file slice to open readers for
* @return File reader and the record scanner pair for the requested file slice
*/
private Pair<HoodieFileReader, HoodieMetadataMergedLogRecordReader> openReadersIfNeeded(String partitionName, FileSlice slice) {
return partitionReaders.computeIfAbsent(Pair.of(partitionName, slice.getFileId()), k -> {
try {
HoodieTimer timer = new HoodieTimer().startTimer();
List<FileSlice> latestFileSlices = HoodieTableMetadataUtil.getPartitionLatestMergedFileSlices(metadataMetaClient, partitionName);
if (latestFileSlices.size() == 0) {
// empty partition
return Pair.of(null, null);
}
ValidationUtils.checkArgument(latestFileSlices.size() == 1, String.format("Invalid number of file slices: found=%d, required=%d", latestFileSlices.size(), 1));
final FileSlice slice = latestFileSlices.get(HoodieTableMetadataUtil.mapRecordKeyToFileGroupIndex(key, latestFileSlices.size()));
// Open base file reader
Pair<HoodieFileReader, Long> baseFileReaderOpenTimePair = getBaseFileReader(slice, timer);
baseFileReader = baseFileReaderOpenTimePair.getKey();
baseFileOpenMs = baseFileReaderOpenTimePair.getValue();
HoodieFileReader baseFileReader = baseFileReaderOpenTimePair.getKey();
final long baseFileOpenMs = baseFileReaderOpenTimePair.getValue();
// Open the log record scanner using the log files from the latest file slice
Pair<HoodieMetadataMergedLogRecordReader, Long> logRecordScannerOpenTimePair = getLogRecordScanner(slice,
partitionName);
logRecordScanner = logRecordScannerOpenTimePair.getKey();
logScannerOpenMs = logRecordScannerOpenTimePair.getValue();
Pair<HoodieMetadataMergedLogRecordReader, Long> logRecordScannerOpenTimePair = getLogRecordScanner(slice, partitionName);
HoodieMetadataMergedLogRecordReader logRecordScanner = logRecordScannerOpenTimePair.getKey();
final long logScannerOpenMs = logRecordScannerOpenTimePair.getValue();
metrics.ifPresent(metrics -> metrics.updateMetrics(HoodieMetadataMetrics.SCAN_STR, baseFileOpenMs + logScannerOpenMs));
metrics.ifPresent(metrics -> metrics.updateMetrics(HoodieMetadataMetrics.SCAN_STR,
+baseFileOpenMs + logScannerOpenMs));
return Pair.of(baseFileReader, logRecordScanner);
} catch (IOException e) {
throw new HoodieIOException("Error opening readers for metadata table partition " + partitionName, e);
@@ -382,14 +412,20 @@ public class HoodieBackedTableMetadata extends BaseTableMetadata {
@Override
public void close() {
for (String partitionName : partitionReaders.keySet()) {
close(partitionName);
for (Pair<String, String> partitionFileSlicePair : partitionReaders.keySet()) {
close(partitionFileSlicePair);
}
partitionReaders.clear();
}
private synchronized void close(String partitionName) {
Pair<HoodieFileReader, HoodieMetadataMergedLogRecordReader> readers = partitionReaders.remove(partitionName);
/**
* Close the file reader and the record scanner for the given file slice.
*
* @param partitionFileSlicePair - Partition and FileSlice
*/
private synchronized void close(Pair<String, String> partitionFileSlicePair) {
Pair<HoodieFileReader, HoodieMetadataMergedLogRecordReader> readers =
partitionReaders.remove(partitionFileSlicePair);
if (readers != null) {
try {
if (readers.getKey() != null) {
@@ -405,7 +441,7 @@ public class HoodieBackedTableMetadata extends BaseTableMetadata {
}
public boolean enabled() {
return enabled;
return isMetadataTableEnabled;
}
public SerializableConfiguration getHadoopConf() {

View File

@@ -116,7 +116,7 @@ public class HoodieMetadataMergedLogRecordReader extends HoodieMergedLogRecordSc
* @param key Key of the record to retrieve
* @return {@code HoodieRecord} if key was found else {@code Option.empty()}
*/
public List<Pair<String, Option<HoodieRecord<HoodieMetadataPayload>>>> getRecordByKey(String key) {
public synchronized List<Pair<String, Option<HoodieRecord<HoodieMetadataPayload>>>> getRecordByKey(String key) {
return Collections.singletonList(Pair.of(key, Option.ofNullable((HoodieRecord) records.get(key))));
}

View File

@@ -41,6 +41,8 @@ public class HoodieMetadataMetrics implements Serializable {
// Metric names
public static final String LOOKUP_PARTITIONS_STR = "lookup_partitions";
public static final String LOOKUP_FILES_STR = "lookup_files";
public static final String LOOKUP_BLOOM_FILTERS_METADATA_STR = "lookup_meta_index_bloom_filters";
public static final String LOOKUP_COLUMN_STATS_METADATA_STR = "lookup_meta_index_column_ranges";
public static final String SCAN_STR = "scan";
public static final String BASEFILE_READ_STR = "basefile_read";
public static final String INITIALIZE_STR = "initialize";
@@ -77,7 +79,7 @@ public class HoodieMetadataMetrics implements Serializable {
Map<String, String> stats = new HashMap<>();
// Total size of the metadata and count of base/log files
for (String metadataPartition : MetadataPartitionType.all()) {
for (String metadataPartition : MetadataPartitionType.allPaths()) {
List<FileSlice> latestSlices = fsView.getLatestFileSlices(metadataPartition).collect(Collectors.toList());
// Total size of the metadata and count of base/log files

View File

@@ -18,13 +18,22 @@
package org.apache.hudi.metadata;
import org.apache.hudi.avro.model.HoodieMetadataColumnStats;
import org.apache.hudi.avro.model.HoodieMetadataBloomFilter;
import org.apache.hudi.avro.model.HoodieMetadataFileInfo;
import org.apache.hudi.avro.model.HoodieMetadataRecord;
import org.apache.hudi.common.bloom.BloomFilterTypeCode;
import org.apache.hudi.common.fs.FSUtils;
import org.apache.hudi.common.model.HoodieColumnRangeMetadata;
import org.apache.hudi.common.model.HoodieKey;
import org.apache.hudi.common.model.HoodieRecord;
import org.apache.hudi.common.model.HoodieRecordPayload;
import org.apache.hudi.common.util.Option;
import org.apache.hudi.common.util.ValidationUtils;
import org.apache.hudi.common.util.hash.ColumnIndexID;
import org.apache.hudi.common.util.hash.FileIndexID;
import org.apache.hudi.common.util.hash.PartitionIndexID;
import org.apache.hudi.exception.HoodieMetadataException;
import org.apache.avro.Schema;
import org.apache.avro.generic.GenericRecord;
@@ -36,7 +45,9 @@ import org.apache.hadoop.fs.Path;
import org.apache.hudi.io.storage.HoodieHFileReader;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.util.Arrays;
import java.util.Collection;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
@@ -46,36 +57,67 @@ import java.util.stream.Stream;
import static org.apache.hudi.metadata.HoodieTableMetadata.RECORDKEY_PARTITION_LIST;
/**
* This is a payload which saves information about a single entry in the Metadata Table.
*
* The type of the entry is determined by the "type" saved within the record. The following types of entries are saved:
*
* 1. List of partitions: There is a single such record
* key="__all_partitions__"
*
* 2. List of files in a Partition: There is one such record for each partition
* key=Partition name
*
* During compaction on the table, the deletions are merged with additions and hence pruned.
*
* Metadata Table records are saved with the schema defined in HoodieMetadata.avsc. This class encapsulates the
* HoodieMetadataRecord for ease of operations.
* MetadataTable records are persisted with the schema defined in HoodieMetadata.avsc.
* This class represents the payload for the MetadataTable.
* <p>
* This single metadata payload is shared by all the partitions under the metadata table.
* The partition specific records are determined by the field "type" saved within the record.
* The following types are supported:
* <p>
* METADATA_TYPE_PARTITION_LIST (1):
* -- List of all partitions. There is a single such record
* -- key = @{@link HoodieTableMetadata.RECORDKEY_PARTITION_LIST}
* <p>
* METADATA_TYPE_FILE_LIST (2):
* -- List of all files in a partition. There is one such record for each partition
* -- key = partition name
* <p>
* METADATA_TYPE_COLUMN_STATS (3):
* -- This is an index for column stats in the table
* <p>
* METADATA_TYPE_BLOOM_FILTER (4):
* -- This is an index for base file bloom filters. This is a map of FileID to its BloomFilter byte[].
* <p>
* During compaction on the table, the deletions are merged with additions and hence records are pruned.
*/
public class HoodieMetadataPayload implements HoodieRecordPayload<HoodieMetadataPayload> {
// Type of the record. This can be an enum in the schema but Avro1.8
// has a bug - https://issues.apache.org/jira/browse/AVRO-1810
protected static final int METADATA_TYPE_PARTITION_LIST = 1;
protected static final int METADATA_TYPE_FILE_LIST = 2;
protected static final int METADATA_TYPE_COLUMN_STATS = 3;
protected static final int METADATA_TYPE_BLOOM_FILTER = 4;
// HoodieMetadata schema field ids
public static final String KEY_FIELD_NAME = HoodieHFileReader.KEY_FIELD_NAME;
public static final String SCHEMA_FIELD_NAME_TYPE = "type";
public static final String SCHEMA_FIELD_NAME_METADATA = "filesystemMetadata";
private static final String SCHEMA_FIELD_ID_COLUMN_STATS = "ColumnStatsMetadata";
private static final String SCHEMA_FIELD_ID_BLOOM_FILTER = "BloomFilterMetadata";
// Type of the record
// This can be an enum in the schema but Avro 1.8 has a bug - https://issues.apache.org/jira/browse/AVRO-1810
private static final int PARTITION_LIST = 1;
private static final int FILE_LIST = 2;
// HoodieMetadata bloom filter payload field ids
private static final String FIELD_IS_DELETED = "isDeleted";
private static final String BLOOM_FILTER_FIELD_TYPE = "type";
private static final String BLOOM_FILTER_FIELD_TIMESTAMP = "timestamp";
private static final String BLOOM_FILTER_FIELD_BLOOM_FILTER = "bloomFilter";
private static final String BLOOM_FILTER_FIELD_IS_DELETED = FIELD_IS_DELETED;
// HoodieMetadata column stats payload field ids
private static final String COLUMN_STATS_FIELD_MIN_VALUE = "minValue";
private static final String COLUMN_STATS_FIELD_MAX_VALUE = "maxValue";
private static final String COLUMN_STATS_FIELD_NULL_COUNT = "nullCount";
private static final String COLUMN_STATS_FIELD_VALUE_COUNT = "valueCount";
private static final String COLUMN_STATS_FIELD_TOTAL_SIZE = "totalSize";
private static final String COLUMN_STATS_FIELD_RESOURCE_NAME = "fileName";
private static final String COLUMN_STATS_FIELD_TOTAL_UNCOMPRESSED_SIZE = "totalUncompressedSize";
private static final String COLUMN_STATS_FIELD_IS_DELETED = FIELD_IS_DELETED;
private String key = null;
private int type = 0;
private Map<String, HoodieMetadataFileInfo> filesystemMetadata = null;
private HoodieMetadataBloomFilter bloomFilterMetadata = null;
private HoodieMetadataColumnStats columnStatMetadata = null;
public HoodieMetadataPayload(GenericRecord record, Comparable<?> orderingVal) {
this(Option.of(record));
@@ -94,13 +136,60 @@ public class HoodieMetadataPayload implements HoodieRecordPayload<HoodieMetadata
filesystemMetadata.put(k.toString(), new HoodieMetadataFileInfo((Long) v.get("size"), (Boolean) v.get("isDeleted")));
});
}
if (type == METADATA_TYPE_BLOOM_FILTER) {
final GenericRecord metadataRecord = (GenericRecord) record.get().get(SCHEMA_FIELD_ID_BLOOM_FILTER);
if (metadataRecord == null) {
throw new HoodieMetadataException("Valid " + SCHEMA_FIELD_ID_BLOOM_FILTER + " record expected for type: " + METADATA_TYPE_BLOOM_FILTER);
}
bloomFilterMetadata = new HoodieMetadataBloomFilter(
(String) metadataRecord.get(BLOOM_FILTER_FIELD_TYPE),
(String) metadataRecord.get(BLOOM_FILTER_FIELD_TIMESTAMP),
(ByteBuffer) metadataRecord.get(BLOOM_FILTER_FIELD_BLOOM_FILTER),
(Boolean) metadataRecord.get(BLOOM_FILTER_FIELD_IS_DELETED)
);
}
if (type == METADATA_TYPE_COLUMN_STATS) {
GenericRecord v = (GenericRecord) record.get().get(SCHEMA_FIELD_ID_COLUMN_STATS);
if (v == null) {
throw new HoodieMetadataException("Valid " + SCHEMA_FIELD_ID_COLUMN_STATS + " record expected for type: " + METADATA_TYPE_COLUMN_STATS);
}
columnStatMetadata = new HoodieMetadataColumnStats(
(String) v.get(COLUMN_STATS_FIELD_RESOURCE_NAME),
(String) v.get(COLUMN_STATS_FIELD_MIN_VALUE),
(String) v.get(COLUMN_STATS_FIELD_MAX_VALUE),
(Long) v.get(COLUMN_STATS_FIELD_NULL_COUNT),
(Long) v.get(COLUMN_STATS_FIELD_VALUE_COUNT),
(Long) v.get(COLUMN_STATS_FIELD_TOTAL_SIZE),
(Long) v.get(COLUMN_STATS_FIELD_TOTAL_UNCOMPRESSED_SIZE),
(Boolean) v.get(COLUMN_STATS_FIELD_IS_DELETED)
);
}
}
}
private HoodieMetadataPayload(String key, int type, Map<String, HoodieMetadataFileInfo> filesystemMetadata) {
this(key, type, filesystemMetadata, null, null);
}
private HoodieMetadataPayload(String key, int type, HoodieMetadataBloomFilter metadataBloomFilter) {
this(key, type, null, metadataBloomFilter, null);
}
private HoodieMetadataPayload(String key, int type, HoodieMetadataColumnStats columnStats) {
this(key, type, null, null, columnStats);
}
protected HoodieMetadataPayload(String key, int type,
Map<String, HoodieMetadataFileInfo> filesystemMetadata,
HoodieMetadataBloomFilter metadataBloomFilter,
HoodieMetadataColumnStats columnStats) {
this.key = key;
this.type = type;
this.filesystemMetadata = filesystemMetadata;
this.bloomFilterMetadata = metadataBloomFilter;
this.columnStatMetadata = columnStats;
}
/**
@@ -110,55 +199,97 @@ public class HoodieMetadataPayload implements HoodieRecordPayload<HoodieMetadata
*/
public static HoodieRecord<HoodieMetadataPayload> createPartitionListRecord(List<String> partitions) {
Map<String, HoodieMetadataFileInfo> fileInfo = new HashMap<>();
partitions.forEach(partition -> fileInfo.put(partition, new HoodieMetadataFileInfo(0L, false)));
partitions.forEach(partition -> fileInfo.put(partition, new HoodieMetadataFileInfo(0L, false)));
HoodieKey key = new HoodieKey(RECORDKEY_PARTITION_LIST, MetadataPartitionType.FILES.partitionPath());
HoodieMetadataPayload payload = new HoodieMetadataPayload(key.getRecordKey(), PARTITION_LIST, fileInfo);
HoodieKey key = new HoodieKey(RECORDKEY_PARTITION_LIST, MetadataPartitionType.FILES.getPartitionPath());
HoodieMetadataPayload payload = new HoodieMetadataPayload(key.getRecordKey(), METADATA_TYPE_PARTITION_LIST,
fileInfo);
return new HoodieRecord<>(key, payload);
}
/**
* Create and return a {@code HoodieMetadataPayload} to save list of files within a partition.
*
* @param partition The name of the partition
* @param filesAdded Mapping of files to their sizes for files which have been added to this partition
* @param partition The name of the partition
* @param filesAdded Mapping of files to their sizes for files which have been added to this partition
* @param filesDeleted List of files which have been deleted from this partition
*/
public static HoodieRecord<HoodieMetadataPayload> createPartitionFilesRecord(String partition,
Option<Map<String, Long>> filesAdded, Option<List<String>> filesDeleted) {
Option<Map<String, Long>> filesAdded,
Option<List<String>> filesDeleted) {
Map<String, HoodieMetadataFileInfo> fileInfo = new HashMap<>();
filesAdded.ifPresent(
m -> m.forEach((filename, size) -> fileInfo.put(filename, new HoodieMetadataFileInfo(size, false))));
filesDeleted.ifPresent(
m -> m.forEach(filename -> fileInfo.put(filename, new HoodieMetadataFileInfo(0L, true))));
m -> m.forEach(filename -> fileInfo.put(filename, new HoodieMetadataFileInfo(0L, true))));
HoodieKey key = new HoodieKey(partition, MetadataPartitionType.FILES.partitionPath());
HoodieMetadataPayload payload = new HoodieMetadataPayload(key.getRecordKey(), FILE_LIST, fileInfo);
HoodieKey key = new HoodieKey(partition, MetadataPartitionType.FILES.getPartitionPath());
HoodieMetadataPayload payload = new HoodieMetadataPayload(key.getRecordKey(), METADATA_TYPE_FILE_LIST, fileInfo);
return new HoodieRecord<>(key, payload);
}
/**
* Create bloom filter metadata record.
*
* @param partitionName - Partition name
* @param baseFileName - Base file name for which the bloom filter needs to persisted
* @param timestamp - Instant timestamp responsible for this record
* @param bloomFilter - Bloom filter for the File
* @param isDeleted - Is the bloom filter no more valid
* @return Metadata payload containing the fileID and its bloom filter record
*/
public static HoodieRecord<HoodieMetadataPayload> createBloomFilterMetadataRecord(final String partitionName,
final String baseFileName,
final String timestamp,
final ByteBuffer bloomFilter,
final boolean isDeleted) {
ValidationUtils.checkArgument(!baseFileName.contains(Path.SEPARATOR)
&& FSUtils.isBaseFile(new Path(baseFileName)),
"Invalid base file '" + baseFileName + "' for MetaIndexBloomFilter!");
final String bloomFilterIndexKey = new PartitionIndexID(partitionName).asBase64EncodedString()
.concat(new FileIndexID(baseFileName).asBase64EncodedString());
HoodieKey key = new HoodieKey(bloomFilterIndexKey, MetadataPartitionType.BLOOM_FILTERS.getPartitionPath());
// TODO: HUDI-3203 Get the bloom filter type from the file
HoodieMetadataBloomFilter metadataBloomFilter =
new HoodieMetadataBloomFilter(BloomFilterTypeCode.DYNAMIC_V0.name(),
timestamp, bloomFilter, isDeleted);
HoodieMetadataPayload metadataPayload = new HoodieMetadataPayload(key.getRecordKey(),
HoodieMetadataPayload.METADATA_TYPE_BLOOM_FILTER, metadataBloomFilter);
return new HoodieRecord<>(key, metadataPayload);
}
@Override
public HoodieMetadataPayload preCombine(HoodieMetadataPayload previousRecord) {
ValidationUtils.checkArgument(previousRecord.type == type,
"Cannot combine " + previousRecord.type + " with " + type);
Map<String, HoodieMetadataFileInfo> combinedFileInfo = null;
"Cannot combine " + previousRecord.type + " with " + type);
switch (type) {
case PARTITION_LIST:
case FILE_LIST:
combinedFileInfo = combineFilesystemMetadata(previousRecord);
break;
case METADATA_TYPE_PARTITION_LIST:
case METADATA_TYPE_FILE_LIST:
Map<String, HoodieMetadataFileInfo> combinedFileInfo = combineFilesystemMetadata(previousRecord);
return new HoodieMetadataPayload(key, type, combinedFileInfo);
case METADATA_TYPE_BLOOM_FILTER:
HoodieMetadataBloomFilter combineBloomFilterMetadata = combineBloomFilterMetadata(previousRecord);
return new HoodieMetadataPayload(key, type, combineBloomFilterMetadata);
case METADATA_TYPE_COLUMN_STATS:
return new HoodieMetadataPayload(key, type, combineColumnStatsMetadatat(previousRecord));
default:
throw new HoodieMetadataException("Unknown type of HoodieMetadataPayload: " + type);
}
}
return new HoodieMetadataPayload(key, type, combinedFileInfo);
private HoodieMetadataBloomFilter combineBloomFilterMetadata(HoodieMetadataPayload previousRecord) {
return this.bloomFilterMetadata;
}
private HoodieMetadataColumnStats combineColumnStatsMetadatat(HoodieMetadataPayload previousRecord) {
return this.columnStatMetadata;
}
@Override
public Option<IndexedRecord> combineAndGetUpdateValue(IndexedRecord oldRecord, Schema schema) throws IOException {
HoodieMetadataPayload anotherPayload = new HoodieMetadataPayload(Option.of((GenericRecord)oldRecord));
HoodieMetadataPayload anotherPayload = new HoodieMetadataPayload(Option.of((GenericRecord) oldRecord));
HoodieRecordPayload combinedPayload = preCombine(anotherPayload);
return combinedPayload.getInsertValue(schema);
}
@@ -169,7 +300,8 @@ public class HoodieMetadataPayload implements HoodieRecordPayload<HoodieMetadata
return Option.empty();
}
HoodieMetadataRecord record = new HoodieMetadataRecord(key, type, filesystemMetadata);
HoodieMetadataRecord record = new HoodieMetadataRecord(key, type, filesystemMetadata, bloomFilterMetadata,
columnStatMetadata);
return Option.of(record);
}
@@ -187,6 +319,28 @@ public class HoodieMetadataPayload implements HoodieRecordPayload<HoodieMetadata
return filterFileInfoEntries(true).map(Map.Entry::getKey).sorted().collect(Collectors.toList());
}
/**
* Get the bloom filter metadata from this payload.
*/
public Option<HoodieMetadataBloomFilter> getBloomFilterMetadata() {
if (bloomFilterMetadata == null) {
return Option.empty();
}
return Option.of(bloomFilterMetadata);
}
/**
* Get the bloom filter metadata from this payload.
*/
public Option<HoodieMetadataColumnStats> getColumnStatMetadata() {
if (columnStatMetadata == null) {
return Option.empty();
}
return Option.of(columnStatMetadata);
}
/**
* Returns the files added as part of this record.
*/
@@ -235,6 +389,70 @@ public class HoodieMetadataPayload implements HoodieRecordPayload<HoodieMetadata
return combinedFileInfo;
}
/**
* Get bloom filter index key.
*
* @param partitionIndexID - Partition index id
* @param fileIndexID - File index id
* @return Bloom filter index key
*/
public static String getBloomFilterIndexKey(PartitionIndexID partitionIndexID, FileIndexID fileIndexID) {
return partitionIndexID.asBase64EncodedString()
.concat(fileIndexID.asBase64EncodedString());
}
/**
* Get column stats index key.
*
* @param partitionIndexID - Partition index id
* @param fileIndexID - File index id
* @param columnIndexID - Column index id
* @return Column stats index key
*/
public static String getColumnStatsIndexKey(PartitionIndexID partitionIndexID, FileIndexID fileIndexID, ColumnIndexID columnIndexID) {
return columnIndexID.asBase64EncodedString()
.concat(partitionIndexID.asBase64EncodedString())
.concat(fileIndexID.asBase64EncodedString());
}
/**
* Get column stats index key from the column range metadata.
*
* @param partitionName - Partition name
* @param columnRangeMetadata - Column range metadata
* @return Column stats index key
*/
public static String getColumnStatsIndexKey(String partitionName, HoodieColumnRangeMetadata<Comparable> columnRangeMetadata) {
final PartitionIndexID partitionIndexID = new PartitionIndexID(partitionName);
final FileIndexID fileIndexID = new FileIndexID(new Path(columnRangeMetadata.getFilePath()).getName());
final ColumnIndexID columnIndexID = new ColumnIndexID(columnRangeMetadata.getColumnName());
return getColumnStatsIndexKey(partitionIndexID, fileIndexID, columnIndexID);
}
public static Stream<HoodieRecord> createColumnStatsRecords(
String partitionName, Collection<HoodieColumnRangeMetadata<Comparable>> columnRangeMetadataList, boolean isDeleted) {
return columnRangeMetadataList.stream().map(columnRangeMetadata -> {
HoodieKey key = new HoodieKey(getColumnStatsIndexKey(partitionName, columnRangeMetadata),
MetadataPartitionType.COLUMN_STATS.getPartitionPath());
HoodieMetadataPayload payload = new HoodieMetadataPayload(key.getRecordKey(), METADATA_TYPE_COLUMN_STATS,
HoodieMetadataColumnStats.newBuilder()
.setFileName(new Path(columnRangeMetadata.getFilePath()).getName())
.setMinValue(columnRangeMetadata.getMinValue() == null ? null :
columnRangeMetadata.getMinValue().toString())
.setMaxValue(columnRangeMetadata.getMaxValue() == null ? null :
columnRangeMetadata.getMaxValue().toString())
.setNullCount(columnRangeMetadata.getNullCount())
.setValueCount(columnRangeMetadata.getValueCount())
.setTotalSize(columnRangeMetadata.getTotalSize())
.setTotalUncompressedSize(columnRangeMetadata.getTotalUncompressedSize())
.setIsDeleted(isDeleted)
.build());
return new HoodieRecord<>(key, payload);
});
}
@Override
public String toString() {
final StringBuilder sb = new StringBuilder("HoodieMetadataPayload {");
@@ -242,6 +460,20 @@ public class HoodieMetadataPayload implements HoodieRecordPayload<HoodieMetadata
sb.append(SCHEMA_FIELD_NAME_TYPE + "=").append(type).append(", ");
sb.append("creations=").append(Arrays.toString(getFilenames().toArray())).append(", ");
sb.append("deletions=").append(Arrays.toString(getDeletions().toArray())).append(", ");
if (type == METADATA_TYPE_BLOOM_FILTER) {
ValidationUtils.checkState(getBloomFilterMetadata().isPresent());
sb.append("BloomFilter: {");
sb.append("bloom size: " + getBloomFilterMetadata().get().getBloomFilter().array().length).append(", ");
sb.append("timestamp: " + getBloomFilterMetadata().get().getTimestamp()).append(", ");
sb.append("deleted: " + getBloomFilterMetadata().get().getIsDeleted());
sb.append("}");
}
if (type == METADATA_TYPE_COLUMN_STATS) {
ValidationUtils.checkState(getColumnStatMetadata().isPresent());
sb.append("ColStats: {");
sb.append(getColumnStatMetadata().get());
sb.append("}");
}
sb.append('}');
return sb.toString();
}

View File

@@ -18,6 +18,7 @@
package org.apache.hudi.metadata;
import org.apache.hudi.avro.model.HoodieMetadataColumnStats;
import org.apache.hudi.common.config.HoodieMetadataConfig;
import org.apache.hudi.common.config.SerializableConfiguration;
import org.apache.hudi.common.engine.HoodieEngineContext;
@@ -25,9 +26,12 @@ import org.apache.hudi.common.table.HoodieTableMetaClient;
import org.apache.hudi.common.util.Option;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.Path;
import org.apache.hudi.common.util.collection.Pair;
import org.apache.hudi.exception.HoodieMetadataException;
import java.io.IOException;
import java.io.Serializable;
import java.nio.ByteBuffer;
import java.util.List;
import java.util.Map;
@@ -104,6 +108,38 @@ public interface HoodieTableMetadata extends Serializable, AutoCloseable {
*/
Map<String, FileStatus[]> getAllFilesInPartitions(List<String> partitionPaths) throws IOException;
/**
* Get the bloom filter for the FileID from the metadata table.
*
* @param partitionName - Partition name
* @param fileName - File name for which bloom filter needs to be retrieved
* @return BloomFilter byte buffer if available, otherwise empty
* @throws HoodieMetadataException
*/
Option<ByteBuffer> getBloomFilter(final String partitionName, final String fileName)
throws HoodieMetadataException;
/**
* Get bloom filters for files from the metadata table index.
*
* @param partitionNameFileNameList - List of partition and file name pair for which bloom filters need to be retrieved
* @return Map of partition file name pair to its bloom filter byte buffer
* @throws HoodieMetadataException
*/
Map<Pair<String, String>, ByteBuffer> getBloomFilters(final List<Pair<String, String>> partitionNameFileNameList)
throws HoodieMetadataException;
/**
* Get column stats for files from the metadata table index.
*
* @param partitionNameFileNameList - List of partition and file name pair for which bloom filters need to be retrieved
* @param columnName - Column name for which stats are needed
* @return Map of partition and file name pair to its column stats
* @throws HoodieMetadataException
*/
Map<Pair<String, String>, HoodieMetadataColumnStats> getColumnStats(final List<Pair<String, String>> partitionNameFileNameList, final String columnName)
throws HoodieMetadataException;
/**
* Get the instant time to which the metadata is synced w.r.t data timeline.
*/

View File

@@ -18,29 +18,44 @@
package org.apache.hudi.metadata;
import org.apache.avro.generic.IndexedRecord;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hudi.avro.model.HoodieCleanMetadata;
import org.apache.hudi.avro.model.HoodieRestoreMetadata;
import org.apache.hudi.avro.model.HoodieRollbackMetadata;
import org.apache.hudi.common.bloom.BloomFilter;
import org.apache.hudi.common.config.HoodieMetadataConfig;
import org.apache.hudi.common.data.HoodieData;
import org.apache.hudi.common.engine.HoodieEngineContext;
import org.apache.hudi.common.fs.FSUtils;
import org.apache.hudi.common.model.FileSlice;
import org.apache.hudi.common.model.HoodieColumnRangeMetadata;
import org.apache.hudi.common.model.HoodieCommitMetadata;
import org.apache.hudi.common.model.HoodieDeltaWriteStat;
import org.apache.hudi.common.model.HoodieFileFormat;
import org.apache.hudi.common.model.HoodieRecord;
import org.apache.hudi.common.model.HoodieWriteStat;
import org.apache.hudi.common.table.HoodieTableMetaClient;
import org.apache.hudi.common.table.TableSchemaResolver;
import org.apache.hudi.common.table.timeline.HoodieActiveTimeline;
import org.apache.hudi.common.table.timeline.HoodieDefaultTimeline;
import org.apache.hudi.common.table.timeline.HoodieInstant;
import org.apache.hudi.common.table.timeline.HoodieTimeline;
import org.apache.hudi.common.table.view.HoodieTableFileSystemView;
import org.apache.hudi.common.util.Option;
import org.apache.hudi.common.util.ParquetUtils;
import org.apache.hudi.common.util.ValidationUtils;
import org.apache.hudi.common.util.collection.Pair;
import org.apache.hudi.exception.HoodieException;
import org.apache.hudi.exception.HoodieMetadataException;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hudi.io.storage.HoodieFileReader;
import org.apache.hudi.io.storage.HoodieFileReaderFactory;
import org.apache.log4j.LogManager;
import org.apache.log4j.Logger;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
@@ -62,12 +77,17 @@ public class HoodieTableMetadataUtil {
private static final Logger LOG = LogManager.getLogger(HoodieTableMetadataUtil.class);
protected static final String PARTITION_NAME_FILES = "files";
protected static final String PARTITION_NAME_COLUMN_STATS = "column_stats";
protected static final String PARTITION_NAME_BLOOM_FILTERS = "bloom_filters";
/**
* Delete the metadata table for the dataset. This will be invoked during upgrade/downgrade operation during which no other
* Delete the metadata table for the dataset. This will be invoked during upgrade/downgrade operation during which
* no other
* process should be running.
*
* @param basePath base path of the dataset
* @param context instance of {@link HoodieEngineContext}.
* @param context instance of {@link HoodieEngineContext}.
*/
public static void deleteMetadataTable(String basePath, HoodieEngineContext context) {
final String metadataTablePath = HoodieTableMetadata.getMetadataTableBasePath(basePath);
@@ -79,14 +99,53 @@ public class HoodieTableMetadataUtil {
}
}
/**
* Convert commit action to metadata records for the enabled partition types.
*
* @param commitMetadata - Commit action metadata
* @param dataMetaClient - Meta client for the data table
* @param isMetaIndexColumnStatsForAllColumns - Do all columns need meta indexing?
* @param instantTime - Action instant time
* @return Map of partition to metadata records for the commit action
*/
public static Map<MetadataPartitionType, HoodieData<HoodieRecord>> convertMetadataToRecords(
HoodieEngineContext context, List<MetadataPartitionType> enabledPartitionTypes,
HoodieCommitMetadata commitMetadata, HoodieTableMetaClient dataMetaClient,
boolean isMetaIndexColumnStatsForAllColumns, String instantTime) {
final Map<MetadataPartitionType, HoodieData<HoodieRecord>> partitionToRecordsMap = new HashMap<>();
final HoodieData<HoodieRecord> filesPartitionRecordsRDD = context.parallelize(
convertMetadataToFilesPartitionRecords(commitMetadata, instantTime), 1);
partitionToRecordsMap.put(MetadataPartitionType.FILES, filesPartitionRecordsRDD);
if (enabledPartitionTypes.contains(MetadataPartitionType.BLOOM_FILTERS)) {
final List<HoodieRecord> metadataBloomFilterRecords = convertMetadataToBloomFilterRecords(commitMetadata,
dataMetaClient, instantTime);
if (!metadataBloomFilterRecords.isEmpty()) {
final HoodieData<HoodieRecord> metadataBloomFilterRecordsRDD = context.parallelize(metadataBloomFilterRecords, 1);
partitionToRecordsMap.put(MetadataPartitionType.BLOOM_FILTERS, metadataBloomFilterRecordsRDD);
}
}
if (enabledPartitionTypes.contains(MetadataPartitionType.COLUMN_STATS)) {
final List<HoodieRecord> metadataColumnStats = convertMetadataToColumnStatsRecords(commitMetadata, context,
dataMetaClient, isMetaIndexColumnStatsForAllColumns, instantTime);
if (!metadataColumnStats.isEmpty()) {
final HoodieData<HoodieRecord> metadataColumnStatsRDD = context.parallelize(metadataColumnStats, 1);
partitionToRecordsMap.put(MetadataPartitionType.COLUMN_STATS, metadataColumnStatsRDD);
}
}
return partitionToRecordsMap;
}
/**
* Finds all new files/partitions created as part of commit and creates metadata table records for them.
*
* @param commitMetadata
* @param instantTime
* @return a list of metadata table records
* @param commitMetadata - Commit action metadata
* @param instantTime - Commit action instant time
* @return List of metadata table records
*/
public static List<HoodieRecord> convertMetadataToRecords(HoodieCommitMetadata commitMetadata, String instantTime) {
public static List<HoodieRecord> convertMetadataToFilesPartitionRecords(HoodieCommitMetadata commitMetadata,
String instantTime) {
List<HoodieRecord> records = new LinkedList<>();
List<String> allPartitions = new LinkedList<>();
commitMetadata.getPartitionToWriteStats().forEach((partitionStatName, writeStats) -> {
@@ -124,6 +183,102 @@ public class HoodieTableMetadataUtil {
return records;
}
/**
* Convert commit action metadata to bloom filter records.
*
* @param commitMetadata - Commit action metadata
* @param dataMetaClient - Meta client for the data table
* @param instantTime - Action instant time
* @return List of metadata table records
*/
public static List<HoodieRecord> convertMetadataToBloomFilterRecords(HoodieCommitMetadata commitMetadata,
HoodieTableMetaClient dataMetaClient,
String instantTime) {
List<HoodieRecord> records = new LinkedList<>();
commitMetadata.getPartitionToWriteStats().forEach((partitionStatName, writeStats) -> {
final String partition = partitionStatName.equals(EMPTY_PARTITION_NAME) ? NON_PARTITIONED_NAME : partitionStatName;
Map<String, Long> newFiles = new HashMap<>(writeStats.size());
writeStats.forEach(hoodieWriteStat -> {
// No action for delta logs
if (hoodieWriteStat instanceof HoodieDeltaWriteStat) {
return;
}
String pathWithPartition = hoodieWriteStat.getPath();
if (pathWithPartition == null) {
// Empty partition
LOG.error("Failed to find path in write stat to update metadata table " + hoodieWriteStat);
return;
}
int offset = partition.equals(NON_PARTITIONED_NAME) ? (pathWithPartition.startsWith("/") ? 1 : 0) :
partition.length() + 1;
final String fileName = pathWithPartition.substring(offset);
if (!FSUtils.isBaseFile(new Path(fileName))) {
return;
}
ValidationUtils.checkState(!newFiles.containsKey(fileName), "Duplicate files in HoodieCommitMetadata");
final Path writeFilePath = new Path(dataMetaClient.getBasePath(), pathWithPartition);
try {
HoodieFileReader<IndexedRecord> fileReader =
HoodieFileReaderFactory.getFileReader(dataMetaClient.getHadoopConf(), writeFilePath);
try {
final BloomFilter fileBloomFilter = fileReader.readBloomFilter();
if (fileBloomFilter == null) {
LOG.error("Failed to read bloom filter for " + writeFilePath);
return;
}
ByteBuffer bloomByteBuffer = ByteBuffer.wrap(fileBloomFilter.serializeToString().getBytes());
HoodieRecord record = HoodieMetadataPayload.createBloomFilterMetadataRecord(
partition, fileName, instantTime, bloomByteBuffer, false);
records.add(record);
} catch (Exception e) {
LOG.error("Failed to read bloom filter for " + writeFilePath);
return;
}
fileReader.close();
} catch (IOException e) {
LOG.error("Failed to get bloom filter for file: " + writeFilePath + ", write stat: " + hoodieWriteStat);
}
});
});
return records;
}
/**
* Convert the clean action to metadata records.
*/
public static Map<MetadataPartitionType, HoodieData<HoodieRecord>> convertMetadataToRecords(
HoodieEngineContext engineContext, List<MetadataPartitionType> enabledPartitionTypes,
HoodieCleanMetadata cleanMetadata, HoodieTableMetaClient dataMetaClient, String instantTime) {
final Map<MetadataPartitionType, HoodieData<HoodieRecord>> partitionToRecordsMap = new HashMap<>();
final HoodieData<HoodieRecord> filesPartitionRecordsRDD = engineContext.parallelize(
convertMetadataToFilesPartitionRecords(cleanMetadata, instantTime), 1);
partitionToRecordsMap.put(MetadataPartitionType.FILES, filesPartitionRecordsRDD);
if (enabledPartitionTypes.contains(MetadataPartitionType.BLOOM_FILTERS)) {
final List<HoodieRecord> metadataBloomFilterRecords = convertMetadataToBloomFilterRecords(cleanMetadata,
engineContext, instantTime);
if (!metadataBloomFilterRecords.isEmpty()) {
final HoodieData<HoodieRecord> metadataBloomFilterRecordsRDD = engineContext.parallelize(metadataBloomFilterRecords, 1);
partitionToRecordsMap.put(MetadataPartitionType.BLOOM_FILTERS, metadataBloomFilterRecordsRDD);
}
}
if (enabledPartitionTypes.contains(MetadataPartitionType.COLUMN_STATS)) {
final List<HoodieRecord> metadataColumnStats = convertMetadataToColumnStatsRecords(cleanMetadata, engineContext,
dataMetaClient);
if (!metadataColumnStats.isEmpty()) {
final HoodieData<HoodieRecord> metadataColumnStatsRDD = engineContext.parallelize(metadataColumnStats, 1);
partitionToRecordsMap.put(MetadataPartitionType.COLUMN_STATS, metadataColumnStatsRDD);
}
}
return partitionToRecordsMap;
}
/**
* Finds all files that were deleted as part of a clean and creates metadata table records for them.
*
@@ -131,7 +286,8 @@ public class HoodieTableMetadataUtil {
* @param instantTime
* @return a list of metadata table records
*/
public static List<HoodieRecord> convertMetadataToRecords(HoodieCleanMetadata cleanMetadata, String instantTime) {
public static List<HoodieRecord> convertMetadataToFilesPartitionRecords(HoodieCleanMetadata cleanMetadata,
String instantTime) {
List<HoodieRecord> records = new LinkedList<>();
int[] fileDeleteCount = {0};
cleanMetadata.getPartitionMetadata().forEach((partitionName, partitionMetadata) -> {
@@ -150,48 +306,187 @@ public class HoodieTableMetadataUtil {
return records;
}
/**
* Convert clean metadata to bloom filter index records.
*
* @param cleanMetadata - Clean action metadata
* @param engineContext - Engine context
* @param instantTime - Clean action instant time
* @return List of bloom filter index records for the clean metadata
*/
public static List<HoodieRecord> convertMetadataToBloomFilterRecords(HoodieCleanMetadata cleanMetadata,
HoodieEngineContext engineContext,
String instantTime) {
List<Pair<String, String>> deleteFileList = new ArrayList<>();
cleanMetadata.getPartitionMetadata().forEach((partition, partitionMetadata) -> {
// Files deleted from a partition
List<String> deletedFiles = partitionMetadata.getDeletePathPatterns();
deletedFiles.forEach(entry -> {
final Path deletedFilePath = new Path(entry);
if (FSUtils.isBaseFile(deletedFilePath)) {
deleteFileList.add(Pair.of(partition, deletedFilePath.getName()));
}
});
});
return engineContext.map(deleteFileList, deleteFileInfo -> {
return HoodieMetadataPayload.createBloomFilterMetadataRecord(
deleteFileInfo.getLeft(), deleteFileInfo.getRight(), instantTime, ByteBuffer.allocate(0), true);
}, 1).stream().collect(Collectors.toList());
}
/**
* Convert clean metadata to column stats index records.
*
* @param cleanMetadata - Clean action metadata
* @param engineContext - Engine context
* @param datasetMetaClient - data table meta client
* @return List of column stats index records for the clean metadata
*/
public static List<HoodieRecord> convertMetadataToColumnStatsRecords(HoodieCleanMetadata cleanMetadata,
HoodieEngineContext engineContext,
HoodieTableMetaClient datasetMetaClient) {
List<Pair<String, String>> deleteFileList = new ArrayList<>();
cleanMetadata.getPartitionMetadata().forEach((partition, partitionMetadata) -> {
// Files deleted from a partition
List<String> deletedFiles = partitionMetadata.getDeletePathPatterns();
deletedFiles.forEach(entry -> deleteFileList.add(Pair.of(partition, entry)));
});
List<String> latestColumns = getLatestColumns(datasetMetaClient);
return engineContext.flatMap(deleteFileList,
deleteFileInfo -> {
if (deleteFileInfo.getRight().endsWith(HoodieFileFormat.PARQUET.getFileExtension())) {
return getColumnStats(deleteFileInfo.getKey(), deleteFileInfo.getValue(), datasetMetaClient,
latestColumns, true);
}
return Stream.empty();
}, 1).stream().collect(Collectors.toList());
}
/**
* Convert restore action metadata to metadata table records.
*/
public static Map<MetadataPartitionType, HoodieData<HoodieRecord>> convertMetadataToRecords(
HoodieEngineContext engineContext, List<MetadataPartitionType> enabledPartitionTypes,
HoodieActiveTimeline metadataTableTimeline, HoodieRestoreMetadata restoreMetadata,
HoodieTableMetaClient dataMetaClient, String instantTime, Option<String> lastSyncTs) {
final Map<MetadataPartitionType, HoodieData<HoodieRecord>> partitionToRecordsMap = new HashMap<>();
final Map<String, Map<String, Long>> partitionToAppendedFiles = new HashMap<>();
final Map<String, List<String>> partitionToDeletedFiles = new HashMap<>();
processRestoreMetadata(metadataTableTimeline, restoreMetadata,
partitionToAppendedFiles, partitionToDeletedFiles, lastSyncTs);
final HoodieData<HoodieRecord> filesPartitionRecordsRDD = engineContext.parallelize(
convertFilesToFilesPartitionRecords(partitionToDeletedFiles,
partitionToAppendedFiles, instantTime, "Restore"), 1);
partitionToRecordsMap.put(MetadataPartitionType.FILES, filesPartitionRecordsRDD);
if (enabledPartitionTypes.contains(MetadataPartitionType.BLOOM_FILTERS)) {
final List<HoodieRecord> metadataBloomFilterRecords = convertFilesToBloomFilterRecords(
engineContext, dataMetaClient, partitionToDeletedFiles, partitionToAppendedFiles, instantTime);
if (!metadataBloomFilterRecords.isEmpty()) {
final HoodieData<HoodieRecord> metadataBloomFilterRecordsRDD = engineContext.parallelize(metadataBloomFilterRecords, 1);
partitionToRecordsMap.put(MetadataPartitionType.BLOOM_FILTERS, metadataBloomFilterRecordsRDD);
}
}
if (enabledPartitionTypes.contains(MetadataPartitionType.COLUMN_STATS)) {
final List<HoodieRecord> metadataColumnStats = convertFilesToColumnStatsRecords(
engineContext, dataMetaClient, partitionToDeletedFiles, partitionToAppendedFiles, instantTime);
if (!metadataColumnStats.isEmpty()) {
final HoodieData<HoodieRecord> metadataColumnStatsRDD = engineContext.parallelize(metadataColumnStats, 1);
partitionToRecordsMap.put(MetadataPartitionType.COLUMN_STATS, metadataColumnStatsRDD);
}
}
return partitionToRecordsMap;
}
/**
* Aggregates all files deleted and appended to from all rollbacks associated with a restore operation then
* creates metadata table records for them.
*
* @param restoreMetadata
* @param instantTime
* @param restoreMetadata - Restore action metadata
* @return a list of metadata table records
*/
public static List<HoodieRecord> convertMetadataToRecords(HoodieActiveTimeline metadataTableTimeline,
HoodieRestoreMetadata restoreMetadata, String instantTime, Option<String> lastSyncTs) {
Map<String, Map<String, Long>> partitionToAppendedFiles = new HashMap<>();
Map<String, List<String>> partitionToDeletedFiles = new HashMap<>();
private static void processRestoreMetadata(HoodieActiveTimeline metadataTableTimeline,
HoodieRestoreMetadata restoreMetadata,
Map<String, Map<String, Long>> partitionToAppendedFiles,
Map<String, List<String>> partitionToDeletedFiles,
Option<String> lastSyncTs) {
restoreMetadata.getHoodieRestoreMetadata().values().forEach(rms -> {
rms.forEach(rm -> processRollbackMetadata(metadataTableTimeline, rm, partitionToDeletedFiles, partitionToAppendedFiles, lastSyncTs));
rms.forEach(rm -> processRollbackMetadata(metadataTableTimeline, rm,
partitionToDeletedFiles, partitionToAppendedFiles, lastSyncTs));
});
return convertFilesToRecords(partitionToDeletedFiles, partitionToAppendedFiles, instantTime, "Restore");
}
public static List<HoodieRecord> convertMetadataToRecords(HoodieActiveTimeline metadataTableTimeline,
HoodieRollbackMetadata rollbackMetadata, String instantTime,
Option<String> lastSyncTs, boolean wasSynced) {
/**
* Convert rollback action metadata to metadata table records.
*/
public static Map<MetadataPartitionType, HoodieData<HoodieRecord>> convertMetadataToRecords(
HoodieEngineContext engineContext, List<MetadataPartitionType> enabledPartitionTypes,
HoodieActiveTimeline metadataTableTimeline, HoodieRollbackMetadata rollbackMetadata,
HoodieTableMetaClient dataMetaClient, String instantTime, Option<String> lastSyncTs, boolean wasSynced) {
final Map<MetadataPartitionType, HoodieData<HoodieRecord>> partitionToRecordsMap = new HashMap<>();
Map<String, Map<String, Long>> partitionToAppendedFiles = new HashMap<>();
Map<String, List<String>> partitionToDeletedFiles = new HashMap<>();
processRollbackMetadata(metadataTableTimeline, rollbackMetadata, partitionToDeletedFiles, partitionToAppendedFiles, lastSyncTs);
Map<String, Map<String, Long>> partitionToAppendedFiles = new HashMap<>();
List<HoodieRecord> filesPartitionRecords = convertMetadataToRollbackRecords(metadataTableTimeline, rollbackMetadata,
partitionToDeletedFiles, partitionToAppendedFiles, instantTime, lastSyncTs, wasSynced);
final HoodieData<HoodieRecord> rollbackRecordsRDD = engineContext.parallelize(filesPartitionRecords, 1);
partitionToRecordsMap.put(MetadataPartitionType.FILES, rollbackRecordsRDD);
if (enabledPartitionTypes.contains(MetadataPartitionType.BLOOM_FILTERS)) {
final List<HoodieRecord> metadataBloomFilterRecords = convertFilesToBloomFilterRecords(
engineContext, dataMetaClient, partitionToDeletedFiles, partitionToAppendedFiles, instantTime);
if (!metadataBloomFilterRecords.isEmpty()) {
final HoodieData<HoodieRecord> metadataBloomFilterRecordsRDD = engineContext.parallelize(metadataBloomFilterRecords, 1);
partitionToRecordsMap.put(MetadataPartitionType.BLOOM_FILTERS, metadataBloomFilterRecordsRDD);
}
}
if (enabledPartitionTypes.contains(MetadataPartitionType.COLUMN_STATS)) {
final List<HoodieRecord> metadataColumnStats = convertFilesToColumnStatsRecords(
engineContext, dataMetaClient, partitionToDeletedFiles, partitionToAppendedFiles, instantTime);
if (!metadataColumnStats.isEmpty()) {
final HoodieData<HoodieRecord> metadataColumnStatsRDD = engineContext.parallelize(metadataColumnStats, 1);
partitionToRecordsMap.put(MetadataPartitionType.COLUMN_STATS, metadataColumnStatsRDD);
}
}
return partitionToRecordsMap;
}
/**
* Convert rollback action metadata to files partition records.
*/
private static List<HoodieRecord> convertMetadataToRollbackRecords(HoodieActiveTimeline metadataTableTimeline,
HoodieRollbackMetadata rollbackMetadata,
Map<String, List<String>> partitionToDeletedFiles,
Map<String, Map<String, Long>> partitionToAppendedFiles,
String instantTime,
Option<String> lastSyncTs, boolean wasSynced) {
processRollbackMetadata(metadataTableTimeline, rollbackMetadata, partitionToDeletedFiles,
partitionToAppendedFiles, lastSyncTs);
if (!wasSynced) {
// Since the instant-being-rolled-back was never committed to the metadata table, the files added there
// need not be deleted. For MOR Table, the rollback appends logBlocks so we need to keep the appended files.
partitionToDeletedFiles.clear();
}
return convertFilesToRecords(partitionToDeletedFiles, partitionToAppendedFiles, instantTime, "Rollback");
return convertFilesToFilesPartitionRecords(partitionToDeletedFiles, partitionToAppendedFiles, instantTime, "Rollback");
}
/**
* Extracts information about the deleted and append files from the {@code HoodieRollbackMetadata}.
*
* <p>
* During a rollback files may be deleted (COW, MOR) or rollback blocks be appended (MOR only) to files. This
* function will extract this change file for each partition.
* @param metadataTableTimeline Current timeline of the Metdata Table
* @param rollbackMetadata {@code HoodieRollbackMetadata}
* @param partitionToDeletedFiles The {@code Map} to fill with files deleted per partition.
*
* @param metadataTableTimeline Current timeline of the Metadata Table
* @param rollbackMetadata {@code HoodieRollbackMetadata}
* @param partitionToDeletedFiles The {@code Map} to fill with files deleted per partition.
* @param partitionToAppendedFiles The {@code Map} to fill with files appended per partition and their sizes.
*/
private static void processRollbackMetadata(HoodieActiveTimeline metadataTableTimeline,
@@ -268,9 +563,12 @@ public class HoodieTableMetadataUtil {
});
}
private static List<HoodieRecord> convertFilesToRecords(Map<String, List<String>> partitionToDeletedFiles,
Map<String, Map<String, Long>> partitionToAppendedFiles, String instantTime,
String operation) {
/**
* Convert rollback action metadata to files partition records.
*/
private static List<HoodieRecord> convertFilesToFilesPartitionRecords(Map<String, List<String>> partitionToDeletedFiles,
Map<String, Map<String, Long>> partitionToAppendedFiles,
String instantTime, String operation) {
List<HoodieRecord> records = new LinkedList<>();
int[] fileChangeCount = {0, 0}; // deletes, appends
@@ -309,9 +607,88 @@ public class HoodieTableMetadataUtil {
return records;
}
/**
* Convert rollback action metadata to bloom filter index records.
*/
private static List<HoodieRecord> convertFilesToBloomFilterRecords(HoodieEngineContext engineContext,
HoodieTableMetaClient dataMetaClient,
Map<String, List<String>> partitionToDeletedFiles,
Map<String, Map<String, Long>> partitionToAppendedFiles,
String instantTime) {
List<HoodieRecord> records = new LinkedList<>();
partitionToDeletedFiles.forEach((partitionName, deletedFileList) -> deletedFileList.forEach(deletedFile -> {
if (!FSUtils.isBaseFile(new Path(deletedFile))) {
return;
}
final String partition = partitionName.equals(EMPTY_PARTITION_NAME) ? NON_PARTITIONED_NAME : partitionName;
records.add(HoodieMetadataPayload.createBloomFilterMetadataRecord(
partition, deletedFile, instantTime, ByteBuffer.allocate(0), true));
}));
partitionToAppendedFiles.forEach((partitionName, appendedFileMap) -> {
final String partition = partitionName.equals(EMPTY_PARTITION_NAME) ? NON_PARTITIONED_NAME : partitionName;
appendedFileMap.forEach((appendedFile, length) -> {
if (!FSUtils.isBaseFile(new Path(appendedFile))) {
return;
}
final String pathWithPartition = partitionName + "/" + appendedFile;
final Path appendedFilePath = new Path(dataMetaClient.getBasePath(), pathWithPartition);
try {
HoodieFileReader<IndexedRecord> fileReader =
HoodieFileReaderFactory.getFileReader(dataMetaClient.getHadoopConf(), appendedFilePath);
final BloomFilter fileBloomFilter = fileReader.readBloomFilter();
if (fileBloomFilter == null) {
LOG.error("Failed to read bloom filter for " + appendedFilePath);
return;
}
ByteBuffer bloomByteBuffer = ByteBuffer.wrap(fileBloomFilter.serializeToString().getBytes());
HoodieRecord record = HoodieMetadataPayload.createBloomFilterMetadataRecord(
partition, appendedFile, instantTime, bloomByteBuffer, false);
records.add(record);
fileReader.close();
} catch (IOException e) {
LOG.error("Failed to get bloom filter for file: " + appendedFilePath);
}
});
});
return records;
}
/**
* Convert rollback action metadata to column stats index records.
*/
private static List<HoodieRecord> convertFilesToColumnStatsRecords(HoodieEngineContext engineContext,
HoodieTableMetaClient datasetMetaClient,
Map<String, List<String>> partitionToDeletedFiles,
Map<String, Map<String, Long>> partitionToAppendedFiles,
String instantTime) {
List<HoodieRecord> records = new LinkedList<>();
List<String> latestColumns = getLatestColumns(datasetMetaClient);
partitionToDeletedFiles.forEach((partitionName, deletedFileList) -> deletedFileList.forEach(deletedFile -> {
final String partition = partitionName.equals(EMPTY_PARTITION_NAME) ? NON_PARTITIONED_NAME : partitionName;
if (deletedFile.endsWith(HoodieFileFormat.PARQUET.getFileExtension())) {
final String filePathWithPartition = partitionName + "/" + deletedFile;
records.addAll(getColumnStats(partition, filePathWithPartition, datasetMetaClient,
latestColumns, true).collect(Collectors.toList()));
}
}));
partitionToAppendedFiles.forEach((partitionName, appendedFileMap) -> appendedFileMap.forEach(
(appendedFile, size) -> {
final String partition = partitionName.equals(EMPTY_PARTITION_NAME) ? NON_PARTITIONED_NAME : partitionName;
if (appendedFile.endsWith(HoodieFileFormat.PARQUET.getFileExtension())) {
final String filePathWithPartition = partitionName + "/" + appendedFile;
records.addAll(getColumnStats(partition, filePathWithPartition, datasetMetaClient,
latestColumns, false).collect(Collectors.toList()));
}
}));
return records;
}
/**
* Map a record key to a file group in partition of interest.
*
* <p>
* Note: For hashing, the algorithm is same as String.hashCode() but is being defined here as hashCode()
* implementation is not guaranteed by the JVM to be consistent across JVM versions and implementations.
*
@@ -339,7 +716,7 @@ public class HoodieTableMetadataUtil {
*/
public static List<FileSlice> getPartitionLatestMergedFileSlices(HoodieTableMetaClient metaClient, String partition) {
LOG.info("Loading latest merged file slices for metadata table partition " + partition);
return getPartitionFileSlices(metaClient, partition, true);
return getPartitionFileSlices(metaClient, Option.empty(), partition, true);
}
/**
@@ -347,12 +724,33 @@ public class HoodieTableMetadataUtil {
* returned is sorted in the correct order of file group name.
*
* @param metaClient - Instance of {@link HoodieTableMetaClient}.
* @param fsView - Metadata table filesystem view
* @param partition - The name of the partition whose file groups are to be loaded.
* @return List of latest file slices for all file groups in a given partition.
*/
public static List<FileSlice> getPartitionLatestFileSlices(HoodieTableMetaClient metaClient, String partition) {
public static List<FileSlice> getPartitionLatestFileSlices(HoodieTableMetaClient metaClient,
Option<HoodieTableFileSystemView> fsView, String partition) {
LOG.info("Loading latest file slices for metadata table partition " + partition);
return getPartitionFileSlices(metaClient, partition, false);
return getPartitionFileSlices(metaClient, fsView, partition, false);
}
/**
* Get metadata table file system view.
*
* @param metaClient - Metadata table meta client
* @return Filesystem view for the metadata table
*/
public static HoodieTableFileSystemView getFileSystemView(HoodieTableMetaClient metaClient) {
// If there are no commits on the metadata table then the table's
// default FileSystemView will not return any file slices even
// though we may have initialized them.
HoodieTimeline timeline = metaClient.getActiveTimeline();
if (timeline.empty()) {
final HoodieInstant instant = new HoodieInstant(false, HoodieTimeline.DELTA_COMMIT_ACTION,
HoodieActiveTimeline.createNewInstantTime());
timeline = new HoodieDefaultTimeline(Arrays.asList(instant).stream(), metaClient.getActiveTimeline()::getInstantDetails);
}
return new HoodieTableFileSystemView(metaClient, timeline);
}
/**
@@ -366,27 +764,161 @@ public class HoodieTableMetadataUtil {
* slices without any merging, and this is needed for the writers.
* @return List of latest file slices for all file groups in a given partition.
*/
private static List<FileSlice> getPartitionFileSlices(HoodieTableMetaClient metaClient, String partition,
private static List<FileSlice> getPartitionFileSlices(HoodieTableMetaClient metaClient,
Option<HoodieTableFileSystemView> fileSystemView,
String partition,
boolean mergeFileSlices) {
// If there are no commits on the metadata table then the table's
// default FileSystemView will not return any file slices even
// though we may have initialized them.
HoodieTimeline timeline = metaClient.getActiveTimeline();
if (timeline.empty()) {
final HoodieInstant instant = new HoodieInstant(false, HoodieTimeline.DELTA_COMMIT_ACTION,
HoodieActiveTimeline.createNewInstantTime());
timeline = new HoodieDefaultTimeline(Arrays.asList(instant).stream(), metaClient.getActiveTimeline()::getInstantDetails);
}
HoodieTableFileSystemView fsView = new HoodieTableFileSystemView(metaClient, timeline);
HoodieTableFileSystemView fsView = fileSystemView.orElse(getFileSystemView(metaClient));
Stream<FileSlice> fileSliceStream;
if (mergeFileSlices) {
fileSliceStream = fsView.getLatestMergedFileSlicesBeforeOrOn(
partition, timeline.filterCompletedInstants().lastInstant().get().getTimestamp());
partition, metaClient.getActiveTimeline().filterCompletedInstants().lastInstant().get().getTimestamp());
} else {
fileSliceStream = fsView.getLatestFileSlices(partition);
}
return fileSliceStream.sorted((s1, s2) -> s1.getFileId().compareTo(s2.getFileId())).collect(Collectors.toList());
}
public static List<HoodieRecord> convertMetadataToColumnStatsRecords(HoodieCommitMetadata commitMetadata,
HoodieEngineContext engineContext,
HoodieTableMetaClient dataMetaClient,
boolean isMetaIndexColumnStatsForAllColumns,
String instantTime) {
try {
List<HoodieWriteStat> allWriteStats = commitMetadata.getPartitionToWriteStats().values().stream()
.flatMap(entry -> entry.stream()).collect(Collectors.toList());
return HoodieTableMetadataUtil.createColumnStatsFromWriteStats(engineContext, dataMetaClient, allWriteStats,
isMetaIndexColumnStatsForAllColumns);
} catch (Exception e) {
throw new HoodieException("Failed to generate column stats records for metadata table ", e);
}
}
/**
* Create column stats from write status.
*
* @param engineContext - Enging context
* @param datasetMetaClient - Dataset meta client
* @param allWriteStats - Write status to convert
* @param isMetaIndexColumnStatsForAllColumns - Are all columns enabled for indexing
*/
public static List<HoodieRecord> createColumnStatsFromWriteStats(HoodieEngineContext engineContext,
HoodieTableMetaClient datasetMetaClient,
List<HoodieWriteStat> allWriteStats,
boolean isMetaIndexColumnStatsForAllColumns) throws Exception {
if (allWriteStats.isEmpty()) {
return Collections.emptyList();
}
List<HoodieWriteStat> prunedWriteStats = allWriteStats.stream().filter(writeStat -> {
return !(writeStat instanceof HoodieDeltaWriteStat);
}).collect(Collectors.toList());
if (prunedWriteStats.isEmpty()) {
return Collections.emptyList();
}
return engineContext.flatMap(prunedWriteStats,
writeStat -> translateWriteStatToColumnStats(writeStat, datasetMetaClient,
getLatestColumns(datasetMetaClient, isMetaIndexColumnStatsForAllColumns)),
prunedWriteStats.size());
}
/**
* Get the latest columns for the table for column stats indexing.
*
* @param datasetMetaClient - Data table meta client
* @param isMetaIndexColumnStatsForAllColumns - Is column stats indexing enabled for all columns
*/
private static List<String> getLatestColumns(HoodieTableMetaClient datasetMetaClient, boolean isMetaIndexColumnStatsForAllColumns) {
if (!isMetaIndexColumnStatsForAllColumns
|| datasetMetaClient.getCommitsTimeline().filterCompletedInstants().countInstants() < 1) {
return Collections.singletonList(datasetMetaClient.getTableConfig().getRecordKeyFieldProp());
}
TableSchemaResolver schemaResolver = new TableSchemaResolver(datasetMetaClient);
// consider nested fields as well. if column stats is enabled only for a subset of columns,
// directly use them instead of all columns from the latest table schema
try {
return schemaResolver.getTableAvroSchema().getFields().stream()
.map(entry -> entry.name()).collect(Collectors.toList());
} catch (Exception e) {
throw new HoodieException("Failed to get latest columns for " + datasetMetaClient.getBasePath());
}
}
private static List<String> getLatestColumns(HoodieTableMetaClient datasetMetaClient) {
return getLatestColumns(datasetMetaClient, false);
}
public static Stream<HoodieRecord> translateWriteStatToColumnStats(HoodieWriteStat writeStat,
HoodieTableMetaClient datasetMetaClient,
List<String> latestColumns) {
return getColumnStats(writeStat.getPartitionPath(), writeStat.getPath(), datasetMetaClient, latestColumns, false);
}
private static Stream<HoodieRecord> getColumnStats(final String partitionPath, final String filePathWithPartition,
HoodieTableMetaClient datasetMetaClient,
List<String> columns, boolean isDeleted) {
final String partition = partitionPath.equals(EMPTY_PARTITION_NAME) ? NON_PARTITIONED_NAME : partitionPath;
final int offset = partition.equals(NON_PARTITIONED_NAME) ? (filePathWithPartition.startsWith("/") ? 1 : 0)
: partition.length() + 1;
final String fileName = filePathWithPartition.substring(offset);
if (!FSUtils.isBaseFile(new Path(fileName))) {
return Stream.empty();
}
if (filePathWithPartition.endsWith(HoodieFileFormat.PARQUET.getFileExtension())) {
List<HoodieColumnRangeMetadata<Comparable>> columnRangeMetadataList = new ArrayList<>();
final Path fullFilePath = new Path(datasetMetaClient.getBasePath(), filePathWithPartition);
if (!isDeleted) {
try {
columnRangeMetadataList = new ParquetUtils().readRangeFromParquetMetadata(
datasetMetaClient.getHadoopConf(), fullFilePath, columns);
} catch (Exception e) {
LOG.error("Failed to read column stats for " + fullFilePath, e);
}
} else {
columnRangeMetadataList =
columns.stream().map(entry -> new HoodieColumnRangeMetadata<Comparable>(fileName,
entry, null, null, 0, 0, 0, 0))
.collect(Collectors.toList());
}
return HoodieMetadataPayload.createColumnStatsRecords(partitionPath, columnRangeMetadataList, isDeleted);
} else {
throw new HoodieException("Column range index not supported for filePathWithPartition " + fileName);
}
}
/**
* Get file group count for a metadata table partition.
*
* @param partitionType - Metadata table partition type
* @param metaClient - Metadata table meta client
* @param fsView - Filesystem view
* @param metadataConfig - Metadata config
* @param isBootstrapCompleted - Is bootstrap completed for the metadata table
* @return File group count for the requested metadata partition type
*/
public static int getPartitionFileGroupCount(final MetadataPartitionType partitionType,
final Option<HoodieTableMetaClient> metaClient,
final Option<HoodieTableFileSystemView> fsView,
final HoodieMetadataConfig metadataConfig, boolean isBootstrapCompleted) {
if (isBootstrapCompleted) {
final List<FileSlice> latestFileSlices = HoodieTableMetadataUtil
.getPartitionLatestFileSlices(metaClient.get(), fsView, partitionType.getPartitionPath());
return Math.max(latestFileSlices.size(), 1);
}
switch (partitionType) {
case BLOOM_FILTERS:
return metadataConfig.getBloomFilterIndexFileGroupCount();
case COLUMN_STATS:
return metadataConfig.getColumnStatsIndexFileGroupCount();
default:
return 1;
}
}
}

View File

@@ -22,19 +22,23 @@ import java.util.Arrays;
import java.util.List;
public enum MetadataPartitionType {
FILES("files", "files-");
FILES(HoodieTableMetadataUtil.PARTITION_NAME_FILES, "files-"),
COLUMN_STATS(HoodieTableMetadataUtil.PARTITION_NAME_COLUMN_STATS, "col-stats-"),
BLOOM_FILTERS(HoodieTableMetadataUtil.PARTITION_NAME_BLOOM_FILTERS, "bloom-filters-");
// refers to partition path in metadata table.
// Partition path in metadata table.
private final String partitionPath;
// refers to fileId prefix used for all file groups in this partition.
// FileId prefix used for all file groups in this partition.
private final String fileIdPrefix;
// Total file groups
private int fileGroupCount = 1;
MetadataPartitionType(String partitionPath, String fileIdPrefix) {
MetadataPartitionType(final String partitionPath, final String fileIdPrefix) {
this.partitionPath = partitionPath;
this.fileIdPrefix = fileIdPrefix;
}
public String partitionPath() {
public String getPartitionPath() {
return partitionPath;
}
@@ -42,7 +46,28 @@ public enum MetadataPartitionType {
return fileIdPrefix;
}
public static List<String> all() {
return Arrays.asList(MetadataPartitionType.FILES.partitionPath());
void setFileGroupCount(final int fileGroupCount) {
this.fileGroupCount = fileGroupCount;
}
public int getFileGroupCount() {
return this.fileGroupCount;
}
public static List<String> allPaths() {
return Arrays.asList(
FILES.getPartitionPath(),
COLUMN_STATS.getPartitionPath(),
BLOOM_FILTERS.getPartitionPath()
);
}
@Override
public String toString() {
return "Metadata partition {"
+ "name: " + getPartitionPath()
+ ", prefix: " + getFileIdPrefix()
+ ", groups: " + getFileGroupCount()
+ "}";
}
}