1
0

[HUDI-1296] Support Metadata Table in Spark Datasource (#4789)

* Bootstrapping initial support for Metadata Table in Spark Datasource

- Consolidated Avro/Row conversion utilities to center around Spark's AvroDeserializer ; removed duplication
- Bootstrapped HoodieBaseRelation
- Updated HoodieMergeOnReadRDD to be able to handle Metadata Table
- Modified MOR relations to be able to read different Base File formats (Parquet, HFile)
This commit is contained in:
Alexey Kudinkin
2022-02-24 13:23:13 -08:00
committed by GitHub
parent 521338b4d9
commit 85e8a5c4de
56 changed files with 1634 additions and 1010 deletions

View File

@@ -80,6 +80,23 @@ public class RawTripTestPayload implements HoodieRecordPayload<RawTripTestPayloa
this.isDeleted = false;
}
/**
* @deprecated PLEASE READ THIS CAREFULLY
*
* Converting properly typed schemas into JSON leads to inevitable information loss, since JSON
* encodes only representation of the record (with no schema accompanying it), therefore occasionally
* losing nuances of the original data-types provided by the schema (for ex, with 1.23 literal it's
* impossible to tell whether original type was Double or Decimal).
*
* Multiplied by the fact that Spark 2 JSON schema inference has substantial gaps in it (see below),
* it's **NOT RECOMMENDED** to use this method. Instead please consider using {@link AvroConversionUtils#createDataframe()}
* method accepting list of {@link HoodieRecord} (as produced by the {@link HoodieTestDataGenerator}
* to create Spark's {@code Dataframe}s directly.
*
* REFs
* https://medium.com/swlh/notes-about-json-schema-handling-in-spark-sql-be1e7f13839d
*/
@Deprecated
public static List<String> recordsToStrings(List<HoodieRecord> records) {
return records.stream().map(RawTripTestPayload::recordToString).filter(Option::isPresent).map(Option::get)
.collect(Collectors.toList());

View File

@@ -20,7 +20,43 @@
"type": "record",
"name": "User",
"fields": [
{"name": "field1", "type": ["null", "string"], "default": null},
{"name": "createTime", "type": ["null", "long"], "default": null}
{
"name": "field1",
"type": [
"null",
"string"
],
"default": null
},
{
"name": "createTime",
"type": [
"null",
"long"
],
"default": null
},
{
"name": "createTimeString",
"type": [
"null",
"string"
],
"default": null
},
{
"name": "createTimeDecimal",
"type": [
"null",
{
"name": "decimalFixed",
"type": "fixed",
"logicalType": "decimal",
"precision": 20,
"scale": 4,
"size": 10
}
]
}
]
}