66893bfef21ef4644536688595f685f4e1a95051
jira link https://issues.apache.org/jira/browse/HUDI-101 issue link https://github.com/apache/incubator-hudi/issues/516#issue-386048519 when using spark-shell with hoodie save data like : ``` ./spark-shell --master yarn --jars /home/hdfs/software/spark/hoodie/hoodie-spark-bundle-0.4.8-SNAPSHOT.jar --conf spark.sql.hive.convertMetastoreParquet=false --packages com.databricks:spark-avro_2.11:4.0.0 ``` and ``` inputDF.write.format("com.uber.hoodie") .option("hoodie.insert.shuffle.parallelism", "1") // any hoodie client config can be passed like this .option("hoodie.upsert.shuffle.parallelism", "1") // full list in HoodieWriteConfig & its package .option(DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY, HoodieTableType.COPY_ON_WRITE.name()) .option(DataSourceWriteOptions.OPERATION_OPT_KEY, DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL) // insert .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "_row_key") .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, "partition") .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, "extend_deal_date") .option(HoodieWriteConfig.TABLE_NAME, "c_upload_code") .mode(SaveMode.Overwrite) .save("/tmp/test/hoodie") ``` It also report error `Invalid signature file digest for Manifest main attributes`. Need to scan all infected dependency.
Hudi
Hudi (pronounced Hoodie) stands for Hadoop Upserts anD Incrementals. Hudi manages storage of large analytical datasets on HDFS and serve them out via two types of tables
- Read Optimized Table - Provides excellent query performance via purely columnar storage (e.g. Parquet)
- Near-Real time Table (WIP) - Provides queries on real-time data, using a combination of columnar & row based storage (e.g Parquet + Avro)
For more, head over here
Description
Languages
Java
81.4%
Scala
16.7%
ANTLR
0.9%
Shell
0.8%
Dockerfile
0.2%