1
0
Thinking 66893bfef2 fix spark-shell add jar problem
jira link https://issues.apache.org/jira/browse/HUDI-101
issue link https://github.com/apache/incubator-hudi/issues/516#issue-386048519

when using spark-shell with hoodie save data like :
```
./spark-shell --master yarn --jars /home/hdfs/software/spark/hoodie/hoodie-spark-bundle-0.4.8-SNAPSHOT.jar --conf spark.sql.hive.convertMetastoreParquet=false --packages com.databricks:spark-avro_2.11:4.0.0
```
and
```
inputDF.write.format("com.uber.hoodie")
        .option("hoodie.insert.shuffle.parallelism", "1") // any hoodie client config can be passed like this
        .option("hoodie.upsert.shuffle.parallelism", "1") // full list in HoodieWriteConfig & its package
        .option(DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY, HoodieTableType.COPY_ON_WRITE.name())
        .option(DataSourceWriteOptions.OPERATION_OPT_KEY, DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL) // insert
        .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "_row_key")
        .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, "partition")
        .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, "extend_deal_date")
        .option(HoodieWriteConfig.TABLE_NAME, "c_upload_code")
        .mode(SaveMode.Overwrite)
        .save("/tmp/test/hoodie")
```
It also report error  `Invalid signature file digest for Manifest main attributes`. Need to scan all infected dependency.
2019-06-03 15:01:43 -07:00
2019-05-29 16:16:29 -07:00
2019-06-03 15:01:43 -07:00
2018-12-18 12:52:39 -08:00
2016-12-29 16:53:39 -08:00
2019-05-11 16:38:28 -07:00
2018-12-31 10:31:12 -08:00
2019-03-18 07:46:25 -07:00
2019-02-15 21:28:39 -08:00
2019-05-28 18:28:59 -07:00

Hudi

Hudi (pronounced Hoodie) stands for Hadoop Upserts anD Incrementals. Hudi manages storage of large analytical datasets on HDFS and serve them out via two types of tables

  • Read Optimized Table - Provides excellent query performance via purely columnar storage (e.g. Parquet)
  • Near-Real time Table (WIP) - Provides queries on real-time data, using a combination of columnar & row based storage (e.g Parquet + Avro)

For more, head over here

Description
内部版本
Readme 43 MiB
Languages
Java 81.4%
Scala 16.7%
ANTLR 0.9%
Shell 0.8%
Dockerfile 0.2%