1
0
Commit Graph

487 Commits

Author SHA1 Message Date
Balaji Varadarajan
6e0ff3a235 Generate Source Jars for bundle packages (#810) 2019-07-30 18:17:14 -07:00
Vinoth Chandar
e20b77be3b HUDI-92 : Making deltastreamer with DistributedTestSource also run locally
- Separating out the test data generators per partition
 - Minor logging improvements on IOHandle performance
2019-07-30 16:30:47 -07:00
vinoyang
68464c7d02 [HUDI-181] Fix the Bold markdown grammar issue of README file (#808) 2019-07-30 03:47:53 -07:00
eisig
e0648de2ef HUDI-175 - add an option to manually override the DeltaStreamer checkpoint (#798)
- Add cli option to allow override the checkpoint using `--checkpoint` 
- Persist overridden checkpoint into commit metadata
2019-07-29 10:40:02 -07:00
Balaji Varadarajan
9265c7cc36 Add balaji gpg key to KEYS file 2019-07-29 06:41:41 -07:00
Balaji Varadarajan
83dab21ae1 Allow HoodieWrapperFileSystem to wrap other proxy file-system implementations with no getScheme implementation (#793) 2019-07-24 21:31:46 -07:00
Balaji Varadarajan
0b451b3a58 HUDI-140 : GCS: Log File Reading not working due to difference in seek() behavior for EOF 2019-07-19 12:38:28 -07:00
eisig
9857c4b21c add jssc.stop() (#797) 2019-07-19 05:01:45 -07:00
n3nash
6efa16317c Fixing default value for avro 1.7 which assumes NULL value instead of a jsonnode that is null (#792) 2019-07-17 03:25:54 -07:00
Balaji Varadarajan
3d408ee96b HUDI-168 Ensure getFileStatus calls for files getting written is done after close() is called (#788) 2019-07-16 17:33:34 -07:00
eisig
c0593e7a13 fix HoodieLogFileReader (#787) 2019-07-15 13:25:55 -07:00
Balaji Varadarajan
ae3c02fb3f HUDI-162 : File System view must be built with correct timeline actions 2019-07-14 00:48:09 -07:00
Balaji Varadarajan
5823c1ebd7 HUDI-138 - Meta Files handling also need to support consistency guard 2019-07-13 22:02:55 -07:00
Yihua Guo
621c246fa9 [HUDI-161] Remove --key-generator-class CLI arg in HoodieDeltaStreamer and use key generator class specified in datasource properties. (#781) 2019-07-12 13:45:49 -07:00
Ho Tien Vu
11c4121f73 Fixed TableNotFoundException when write with structured streaming (#778)
- When write to a new hoodie table, if checkpoint dir is under target path, Spark will create the base path and thus skip initializing .hoodie which result in error

- apply .hoodie existent check for all save mode
2019-07-12 09:17:16 -07:00
Thinking Chen
62ecb2da62 when column type is decimal, should add precision and scale (#753) 2019-07-08 16:13:22 -07:00
Balaji Varadarajan
9f18a1ca80 Fixing bugs found during running hoodie demo (#760) 2019-06-28 17:49:23 -07:00
Ho Tien Vu
e48e35385a Added preemptive check for 'spark.scheduler.mode'
When running docker demo, NoSuchElementException was thrown because spark.scheduler.mode is not set.
Also we want to check before initializing the Spark Context to avoid polute the SparkConf
with unused config.
2019-06-25 13:39:41 -07:00
Jaimin Shah
17e878f721 adding support for complex keys (#728)
- Resolving the issue related to ambiguity in recordKey by creating and parsing json object as string.
- added unit test for ComplexKeyGenerator
- minor changes
2019-06-21 00:25:06 -07:00
Ron Barabash
1b61eb45e0 Adding support for optional skipping single archiving failures 2019-06-20 22:54:45 -07:00
Balaji Varadarajan
66c7fa2322 Reword confusing message and reducing the severity level 2019-06-20 22:46:09 -07:00
Balaji Varadarajan
8223127611 Add maprfs to storage schemes 2019-06-20 22:45:35 -07:00
Balaji Varadarajan
2c40e8419e Ensure TableMetaClient and FileSystem instances have exclusive copy of Configuration 2019-06-20 14:05:00 -07:00
Balaji Varadarajan
a0d7ab2384 HUDI-70 : Making DeltaStreamer run in continuous mode with concurrent compaction 2019-06-18 17:48:14 -07:00
Balaji Varadarajan
3a210ef08e Disable Notice Plugin 2019-06-18 11:33:26 -07:00
Balaji Varadarajan
a1483f2c5f HUDI-148 Small File selection logic for MOR must skip fileIds selected for pending compaction correctly 2019-06-17 18:35:17 -07:00
vinoth chandar
8c9980f4f5 Update README.md 2019-06-17 18:19:34 -07:00
Nishith Agarwal
8e08d498c9 Reading baseCommitTime from the latest file slice as opposed to the tagged record value 2019-06-17 16:46:16 -07:00
Nishith Agarwal
129e433641 - Ugrading to Hive 2.x
- Eliminating in-memory deltaRecordsMap
- Use writerSchema to generate generic record needed by custom payloads
- changes to make tests work with hive 2.x
2019-06-13 12:46:14 -07:00
Balaji Varadarajan
cd7623e216 All Opened hoodie clients in tests needs to be closed
TestMergeOnReadTable must use embedded timeline server
2019-06-13 12:30:07 -07:00
Balaji Varadarajan
136f8478a3 TestMergeOnReadTable must use embedded timeline server 2019-06-12 19:08:09 -07:00
Balaji Varadarajan
04fc86b43d Turn on embedded server for all client tests 2019-06-12 18:14:55 -07:00
Balaji Varadarajan
1c943ab230 Ensure log files are consistently ordered when scanning 2019-06-12 16:16:37 -07:00
Vinoth Chandar
b791473a6d Introduce HoodieReadHandle abstraction into index
- Generalized BloomIndex to work with file ids instead of paths
 - Abstracted away Bloom filter checking into HoodieLookupHandle
 - Abstracted away range information retrieval into HoodieRangeInfoHandle
2019-06-12 10:46:14 -07:00
Balaji Varadarajan
51d122b5c3 Close Hoodie Clients which are opened to properly shutdown embedded timeline service 2019-06-11 20:22:14 -07:00
Balaji Varadarajan
065173211e HUDI-147 Compaction Inflight Rollback not deleting Marker directory 2019-06-09 11:45:54 -07:00
Balaji Varadarajan
479908fd20 HUDI-125 : Change License for all source files and update RAT configurations 2019-06-09 11:41:55 -07:00
Balaji Varadarajan
30b0f2636f Changes related to Licensing work
1. Go through dependencies list one round to ensure compliance. Generated current NOTICE list in all submodules (other apache projects like flink does this).
   To be on conservative side regarding licensing, NOTICE.txt lists all dependencies including transitive. Pending Compliance questions reported in https://issues.apache.org/jira/browse/LEGAL-461
2. Automate generating NOTICE.txt files to allow future package compliance issues be identified early as part of code-review process.
3. Added NOTICE.txt and LICENSE.txt to all HUDI jars
2019-06-07 17:58:57 -07:00
guanjianhui
173e0b6be4 exlude fasterxml and parquet from presto bundle 2019-06-07 11:33:43 -07:00
guanjianhui
b325cbff10 set codehaus.jackson modules to the same version 1.9.13 2019-06-07 11:33:43 -07:00
Balaji Varadarajan
45e65cc2f7 Auto generated Slack Channel Notifications setup 2019-06-07 06:46:00 -07:00
Balaji Varadarajan
5ae34db764 Replace Non-Compliant dnl.utils package with Apache 2.0 licensed alternative 2019-06-06 22:33:33 -07:00
Balaji Varadarajan
a0391b7c01 LogFile comparator must handle log file names without write token for backwards compatibility 2019-06-06 10:00:31 -07:00
Thinking
66893bfef2 fix spark-shell add jar problem
jira link https://issues.apache.org/jira/browse/HUDI-101
issue link https://github.com/apache/incubator-hudi/issues/516#issue-386048519

when using spark-shell with hoodie save data like :
```
./spark-shell --master yarn --jars /home/hdfs/software/spark/hoodie/hoodie-spark-bundle-0.4.8-SNAPSHOT.jar --conf spark.sql.hive.convertMetastoreParquet=false --packages com.databricks:spark-avro_2.11:4.0.0
```
and
```
inputDF.write.format("com.uber.hoodie")
        .option("hoodie.insert.shuffle.parallelism", "1") // any hoodie client config can be passed like this
        .option("hoodie.upsert.shuffle.parallelism", "1") // full list in HoodieWriteConfig & its package
        .option(DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY, HoodieTableType.COPY_ON_WRITE.name())
        .option(DataSourceWriteOptions.OPERATION_OPT_KEY, DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL) // insert
        .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "_row_key")
        .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, "partition")
        .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, "extend_deal_date")
        .option(HoodieWriteConfig.TABLE_NAME, "c_upload_code")
        .mode(SaveMode.Overwrite)
        .save("/tmp/test/hoodie")
```
It also report error  `Invalid signature file digest for Manifest main attributes`. Need to scan all infected dependency.
2019-06-03 15:01:43 -07:00
Vinoth Chandar
7b4a28ecf8 Move depedency repos to https urls 2019-05-31 20:37:03 -07:00
Vinoth Chandar
acd74129cd Create hoodie-utilities-bundle to host the shaded jar
- hoodie-utilities can now be pulled in as compile time dependency
  - Lets users test their DeltaStreamer transformers for e.g
  - Tested the docker demo works & takes in the bundle
  - Doc changes to follow, to move DeltaStreamer commands to bundle jar
2019-05-30 22:46:24 -07:00
Vinoth Chandar
a5e2439514 Turn off noisy test 2019-05-30 21:35:12 -07:00
Vinoth Chandar
3b916ec1af Add support for maven deploy plugin to make snapshot releases 2019-05-30 21:35:12 -07:00
guanjianhui
6b5abb5d92 fix maven pom 2019-05-29 16:16:29 -07:00
Balaji Varadarajan
d860fb18b6 HUDI-139 Compaction running twice due to duplicate "map" transformation while finalizing compaction 2019-05-29 15:12:30 -07:00