1
0
Commit Graph

476 Commits

Author SHA1 Message Date
Balaji Varadarajan
ae3c02fb3f HUDI-162 : File System view must be built with correct timeline actions 2019-07-14 00:48:09 -07:00
Balaji Varadarajan
5823c1ebd7 HUDI-138 - Meta Files handling also need to support consistency guard 2019-07-13 22:02:55 -07:00
Yihua Guo
621c246fa9 [HUDI-161] Remove --key-generator-class CLI arg in HoodieDeltaStreamer and use key generator class specified in datasource properties. (#781) 2019-07-12 13:45:49 -07:00
Ho Tien Vu
11c4121f73 Fixed TableNotFoundException when write with structured streaming (#778)
- When write to a new hoodie table, if checkpoint dir is under target path, Spark will create the base path and thus skip initializing .hoodie which result in error

- apply .hoodie existent check for all save mode
2019-07-12 09:17:16 -07:00
Thinking Chen
62ecb2da62 when column type is decimal, should add precision and scale (#753) 2019-07-08 16:13:22 -07:00
Balaji Varadarajan
9f18a1ca80 Fixing bugs found during running hoodie demo (#760) 2019-06-28 17:49:23 -07:00
Ho Tien Vu
e48e35385a Added preemptive check for 'spark.scheduler.mode'
When running docker demo, NoSuchElementException was thrown because spark.scheduler.mode is not set.
Also we want to check before initializing the Spark Context to avoid polute the SparkConf
with unused config.
2019-06-25 13:39:41 -07:00
Jaimin Shah
17e878f721 adding support for complex keys (#728)
- Resolving the issue related to ambiguity in recordKey by creating and parsing json object as string.
- added unit test for ComplexKeyGenerator
- minor changes
2019-06-21 00:25:06 -07:00
Ron Barabash
1b61eb45e0 Adding support for optional skipping single archiving failures 2019-06-20 22:54:45 -07:00
Balaji Varadarajan
66c7fa2322 Reword confusing message and reducing the severity level 2019-06-20 22:46:09 -07:00
Balaji Varadarajan
8223127611 Add maprfs to storage schemes 2019-06-20 22:45:35 -07:00
Balaji Varadarajan
2c40e8419e Ensure TableMetaClient and FileSystem instances have exclusive copy of Configuration 2019-06-20 14:05:00 -07:00
Balaji Varadarajan
a0d7ab2384 HUDI-70 : Making DeltaStreamer run in continuous mode with concurrent compaction 2019-06-18 17:48:14 -07:00
Balaji Varadarajan
3a210ef08e Disable Notice Plugin 2019-06-18 11:33:26 -07:00
Balaji Varadarajan
a1483f2c5f HUDI-148 Small File selection logic for MOR must skip fileIds selected for pending compaction correctly 2019-06-17 18:35:17 -07:00
vinoth chandar
8c9980f4f5 Update README.md 2019-06-17 18:19:34 -07:00
Nishith Agarwal
8e08d498c9 Reading baseCommitTime from the latest file slice as opposed to the tagged record value 2019-06-17 16:46:16 -07:00
Nishith Agarwal
129e433641 - Ugrading to Hive 2.x
- Eliminating in-memory deltaRecordsMap
- Use writerSchema to generate generic record needed by custom payloads
- changes to make tests work with hive 2.x
2019-06-13 12:46:14 -07:00
Balaji Varadarajan
cd7623e216 All Opened hoodie clients in tests needs to be closed
TestMergeOnReadTable must use embedded timeline server
2019-06-13 12:30:07 -07:00
Balaji Varadarajan
136f8478a3 TestMergeOnReadTable must use embedded timeline server 2019-06-12 19:08:09 -07:00
Balaji Varadarajan
04fc86b43d Turn on embedded server for all client tests 2019-06-12 18:14:55 -07:00
Balaji Varadarajan
1c943ab230 Ensure log files are consistently ordered when scanning 2019-06-12 16:16:37 -07:00
Vinoth Chandar
b791473a6d Introduce HoodieReadHandle abstraction into index
- Generalized BloomIndex to work with file ids instead of paths
 - Abstracted away Bloom filter checking into HoodieLookupHandle
 - Abstracted away range information retrieval into HoodieRangeInfoHandle
2019-06-12 10:46:14 -07:00
Balaji Varadarajan
51d122b5c3 Close Hoodie Clients which are opened to properly shutdown embedded timeline service 2019-06-11 20:22:14 -07:00
Balaji Varadarajan
065173211e HUDI-147 Compaction Inflight Rollback not deleting Marker directory 2019-06-09 11:45:54 -07:00
Balaji Varadarajan
479908fd20 HUDI-125 : Change License for all source files and update RAT configurations 2019-06-09 11:41:55 -07:00
Balaji Varadarajan
30b0f2636f Changes related to Licensing work
1. Go through dependencies list one round to ensure compliance. Generated current NOTICE list in all submodules (other apache projects like flink does this).
   To be on conservative side regarding licensing, NOTICE.txt lists all dependencies including transitive. Pending Compliance questions reported in https://issues.apache.org/jira/browse/LEGAL-461
2. Automate generating NOTICE.txt files to allow future package compliance issues be identified early as part of code-review process.
3. Added NOTICE.txt and LICENSE.txt to all HUDI jars
2019-06-07 17:58:57 -07:00
guanjianhui
173e0b6be4 exlude fasterxml and parquet from presto bundle 2019-06-07 11:33:43 -07:00
guanjianhui
b325cbff10 set codehaus.jackson modules to the same version 1.9.13 2019-06-07 11:33:43 -07:00
Balaji Varadarajan
45e65cc2f7 Auto generated Slack Channel Notifications setup 2019-06-07 06:46:00 -07:00
Balaji Varadarajan
5ae34db764 Replace Non-Compliant dnl.utils package with Apache 2.0 licensed alternative 2019-06-06 22:33:33 -07:00
Balaji Varadarajan
a0391b7c01 LogFile comparator must handle log file names without write token for backwards compatibility 2019-06-06 10:00:31 -07:00
Thinking
66893bfef2 fix spark-shell add jar problem
jira link https://issues.apache.org/jira/browse/HUDI-101
issue link https://github.com/apache/incubator-hudi/issues/516#issue-386048519

when using spark-shell with hoodie save data like :
```
./spark-shell --master yarn --jars /home/hdfs/software/spark/hoodie/hoodie-spark-bundle-0.4.8-SNAPSHOT.jar --conf spark.sql.hive.convertMetastoreParquet=false --packages com.databricks:spark-avro_2.11:4.0.0
```
and
```
inputDF.write.format("com.uber.hoodie")
        .option("hoodie.insert.shuffle.parallelism", "1") // any hoodie client config can be passed like this
        .option("hoodie.upsert.shuffle.parallelism", "1") // full list in HoodieWriteConfig & its package
        .option(DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY, HoodieTableType.COPY_ON_WRITE.name())
        .option(DataSourceWriteOptions.OPERATION_OPT_KEY, DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL) // insert
        .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "_row_key")
        .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, "partition")
        .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, "extend_deal_date")
        .option(HoodieWriteConfig.TABLE_NAME, "c_upload_code")
        .mode(SaveMode.Overwrite)
        .save("/tmp/test/hoodie")
```
It also report error  `Invalid signature file digest for Manifest main attributes`. Need to scan all infected dependency.
2019-06-03 15:01:43 -07:00
Vinoth Chandar
7b4a28ecf8 Move depedency repos to https urls 2019-05-31 20:37:03 -07:00
Vinoth Chandar
acd74129cd Create hoodie-utilities-bundle to host the shaded jar
- hoodie-utilities can now be pulled in as compile time dependency
  - Lets users test their DeltaStreamer transformers for e.g
  - Tested the docker demo works & takes in the bundle
  - Doc changes to follow, to move DeltaStreamer commands to bundle jar
2019-05-30 22:46:24 -07:00
Vinoth Chandar
a5e2439514 Turn off noisy test 2019-05-30 21:35:12 -07:00
Vinoth Chandar
3b916ec1af Add support for maven deploy plugin to make snapshot releases 2019-05-30 21:35:12 -07:00
guanjianhui
6b5abb5d92 fix maven pom 2019-05-29 16:16:29 -07:00
Balaji Varadarajan
d860fb18b6 HUDI-139 Compaction running twice due to duplicate "map" transformation while finalizing compaction 2019-05-29 15:12:30 -07:00
vinothchandar
66c0b81b49 [maven-release-plugin] prepare for next development iteration 2019-05-28 19:17:26 -07:00
vinothchandar
227785c022 [maven-release-plugin] prepare release hoodie-0.4.7 2019-05-28 19:17:15 -07:00
Vinoth Chandar
a1f287d359 Release notes for 0.4.7 2019-05-28 18:28:59 -07:00
Balaji Varadarajan
93f8f12a30 HUDI-135 - Skip Meta folder when looking for partitions 2019-05-28 17:37:10 -07:00
Balaji Varadarajan
33f5208c1e Only inflight commit timeline (.commit/.deltacommit) must be used when checking for sanity during compaction scheduling 2019-05-28 16:54:20 -07:00
Balaji Varadarajan
9c8f8212ef HUDI-134 - Disable inline compaction for Hoodie Demo 2019-05-28 11:19:48 -07:00
Balaji Varadarajan
d0d2fa0337 Reduce logging in unit-test runs 2019-05-24 23:43:54 -07:00
Venkat
f2d91a455e default implementation for HBase index qps allocator (#685)
* default implementation and configs for HBase index qps allocator

* Test for QPS allocator and address CR

* fix QPS allocator test
2019-05-24 18:43:46 -07:00
Balaji Varadarajan
99b0c72aa6 HUDI-131 Zero FIle Listing in Compactor run 2019-05-24 18:34:14 -07:00
Vinoth Chandar
4074c5eb23 Fixed HUDI-116 : Handle duplicate record keys across partitions
- Join based on HoodieKey and not RecordKey during tagging
 - Unit tests changed to run with duplicate keys
 - Special casing GlobalBloom to still join by recordkey
2019-05-24 18:32:49 -07:00
leiline
f120427607 HUDI-105 : Fix up offsets not available on leader exception (#650)
* Fix up offsets not available on leader exception
2019-05-23 19:32:31 -07:00