1
0
Commit Graph

457 Commits

Author SHA1 Message Date
Balaji Varadarajan
136f8478a3 TestMergeOnReadTable must use embedded timeline server 2019-06-12 19:08:09 -07:00
Balaji Varadarajan
04fc86b43d Turn on embedded server for all client tests 2019-06-12 18:14:55 -07:00
Balaji Varadarajan
1c943ab230 Ensure log files are consistently ordered when scanning 2019-06-12 16:16:37 -07:00
Vinoth Chandar
b791473a6d Introduce HoodieReadHandle abstraction into index
- Generalized BloomIndex to work with file ids instead of paths
 - Abstracted away Bloom filter checking into HoodieLookupHandle
 - Abstracted away range information retrieval into HoodieRangeInfoHandle
2019-06-12 10:46:14 -07:00
Balaji Varadarajan
51d122b5c3 Close Hoodie Clients which are opened to properly shutdown embedded timeline service 2019-06-11 20:22:14 -07:00
Balaji Varadarajan
065173211e HUDI-147 Compaction Inflight Rollback not deleting Marker directory 2019-06-09 11:45:54 -07:00
Balaji Varadarajan
479908fd20 HUDI-125 : Change License for all source files and update RAT configurations 2019-06-09 11:41:55 -07:00
Balaji Varadarajan
30b0f2636f Changes related to Licensing work
1. Go through dependencies list one round to ensure compliance. Generated current NOTICE list in all submodules (other apache projects like flink does this).
   To be on conservative side regarding licensing, NOTICE.txt lists all dependencies including transitive. Pending Compliance questions reported in https://issues.apache.org/jira/browse/LEGAL-461
2. Automate generating NOTICE.txt files to allow future package compliance issues be identified early as part of code-review process.
3. Added NOTICE.txt and LICENSE.txt to all HUDI jars
2019-06-07 17:58:57 -07:00
guanjianhui
173e0b6be4 exlude fasterxml and parquet from presto bundle 2019-06-07 11:33:43 -07:00
guanjianhui
b325cbff10 set codehaus.jackson modules to the same version 1.9.13 2019-06-07 11:33:43 -07:00
Balaji Varadarajan
45e65cc2f7 Auto generated Slack Channel Notifications setup 2019-06-07 06:46:00 -07:00
Balaji Varadarajan
5ae34db764 Replace Non-Compliant dnl.utils package with Apache 2.0 licensed alternative 2019-06-06 22:33:33 -07:00
Balaji Varadarajan
a0391b7c01 LogFile comparator must handle log file names without write token for backwards compatibility 2019-06-06 10:00:31 -07:00
Thinking
66893bfef2 fix spark-shell add jar problem
jira link https://issues.apache.org/jira/browse/HUDI-101
issue link https://github.com/apache/incubator-hudi/issues/516#issue-386048519

when using spark-shell with hoodie save data like :
```
./spark-shell --master yarn --jars /home/hdfs/software/spark/hoodie/hoodie-spark-bundle-0.4.8-SNAPSHOT.jar --conf spark.sql.hive.convertMetastoreParquet=false --packages com.databricks:spark-avro_2.11:4.0.0
```
and
```
inputDF.write.format("com.uber.hoodie")
        .option("hoodie.insert.shuffle.parallelism", "1") // any hoodie client config can be passed like this
        .option("hoodie.upsert.shuffle.parallelism", "1") // full list in HoodieWriteConfig & its package
        .option(DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY, HoodieTableType.COPY_ON_WRITE.name())
        .option(DataSourceWriteOptions.OPERATION_OPT_KEY, DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL) // insert
        .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "_row_key")
        .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, "partition")
        .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, "extend_deal_date")
        .option(HoodieWriteConfig.TABLE_NAME, "c_upload_code")
        .mode(SaveMode.Overwrite)
        .save("/tmp/test/hoodie")
```
It also report error  `Invalid signature file digest for Manifest main attributes`. Need to scan all infected dependency.
2019-06-03 15:01:43 -07:00
Vinoth Chandar
7b4a28ecf8 Move depedency repos to https urls 2019-05-31 20:37:03 -07:00
Vinoth Chandar
acd74129cd Create hoodie-utilities-bundle to host the shaded jar
- hoodie-utilities can now be pulled in as compile time dependency
  - Lets users test their DeltaStreamer transformers for e.g
  - Tested the docker demo works & takes in the bundle
  - Doc changes to follow, to move DeltaStreamer commands to bundle jar
2019-05-30 22:46:24 -07:00
Vinoth Chandar
a5e2439514 Turn off noisy test 2019-05-30 21:35:12 -07:00
Vinoth Chandar
3b916ec1af Add support for maven deploy plugin to make snapshot releases 2019-05-30 21:35:12 -07:00
guanjianhui
6b5abb5d92 fix maven pom 2019-05-29 16:16:29 -07:00
Balaji Varadarajan
d860fb18b6 HUDI-139 Compaction running twice due to duplicate "map" transformation while finalizing compaction 2019-05-29 15:12:30 -07:00
vinothchandar
66c0b81b49 [maven-release-plugin] prepare for next development iteration 2019-05-28 19:17:26 -07:00
vinothchandar
227785c022 [maven-release-plugin] prepare release hoodie-0.4.7 2019-05-28 19:17:15 -07:00
Vinoth Chandar
a1f287d359 Release notes for 0.4.7 2019-05-28 18:28:59 -07:00
Balaji Varadarajan
93f8f12a30 HUDI-135 - Skip Meta folder when looking for partitions 2019-05-28 17:37:10 -07:00
Balaji Varadarajan
33f5208c1e Only inflight commit timeline (.commit/.deltacommit) must be used when checking for sanity during compaction scheduling 2019-05-28 16:54:20 -07:00
Balaji Varadarajan
9c8f8212ef HUDI-134 - Disable inline compaction for Hoodie Demo 2019-05-28 11:19:48 -07:00
Balaji Varadarajan
d0d2fa0337 Reduce logging in unit-test runs 2019-05-24 23:43:54 -07:00
Venkat
f2d91a455e default implementation for HBase index qps allocator (#685)
* default implementation and configs for HBase index qps allocator

* Test for QPS allocator and address CR

* fix QPS allocator test
2019-05-24 18:43:46 -07:00
Balaji Varadarajan
99b0c72aa6 HUDI-131 Zero FIle Listing in Compactor run 2019-05-24 18:34:14 -07:00
Vinoth Chandar
4074c5eb23 Fixed HUDI-116 : Handle duplicate record keys across partitions
- Join based on HoodieKey and not RecordKey during tagging
 - Unit tests changed to run with duplicate keys
 - Special casing GlobalBloom to still join by recordkey
2019-05-24 18:32:49 -07:00
leiline
f120427607 HUDI-105 : Fix up offsets not available on leader exception (#650)
* Fix up offsets not available on leader exception
2019-05-23 19:32:31 -07:00
Balaji Varadarajan
2fe526d548 Allow users to set hoodie configs figs for Compactor, Cleaner and HDFSParquetImporter utility scripts 2019-05-23 17:35:53 -07:00
Balaji Varadarajan
145034c5fa Spark Stage retry handling 2019-05-21 14:49:51 -07:00
David Muto (pseudomuto)
3fd2fd6e9d Remove redundant string from file comp rdd 2019-05-21 13:07:32 -07:00
Balaji Varadarajan
a7e6cf5197 Support nested types for recordKey, partitionPath and combineKey 2019-05-18 07:14:58 -07:00
Vinoth Chandar
e43efa042f Downgrading fasterxml jackson to 2.6.7 to be spark compatible 2019-05-16 13:53:54 -07:00
Balaji Varadarajan
64fec64097 Timeline Service with Incremental View Syncing support 2019-05-16 13:25:33 -07:00
vinothchandar
446f99aa0f [maven-release-plugin] prepare for next development iteration 2019-05-14 07:29:22 -07:00
vinothchandar
cc38abecc8 [maven-release-plugin] prepare release hoodie-0.4.6 2019-05-14 07:29:11 -07:00
Vinoth Chandar
7002ca6775 Update release notes for 0.4.6 release 2019-05-14 05:16:58 -07:00
Balaji Varadarajan
6e1e626357 Minor CLI documentation change in delta-streamer 2019-05-14 04:05:47 -07:00
Nishith Agarwal
af46078a82 converting map task memory from mb to bytes 2019-05-13 21:23:30 -07:00
Balaji Varadarajan
9cce9abf4d Fix various errors found by long running delta-streamer tests
1. Parquet Avro schema mismatch errors when ingesting are sometimes silently ignored due to race-condition in BoundedInMemoryExecutor. This was reproducible when running long-running delta-streamer with wrong schema and it caused data-loss
  2. Fix behavior of Delta-Streamer to error out by default if there are any error records
  3. Fix a bug in tracking write errors in WriteStats. Earlier the write errors were tracking sampled errors as opposed to total errors.
  4. Delta Streamer does not commit the changes done as part of inline compaction as auto-commit is force disabled. Fix this behavior to always auto-commit inline compaction as it would not otherwise commit.
2019-05-13 10:47:34 -07:00
Vinoth Chandar
a0e62b7919 Bucketized Bloom Filter checking
- Tackles the skew seen in sort based partitioning/checking
 - Parameterized the HoodieBloomIndex test
 - Config to turn on/off (on by default)
 - Unit tests & also tested at scale
2019-05-11 16:38:28 -07:00
David Muto (pseudomuto)
4b27cc72bb Don't raise when spark-defaults.conf doesn't exist 2019-05-08 17:30:23 -07:00
Abhishek Sharma
e2dcef8606 HUDI-101: added exclusion filters for signature files. 2019-05-07 18:35:18 -07:00
Omkar Joshi
738635306b migrating kryo's dependency from twitter chill to plain kryo library 2019-05-06 20:32:00 -07:00
Nishith Agarwal
a33a55fcb5 Caching Avro Binary encoder/decoder to avoid creating new one for every record 2019-05-06 11:28:08 -07:00
Balaji Varadarajan
ee1feb7c75 Revert "HUDI-101: added mevn-shade plugin with filters."
Creates fat jars for all hoodie packages

This reverts commit f47f0eb6cb.
2019-05-05 18:39:38 -07:00
Abhishek Sharma
f47f0eb6cb HUDI-101: added mevn-shade plugin with filters. 2019-05-03 13:49:51 -07:00