1
0
Commit Graph

569 Commits

Author SHA1 Message Date
leesf
8b150a3c6b [HUDI-230] Add missing Apache License in some files 2019-08-30 09:38:28 -07:00
Balaji Varadarajan
376b59ae5f [HUDI-227] : DeltaStreamer Improvements : Commit empty input batch with progressing checkpoints and allow users to override configs through properties. Original PR : PR-805 and PR-806 (#863) 2019-08-30 09:13:34 -07:00
Balaji Varadarajan
a6908ef44d HUDI-170 Updating hoodie record before inserting it into ExternalSpillableMap (#866) 2019-08-30 09:03:37 -07:00
leesf
40dd4dd637 [HUDI-229] Fix mvn notice:generate issue in windows 2019-08-30 00:16:24 -07:00
leesf
5c2da6051e [HUDI-225] Create Hudi Timeline Server Fat Jar 2019-08-29 20:03:06 -07:00
Balaji Varadarajan
5f9fa82f47 HUDI-124 : Exclude jdk.tools from hadoop-common and update Notice files (#858) 2019-08-28 16:20:47 -07:00
leesf
00cfe72c5d [hotfix] change hoodie-timeline-*.jar to hudi-timeline-*.jar 2019-08-28 13:59:33 -07:00
leesf
b44f8521f2 [HUDI-222] Rename main class path to org.apache.hudi.timeline.service.TimelineService in run_server.sh 2019-08-28 13:59:33 -07:00
Alex Filipchik
41dbac6903 Fixed unit test 2019-08-28 06:19:43 -07:00
Alex Filipchik
b5d4da7958 Addressing comments 2019-08-28 06:19:43 -07:00
Alex Filipchik
baea4f3b82 Ignore dublicate of a compaction file 2019-08-28 06:19:43 -07:00
Alexander Filipchik
e0ab89b3ac [HUDI-223] Adding a way to infer target schema from the dataset after the transformation (#854)
- Adding a way to decouple target and source schema providers
- Adding flattening transformer
2019-08-28 04:48:38 -07:00
Vinoth Chandar
78e0721507 [HUDI-159] Precursor cleanup to reduce build warnings 2019-08-26 19:41:00 -07:00
Balaji Varadarajan
c265b4948f HUDI-128 Preparing POM for release and snapshot builds (#851) 2019-08-26 08:52:36 -07:00
vinoth chandar
cd090871a1 [HUDI-159]: Pom cleanup and removal of com.twitter.parquet
- Redo all classes based on org.parquet only
 - remove unuused dependencies like parquet-hadoop, common-configuration2
 - timeline-service does not build a fat jar anymore
 - Fix utilities and hadoop-mr bundles based on above
2019-08-25 16:01:14 -07:00
vinoth chandar
6edf0b9def [HUDI-68] Pom cleanup & demo automation (#846)
- [HUDI-172] Cleanup Maven POM/Classpath
  - Fix ordering of dependencies in poms, to enable better resolution
  - Idea is to place more specific ones at the top
  - And place dependencies which use them below them
- [HUDI-68] : Automate demo steps on docker setup
 - Move hive queries from hive cli to beeline
 - Standardize on taking query input from text command files
 - Deltastreamer ingest, also does hive sync in a single step
 - Spark Incremental Query materialized as a derived Hive table using datasource
 - Fix flakiness in HDFS spin up and output comparison
 - Code cleanup around streamlining and loc reduction
 - Also fixed pom to not shade some hive classs in spark, to enable hive sync
2019-08-22 20:18:50 -07:00
Bhavani Sudha Saktheeswaran
92eed6aca8 [HUDI-82] Adds Presto integration in Docker demo (#847) 2019-08-22 19:40:36 -07:00
leesf
1b79ef7672 HUDI-212: Specify Charset to UTF-8 for IOUtils.toString (#837) 2019-08-16 08:27:19 -07:00
vinoyang
8f5e7ad5d9 [HUDI-205] Let checkstyle ban Java and Guava Optional instead of using Option provided by Hudi (#834) 2019-08-13 17:13:52 -07:00
Balaji Varadarajan
4787076c6d HUDI-204 : Make MOR rollback idempotent and disable using rolling stats for small file selection (#833) 2019-08-13 17:13:30 -07:00
Nishith Agarwal
8d37fbf0db Adding GPG Keys 2019-08-12 12:49:10 -07:00
Balaji Varadarajan
a4f9d7575f HUDI-123 Rename code packages/constants to org.apache.hudi (#830)
- Rename com.uber.hoodie to org.apache.hudi
- Flag to pass com.uber.hoodie Input formats for hoodie-sync
- Works with HUDI demo. 
- Also tested for backwards compatibility with datasets built by com.uber.hoodie packages
- Migration guide : https://cwiki.apache.org/confluence/display/HUDI/Migration+Guide+From+com.uber.hoodie+to+org.apache.hudi
2019-08-11 17:48:17 -07:00
yanghua
722b6be04a [HUDI-153] Use com.uber.hoodie.common.util.Option instead of Java and Guava Optional 2019-08-07 11:53:59 -07:00
garyli1019
d288e32833 HUDI-171 delete tmp file in addShutDownHook 2019-08-05 17:24:29 -07:00
Balaji Varadarajan
ec965892b0 HUDI-149 - Remove platform dependencies and update NOTICE plugin 2019-08-05 08:57:15 -07:00
n3nash
a066865bd6 - Adding HoodieCombineHiveInputFormat for COW tables (#811)
- Combine input format helps to reduce large scans into smaller ones by combining map tasks
- Implementation to support Hive 2.x and above
2019-08-03 08:44:01 -07:00
n3nash
1a29d46a57 - Fix realtime queries by removing COLUMN_ID and COLUMN_NAME cache in inputformat (#814)
- Hive on Spark will NOT work for RT tables after this patch
2019-08-02 16:06:34 -07:00
venkatr
86b5fcdd33 Cache RDD to avoid recomputing data ingestion. Return result RDD after updating index so that this step is not skipped by chained actions on the same RDD 2019-08-02 12:40:14 -07:00
Balaji Varadarajan
8139ffd94c HUDI-197 Hive Sync and othe CLIs using bundle picking sources jar instead of binary jar 2019-08-02 09:07:45 -07:00
vinothchandar
8ddfa2ecda HUDI-178 : Add keys for vinoth to KEYS file 2019-08-02 05:25:44 -07:00
Anbu Cheeralan
69d2afd0a9 Update Keys with anchee@apache.org 2019-08-01 12:30:02 -07:00
Luke Zhu
171901a9d0 Fix typo in hoodie-presto-bundle (#818) 2019-08-01 08:51:57 -07:00
Balaji Varadarajan
6e0ff3a235 Generate Source Jars for bundle packages (#810) 2019-07-30 18:17:14 -07:00
Vinoth Chandar
e20b77be3b HUDI-92 : Making deltastreamer with DistributedTestSource also run locally
- Separating out the test data generators per partition
 - Minor logging improvements on IOHandle performance
2019-07-30 16:30:47 -07:00
vinoyang
68464c7d02 [HUDI-181] Fix the Bold markdown grammar issue of README file (#808) 2019-07-30 03:47:53 -07:00
eisig
e0648de2ef HUDI-175 - add an option to manually override the DeltaStreamer checkpoint (#798)
- Add cli option to allow override the checkpoint using `--checkpoint` 
- Persist overridden checkpoint into commit metadata
2019-07-29 10:40:02 -07:00
Balaji Varadarajan
9265c7cc36 Add balaji gpg key to KEYS file 2019-07-29 06:41:41 -07:00
Balaji Varadarajan
83dab21ae1 Allow HoodieWrapperFileSystem to wrap other proxy file-system implementations with no getScheme implementation (#793) 2019-07-24 21:31:46 -07:00
Balaji Varadarajan
0b451b3a58 HUDI-140 : GCS: Log File Reading not working due to difference in seek() behavior for EOF 2019-07-19 12:38:28 -07:00
eisig
9857c4b21c add jssc.stop() (#797) 2019-07-19 05:01:45 -07:00
n3nash
6efa16317c Fixing default value for avro 1.7 which assumes NULL value instead of a jsonnode that is null (#792) 2019-07-17 03:25:54 -07:00
Balaji Varadarajan
3d408ee96b HUDI-168 Ensure getFileStatus calls for files getting written is done after close() is called (#788) 2019-07-16 17:33:34 -07:00
eisig
c0593e7a13 fix HoodieLogFileReader (#787) 2019-07-15 13:25:55 -07:00
Balaji Varadarajan
ae3c02fb3f HUDI-162 : File System view must be built with correct timeline actions 2019-07-14 00:48:09 -07:00
Balaji Varadarajan
5823c1ebd7 HUDI-138 - Meta Files handling also need to support consistency guard 2019-07-13 22:02:55 -07:00
Yihua Guo
621c246fa9 [HUDI-161] Remove --key-generator-class CLI arg in HoodieDeltaStreamer and use key generator class specified in datasource properties. (#781) 2019-07-12 13:45:49 -07:00
Ho Tien Vu
11c4121f73 Fixed TableNotFoundException when write with structured streaming (#778)
- When write to a new hoodie table, if checkpoint dir is under target path, Spark will create the base path and thus skip initializing .hoodie which result in error

- apply .hoodie existent check for all save mode
2019-07-12 09:17:16 -07:00
Thinking Chen
62ecb2da62 when column type is decimal, should add precision and scale (#753) 2019-07-08 16:13:22 -07:00
Balaji Varadarajan
9f18a1ca80 Fixing bugs found during running hoodie demo (#760) 2019-06-28 17:49:23 -07:00
Ho Tien Vu
e48e35385a Added preemptive check for 'spark.scheduler.mode'
When running docker demo, NoSuchElementException was thrown because spark.scheduler.mode is not set.
Also we want to check before initializing the Spark Context to avoid polute the SparkConf
with unused config.
2019-06-25 13:39:41 -07:00