1
0
Commit Graph

529 Commits

Author SHA1 Message Date
yanghua
90bfb900aa revert setting jsc spark configuration 2019-09-12 05:15:07 -07:00
yanghua
6f2b166005 [HUDI-217] Provide a unified resource management class to standardize the resource allocation and release for hudi client test cases 2019-09-12 05:15:07 -07:00
Bhavani Sudha Saktheeswaran
64df98fc4a [HUDI-164] Fixes incorrect averageBytesPerRecord
When number of records written is zero, averageBytesPerRecord results in a huge size (division by zero and ceiled to Long.MAX_VALUE) causing OOM. This commit fixes this issue by reverse traversing the commits until a more reasonable average record size can be computed and if that is not possible returns the default configured record size.
2019-09-11 15:20:25 -07:00
Balaji Varadarajan
93bc5e2153 HUDI-243 Rename HoodieInputFormat and HoodieRealtimeInputFormat to HoodieParquetInputFormat and HoodieParquetRealtimeInputFormat 2019-09-11 14:03:01 -07:00
Vinoth Chandar
d0b9b56b7d [HUDI-143] Excluding javax.* from utilities and spark bundles
- Plus minor code review comments
2019-09-11 11:08:27 -07:00
vinoth chandar
7a973a6944 [HUDI-159] Redesigning bundles for lighter-weight integrations
- Documented principles applied for redesign at packaging/README.md
 - No longer depends on incl commons-codec, commons-io, commons-pool, commons-dbcp, commons-lang, commons-logging, avro-mapred
 - Introduce new FileIOUtils & added checkstyle rule for illegal import of above
 - Parquet, Avro dependencies moved to provided scope to enable being picked up from Hive/Spark/Presto instead
 - Pickup jackson jars for Hive sync tool from HIVE_HOME & unbundling jackson everywhere
 - Remove hive-jdbc standalone jar from being bundled in Spark/Hive/Utilities bundles
 - 6.5x reduced number of classes across bundles
2019-09-11 11:08:27 -07:00
Mehrotra
0e6f078ec4 Fix logging in HoodieSparkSqlWriter 2019-09-07 07:51:11 -07:00
leesf
07a0ea87ab [hotfix] fix typo 2019-09-06 08:31:30 -07:00
leesf
821e0dcffc [HUDI-236] Failed to close stream 2019-09-03 19:24:11 -07:00
Alex Filipchik
555dd55c16 Support nested ordering fields 2019-08-30 13:41:16 -07:00
leesf
8b150a3c6b [HUDI-230] Add missing Apache License in some files 2019-08-30 09:38:28 -07:00
Balaji Varadarajan
376b59ae5f [HUDI-227] : DeltaStreamer Improvements : Commit empty input batch with progressing checkpoints and allow users to override configs through properties. Original PR : PR-805 and PR-806 (#863) 2019-08-30 09:13:34 -07:00
Balaji Varadarajan
a6908ef44d HUDI-170 Updating hoodie record before inserting it into ExternalSpillableMap (#866) 2019-08-30 09:03:37 -07:00
leesf
40dd4dd637 [HUDI-229] Fix mvn notice:generate issue in windows 2019-08-30 00:16:24 -07:00
leesf
5c2da6051e [HUDI-225] Create Hudi Timeline Server Fat Jar 2019-08-29 20:03:06 -07:00
Balaji Varadarajan
5f9fa82f47 HUDI-124 : Exclude jdk.tools from hadoop-common and update Notice files (#858) 2019-08-28 16:20:47 -07:00
leesf
00cfe72c5d [hotfix] change hoodie-timeline-*.jar to hudi-timeline-*.jar 2019-08-28 13:59:33 -07:00
leesf
b44f8521f2 [HUDI-222] Rename main class path to org.apache.hudi.timeline.service.TimelineService in run_server.sh 2019-08-28 13:59:33 -07:00
Alex Filipchik
41dbac6903 Fixed unit test 2019-08-28 06:19:43 -07:00
Alex Filipchik
b5d4da7958 Addressing comments 2019-08-28 06:19:43 -07:00
Alex Filipchik
baea4f3b82 Ignore dublicate of a compaction file 2019-08-28 06:19:43 -07:00
Alexander Filipchik
e0ab89b3ac [HUDI-223] Adding a way to infer target schema from the dataset after the transformation (#854)
- Adding a way to decouple target and source schema providers
- Adding flattening transformer
2019-08-28 04:48:38 -07:00
Vinoth Chandar
78e0721507 [HUDI-159] Precursor cleanup to reduce build warnings 2019-08-26 19:41:00 -07:00
Balaji Varadarajan
c265b4948f HUDI-128 Preparing POM for release and snapshot builds (#851) 2019-08-26 08:52:36 -07:00
vinoth chandar
cd090871a1 [HUDI-159]: Pom cleanup and removal of com.twitter.parquet
- Redo all classes based on org.parquet only
 - remove unuused dependencies like parquet-hadoop, common-configuration2
 - timeline-service does not build a fat jar anymore
 - Fix utilities and hadoop-mr bundles based on above
2019-08-25 16:01:14 -07:00
vinoth chandar
6edf0b9def [HUDI-68] Pom cleanup & demo automation (#846)
- [HUDI-172] Cleanup Maven POM/Classpath
  - Fix ordering of dependencies in poms, to enable better resolution
  - Idea is to place more specific ones at the top
  - And place dependencies which use them below them
- [HUDI-68] : Automate demo steps on docker setup
 - Move hive queries from hive cli to beeline
 - Standardize on taking query input from text command files
 - Deltastreamer ingest, also does hive sync in a single step
 - Spark Incremental Query materialized as a derived Hive table using datasource
 - Fix flakiness in HDFS spin up and output comparison
 - Code cleanup around streamlining and loc reduction
 - Also fixed pom to not shade some hive classs in spark, to enable hive sync
2019-08-22 20:18:50 -07:00
Bhavani Sudha Saktheeswaran
92eed6aca8 [HUDI-82] Adds Presto integration in Docker demo (#847) 2019-08-22 19:40:36 -07:00
leesf
1b79ef7672 HUDI-212: Specify Charset to UTF-8 for IOUtils.toString (#837) 2019-08-16 08:27:19 -07:00
vinoyang
8f5e7ad5d9 [HUDI-205] Let checkstyle ban Java and Guava Optional instead of using Option provided by Hudi (#834) 2019-08-13 17:13:52 -07:00
Balaji Varadarajan
4787076c6d HUDI-204 : Make MOR rollback idempotent and disable using rolling stats for small file selection (#833) 2019-08-13 17:13:30 -07:00
Nishith Agarwal
8d37fbf0db Adding GPG Keys 2019-08-12 12:49:10 -07:00
Balaji Varadarajan
a4f9d7575f HUDI-123 Rename code packages/constants to org.apache.hudi (#830)
- Rename com.uber.hoodie to org.apache.hudi
- Flag to pass com.uber.hoodie Input formats for hoodie-sync
- Works with HUDI demo. 
- Also tested for backwards compatibility with datasets built by com.uber.hoodie packages
- Migration guide : https://cwiki.apache.org/confluence/display/HUDI/Migration+Guide+From+com.uber.hoodie+to+org.apache.hudi
2019-08-11 17:48:17 -07:00
yanghua
722b6be04a [HUDI-153] Use com.uber.hoodie.common.util.Option instead of Java and Guava Optional 2019-08-07 11:53:59 -07:00
garyli1019
d288e32833 HUDI-171 delete tmp file in addShutDownHook 2019-08-05 17:24:29 -07:00
Balaji Varadarajan
ec965892b0 HUDI-149 - Remove platform dependencies and update NOTICE plugin 2019-08-05 08:57:15 -07:00
n3nash
a066865bd6 - Adding HoodieCombineHiveInputFormat for COW tables (#811)
- Combine input format helps to reduce large scans into smaller ones by combining map tasks
- Implementation to support Hive 2.x and above
2019-08-03 08:44:01 -07:00
n3nash
1a29d46a57 - Fix realtime queries by removing COLUMN_ID and COLUMN_NAME cache in inputformat (#814)
- Hive on Spark will NOT work for RT tables after this patch
2019-08-02 16:06:34 -07:00
venkatr
86b5fcdd33 Cache RDD to avoid recomputing data ingestion. Return result RDD after updating index so that this step is not skipped by chained actions on the same RDD 2019-08-02 12:40:14 -07:00
Balaji Varadarajan
8139ffd94c HUDI-197 Hive Sync and othe CLIs using bundle picking sources jar instead of binary jar 2019-08-02 09:07:45 -07:00
vinothchandar
8ddfa2ecda HUDI-178 : Add keys for vinoth to KEYS file 2019-08-02 05:25:44 -07:00
Anbu Cheeralan
69d2afd0a9 Update Keys with anchee@apache.org 2019-08-01 12:30:02 -07:00
Luke Zhu
171901a9d0 Fix typo in hoodie-presto-bundle (#818) 2019-08-01 08:51:57 -07:00
Balaji Varadarajan
6e0ff3a235 Generate Source Jars for bundle packages (#810) 2019-07-30 18:17:14 -07:00
Vinoth Chandar
e20b77be3b HUDI-92 : Making deltastreamer with DistributedTestSource also run locally
- Separating out the test data generators per partition
 - Minor logging improvements on IOHandle performance
2019-07-30 16:30:47 -07:00
vinoyang
68464c7d02 [HUDI-181] Fix the Bold markdown grammar issue of README file (#808) 2019-07-30 03:47:53 -07:00
eisig
e0648de2ef HUDI-175 - add an option to manually override the DeltaStreamer checkpoint (#798)
- Add cli option to allow override the checkpoint using `--checkpoint` 
- Persist overridden checkpoint into commit metadata
2019-07-29 10:40:02 -07:00
Balaji Varadarajan
9265c7cc36 Add balaji gpg key to KEYS file 2019-07-29 06:41:41 -07:00
Balaji Varadarajan
83dab21ae1 Allow HoodieWrapperFileSystem to wrap other proxy file-system implementations with no getScheme implementation (#793) 2019-07-24 21:31:46 -07:00
Balaji Varadarajan
0b451b3a58 HUDI-140 : GCS: Log File Reading not working due to difference in seek() behavior for EOF 2019-07-19 12:38:28 -07:00
eisig
9857c4b21c add jssc.stop() (#797) 2019-07-19 05:01:45 -07:00