Balaji Varadarajan
c265b4948f
HUDI-128 Preparing POM for release and snapshot builds ( #851 )
2019-08-26 08:52:36 -07:00
vinoth chandar
cd090871a1
[HUDI-159]: Pom cleanup and removal of com.twitter.parquet
...
- Redo all classes based on org.parquet only
- remove unuused dependencies like parquet-hadoop, common-configuration2
- timeline-service does not build a fat jar anymore
- Fix utilities and hadoop-mr bundles based on above
2019-08-25 16:01:14 -07:00
vinoth chandar
6edf0b9def
[HUDI-68] Pom cleanup & demo automation ( #846 )
...
- [HUDI-172] Cleanup Maven POM/Classpath
- Fix ordering of dependencies in poms, to enable better resolution
- Idea is to place more specific ones at the top
- And place dependencies which use them below them
- [HUDI-68] : Automate demo steps on docker setup
- Move hive queries from hive cli to beeline
- Standardize on taking query input from text command files
- Deltastreamer ingest, also does hive sync in a single step
- Spark Incremental Query materialized as a derived Hive table using datasource
- Fix flakiness in HDFS spin up and output comparison
- Code cleanup around streamlining and loc reduction
- Also fixed pom to not shade some hive classs in spark, to enable hive sync
2019-08-22 20:18:50 -07:00
Bhavani Sudha Saktheeswaran
92eed6aca8
[HUDI-82] Adds Presto integration in Docker demo ( #847 )
2019-08-22 19:40:36 -07:00
leesf
1b79ef7672
HUDI-212: Specify Charset to UTF-8 for IOUtils.toString ( #837 )
2019-08-16 08:27:19 -07:00
vinoyang
8f5e7ad5d9
[HUDI-205] Let checkstyle ban Java and Guava Optional instead of using Option provided by Hudi ( #834 )
2019-08-13 17:13:52 -07:00
Balaji Varadarajan
4787076c6d
HUDI-204 : Make MOR rollback idempotent and disable using rolling stats for small file selection ( #833 )
2019-08-13 17:13:30 -07:00
Nishith Agarwal
8d37fbf0db
Adding GPG Keys
2019-08-12 12:49:10 -07:00
Balaji Varadarajan
a4f9d7575f
HUDI-123 Rename code packages/constants to org.apache.hudi ( #830 )
...
- Rename com.uber.hoodie to org.apache.hudi
- Flag to pass com.uber.hoodie Input formats for hoodie-sync
- Works with HUDI demo.
- Also tested for backwards compatibility with datasets built by com.uber.hoodie packages
- Migration guide : https://cwiki.apache.org/confluence/display/HUDI/Migration+Guide+From+com.uber.hoodie+to+org.apache.hudi
2019-08-11 17:48:17 -07:00
yanghua
722b6be04a
[HUDI-153] Use com.uber.hoodie.common.util.Option instead of Java and Guava Optional
2019-08-07 11:53:59 -07:00
garyli1019
d288e32833
HUDI-171 delete tmp file in addShutDownHook
2019-08-05 17:24:29 -07:00
Balaji Varadarajan
ec965892b0
HUDI-149 - Remove platform dependencies and update NOTICE plugin
2019-08-05 08:57:15 -07:00
n3nash
a066865bd6
- Adding HoodieCombineHiveInputFormat for COW tables ( #811 )
...
- Combine input format helps to reduce large scans into smaller ones by combining map tasks
- Implementation to support Hive 2.x and above
2019-08-03 08:44:01 -07:00
n3nash
1a29d46a57
- Fix realtime queries by removing COLUMN_ID and COLUMN_NAME cache in inputformat ( #814 )
...
- Hive on Spark will NOT work for RT tables after this patch
2019-08-02 16:06:34 -07:00
venkatr
86b5fcdd33
Cache RDD to avoid recomputing data ingestion. Return result RDD after updating index so that this step is not skipped by chained actions on the same RDD
2019-08-02 12:40:14 -07:00
Balaji Varadarajan
8139ffd94c
HUDI-197 Hive Sync and othe CLIs using bundle picking sources jar instead of binary jar
2019-08-02 09:07:45 -07:00
vinothchandar
8ddfa2ecda
HUDI-178 : Add keys for vinoth to KEYS file
2019-08-02 05:25:44 -07:00
Anbu Cheeralan
69d2afd0a9
Update Keys with anchee@apache.org
2019-08-01 12:30:02 -07:00
Luke Zhu
171901a9d0
Fix typo in hoodie-presto-bundle ( #818 )
2019-08-01 08:51:57 -07:00
Balaji Varadarajan
6e0ff3a235
Generate Source Jars for bundle packages ( #810 )
2019-07-30 18:17:14 -07:00
Vinoth Chandar
e20b77be3b
HUDI-92 : Making deltastreamer with DistributedTestSource also run locally
...
- Separating out the test data generators per partition
- Minor logging improvements on IOHandle performance
2019-07-30 16:30:47 -07:00
vinoyang
68464c7d02
[HUDI-181] Fix the Bold markdown grammar issue of README file ( #808 )
2019-07-30 03:47:53 -07:00
eisig
e0648de2ef
HUDI-175 - add an option to manually override the DeltaStreamer checkpoint ( #798 )
...
- Add cli option to allow override the checkpoint using `--checkpoint`
- Persist overridden checkpoint into commit metadata
2019-07-29 10:40:02 -07:00
Balaji Varadarajan
9265c7cc36
Add balaji gpg key to KEYS file
2019-07-29 06:41:41 -07:00
Balaji Varadarajan
83dab21ae1
Allow HoodieWrapperFileSystem to wrap other proxy file-system implementations with no getScheme implementation ( #793 )
2019-07-24 21:31:46 -07:00
Balaji Varadarajan
0b451b3a58
HUDI-140 : GCS: Log File Reading not working due to difference in seek() behavior for EOF
2019-07-19 12:38:28 -07:00
eisig
9857c4b21c
add jssc.stop() ( #797 )
2019-07-19 05:01:45 -07:00
n3nash
6efa16317c
Fixing default value for avro 1.7 which assumes NULL value instead of a jsonnode that is null ( #792 )
2019-07-17 03:25:54 -07:00
Balaji Varadarajan
3d408ee96b
HUDI-168 Ensure getFileStatus calls for files getting written is done after close() is called ( #788 )
2019-07-16 17:33:34 -07:00
eisig
c0593e7a13
fix HoodieLogFileReader ( #787 )
2019-07-15 13:25:55 -07:00
Balaji Varadarajan
ae3c02fb3f
HUDI-162 : File System view must be built with correct timeline actions
2019-07-14 00:48:09 -07:00
Balaji Varadarajan
5823c1ebd7
HUDI-138 - Meta Files handling also need to support consistency guard
2019-07-13 22:02:55 -07:00
Yihua Guo
621c246fa9
[HUDI-161] Remove --key-generator-class CLI arg in HoodieDeltaStreamer and use key generator class specified in datasource properties. ( #781 )
2019-07-12 13:45:49 -07:00
Ho Tien Vu
11c4121f73
Fixed TableNotFoundException when write with structured streaming ( #778 )
...
- When write to a new hoodie table, if checkpoint dir is under target path, Spark will create the base path and thus skip initializing .hoodie which result in error
- apply .hoodie existent check for all save mode
2019-07-12 09:17:16 -07:00
Thinking Chen
62ecb2da62
when column type is decimal, should add precision and scale ( #753 )
2019-07-08 16:13:22 -07:00
Balaji Varadarajan
9f18a1ca80
Fixing bugs found during running hoodie demo ( #760 )
2019-06-28 17:49:23 -07:00
Ho Tien Vu
e48e35385a
Added preemptive check for 'spark.scheduler.mode'
...
When running docker demo, NoSuchElementException was thrown because spark.scheduler.mode is not set.
Also we want to check before initializing the Spark Context to avoid polute the SparkConf
with unused config.
2019-06-25 13:39:41 -07:00
Jaimin Shah
17e878f721
adding support for complex keys ( #728 )
...
- Resolving the issue related to ambiguity in recordKey by creating and parsing json object as string.
- added unit test for ComplexKeyGenerator
- minor changes
2019-06-21 00:25:06 -07:00
Ron Barabash
1b61eb45e0
Adding support for optional skipping single archiving failures
2019-06-20 22:54:45 -07:00
Balaji Varadarajan
66c7fa2322
Reword confusing message and reducing the severity level
2019-06-20 22:46:09 -07:00
Balaji Varadarajan
8223127611
Add maprfs to storage schemes
2019-06-20 22:45:35 -07:00
Balaji Varadarajan
2c40e8419e
Ensure TableMetaClient and FileSystem instances have exclusive copy of Configuration
2019-06-20 14:05:00 -07:00
Balaji Varadarajan
a0d7ab2384
HUDI-70 : Making DeltaStreamer run in continuous mode with concurrent compaction
2019-06-18 17:48:14 -07:00
Balaji Varadarajan
3a210ef08e
Disable Notice Plugin
2019-06-18 11:33:26 -07:00
Balaji Varadarajan
a1483f2c5f
HUDI-148 Small File selection logic for MOR must skip fileIds selected for pending compaction correctly
2019-06-17 18:35:17 -07:00
vinoth chandar
8c9980f4f5
Update README.md
2019-06-17 18:19:34 -07:00
Nishith Agarwal
8e08d498c9
Reading baseCommitTime from the latest file slice as opposed to the tagged record value
2019-06-17 16:46:16 -07:00
Nishith Agarwal
129e433641
- Ugrading to Hive 2.x
...
- Eliminating in-memory deltaRecordsMap
- Use writerSchema to generate generic record needed by custom payloads
- changes to make tests work with hive 2.x
2019-06-13 12:46:14 -07:00
Balaji Varadarajan
cd7623e216
All Opened hoodie clients in tests needs to be closed
...
TestMergeOnReadTable must use embedded timeline server
2019-06-13 12:30:07 -07:00
Balaji Varadarajan
136f8478a3
TestMergeOnReadTable must use embedded timeline server
2019-06-12 19:08:09 -07:00