Balaji Varadarajan
f3418e4718
Docker Container Build and Run setup with foundations for adding docker integration tests. Docker images built with Hadoop 2.8.4 Hive 2.3.3 and Spark 2.3.1 and published to docker-hub
...
Look at quickstart document for how to setup docker and run demo
2018-10-02 09:28:21 +05:30
vinothchandar
7ba842c0fe
[maven-release-plugin] prepare for next development iteration
2018-09-28 11:27:00 +05:30
vinothchandar
5847b61f44
[maven-release-plugin] prepare release hoodie-0.4.4
2018-09-28 11:26:15 +05:30
Balaji Varadarajan
4c74dd4cad
Travis CI tests needs to be run in quieter mode (WARN log level) to avoid max log-size errors
2018-09-26 21:10:20 +05:30
Balaji Varadarajan
460e24e84b
Hive Sync handling must work for datasets with multi-partition keys
2018-09-20 16:53:26 +05:30
Balaji Varadarajan
5cb28e7b1f
Explicitly release resources in LogFileReader and TestHoodieClientBase
2018-09-20 13:24:57 +05:30
Vinoth Chandar
bd5af89f12
[maven-release-plugin] rollback the release of hoodie-0.4.4
2018-09-13 15:01:53 +05:30
Vinoth Chandar
d1cc864a43
[maven-release-plugin] prepare for next development iteration
2018-09-12 23:59:47 +05:30
Vinoth Chandar
b748bc836d
[maven-release-plugin] prepare release hoodie-0.4.4
2018-09-12 23:59:34 +05:30
Vinoth Chandar
eca49a255e
Rebasing and fixing conflicts against master
2018-09-11 11:03:30 +05:30
Vinoth Chandar
a5359662be
Moving depedencies off cdh to apache + Hive2 support
...
- Tests redone in the process
- Main changes are to RealtimeRecordReader and how it treats maps/arrays
- Make hive sync work with Hive 1/2 and CDH environments
- Fixes to make corner cases for Hive queries
- Spark Hive integration - Working version across Apache and CDH versions
- Known Issue - https://github.com/uber/hudi/issues/439
2018-09-11 11:03:30 +05:30
Nishith Agarwal
459e523d9e
1. Small file size handling for inserts into log files. In summary, the total size of the log file is compared with the parquet max file size and if there is scope to add inserts the add it.
2018-09-06 08:52:08 +08:00
Vinoth Chandar
89cd6b0726
[maven-release-plugin] prepare for next development iteration
2018-08-22 21:30:05 -07:00
Vinoth Chandar
8d305c5a86
[maven-release-plugin] prepare release hoodie-0.4.3
2018-08-22 21:29:53 -07:00
Balaji Varadarajan
2e12c86d01
Ensure Compaction Operation compacts the data file as defined in the workload
2018-08-07 08:19:50 -07:00
Balaji Varadarajan
2f8ce93030
Async Compaction Main API changes
2018-08-07 08:19:50 -07:00
Vinoth Chandar
34827d50e1
[maven-release-plugin] prepare for next development iteration
2018-06-11 08:59:13 -07:00
Vinoth Chandar
43ef385730
[maven-release-plugin] prepare release hoodie-0.4.2
2018-06-11 08:59:02 -07:00
Balaji Varadarajan
788e4f2d2e
CodeStyle formatting to conform to basic Checkstyle rules.
...
The code-style rules follow google style with some changes:
1. Increase line length from 100 to 120
2. Disable JavaDoc related checkstyles as this needs more manual work.
Both source and test code are checked for code-style
2018-03-30 11:09:40 -07:00
Nishith Agarwal
9dff8c2326
Adding a tool to read/inspect a HoodieLogFile
2018-03-15 16:48:28 -07:00
Jian Xu
7f079632a6
Use hadoopConf in HoodieTableMetaClient and related tests
2018-03-12 11:47:55 -07:00
Vinoth Chandar
73534d467f
[maven-release-plugin] prepare for next development iteration
2018-03-07 21:04:10 -08:00
Vinoth Chandar
f2e5c6f9f8
[maven-release-plugin] prepare release hoodie-0.4.1
2018-03-07 21:04:00 -08:00
Nishith Agarwal
5405a6287b
Introducing HoodieLogFormat V2 with versioning support
...
- HoodieLogFormat V2 has support for LogFormat evolution through versioning
- LogVersion is associated with a LogBlock not a LogFile
- Based on a version for a LogBlock, approporiate code path is executed
- Implemented LazyReading of Hoodie Log Blocks with Memory / IO tradeoff
- Implemented Reverse pointer to be able to traverse the log in reverse
- Introduce new MAGIC for backwards compatibility with logs without versions
2018-03-06 21:14:11 -08:00
Vinoth Chandar
0cd186c899
Multi FS Support
...
- Reviving PR 191, to make FileSystem creation off actual path
- Streamline all filesystem access to HoodieTableMetaClient
- Hadoop Conf from Spark Context serialized & passed to executor code too
- Pick up env vars prefixed with HOODIE_ENV_ into Configuration object
- Cleanup usage of FSUtils.getFS, piggybacking off HoodieTableMetaClient.getFS
- Adding s3a to supported schemes & support escaping "." in env vars
- Tests use HoodieTestUtils.getDefaultHadoopConf
2018-01-17 23:34:21 -08:00
Nishith Agarwal
44839b88c6
Removing compaction action type and associated compaction timeline operations, replace with commit action type
2018-01-09 09:56:15 -08:00
Nishith Agarwal
051f600b7f
Enable hive sync even if there is no compaction commit
2017-11-30 18:22:58 -08:00
Vinoth Chandar
e45679f5e2
Reformatting code per Google Code Style all over
2017-11-12 23:19:02 -08:00
Nishith Agarwal
abe964bebd
Implementing custom payload/merge hooks abstractions for application specific merge logic
2017-11-07 18:55:55 -08:00
Nishith Agarwal
c7d63a7622
1) Separated rollback as a table operation 2) Implement rollback for MOR
2017-10-12 07:36:46 -07:00
Vinoth Chandar
e1fe3ab937
[maven-release-plugin] prepare for next development iteration
2017-10-02 22:42:54 -07:00
Vinoth Chandar
50139fe904
[maven-release-plugin] prepare release hoodie-0.4.0
2017-10-02 22:42:32 -07:00
Nishith Agarwal
19c22b231e
1. Use HoodieLogFormat to archive commits and other actions 2. Introduced avro schema for commits and compactions and an avro wrapper schema
2017-07-26 14:27:44 -07:00
Prasanna Rajaperumal
7d3963b4ab
Pushing master to 0.4.0 as we continue to make minor releases over 0.3.8 (MVP for MOR)
2017-06-30 11:41:23 -07:00
Nishith Agarwal
3eba812a1b
[maven-release-plugin] prepare for next development iteration
2017-06-30 11:17:07 -07:00
Nishith Agarwal
06d44daea3
[maven-release-plugin] prepare release hoodie-0.3.9
2017-06-30 11:16:58 -07:00
Nishith Agarwal
e5d9b818bc
Sync Tool registers 2 tables, RO and RT Tables
2017-06-28 15:41:36 -07:00
Vinoth Chandar
c00f1a9ed9
Refactoring HoodieTableFileSystemView using FileGroups/FileSlices
...
- Merged all filter* and get* methods
- new constructor takes filestatus[]
- All existing tests pass
- FileGroup is all files that belong to a fileID within a partition
- FileSlice is a generation of data and log files, starting at a base commit
2017-06-22 17:16:13 -07:00
gekath
db7311f85e
Writes relative paths to .commit files instead of absolute paths
...
Clean up code
Removed commented out code
Fixed merge conflict with master
2017-06-16 12:51:19 -07:00
Prasanna Rajaperumal
0ed3fac5e3
[maven-release-plugin] prepare for next development iteration
2017-06-16 11:03:17 -07:00
Prasanna Rajaperumal
45732e440c
[maven-release-plugin] prepare release hoodie-0.3.8
2017-06-16 10:59:58 -07:00
Prasanna Rajaperumal
4b26be9f61
Fixes to RealtimeInputFormat and RealtimeRecordReader and update documentation for HiveSyncTool
2017-06-15 18:21:07 -07:00
Prasanna Rajaperumal
db6150c5ef
Refactor hoodie-hive
2017-06-09 13:06:33 -07:00
Prasanna Rajaperumal
933cc8071f
[maven-release-plugin] prepare for next development iteration
2017-05-24 14:02:50 -07:00
Prasanna Rajaperumal
bebae06b5b
[maven-release-plugin] prepare release hoodie-0.3.7
2017-05-24 14:02:41 -07:00
Prasanna Rajaperumal
240c91241b
Implement HoodieLogFormat replacing Avro as the default log format
2017-05-23 08:35:11 -07:00
Prasanna Rajaperumal
c3258039f0
[maven-release-plugin] prepare for next development iteration
2017-04-27 11:00:56 -07:00
Prasanna Rajaperumal
de1bdad756
[maven-release-plugin] prepare release hoodie-0.3.6
2017-04-27 11:00:45 -07:00
Vinoth Chandar
82b211d2e6
Rebase with generic partition support
2017-04-03 21:27:49 -07:00
Vinoth Chandar
542d622e49
Adding HiveSyncTool to sync hoodie dataset schema/partitions to Hive
...
- Designed to be run by your workflow manager after hoodie upsert
- Assumes jdbc connectivity via HiveServer2, which should work with all major distros
2017-04-03 21:27:49 -07:00