1
0
Commit Graph

115 Commits

Author SHA1 Message Date
Nishith Agarwal
9dff8c2326 Adding a tool to read/inspect a HoodieLogFile 2018-03-15 16:48:28 -07:00
Jian Xu
7f079632a6 Use hadoopConf in HoodieTableMetaClient and related tests 2018-03-12 11:47:55 -07:00
Vinoth Chandar
73534d467f [maven-release-plugin] prepare for next development iteration 2018-03-07 21:04:10 -08:00
Vinoth Chandar
f2e5c6f9f8 [maven-release-plugin] prepare release hoodie-0.4.1 2018-03-07 21:04:00 -08:00
Nishith Agarwal
5405a6287b Introducing HoodieLogFormat V2 with versioning support
- HoodieLogFormat V2 has support for LogFormat evolution through versioning
			- LogVersion is associated with a LogBlock not a LogFile
			- Based on a version for a LogBlock, approporiate code path is executed
		- Implemented LazyReading of Hoodie Log Blocks with Memory / IO tradeoff
		- Implemented Reverse pointer to be able to traverse the log in reverse
		- Introduce new MAGIC for backwards compatibility with logs without versions
2018-03-06 21:14:11 -08:00
Jian Xu
dfd1979c51 Handle inflight clean instants during Hoodie instants archiving 2018-03-05 15:01:58 -08:00
Nishith Agarwal
6fec9655a8 Added support for Disk Spillable Compaction to prevent OOM issues 2018-02-26 16:00:35 -08:00
Jian Xu
3bdd750982 Use FastDateFormat for thread safety
Use FastDateFormat for thread safety, this is to fix an exception when a
job is used to ingest multiple tables.  An example exception:
```
Caused by: java.lang.NumberFormatException: multiple points
        at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1890)
        at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
        at java.lang.Double.parseDouble(Double.java:538)
        at java.text.DigitList.getDouble(DigitList.java:169)
        at java.text.DecimalFormat.parse(DecimalFormat.java:2056)
        at java.text.SimpleDateFormat.subParse(SimpleDateFormat.java:1867)
        at java.text.SimpleDateFormat.parse(SimpleDateFormat.java:1514)
        at java.text.DateFormat.parse(DateFormat.java:364)
        at com.uber.hoodie.HoodieWriteClient.commit(HoodieWriteClient.java:442)
```
2018-02-12 11:43:57 -08:00
Nishith Agarwal
7076c2e9f0 refactor classes to accept Map passed by RealtimeCompactor to avoid multiple map creations in HoodieMergeHandle 2018-02-07 11:16:01 -08:00
Nishith Agarwal
2116815261 Fixing Rollback for compaction/commit operation, added check for null commit
- Fallback to old way of rollback by listing all partitions
	- Added null check to ensure only partitions which are to be rolledback are considered
	- Added location (committime) to workload stat
	- Added checks in CompactedScanner to guard against task retries
	- Introduce new logic for rollback (bounded by instant_time and target_instant time)
        - Reversed logfiles order
2018-02-06 16:55:23 -08:00
Jian Xu
37f2cdd7e4 Incorporating code review feedback for finalizeWrite for COW #2 2018-02-02 11:38:25 -08:00
Jian Xu
c874248f23 Add FinalizeWrite in HoodieCreateHandle for COW tables 2018-02-02 11:38:25 -08:00
Nishith Agarwal
e10100fe32 Reducing list status calls from listing logfile versions, some associated refactoring 2018-01-29 08:26:39 -08:00
Nishith Agarwal
937ae322ba Reducing memory footprint required in HoodieAvroDataBlock and HoodieAppendHandle 2018-01-29 08:22:29 -08:00
vinothchandar
21ce846f18 Remove stateful fs member from HoodieTestUtils & FSUtils 2018-01-17 23:34:21 -08:00
vinothchandar
cf7f7aabb9 Nicer handling of timeline archival for Cloud storage
- When append() is not supported, rollover to new file always (instead of failing)
 - Provide way to configure archive log folder (avoids small files inside .hoodie)
 - Datasets written via Spark datasource archive to .hoodie/archived
 - HoodieClientExample will now retain only 2,3 commits to exercise archival path during dev cycles
 - Few tweaks to code structure around CommitArchiveLog
2018-01-17 23:34:21 -08:00
Vinoth Chandar
0cd186c899 Multi FS Support
- Reviving PR 191, to make FileSystem creation off actual path
 - Streamline all filesystem access to HoodieTableMetaClient
 - Hadoop Conf from Spark Context serialized & passed to executor code too
 - Pick up env vars prefixed with HOODIE_ENV_ into Configuration object
 - Cleanup usage of FSUtils.getFS, piggybacking off HoodieTableMetaClient.getFS
 - Adding s3a to supported schemes & support escaping "." in env vars
 - Tests use HoodieTestUtils.getDefaultHadoopConf
2018-01-17 23:34:21 -08:00
Nishith Agarwal
44839b88c6 Removing compaction action type and associated compaction timeline operations, replace with commit action type 2018-01-09 09:56:15 -08:00
Vinoth Chandar
e45679f5e2 Reformatting code per Google Code Style all over 2017-11-12 23:19:02 -08:00
Nishith Agarwal
abe964bebd Implementing custom payload/merge hooks abstractions for application specific merge logic 2017-11-07 18:55:55 -08:00
Nishith Agarwal
c7d63a7622 1) Separated rollback as a table operation 2) Implement rollback for MOR 2017-10-12 07:36:46 -07:00
Vinoth Chandar
e1fe3ab937 [maven-release-plugin] prepare for next development iteration 2017-10-02 22:42:54 -07:00
Vinoth Chandar
50139fe904 [maven-release-plugin] prepare release hoodie-0.4.0 2017-10-02 22:42:32 -07:00
Vinoth Chandar
274aaf49fe Incorporating code review feedback for DataSource 2017-10-02 20:44:53 -07:00
Vinoth Chandar
64e0573aca Adding hoodie-spark to support Spark Datasource for Hoodie
- Write with COW/MOR paths work fully
 - Read with RO view works on both storages*
 - Incremental view supported on COW
 - Refactored out HoodieReadClient methods, to just contain key based access
 - HoodieDataSourceHelpers class can be now used to construct inputs to datasource
 - Tests in hoodie-client using new helpers and mechanisms
 - Basic tests around save modes & insert/upserts (more to follow)
 - Bumped up scala to 2.11, since 2.10 is deprecated & complains with scalatest
 - Updated documentation to describe usage
 - New sample app written using the DataSource API
2017-10-02 20:44:53 -07:00
Kaushik Devarajaiah
c98ee057fc capture record metadata before deflating for record counting 2017-10-02 10:46:06 -07:00
Jian Xu
7e9a4a89dd Use getFileStatus to get single FileStatus for single file 2017-09-11 11:24:44 -07:00
Jian Xu
b1cf097b0c Add nested fields support for MOR tables 2017-08-16 10:35:26 -07:00
Nishith Agarwal
6a3c94aaa3 suppressing logs (under 4MB) for jenkins 2017-08-15 16:30:51 -07:00
Vinoth Chandar
86209640f7 Adding range based pruning to bloom index
- keys compared lexicographically using String::compareTo
 - Range metadata additionally written into parquet file footers
 - Trim fat & few optimizations to speed up indexing
 - Add param to control whether input shall be cached, to speed up lookup
 - Add param to turn on/off range pruning
 - Auto compute of parallelism now simply factors in amount of comparisons done
 - More accurate parallelism computation when range pruning is on
 - tests added & hardened, docs updated
2017-08-04 13:22:13 -07:00
Nishith Agarwal
0b26b60a5c fix for cleaning log files(mor) 2017-08-02 11:54:42 -07:00
Nishith Agarwal
19c22b231e 1. Use HoodieLogFormat to archive commits and other actions 2. Introduced avro schema for commits and compactions and an avro wrapper schema 2017-07-26 14:27:44 -07:00
Nishith Agarwal
616c9a68c3 Enabled deletes in merge_on_read 2017-07-26 13:37:27 -07:00
Prasanna Rajaperumal
7d3963b4ab Pushing master to 0.4.0 as we continue to make minor releases over 0.3.8 (MVP for MOR) 2017-06-30 11:41:23 -07:00
Nishith Agarwal
3eba812a1b [maven-release-plugin] prepare for next development iteration 2017-06-30 11:17:07 -07:00
Nishith Agarwal
06d44daea3 [maven-release-plugin] prepare release hoodie-0.3.9 2017-06-30 11:16:58 -07:00
Nishith Agarwal
348250d960 Using FsUtils instead of Files API to extract file extension 2017-06-29 19:26:31 -07:00
Prasanna Rajaperumal
5cc071f74e Savepoint should not create a hole in the commit timeline 2017-06-27 16:36:09 -07:00
Vinoth Chandar
754ab88a2d Introduce ReadOptimizedView & RealtimeView out of TableFileSystemView
- Usage now marks code as clearly using either RO or RT views, for future evolution
  - Tests on all of FileGroups and FileSlices
2017-06-22 17:16:13 -07:00
Vinoth Chandar
c00f1a9ed9 Refactoring HoodieTableFileSystemView using FileGroups/FileSlices
- Merged all filter* and get* methods
 - new constructor takes filestatus[]
 - All existing tests pass
 - FileGroup is all files that belong to a fileID within a partition
 - FileSlice is a generation of data and log files, starting at a base commit
2017-06-22 17:16:13 -07:00
gekath
52c507f83e Writes relative paths to .commit files
Handle case where path is read in as null from commit file

Merged with updated release
2017-06-16 12:51:19 -07:00
gekath
db7311f85e Writes relative paths to .commit files instead of absolute paths
Clean up code

Removed commented out code

Fixed merge conflict with master
2017-06-16 12:51:19 -07:00
Prasanna Rajaperumal
0ed3fac5e3 [maven-release-plugin] prepare for next development iteration 2017-06-16 11:03:17 -07:00
Prasanna Rajaperumal
45732e440c [maven-release-plugin] prepare release hoodie-0.3.8 2017-06-16 10:59:58 -07:00
Prasanna Rajaperumal
4b26be9f61 Fixes to RealtimeInputFormat and RealtimeRecordReader and update documentation for HiveSyncTool 2017-06-15 18:21:07 -07:00
Kaushik Devarajaiah
521555c576 Parallelize file version deletes during clean and related tests 2017-06-15 18:20:42 -07:00
Prasanna Rajaperumal
db6150c5ef Refactor hoodie-hive 2017-06-09 13:06:33 -07:00
Danny Chen
c192dd60b4 Change from deprecated closeQuietly to try with resources 2017-06-05 19:11:53 -07:00
Nishith Agarwal
ba050973e3 updated HoodieRealtimeRecordReader to use HoodieCompactedLogRecordScanner, added test for recordreader 2017-06-02 11:33:59 -07:00
Prasanna Rajaperumal
933cc8071f [maven-release-plugin] prepare for next development iteration 2017-05-24 14:02:50 -07:00