Balaji Varadarajan
30c5f8b7bd
Ensure Hoodie works for non-partitioned Hive table
2018-12-12 13:35:16 -08:00
Nishith Agarwal
7243ce40c9
Serializing the complete payload object instead of serializing just the GenericRecord
...
Removing Converter hierarchy as we now depend purely on JavaSerialization and require the payload to be java serializable
2018-12-04 11:43:41 -08:00
Balaji Varadarajan
f999e4960c
Avoid WriteStatus collect() call when committing batch
2018-11-28 10:41:49 -08:00
Nishith Agarwal
88274b8261
Adding another metric to HoodieWriteStat to determine if there were inserts converted to updates, added one test for this
2018-08-14 06:22:16 -07:00
Balaji Varadarajan
2e12c86d01
Ensure Compaction Operation compacts the data file as defined in the workload
2018-08-07 08:19:50 -07:00
Nishith Agarwal
3da063f83b
Adding ability for inserts to be written to log files
2018-06-11 14:08:59 -07:00
Balaji Varadarajan
dfc0c61eb7
Support union mode in HoodieRealtimeRecordReader for pure insert workloads
...
Also Replace BufferedIteratorPayload abstraction with function passing
2018-05-10 17:39:56 -07:00
Nishith Agarwal
93f345a032
Minor fixes for MergeOnRead MVP release readiness
2018-05-09 07:23:58 -07:00
Nishith Agarwal
04655e9e85
Adding metrics for MOR and COW
2018-04-26 09:32:45 -07:00
Sunil Ramaiah
4d1fba24c9
Fix for updating duplicate records in same/different files in same parition
2018-04-25 13:09:15 -07:00
Nishith Agarwal
720e42f52a
Parallelized read-write operations in Hoodie Merge phase
2018-04-12 11:46:42 -07:00
Balaji Varadarajan
788e4f2d2e
CodeStyle formatting to conform to basic Checkstyle rules.
...
The code-style rules follow google style with some changes:
1. Increase line length from 100 to 120
2. Disable JavaDoc related checkstyles as this needs more manual work.
Both source and test code are checked for code-style
2018-03-30 11:09:40 -07:00
Nishith Agarwal
987f5d6b96
Making ExternalSpillableMap generic for any datatype
...
- Introduced concept of converters to be able to serde generic datatype for SpillableMap
- Fixed/Added configs to Hoodie Configs
- Changed HoodieMergeHandle to start using SpillableMap
2018-03-28 07:56:07 -07:00
Jian Xu
d3df32fa03
Add back UseTempFolder changes in HoodieMergeHandle
2018-03-15 17:11:15 -07:00
Omkar Joshi
c5b4cb1b75
Spawning parallel writer thread to separate reading records from spark and writing records to parquet file
2018-03-15 16:58:14 -07:00
Nishith Agarwal
6fec9655a8
Added support for Disk Spillable Compaction to prevent OOM issues
2018-02-26 16:00:35 -08:00
Nishith Agarwal
7076c2e9f0
refactor classes to accept Map passed by RealtimeCompactor to avoid multiple map creations in HoodieMergeHandle
2018-02-07 11:16:01 -08:00
Jian Xu
15e669c60c
Incorporating code review feedback for finalizeWrite for COW #4
2018-02-02 11:38:25 -08:00
Jian Xu
363e35bb0f
Add finalizeWrite support for HoodieMergeHandle
2018-02-02 11:38:25 -08:00
Nishith Agarwal
e10100fe32
Reducing list status calls from listing logfile versions, some associated refactoring
2018-01-29 08:26:39 -08:00
Vinoth Chandar
e45679f5e2
Reformatting code per Google Code Style all over
2017-11-12 23:19:02 -08:00
Kaushik Devarajaiah
c98ee057fc
capture record metadata before deflating for record counting
2017-10-02 10:46:06 -07:00
Vinoth Chandar
c00f1a9ed9
Refactoring HoodieTableFileSystemView using FileGroups/FileSlices
...
- Merged all filter* and get* methods
- new constructor takes filestatus[]
- All existing tests pass
- FileGroup is all files that belong to a fileID within a partition
- FileSlice is a generation of data and log files, starting at a base commit
2017-06-22 17:16:13 -07:00
Vinoth Chandar
23e7badd8a
Rename IO Handles & introduce stub for BucketedIndex
...
- UpdateHandle -> MergeHandle, InsertHandle -> CreateHandle
- Also bunch of code cleanup in different places
2017-06-22 17:16:13 -07:00