Balaji Varadarajan
2e12c86d01
Ensure Compaction Operation compacts the data file as defined in the workload
2018-08-07 08:19:50 -07:00
Balaji Varadarajan
2f8ce93030
Async Compaction Main API changes
2018-08-07 08:19:50 -07:00
Balaji Varadarajan
9b78523d62
Ensure Cleaner and Archiver do not delete file-slices and workload marked for compaction
2018-08-07 08:19:50 -07:00
Balaji Varadarajan
0a0451a765
Ensure Compaction workload is stored in write-once meta-data files separate from timeline files.
...
This avoids concurrency issues when compactor(s) and ingestor are running in parallel.
In the Next PR -> Safety concern regarding Cleaner retaining all meta-data and file-slices for pending compactions will be addressed
2018-08-07 08:19:50 -07:00
Balaji Varadarajan
9d99942564
Track fileIds with pending compaction in FileSystemView to provide correct API semantics
2018-08-07 08:19:50 -07:00
Balaji Varadarajan
1b61f04e05
(1) Define CompactionWorkload in avro to allow storing them in instant files.
...
(2) Split APIs in HoodieRealtimeCompactor to separate generating compaction workload from running compaction
2018-08-07 08:19:50 -07:00
Balaji Varadarajan
6d01ae8ca0
FileSystemView and Timeline level changes to support Async Compaction
2018-08-07 08:19:50 -07:00
Nishith Agarwal
44caf0d40c
Fixing missing hoodie record location in HoodieRecord when record is read from disk after being spilled
2018-07-18 12:53:35 -07:00
Omkar Joshi
f62890ca1f
adding setters so that subclasses can set it
2018-07-18 12:53:11 -07:00
Nishith Agarwal
34ab54a9d3
Fixing bug introducted in rollback for MOR table type with inserts into log files
2018-07-17 17:20:34 -07:00
Nishith Agarwal
a6fe96fdfe
Changing Day based compaction strategy to be IO agnostic
2018-06-18 15:22:56 -07:00
Nishith Agarwal
3da063f83b
Adding ability for inserts to be written to log files
2018-06-11 14:08:59 -07:00
Vinoth Chandar
34827d50e1
[maven-release-plugin] prepare for next development iteration
2018-06-11 08:59:13 -07:00
Vinoth Chandar
43ef385730
[maven-release-plugin] prepare release hoodie-0.4.2
2018-06-11 08:59:02 -07:00
vinoth chandar
4f76f2899e
Update Release notes for 0.4.2 release
2018-06-11 08:41:11 -07:00
Xavier Jodoin
8ad8030f2a
Fix wrong use of TemporaryFolder junit rule
2018-06-10 23:31:42 -07:00
vinothchandar
8f1d362015
Fixing deps & serialization for RTView
...
- hoodie-hadoop-mr now needs objectsize bundled
- Also updated docs with additional tuning tips
2018-06-10 19:16:44 -07:00
Vinoth Chandar
85dd265b7b
Improving out of box experience for data source
...
- Fixes #246
- Bump up default parallelism to 1500, to handle large upserts
- Add docs on s3 confuration & tuning tips with tested spark knobs
- Fix bug to not duplicate hoodie metadata fields when input dataframe is another hoodie dataset
- Improve speed of ROTablePathFilter by removing directory check
- Move to spark-avro 4.0 to handle issue with nested fields with same name
- Keep AvroConversionUtils in sync with spark-avro 4.0
2018-06-10 19:16:44 -07:00
Sunil Ramaiah
a97814462d
Added a filter function to filter the record keys in a parquet file
2018-05-17 19:01:11 -07:00
Nishith Agarwal
23d53763c4
enabling global index for MOR
2018-05-16 10:36:25 -07:00
Balaji Varadarajan
dfc0c61eb7
Support union mode in HoodieRealtimeRecordReader for pure insert workloads
...
Also Replace BufferedIteratorPayload abstraction with function passing
2018-05-10 17:39:56 -07:00
Nishith Agarwal
93f345a032
Minor fixes for MergeOnRead MVP release readiness
2018-05-09 07:23:58 -07:00
Nishith Agarwal
75df72f575
Adding a fix/workaround when fs.append() unable to return a valid outputstream
2018-05-08 18:46:17 -07:00
Nishith Agarwal
04655e9e85
Adding metrics for MOR and COW
2018-04-26 09:32:45 -07:00
Balaji Varadarajan
c66004d79a
Add Support for ordering and limiting results in CLI show commands
2018-04-26 09:30:05 -07:00
Sunil Ramaiah
b9b9b24993
Added more comments and removed the extra new lines
2018-04-25 13:09:15 -07:00
Sunil Ramaiah
4d1fba24c9
Fix for updating duplicate records in same/different files in same parition
2018-04-25 13:09:15 -07:00
vinoth chandar
fa73a911cc
Update Gemfile.lock
2018-04-19 14:20:50 -07:00
Nishith Agarwal
c3c205fc02
Using BufferedFsInputStream to wrap FSInputStream for FSDataInputStream
2018-04-18 08:05:19 -07:00
Nishith Agarwal
720e42f52a
Parallelized read-write operations in Hoodie Merge phase
2018-04-12 11:46:42 -07:00
Balaji Varadarajan
6c226ca21a
Issue-329 : Refactoring TestHoodieClientOnCopyOnWriteStorage and adding test-cases
2018-04-09 16:34:58 -07:00
Vinoth Chandar
a4049329a5
Update release notes for 0.4.1 (post)
2018-04-02 09:31:01 -07:00
Balaji Varadarajan
788e4f2d2e
CodeStyle formatting to conform to basic Checkstyle rules.
...
The code-style rules follow google style with some changes:
1. Increase line length from 100 to 120
2. Disable JavaDoc related checkstyles as this needs more manual work.
Both source and test code are checked for code-style
2018-03-30 11:09:40 -07:00
Nishith Agarwal
987f5d6b96
Making ExternalSpillableMap generic for any datatype
...
- Introduced concept of converters to be able to serde generic datatype for SpillableMap
- Fixed/Added configs to Hoodie Configs
- Changed HoodieMergeHandle to start using SpillableMap
2018-03-28 07:56:07 -07:00
Xavier Jodoin
fa787ab5ab
Replace deprecated jackson version
2018-03-27 14:27:20 -07:00
Nishith Agarwal
1b756db221
Adding config for parquet compression ratio
2018-03-25 22:17:36 -07:00
Jian Xu
48643795b8
Checking storage level before persisting preppedRecords
2018-03-22 22:15:52 -07:00
Kaushik Devarajaiah
291a88ba94
DeduplicateRecords based on recordKey if global index is used
2018-03-22 09:15:44 -07:00
Nishith Agarwal
123da020e2
- Fixing memory leak due to HoodieLogFileReader holding on to a logblock
...
- Removed inMemory HashMap usage in merge(..) code in LogScanner
2018-03-16 12:43:31 -07:00
Jian Xu
d3df32fa03
Add back UseTempFolder changes in HoodieMergeHandle
2018-03-15 17:11:15 -07:00
Omkar Joshi
c5b4cb1b75
Spawning parallel writer thread to separate reading records from spark and writing records to parquet file
2018-03-15 16:58:14 -07:00
Nishith Agarwal
9dff8c2326
Adding a tool to read/inspect a HoodieLogFile
2018-03-15 16:48:28 -07:00
Jian Xu
ba7c258c61
Add more options in HoodieWriteConfig
2018-03-13 23:26:36 -07:00
Jian Xu
7f079632a6
Use hadoopConf in HoodieTableMetaClient and related tests
2018-03-12 11:47:55 -07:00
Vinoth Chandar
73534d467f
[maven-release-plugin] prepare for next development iteration
2018-03-07 21:04:10 -08:00
Vinoth Chandar
f2e5c6f9f8
[maven-release-plugin] prepare release hoodie-0.4.1
2018-03-07 21:04:00 -08:00
Nishith Agarwal
0eaa21111a
Re-factoring Compaction as first level API in WriteClient similar to upsert/insert
2018-03-07 16:16:39 -08:00
Nishith Agarwal
5405a6287b
Introducing HoodieLogFormat V2 with versioning support
...
- HoodieLogFormat V2 has support for LogFormat evolution through versioning
- LogVersion is associated with a LogBlock not a LogFile
- Based on a version for a LogBlock, approporiate code path is executed
- Implemented LazyReading of Hoodie Log Blocks with Memory / IO tradeoff
- Implemented Reverse pointer to be able to traverse the log in reverse
- Introduce new MAGIC for backwards compatibility with logs without versions
2018-03-06 21:14:11 -08:00
Jian Xu
dfd1979c51
Handle inflight clean instants during Hoodie instants archiving
2018-03-05 15:01:58 -08:00
Jian Xu
5d5c306e64
Add new APIs in HoodieReadClient and HoodieWriteClient
2018-02-28 13:58:12 -08:00