Vinoth Chandar
89cd6b0726
[maven-release-plugin] prepare for next development iteration
2018-08-22 21:30:05 -07:00
Vinoth Chandar
8d305c5a86
[maven-release-plugin] prepare release hoodie-0.4.3
2018-08-22 21:29:53 -07:00
Vinoth Chandar
6fffda5c70
Update Release notes for 0.4.3 release
2018-08-22 21:11:43 -07:00
Kaushik Devarajaiah
e624480259
Throttling to limit QPS from HbaseIndex
2018-08-21 21:10:38 -07:00
Nishith Agarwal
3746ace76a
Fixing Null pointer exception in finally block
2018-08-21 21:07:53 -07:00
Nishith Agarwal
88274b8261
Adding another metric to HoodieWriteStat to determine if there were inserts converted to updates, added one test for this
2018-08-14 06:22:16 -07:00
Balaji Varadarajan
989afddd54
BUGFIX - Use Guava Optional (which is Serializable) in CompactionOperation wcached to avoid NoSerializableException
2018-08-08 06:00:55 -07:00
Balaji Varadarajan
ea23c9b7a0
Minor bug fixes found during testing
2018-08-07 08:19:50 -07:00
Balaji Varadarajan
594059a19c
Add CLI support inspect, schedule and run compaction
2018-08-07 08:19:50 -07:00
Balaji Varadarajan
2e12c86d01
Ensure Compaction Operation compacts the data file as defined in the workload
2018-08-07 08:19:50 -07:00
Balaji Varadarajan
2f8ce93030
Async Compaction Main API changes
2018-08-07 08:19:50 -07:00
Balaji Varadarajan
9b78523d62
Ensure Cleaner and Archiver do not delete file-slices and workload marked for compaction
2018-08-07 08:19:50 -07:00
Balaji Varadarajan
0a0451a765
Ensure Compaction workload is stored in write-once meta-data files separate from timeline files.
...
This avoids concurrency issues when compactor(s) and ingestor are running in parallel.
In the Next PR -> Safety concern regarding Cleaner retaining all meta-data and file-slices for pending compactions will be addressed
2018-08-07 08:19:50 -07:00
Balaji Varadarajan
9d99942564
Track fileIds with pending compaction in FileSystemView to provide correct API semantics
2018-08-07 08:19:50 -07:00
Balaji Varadarajan
1b61f04e05
(1) Define CompactionWorkload in avro to allow storing them in instant files.
...
(2) Split APIs in HoodieRealtimeCompactor to separate generating compaction workload from running compaction
2018-08-07 08:19:50 -07:00
Balaji Varadarajan
6d01ae8ca0
FileSystemView and Timeline level changes to support Async Compaction
2018-08-07 08:19:50 -07:00
Nishith Agarwal
44caf0d40c
Fixing missing hoodie record location in HoodieRecord when record is read from disk after being spilled
2018-07-18 12:53:35 -07:00
Omkar Joshi
f62890ca1f
adding setters so that subclasses can set it
2018-07-18 12:53:11 -07:00
Nishith Agarwal
34ab54a9d3
Fixing bug introducted in rollback for MOR table type with inserts into log files
2018-07-17 17:20:34 -07:00
Nishith Agarwal
a6fe96fdfe
Changing Day based compaction strategy to be IO agnostic
2018-06-18 15:22:56 -07:00
Nishith Agarwal
3da063f83b
Adding ability for inserts to be written to log files
2018-06-11 14:08:59 -07:00
Vinoth Chandar
34827d50e1
[maven-release-plugin] prepare for next development iteration
2018-06-11 08:59:13 -07:00
Vinoth Chandar
43ef385730
[maven-release-plugin] prepare release hoodie-0.4.2
2018-06-11 08:59:02 -07:00
vinoth chandar
4f76f2899e
Update Release notes for 0.4.2 release
2018-06-11 08:41:11 -07:00
Xavier Jodoin
8ad8030f2a
Fix wrong use of TemporaryFolder junit rule
2018-06-10 23:31:42 -07:00
vinothchandar
8f1d362015
Fixing deps & serialization for RTView
...
- hoodie-hadoop-mr now needs objectsize bundled
- Also updated docs with additional tuning tips
2018-06-10 19:16:44 -07:00
Vinoth Chandar
85dd265b7b
Improving out of box experience for data source
...
- Fixes #246
- Bump up default parallelism to 1500, to handle large upserts
- Add docs on s3 confuration & tuning tips with tested spark knobs
- Fix bug to not duplicate hoodie metadata fields when input dataframe is another hoodie dataset
- Improve speed of ROTablePathFilter by removing directory check
- Move to spark-avro 4.0 to handle issue with nested fields with same name
- Keep AvroConversionUtils in sync with spark-avro 4.0
2018-06-10 19:16:44 -07:00
Sunil Ramaiah
a97814462d
Added a filter function to filter the record keys in a parquet file
2018-05-17 19:01:11 -07:00
Nishith Agarwal
23d53763c4
enabling global index for MOR
2018-05-16 10:36:25 -07:00
Balaji Varadarajan
dfc0c61eb7
Support union mode in HoodieRealtimeRecordReader for pure insert workloads
...
Also Replace BufferedIteratorPayload abstraction with function passing
2018-05-10 17:39:56 -07:00
Nishith Agarwal
93f345a032
Minor fixes for MergeOnRead MVP release readiness
2018-05-09 07:23:58 -07:00
Nishith Agarwal
75df72f575
Adding a fix/workaround when fs.append() unable to return a valid outputstream
2018-05-08 18:46:17 -07:00
Nishith Agarwal
04655e9e85
Adding metrics for MOR and COW
2018-04-26 09:32:45 -07:00
Balaji Varadarajan
c66004d79a
Add Support for ordering and limiting results in CLI show commands
2018-04-26 09:30:05 -07:00
Sunil Ramaiah
b9b9b24993
Added more comments and removed the extra new lines
2018-04-25 13:09:15 -07:00
Sunil Ramaiah
4d1fba24c9
Fix for updating duplicate records in same/different files in same parition
2018-04-25 13:09:15 -07:00
vinoth chandar
fa73a911cc
Update Gemfile.lock
2018-04-19 14:20:50 -07:00
Nishith Agarwal
c3c205fc02
Using BufferedFsInputStream to wrap FSInputStream for FSDataInputStream
2018-04-18 08:05:19 -07:00
Nishith Agarwal
720e42f52a
Parallelized read-write operations in Hoodie Merge phase
2018-04-12 11:46:42 -07:00
Balaji Varadarajan
6c226ca21a
Issue-329 : Refactoring TestHoodieClientOnCopyOnWriteStorage and adding test-cases
2018-04-09 16:34:58 -07:00
Vinoth Chandar
a4049329a5
Update release notes for 0.4.1 (post)
2018-04-02 09:31:01 -07:00
Balaji Varadarajan
788e4f2d2e
CodeStyle formatting to conform to basic Checkstyle rules.
...
The code-style rules follow google style with some changes:
1. Increase line length from 100 to 120
2. Disable JavaDoc related checkstyles as this needs more manual work.
Both source and test code are checked for code-style
2018-03-30 11:09:40 -07:00
Nishith Agarwal
987f5d6b96
Making ExternalSpillableMap generic for any datatype
...
- Introduced concept of converters to be able to serde generic datatype for SpillableMap
- Fixed/Added configs to Hoodie Configs
- Changed HoodieMergeHandle to start using SpillableMap
2018-03-28 07:56:07 -07:00
Xavier Jodoin
fa787ab5ab
Replace deprecated jackson version
2018-03-27 14:27:20 -07:00
Nishith Agarwal
1b756db221
Adding config for parquet compression ratio
2018-03-25 22:17:36 -07:00
Jian Xu
48643795b8
Checking storage level before persisting preppedRecords
2018-03-22 22:15:52 -07:00
Kaushik Devarajaiah
291a88ba94
DeduplicateRecords based on recordKey if global index is used
2018-03-22 09:15:44 -07:00
Nishith Agarwal
123da020e2
- Fixing memory leak due to HoodieLogFileReader holding on to a logblock
...
- Removed inMemory HashMap usage in merge(..) code in LogScanner
2018-03-16 12:43:31 -07:00
Jian Xu
d3df32fa03
Add back UseTempFolder changes in HoodieMergeHandle
2018-03-15 17:11:15 -07:00
Omkar Joshi
c5b4cb1b75
Spawning parallel writer thread to separate reading records from spark and writing records to parquet file
2018-03-15 16:58:14 -07:00