1
0
Commit Graph

529 Commits

Author SHA1 Message Date
Balaji Varadarajan
1b61f04e05 (1) Define CompactionWorkload in avro to allow storing them in instant files.
(2) Split APIs in HoodieRealtimeCompactor to separate generating compaction workload from running compaction
2018-08-07 08:19:50 -07:00
Balaji Varadarajan
6d01ae8ca0 FileSystemView and Timeline level changes to support Async Compaction 2018-08-07 08:19:50 -07:00
Nishith Agarwal
44caf0d40c Fixing missing hoodie record location in HoodieRecord when record is read from disk after being spilled 2018-07-18 12:53:35 -07:00
Omkar Joshi
f62890ca1f adding setters so that subclasses can set it 2018-07-18 12:53:11 -07:00
Nishith Agarwal
34ab54a9d3 Fixing bug introducted in rollback for MOR table type with inserts into log files 2018-07-17 17:20:34 -07:00
Nishith Agarwal
a6fe96fdfe Changing Day based compaction strategy to be IO agnostic 2018-06-18 15:22:56 -07:00
Nishith Agarwal
3da063f83b Adding ability for inserts to be written to log files 2018-06-11 14:08:59 -07:00
Vinoth Chandar
34827d50e1 [maven-release-plugin] prepare for next development iteration 2018-06-11 08:59:13 -07:00
Vinoth Chandar
43ef385730 [maven-release-plugin] prepare release hoodie-0.4.2 2018-06-11 08:59:02 -07:00
vinoth chandar
4f76f2899e Update Release notes for 0.4.2 release 2018-06-11 08:41:11 -07:00
Xavier Jodoin
8ad8030f2a Fix wrong use of TemporaryFolder junit rule 2018-06-10 23:31:42 -07:00
vinothchandar
8f1d362015 Fixing deps & serialization for RTView
- hoodie-hadoop-mr now needs objectsize bundled
 - Also updated docs with additional tuning tips
2018-06-10 19:16:44 -07:00
Vinoth Chandar
85dd265b7b Improving out of box experience for data source
- Fixes #246
 - Bump up default parallelism to 1500, to handle large upserts
 - Add docs on s3 confuration & tuning tips with tested spark knobs
 - Fix bug to not duplicate hoodie metadata fields when input dataframe is another hoodie dataset
 - Improve speed of ROTablePathFilter by removing directory check
 - Move to spark-avro 4.0 to handle issue with nested fields with same name
 - Keep AvroConversionUtils in sync with spark-avro 4.0
2018-06-10 19:16:44 -07:00
Sunil Ramaiah
a97814462d Added a filter function to filter the record keys in a parquet file 2018-05-17 19:01:11 -07:00
Nishith Agarwal
23d53763c4 enabling global index for MOR 2018-05-16 10:36:25 -07:00
Balaji Varadarajan
dfc0c61eb7 Support union mode in HoodieRealtimeRecordReader for pure insert workloads
Also Replace BufferedIteratorPayload abstraction with function passing
2018-05-10 17:39:56 -07:00
Nishith Agarwal
93f345a032 Minor fixes for MergeOnRead MVP release readiness 2018-05-09 07:23:58 -07:00
Nishith Agarwal
75df72f575 Adding a fix/workaround when fs.append() unable to return a valid outputstream 2018-05-08 18:46:17 -07:00
Nishith Agarwal
04655e9e85 Adding metrics for MOR and COW 2018-04-26 09:32:45 -07:00
Balaji Varadarajan
c66004d79a Add Support for ordering and limiting results in CLI show commands 2018-04-26 09:30:05 -07:00
Sunil Ramaiah
b9b9b24993 Added more comments and removed the extra new lines 2018-04-25 13:09:15 -07:00
Sunil Ramaiah
4d1fba24c9 Fix for updating duplicate records in same/different files in same parition 2018-04-25 13:09:15 -07:00
vinoth chandar
fa73a911cc Update Gemfile.lock 2018-04-19 14:20:50 -07:00
Nishith Agarwal
c3c205fc02 Using BufferedFsInputStream to wrap FSInputStream for FSDataInputStream 2018-04-18 08:05:19 -07:00
Nishith Agarwal
720e42f52a Parallelized read-write operations in Hoodie Merge phase 2018-04-12 11:46:42 -07:00
Balaji Varadarajan
6c226ca21a Issue-329 : Refactoring TestHoodieClientOnCopyOnWriteStorage and adding test-cases 2018-04-09 16:34:58 -07:00
Vinoth Chandar
a4049329a5 Update release notes for 0.4.1 (post) 2018-04-02 09:31:01 -07:00
Balaji Varadarajan
788e4f2d2e CodeStyle formatting to conform to basic Checkstyle rules.
The code-style rules follow google style with some changes:

1. Increase line length from 100 to 120
2. Disable JavaDoc related checkstyles as this needs more manual work.

Both source and test code are checked for code-style
2018-03-30 11:09:40 -07:00
Nishith Agarwal
987f5d6b96 Making ExternalSpillableMap generic for any datatype
- Introduced concept of converters to be able to serde generic datatype for SpillableMap
	- Fixed/Added configs to Hoodie Configs
	- Changed HoodieMergeHandle to start using SpillableMap
2018-03-28 07:56:07 -07:00
Xavier Jodoin
fa787ab5ab Replace deprecated jackson version 2018-03-27 14:27:20 -07:00
Nishith Agarwal
1b756db221 Adding config for parquet compression ratio 2018-03-25 22:17:36 -07:00
Jian Xu
48643795b8 Checking storage level before persisting preppedRecords 2018-03-22 22:15:52 -07:00
Kaushik Devarajaiah
291a88ba94 DeduplicateRecords based on recordKey if global index is used 2018-03-22 09:15:44 -07:00
Nishith Agarwal
123da020e2 - Fixing memory leak due to HoodieLogFileReader holding on to a logblock
- Removed inMemory HashMap usage in merge(..) code in LogScanner
2018-03-16 12:43:31 -07:00
Jian Xu
d3df32fa03 Add back UseTempFolder changes in HoodieMergeHandle 2018-03-15 17:11:15 -07:00
Omkar Joshi
c5b4cb1b75 Spawning parallel writer thread to separate reading records from spark and writing records to parquet file 2018-03-15 16:58:14 -07:00
Nishith Agarwal
9dff8c2326 Adding a tool to read/inspect a HoodieLogFile 2018-03-15 16:48:28 -07:00
Jian Xu
ba7c258c61 Add more options in HoodieWriteConfig 2018-03-13 23:26:36 -07:00
Jian Xu
7f079632a6 Use hadoopConf in HoodieTableMetaClient and related tests 2018-03-12 11:47:55 -07:00
Vinoth Chandar
73534d467f [maven-release-plugin] prepare for next development iteration 2018-03-07 21:04:10 -08:00
Vinoth Chandar
f2e5c6f9f8 [maven-release-plugin] prepare release hoodie-0.4.1 2018-03-07 21:04:00 -08:00
Nishith Agarwal
0eaa21111a Re-factoring Compaction as first level API in WriteClient similar to upsert/insert 2018-03-07 16:16:39 -08:00
Nishith Agarwal
5405a6287b Introducing HoodieLogFormat V2 with versioning support
- HoodieLogFormat V2 has support for LogFormat evolution through versioning
			- LogVersion is associated with a LogBlock not a LogFile
			- Based on a version for a LogBlock, approporiate code path is executed
		- Implemented LazyReading of Hoodie Log Blocks with Memory / IO tradeoff
		- Implemented Reverse pointer to be able to traverse the log in reverse
		- Introduce new MAGIC for backwards compatibility with logs without versions
2018-03-06 21:14:11 -08:00
Jian Xu
dfd1979c51 Handle inflight clean instants during Hoodie instants archiving 2018-03-05 15:01:58 -08:00
Jian Xu
5d5c306e64 Add new APIs in HoodieReadClient and HoodieWriteClient 2018-02-28 13:58:12 -08:00
Nishith Agarwal
6fec9655a8 Added support for Disk Spillable Compaction to prevent OOM issues 2018-02-26 16:00:35 -08:00
Nishith Agarwal
d495484399 Write smaller sized multiple blocks to log file instead of a large one
- Use SizeEstimator to size number of records to write
	- Configurable block size
   	- Configurable log file size
2018-02-23 07:31:39 -08:00
Vinoth Chandar
eb3d0c470f Fix formatting in HoodieWriteClient 2018-02-14 10:03:20 -08:00
Jian Xu
3bdd750982 Use FastDateFormat for thread safety
Use FastDateFormat for thread safety, this is to fix an exception when a
job is used to ingest multiple tables.  An example exception:
```
Caused by: java.lang.NumberFormatException: multiple points
        at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1890)
        at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
        at java.lang.Double.parseDouble(Double.java:538)
        at java.text.DigitList.getDouble(DigitList.java:169)
        at java.text.DecimalFormat.parse(DecimalFormat.java:2056)
        at java.text.SimpleDateFormat.subParse(SimpleDateFormat.java:1867)
        at java.text.SimpleDateFormat.parse(SimpleDateFormat.java:1514)
        at java.text.DateFormat.parse(DateFormat.java:364)
        at com.uber.hoodie.HoodieWriteClient.commit(HoodieWriteClient.java:442)
```
2018-02-12 11:43:57 -08:00
Nishith Agarwal
7076c2e9f0 refactor classes to accept Map passed by RealtimeCompactor to avoid multiple map creations in HoodieMergeHandle 2018-02-07 11:16:01 -08:00