1
0
Commit Graph

473 Commits

Author SHA1 Message Date
Nishith Agarwal
3da063f83b Adding ability for inserts to be written to log files 2018-06-11 14:08:59 -07:00
Vinoth Chandar
34827d50e1 [maven-release-plugin] prepare for next development iteration 2018-06-11 08:59:13 -07:00
Vinoth Chandar
43ef385730 [maven-release-plugin] prepare release hoodie-0.4.2 2018-06-11 08:59:02 -07:00
vinoth chandar
4f76f2899e Update Release notes for 0.4.2 release 2018-06-11 08:41:11 -07:00
Xavier Jodoin
8ad8030f2a Fix wrong use of TemporaryFolder junit rule 2018-06-10 23:31:42 -07:00
vinothchandar
8f1d362015 Fixing deps & serialization for RTView
- hoodie-hadoop-mr now needs objectsize bundled
 - Also updated docs with additional tuning tips
2018-06-10 19:16:44 -07:00
Vinoth Chandar
85dd265b7b Improving out of box experience for data source
- Fixes #246
 - Bump up default parallelism to 1500, to handle large upserts
 - Add docs on s3 confuration & tuning tips with tested spark knobs
 - Fix bug to not duplicate hoodie metadata fields when input dataframe is another hoodie dataset
 - Improve speed of ROTablePathFilter by removing directory check
 - Move to spark-avro 4.0 to handle issue with nested fields with same name
 - Keep AvroConversionUtils in sync with spark-avro 4.0
2018-06-10 19:16:44 -07:00
Sunil Ramaiah
a97814462d Added a filter function to filter the record keys in a parquet file 2018-05-17 19:01:11 -07:00
Nishith Agarwal
23d53763c4 enabling global index for MOR 2018-05-16 10:36:25 -07:00
Balaji Varadarajan
dfc0c61eb7 Support union mode in HoodieRealtimeRecordReader for pure insert workloads
Also Replace BufferedIteratorPayload abstraction with function passing
2018-05-10 17:39:56 -07:00
Nishith Agarwal
93f345a032 Minor fixes for MergeOnRead MVP release readiness 2018-05-09 07:23:58 -07:00
Nishith Agarwal
75df72f575 Adding a fix/workaround when fs.append() unable to return a valid outputstream 2018-05-08 18:46:17 -07:00
Nishith Agarwal
04655e9e85 Adding metrics for MOR and COW 2018-04-26 09:32:45 -07:00
Balaji Varadarajan
c66004d79a Add Support for ordering and limiting results in CLI show commands 2018-04-26 09:30:05 -07:00
Sunil Ramaiah
b9b9b24993 Added more comments and removed the extra new lines 2018-04-25 13:09:15 -07:00
Sunil Ramaiah
4d1fba24c9 Fix for updating duplicate records in same/different files in same parition 2018-04-25 13:09:15 -07:00
vinoth chandar
fa73a911cc Update Gemfile.lock 2018-04-19 14:20:50 -07:00
Nishith Agarwal
c3c205fc02 Using BufferedFsInputStream to wrap FSInputStream for FSDataInputStream 2018-04-18 08:05:19 -07:00
Nishith Agarwal
720e42f52a Parallelized read-write operations in Hoodie Merge phase 2018-04-12 11:46:42 -07:00
Balaji Varadarajan
6c226ca21a Issue-329 : Refactoring TestHoodieClientOnCopyOnWriteStorage and adding test-cases 2018-04-09 16:34:58 -07:00
Vinoth Chandar
a4049329a5 Update release notes for 0.4.1 (post) 2018-04-02 09:31:01 -07:00
Balaji Varadarajan
788e4f2d2e CodeStyle formatting to conform to basic Checkstyle rules.
The code-style rules follow google style with some changes:

1. Increase line length from 100 to 120
2. Disable JavaDoc related checkstyles as this needs more manual work.

Both source and test code are checked for code-style
2018-03-30 11:09:40 -07:00
Nishith Agarwal
987f5d6b96 Making ExternalSpillableMap generic for any datatype
- Introduced concept of converters to be able to serde generic datatype for SpillableMap
	- Fixed/Added configs to Hoodie Configs
	- Changed HoodieMergeHandle to start using SpillableMap
2018-03-28 07:56:07 -07:00
Xavier Jodoin
fa787ab5ab Replace deprecated jackson version 2018-03-27 14:27:20 -07:00
Nishith Agarwal
1b756db221 Adding config for parquet compression ratio 2018-03-25 22:17:36 -07:00
Jian Xu
48643795b8 Checking storage level before persisting preppedRecords 2018-03-22 22:15:52 -07:00
Kaushik Devarajaiah
291a88ba94 DeduplicateRecords based on recordKey if global index is used 2018-03-22 09:15:44 -07:00
Nishith Agarwal
123da020e2 - Fixing memory leak due to HoodieLogFileReader holding on to a logblock
- Removed inMemory HashMap usage in merge(..) code in LogScanner
2018-03-16 12:43:31 -07:00
Jian Xu
d3df32fa03 Add back UseTempFolder changes in HoodieMergeHandle 2018-03-15 17:11:15 -07:00
Omkar Joshi
c5b4cb1b75 Spawning parallel writer thread to separate reading records from spark and writing records to parquet file 2018-03-15 16:58:14 -07:00
Nishith Agarwal
9dff8c2326 Adding a tool to read/inspect a HoodieLogFile 2018-03-15 16:48:28 -07:00
Jian Xu
ba7c258c61 Add more options in HoodieWriteConfig 2018-03-13 23:26:36 -07:00
Jian Xu
7f079632a6 Use hadoopConf in HoodieTableMetaClient and related tests 2018-03-12 11:47:55 -07:00
Vinoth Chandar
73534d467f [maven-release-plugin] prepare for next development iteration 2018-03-07 21:04:10 -08:00
Vinoth Chandar
f2e5c6f9f8 [maven-release-plugin] prepare release hoodie-0.4.1 2018-03-07 21:04:00 -08:00
Nishith Agarwal
0eaa21111a Re-factoring Compaction as first level API in WriteClient similar to upsert/insert 2018-03-07 16:16:39 -08:00
Nishith Agarwal
5405a6287b Introducing HoodieLogFormat V2 with versioning support
- HoodieLogFormat V2 has support for LogFormat evolution through versioning
			- LogVersion is associated with a LogBlock not a LogFile
			- Based on a version for a LogBlock, approporiate code path is executed
		- Implemented LazyReading of Hoodie Log Blocks with Memory / IO tradeoff
		- Implemented Reverse pointer to be able to traverse the log in reverse
		- Introduce new MAGIC for backwards compatibility with logs without versions
2018-03-06 21:14:11 -08:00
Jian Xu
dfd1979c51 Handle inflight clean instants during Hoodie instants archiving 2018-03-05 15:01:58 -08:00
Jian Xu
5d5c306e64 Add new APIs in HoodieReadClient and HoodieWriteClient 2018-02-28 13:58:12 -08:00
Nishith Agarwal
6fec9655a8 Added support for Disk Spillable Compaction to prevent OOM issues 2018-02-26 16:00:35 -08:00
Nishith Agarwal
d495484399 Write smaller sized multiple blocks to log file instead of a large one
- Use SizeEstimator to size number of records to write
	- Configurable block size
   	- Configurable log file size
2018-02-23 07:31:39 -08:00
Vinoth Chandar
eb3d0c470f Fix formatting in HoodieWriteClient 2018-02-14 10:03:20 -08:00
Jian Xu
3bdd750982 Use FastDateFormat for thread safety
Use FastDateFormat for thread safety, this is to fix an exception when a
job is used to ingest multiple tables.  An example exception:
```
Caused by: java.lang.NumberFormatException: multiple points
        at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1890)
        at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
        at java.lang.Double.parseDouble(Double.java:538)
        at java.text.DigitList.getDouble(DigitList.java:169)
        at java.text.DecimalFormat.parse(DecimalFormat.java:2056)
        at java.text.SimpleDateFormat.subParse(SimpleDateFormat.java:1867)
        at java.text.SimpleDateFormat.parse(SimpleDateFormat.java:1514)
        at java.text.DateFormat.parse(DateFormat.java:364)
        at com.uber.hoodie.HoodieWriteClient.commit(HoodieWriteClient.java:442)
```
2018-02-12 11:43:57 -08:00
Nishith Agarwal
7076c2e9f0 refactor classes to accept Map passed by RealtimeCompactor to avoid multiple map creations in HoodieMergeHandle 2018-02-07 11:16:01 -08:00
Nishith Agarwal
30049383f5 Small File Size correction handling for MOR table type 2018-02-07 11:01:10 -08:00
Nishith Agarwal
2116815261 Fixing Rollback for compaction/commit operation, added check for null commit
- Fallback to old way of rollback by listing all partitions
	- Added null check to ensure only partitions which are to be rolledback are considered
	- Added location (committime) to workload stat
	- Added checks in CompactedScanner to guard against task retries
	- Introduce new logic for rollback (bounded by instant_time and target_instant time)
        - Reversed logfiles order
2018-02-06 16:55:23 -08:00
Nishith Agarwal
be0b1f3e57 Adding global indexing to HbaseIndex implementation
- Adding tests or HbaseIndex
	- Enabling global index functionality
2018-02-05 15:21:22 -08:00
Jian Xu
15e669c60c Incorporating code review feedback for finalizeWrite for COW #4 2018-02-02 11:38:25 -08:00
Jian Xu
3736243fb3 Rebases with latest upstream 2018-02-02 11:38:25 -08:00
Jian Xu
363e35bb0f Add finalizeWrite support for HoodieMergeHandle 2018-02-02 11:38:25 -08:00