Nishith Agarwal
7243ce40c9
Serializing the complete payload object instead of serializing just the GenericRecord
...
Removing Converter hierarchy as we now depend purely on JavaSerialization and require the payload to be java serializable
2018-12-04 11:43:41 -08:00
Vinoth Chandar
fa65db9c4c
Explicitly handle lack of append() support during LogWriting
2018-11-27 17:58:43 -08:00
Balaji Varadarajan
25cd05b24e
Useful Hudi CLI commands to debug/analyze production workloads
2018-10-30 10:28:01 -07:00
Balaji Varadarajan
07324e7a20
Compaction validate, unschedule and repair
2018-10-25 14:12:47 -07:00
Xinli shang
d904fe69ca
Fix addMetadataFields() to carry over 'props'
2018-10-24 10:55:13 -07:00
Balaji Varadarajan
9710b5a3a6
Ensure Hoodie metadata folder and files are filtered out when constructing Parquet Data Source
2018-10-01 14:27:14 +05:30
jiale.tan
98fd97b65f
feature(HoodieGlobalBloomIndex): adds a new type of bloom index to allow global record key lookup
2018-09-29 19:55:20 +05:30
vinothchandar
7ba842c0fe
[maven-release-plugin] prepare for next development iteration
2018-09-28 11:27:00 +05:30
vinothchandar
5847b61f44
[maven-release-plugin] prepare release hoodie-0.4.4
2018-09-28 11:26:15 +05:30
vinothchandar
9ca6f91e97
Perform consistency checks during write finalize
...
- Check to ensure written files are listable on storage
- Docs reflected to capture how this helps with s3 storage
- Unit tests added, corrections to existing tests
- Fix DeltaStreamer to manage archived commits in a separate folder
2018-09-28 08:04:41 +05:30
Balaji Varadarajan
4c74dd4cad
Travis CI tests needs to be run in quieter mode (WARN log level) to avoid max log-size errors
2018-09-26 21:10:20 +05:30
Yishuang Lu
faf93b6340
Fix the name of avro schema file in Test
...
Fixed the name of avro schema file in Test
Signed-off-by: Yishuang Lu <luystu@gmail.com >
2018-09-24 21:58:34 +05:30
Balaji Varadarajan
5cb28e7b1f
Explicitly release resources in LogFileReader and TestHoodieClientBase
2018-09-20 13:24:57 +05:30
Balaji Varadarajan
2728f96505
Add dummy classes to dump all classes loaded as part of packaging modules to ensure javadoc and sources jars are getting created
2018-09-18 09:24:33 +05:30
Vinoth Chandar
bd5af89f12
[maven-release-plugin] rollback the release of hoodie-0.4.4
2018-09-13 15:01:53 +05:30
Vinoth Chandar
d1cc864a43
[maven-release-plugin] prepare for next development iteration
2018-09-12 23:59:47 +05:30
Vinoth Chandar
b748bc836d
[maven-release-plugin] prepare release hoodie-0.4.4
2018-09-12 23:59:34 +05:30
Vinoth Chandar
a5359662be
Moving depedencies off cdh to apache + Hive2 support
...
- Tests redone in the process
- Main changes are to RealtimeRecordReader and how it treats maps/arrays
- Make hive sync work with Hive 1/2 and CDH environments
- Fixes to make corner cases for Hive queries
- Spark Hive integration - Working version across Apache and CDH versions
- Known Issue - https://github.com/uber/hudi/issues/439
2018-09-11 11:03:30 +05:30
Vinoth Chandar
d58ddbd999
Reworking the deltastreamer tool
...
- Standardize version of jackson
- DFSPropertiesConfiguration replaces usage of commons PropertiesConfiguration
- Remove dependency on ConstructorUtils
- Throw error if ordering value is not present, during key generation
- Switch to shade plugin for hoodie-utilities
- Added support for consumption for Confluent avro kafka serdes
- Support for Confluent schema registry
- KafkaSource now deals with skews nicely, by doing round robin allocation of source limit across partitions
- Added support for BULK_INSERT operations as well
- Pass in the payload class config properly into HoodieWriteClient
- Fix documentation based on new usage
- Adding tests on deltastreamer, sources and all new util classes.
2018-09-08 10:24:32 +08:00
Balaji Varadarajan
fb95dbdedb
CLI to create and desc hoodie table
2018-09-08 10:03:38 +08:00
Nishith Agarwal
459e523d9e
1. Small file size handling for inserts into log files. In summary, the total size of the log file is compared with the parquet max file size and if there is scope to add inserts the add it.
2018-09-06 08:52:08 +08:00
Nishith Agarwal
324de298bc
Removing dependency on apache-commons lang 3, adding necessary classes as needed
2018-09-06 08:26:48 +08:00
Vinoth Chandar
89cd6b0726
[maven-release-plugin] prepare for next development iteration
2018-08-22 21:30:05 -07:00
Vinoth Chandar
8d305c5a86
[maven-release-plugin] prepare release hoodie-0.4.3
2018-08-22 21:29:53 -07:00
Nishith Agarwal
3746ace76a
Fixing Null pointer exception in finally block
2018-08-21 21:07:53 -07:00
Nishith Agarwal
88274b8261
Adding another metric to HoodieWriteStat to determine if there were inserts converted to updates, added one test for this
2018-08-14 06:22:16 -07:00
Balaji Varadarajan
989afddd54
BUGFIX - Use Guava Optional (which is Serializable) in CompactionOperation wcached to avoid NoSerializableException
2018-08-08 06:00:55 -07:00
Balaji Varadarajan
ea23c9b7a0
Minor bug fixes found during testing
2018-08-07 08:19:50 -07:00
Balaji Varadarajan
594059a19c
Add CLI support inspect, schedule and run compaction
2018-08-07 08:19:50 -07:00
Balaji Varadarajan
2e12c86d01
Ensure Compaction Operation compacts the data file as defined in the workload
2018-08-07 08:19:50 -07:00
Balaji Varadarajan
2f8ce93030
Async Compaction Main API changes
2018-08-07 08:19:50 -07:00
Balaji Varadarajan
9b78523d62
Ensure Cleaner and Archiver do not delete file-slices and workload marked for compaction
2018-08-07 08:19:50 -07:00
Balaji Varadarajan
0a0451a765
Ensure Compaction workload is stored in write-once meta-data files separate from timeline files.
...
This avoids concurrency issues when compactor(s) and ingestor are running in parallel.
In the Next PR -> Safety concern regarding Cleaner retaining all meta-data and file-slices for pending compactions will be addressed
2018-08-07 08:19:50 -07:00
Balaji Varadarajan
9d99942564
Track fileIds with pending compaction in FileSystemView to provide correct API semantics
2018-08-07 08:19:50 -07:00
Balaji Varadarajan
1b61f04e05
(1) Define CompactionWorkload in avro to allow storing them in instant files.
...
(2) Split APIs in HoodieRealtimeCompactor to separate generating compaction workload from running compaction
2018-08-07 08:19:50 -07:00
Balaji Varadarajan
6d01ae8ca0
FileSystemView and Timeline level changes to support Async Compaction
2018-08-07 08:19:50 -07:00
Nishith Agarwal
44caf0d40c
Fixing missing hoodie record location in HoodieRecord when record is read from disk after being spilled
2018-07-18 12:53:35 -07:00
Nishith Agarwal
34ab54a9d3
Fixing bug introducted in rollback for MOR table type with inserts into log files
2018-07-17 17:20:34 -07:00
Nishith Agarwal
3da063f83b
Adding ability for inserts to be written to log files
2018-06-11 14:08:59 -07:00
Vinoth Chandar
34827d50e1
[maven-release-plugin] prepare for next development iteration
2018-06-11 08:59:13 -07:00
Vinoth Chandar
43ef385730
[maven-release-plugin] prepare release hoodie-0.4.2
2018-06-11 08:59:02 -07:00
Xavier Jodoin
8ad8030f2a
Fix wrong use of TemporaryFolder junit rule
2018-06-10 23:31:42 -07:00
Vinoth Chandar
85dd265b7b
Improving out of box experience for data source
...
- Fixes #246
- Bump up default parallelism to 1500, to handle large upserts
- Add docs on s3 confuration & tuning tips with tested spark knobs
- Fix bug to not duplicate hoodie metadata fields when input dataframe is another hoodie dataset
- Improve speed of ROTablePathFilter by removing directory check
- Move to spark-avro 4.0 to handle issue with nested fields with same name
- Keep AvroConversionUtils in sync with spark-avro 4.0
2018-06-10 19:16:44 -07:00
Sunil Ramaiah
a97814462d
Added a filter function to filter the record keys in a parquet file
2018-05-17 19:01:11 -07:00
Balaji Varadarajan
dfc0c61eb7
Support union mode in HoodieRealtimeRecordReader for pure insert workloads
...
Also Replace BufferedIteratorPayload abstraction with function passing
2018-05-10 17:39:56 -07:00
Nishith Agarwal
93f345a032
Minor fixes for MergeOnRead MVP release readiness
2018-05-09 07:23:58 -07:00
Nishith Agarwal
75df72f575
Adding a fix/workaround when fs.append() unable to return a valid outputstream
2018-05-08 18:46:17 -07:00
Nishith Agarwal
04655e9e85
Adding metrics for MOR and COW
2018-04-26 09:32:45 -07:00
Nishith Agarwal
c3c205fc02
Using BufferedFsInputStream to wrap FSInputStream for FSDataInputStream
2018-04-18 08:05:19 -07:00
Nishith Agarwal
720e42f52a
Parallelized read-write operations in Hoodie Merge phase
2018-04-12 11:46:42 -07:00