1
0

Commit Graph

  • f3418e4718 Docker Container Build and Run setup with foundations for adding docker integration tests. Docker images built with Hadoop 2.8.4 Hive 2.3.3 and Spark 2.3.1 and published to docker-hub Look at quickstart document for how to setup docker and run demo Balaji Varadarajan 2018-08-21 22:54:57 -07:00
  • 9710b5a3a6 Ensure Hoodie metadata folder and files are filtered out when constructing Parquet Data Source Balaji Varadarajan 2018-09-28 21:41:28 -07:00
  • 06bdba3cef Update Gemfile.lock with newer jekyll version vinoth chandar 2018-09-29 20:50:03 +05:30
  • b5a75fdd91 Adding Jiale & Anbu to contributors list vinothchandar 2018-09-29 20:14:35 +05:30
  • 98fd97b65f feature(HoodieGlobalBloomIndex): adds a new type of bloom index to allow global record key lookup jiale.tan 2018-08-13 17:35:21 -07:00
  • 7ba842c0fe [maven-release-plugin] prepare for next development iteration vinothchandar 2018-09-28 11:27:00 +05:30
  • 5847b61f44 [maven-release-plugin] prepare release hoodie-0.4.4 vinothchandar 2018-09-28 11:26:15 +05:30
  • 05bf14a42e Update RELEASE_NOTES for release 0.4.4 vinothchandar 2018-09-28 11:03:08 +05:30
  • 9ca6f91e97 Perform consistency checks during write finalize vinothchandar 2018-09-20 17:50:27 +05:30
  • 4c74dd4cad Travis CI tests needs to be run in quieter mode (WARN log level) to avoid max log-size errors Balaji Varadarajan 2018-09-21 17:09:51 -07:00
  • faf93b6340 Fix the name of avro schema file in Test Yishuang Lu 2018-09-23 17:23:35 -07:00
  • 460e24e84b Hive Sync handling must work for datasets with multi-partition keys Balaji Varadarajan 2018-09-16 08:06:30 -07:00
  • 5cb28e7b1f Explicitly release resources in LogFileReader and TestHoodieClientBase Balaji Varadarajan 2018-09-19 13:13:04 -07:00
  • 2728f96505 Add dummy classes to dump all classes loaded as part of packaging modules to ensure javadoc and sources jars are getting created Balaji Varadarajan 2018-09-17 00:25:24 -07:00
  • f44bcc5b03 Fix bug with incrementally pulling older data Vinoth Chandar 2018-09-14 12:51:27 +05:30
  • bd5af89f12 [maven-release-plugin] rollback the release of hoodie-0.4.4 Vinoth Chandar 2018-09-13 15:01:53 +05:30
  • d1cc864a43 [maven-release-plugin] prepare for next development iteration Vinoth Chandar 2018-09-12 23:59:47 +05:30
  • b748bc836d [maven-release-plugin] prepare release hoodie-0.4.4 Vinoth Chandar 2018-09-12 23:59:34 +05:30
  • 0b1a949a87 Release notes for 0.4.4 Vinoth Chandar 2018-09-12 23:38:00 +05:30
  • cce88b36d2 Use spark Master from environment if set Balaji Varadarajan 2018-09-10 16:06:31 -07:00
  • 605af8a82f Reduce minimum delta-commits required for compaction Balaji Varadarajan 2018-09-10 10:59:59 -07:00
  • 18a39715c9 Bump up versions in packaging modules and remove commons-lang3 dep Balaji Varadarajan 2018-09-07 18:30:49 -07:00
  • eca49a255e Rebasing and fixing conflicts against master Vinoth Chandar 2018-09-08 03:52:49 +08:00
  • a5359662be Moving depedencies off cdh to apache + Hive2 support Vinoth Chandar 2018-07-15 22:34:02 -07:00
  • 2b1af18941 Adding check for rolling stats not present to handle backwards compatibility of existing timeline Nishith Agarwal 2018-09-08 12:26:54 -07:00
  • ea7823a9dd Docs for describing async compaction and how to operate it Balaji Varadarajan 2018-09-06 00:49:38 -07:00
  • d58ddbd999 Reworking the deltastreamer tool Vinoth Chandar 2018-08-04 03:35:30 -07:00
  • fb95dbdedb CLI to create and desc hoodie table Balaji Varadarajan 2018-09-06 10:24:32 -07:00
  • 0fe92dee55 Fix a failing test case intermittenly in TestMergeOnRead due to incorrect prev commit time Nishith Agarwal 2018-09-07 00:17:49 -07:00
  • e2dee68ccd Simplify and fix CLI to schedule and run compactions Balaji Varadarajan 2018-09-06 12:02:09 -07:00
  • fad4b513ea Update Gemfile.lock with higher ffi version vinoth chandar 2018-09-06 08:54:32 +08:00
  • 459e523d9e 1. Small file size handling for inserts into log files. In summary, the total size of the log file is compared with the parquet max file size and if there is scope to add inserts the add it. Nishith Agarwal 2018-06-11 20:27:56 -07:00
  • 324de298bc Removing dependency on apache-commons lang 3, adding necessary classes as needed Nishith Agarwal 2018-09-04 02:05:15 -07:00
  • 2eaa42abde Updated jcommander version to fix NPE in HoodieDeltaStreamer tool Saravanan Elumalai 2018-08-29 23:42:06 +05:30
  • 89cd6b0726 [maven-release-plugin] prepare for next development iteration Vinoth Chandar 2018-08-22 21:30:05 -07:00
  • 8d305c5a86 [maven-release-plugin] prepare release hoodie-0.4.3 Vinoth Chandar 2018-08-22 21:29:53 -07:00
  • 6fffda5c70 Update Release notes for 0.4.3 release Vinoth Chandar 2018-08-22 21:11:43 -07:00
  • e624480259 Throttling to limit QPS from HbaseIndex Kaushik Devarajaiah 2018-07-19 13:46:33 -07:00
  • 3746ace76a Fixing Null pointer exception in finally block Nishith Agarwal 2018-08-21 17:27:56 -07:00
  • 88274b8261 Adding another metric to HoodieWriteStat to determine if there were inserts converted to updates, added one test for this Nishith Agarwal 2018-08-07 15:51:46 -07:00
  • 989afddd54 BUGFIX - Use Guava Optional (which is Serializable) in CompactionOperation wcached to avoid NoSerializableException Balaji Varadarajan 2018-08-07 15:57:01 -07:00
  • ea23c9b7a0 Minor bug fixes found during testing Balaji Varadarajan 2018-08-03 16:40:58 -07:00
  • 594059a19c Add CLI support inspect, schedule and run compaction Balaji Varadarajan 2018-06-05 18:26:01 -07:00
  • 2e12c86d01 Ensure Compaction Operation compacts the data file as defined in the workload Balaji Varadarajan 2018-05-26 14:08:29 -07:00
  • 2f8ce93030 Async Compaction Main API changes Balaji Varadarajan 2018-05-23 23:09:25 -07:00
  • 9b78523d62 Ensure Cleaner and Archiver do not delete file-slices and workload marked for compaction Balaji Varadarajan 2018-05-31 14:16:19 -07:00
  • 0a0451a765 Ensure Compaction workload is stored in write-once meta-data files separate from timeline files. This avoids concurrency issues when compactor(s) and ingestor are running in parallel. In the Next PR -> Safety concern regarding Cleaner retaining all meta-data and file-slices for pending compactions will be addressed Balaji Varadarajan 2018-05-31 14:11:43 -07:00
  • 9d99942564 Track fileIds with pending compaction in FileSystemView to provide correct API semantics Balaji Varadarajan 2018-05-24 11:19:40 -07:00
  • 1b61f04e05 (1) Define CompactionWorkload in avro to allow storing them in instant files. (2) Split APIs in HoodieRealtimeCompactor to separate generating compaction workload from running compaction Balaji Varadarajan 2018-05-23 20:49:24 -07:00
  • 6d01ae8ca0 FileSystemView and Timeline level changes to support Async Compaction Balaji Varadarajan 2018-05-23 16:54:53 -07:00
  • 44caf0d40c Fixing missing hoodie record location in HoodieRecord when record is read from disk after being spilled Nishith Agarwal 2018-07-12 17:45:10 -07:00
  • f62890ca1f adding setters so that subclasses can set it Omkar Joshi 2018-07-18 11:11:32 -07:00
  • 34ab54a9d3 Fixing bug introducted in rollback for MOR table type with inserts into log files Nishith Agarwal 2018-07-09 16:58:05 -07:00
  • a6fe96fdfe Changing Day based compaction strategy to be IO agnostic Nishith Agarwal 2018-05-23 14:26:11 -07:00
  • 3da063f83b Adding ability for inserts to be written to log files Nishith Agarwal 2018-05-13 16:25:11 -07:00
  • 34827d50e1 [maven-release-plugin] prepare for next development iteration Vinoth Chandar 2018-06-11 08:59:13 -07:00
  • 43ef385730 [maven-release-plugin] prepare release hoodie-0.4.2 Vinoth Chandar 2018-06-11 08:59:02 -07:00
  • 4f76f2899e Update Release notes for 0.4.2 release vinoth chandar 2018-06-11 08:41:11 -07:00
  • 8ad8030f2a Fix wrong use of TemporaryFolder junit rule Xavier Jodoin 2018-03-28 13:09:57 -04:00
  • 8f1d362015 Fixing deps & serialization for RTView - hoodie-hadoop-mr now needs objectsize bundled - Also updated docs with additional tuning tips vinothchandar 2018-06-10 18:54:58 -07:00
  • 85dd265b7b Improving out of box experience for data source Vinoth Chandar 2018-01-05 14:06:18 -08:00
  • a97814462d Added a filter function to filter the record keys in a parquet file Sunil Ramaiah 2018-05-17 15:40:47 -07:00
  • 23d53763c4 enabling global index for MOR Nishith Agarwal 2018-05-02 00:52:37 -07:00
  • dfc0c61eb7 Support union mode in HoodieRealtimeRecordReader for pure insert workloads Also Replace BufferedIteratorPayload abstraction with function passing Balaji Varadarajan 2018-04-26 10:18:05 -07:00
  • 93f345a032 Minor fixes for MergeOnRead MVP release readiness Nishith Agarwal 2018-04-02 22:53:28 -07:00
  • 75df72f575 Adding a fix/workaround when fs.append() unable to return a valid outputstream Nishith Agarwal 2018-05-02 00:41:03 -07:00
  • 04655e9e85 Adding metrics for MOR and COW Nishith Agarwal 2018-03-25 11:12:41 -07:00
  • c66004d79a Add Support for ordering and limiting results in CLI show commands Balaji Varadarajan 2018-04-24 10:56:05 -07:00
  • b9b9b24993 Added more comments and removed the extra new lines Sunil Ramaiah 2018-04-25 12:40:24 -07:00
  • 4d1fba24c9 Fix for updating duplicate records in same/different files in same parition Sunil Ramaiah 2018-04-23 15:23:42 -07:00
  • fa73a911cc Update Gemfile.lock vinoth chandar 2018-04-19 14:20:50 -07:00
  • c3c205fc02 Using BufferedFsInputStream to wrap FSInputStream for FSDataInputStream Nishith Agarwal 2018-04-02 22:53:28 -07:00
  • 720e42f52a Parallelized read-write operations in Hoodie Merge phase Nishith Agarwal 2018-04-01 21:43:05 -07:00
  • 6c226ca21a Issue-329 : Refactoring TestHoodieClientOnCopyOnWriteStorage and adding test-cases Balaji Varadarajan 2018-04-02 10:08:06 -07:00
  • a4049329a5 Update release notes for 0.4.1 (post) Vinoth Chandar 2018-04-02 09:29:15 -07:00
  • 788e4f2d2e CodeStyle formatting to conform to basic Checkstyle rules. Balaji Varadarajan 2018-03-20 16:29:20 -07:00
  • 987f5d6b96 Making ExternalSpillableMap generic for any datatype - Introduced concept of converters to be able to serde generic datatype for SpillableMap - Fixed/Added configs to Hoodie Configs - Changed HoodieMergeHandle to start using SpillableMap Nishith Agarwal 2018-03-15 00:20:16 -07:00
  • fa787ab5ab Replace deprecated jackson version Xavier Jodoin 2018-03-27 11:36:01 -04:00
  • 1b756db221 Adding config for parquet compression ratio Nishith Agarwal 2018-03-23 21:50:11 -07:00
  • 48643795b8 Checking storage level before persisting preppedRecords Jian Xu 2018-03-20 12:06:15 -07:00
  • 291a88ba94 DeduplicateRecords based on recordKey if global index is used Kaushik Devarajaiah 2018-03-12 19:06:52 -07:00
  • 123da020e2 - Fixing memory leak due to HoodieLogFileReader holding on to a logblock - Removed inMemory HashMap usage in merge(..) code in LogScanner Nishith Agarwal 2018-03-13 22:56:29 -07:00
  • d3df32fa03 Add back UseTempFolder changes in HoodieMergeHandle Jian Xu 2018-03-14 12:53:12 -07:00
  • c5b4cb1b75 Spawning parallel writer thread to separate reading records from spark and writing records to parquet file Omkar Joshi 2018-03-14 16:00:47 -07:00
  • 9dff8c2326 Adding a tool to read/inspect a HoodieLogFile Nishith Agarwal 2018-01-21 21:09:52 -08:00
  • ba7c258c61 Add more options in HoodieWriteConfig Jian Xu 2018-03-06 11:25:26 -08:00
  • 7f079632a6 Use hadoopConf in HoodieTableMetaClient and related tests Jian Xu 2018-03-08 17:21:11 -08:00
  • 73534d467f [maven-release-plugin] prepare for next development iteration Vinoth Chandar 2018-03-07 21:04:10 -08:00
  • f2e5c6f9f8 [maven-release-plugin] prepare release hoodie-0.4.1 Vinoth Chandar 2018-03-07 21:04:00 -08:00
  • 0eaa21111a Re-factoring Compaction as first level API in WriteClient similar to upsert/insert Nishith Agarwal 2018-02-28 15:58:19 -08:00
  • 5405a6287b Introducing HoodieLogFormat V2 with versioning support - HoodieLogFormat V2 has support for LogFormat evolution through versioning - LogVersion is associated with a LogBlock not a LogFile - Based on a version for a LogBlock, approporiate code path is executed - Implemented LazyReading of Hoodie Log Blocks with Memory / IO tradeoff - Implemented Reverse pointer to be able to traverse the log in reverse - Introduce new MAGIC for backwards compatibility with logs without versions Nishith Agarwal 2018-02-15 11:01:25 -08:00
  • dfd1979c51 Handle inflight clean instants during Hoodie instants archiving Jian Xu 2018-03-01 14:04:16 -08:00
  • 5d5c306e64 Add new APIs in HoodieReadClient and HoodieWriteClient Jian Xu 2018-02-22 11:20:54 -08:00
  • 6fec9655a8 Added support for Disk Spillable Compaction to prevent OOM issues Nishith Agarwal 2017-12-06 13:11:27 -08:00
  • d495484399 Write smaller sized multiple blocks to log file instead of a large one - Use SizeEstimator to size number of records to write - Configurable block size - Configurable log file size Nishith Agarwal 2018-02-01 12:36:12 -08:00
  • eb3d0c470f Fix formatting in HoodieWriteClient Vinoth Chandar 2018-02-13 17:40:09 -08:00
  • 3bdd750982 Use FastDateFormat for thread safety Jian Xu 2018-01-30 17:13:54 -08:00
  • 7076c2e9f0 refactor classes to accept Map passed by RealtimeCompactor to avoid multiple map creations in HoodieMergeHandle Nishith Agarwal 2018-01-24 13:34:14 -08:00
  • 30049383f5 Small File Size correction handling for MOR table type Nishith Agarwal 2018-01-10 21:10:22 -08:00
  • 2116815261 Fixing Rollback for compaction/commit operation, added check for null commit - Fallback to old way of rollback by listing all partitions - Added null check to ensure only partitions which are to be rolledback are considered - Added location (committime) to workload stat - Added checks in CompactedScanner to guard against task retries - Introduce new logic for rollback (bounded by instant_time and target_instant time) - Reversed logfiles order Nishith Agarwal 2017-12-14 21:34:54 -08:00