Importing Hoodie Client from internal repo

Abberved History: * 25c6991 Removed non-opensource modules * a62abf3 Removing email from pom.xml * 0931b68 Misspelt in the copyright * c1cac7d Preperation for OSS: Added License and rat plugin check. Also added meta information about the project in pom.xml * 16b07b3 Preparation of OSS - Remove hoodie specific URL from hoodie cli * fd3e0dd Small code cleanups * 8aa7e34 Adding a de-duplication command to CLI * b464842 Adding a de-duplication command to CLI * 59265b1 RegisterDataset should pass the right zkNodeName after the support for multiple databases added * b295f70 [maven-release-plugin] prepare for next development iteration * 1006e4b [maven-release-plugin] prepare release hoodie-0.2.4 * 4c99437 Move to using hdrone release 0.7.4 * 1891939 Auto tuning the buckets needed for plain inserts also - Off by default for now - Enhanced an existing unit test * b4563bd Change HoodieReadClient to use commit metadata for incremental pull * ee20183 Add full file path onto HoodieWriteStat - This will become an issue later on for incremental processing use cases - Tested with cli, that is able to read older HoodieCommitMetadata * 7dcd5d5 Address skew in cleaner work distribution * 8d7c15d Fixing bug around partial failures of rollback * d4ada1d Empty RDD should not throw java.lang.IllegalArgumentException: Positive number of slices required * 076bea9 Dont clean if there are no partitions to clean * c014f80 Minor changes to SQLStreamer * a96d4df Minor changes to SQLStreamer * bc289cc [maven-release-plugin] prepare for next development iteration * 4160107 [maven-release-plugin] prepare release hoodie-0.2.3 * 409b07a [maven-release-plugin] prepare for next development iteration * 3d71514 [maven-release-plugin] prepare release hoodie-0.2.2 * 4969d52 Fix test failures * ac62609 Implement Review Comments for: Parallelize cleaning and including cleaning time and commit archival time in commit time graphite reporting * cebe65a Parallelize cleaning and including cleaning time and commit archival time in commit time graphite reporting * 2e5b372 Migrating to CDH 5.7.2 * 899ae12 Remove filtering of /tmp/hive/hive paths from HoodieInputFormat. This fixes Join with temporary tables with HoodieCombineHiveInputFormat * 69a68f6 Implement equals and hashCode for HoodieTableMetadata, its used in hash based structures * 12d29c6 Update hive staging url * 1c5c88a Copy filterExists to WriteClient * 76aee67 [maven-release-plugin] prepare for next development iteration * 1f0a715 [maven-release-plugin] prepare release hoodie-0.2.1 * dbfd1d4 HoodieReadClient and HoodieWriteClient separation * c39a98b Revamped HoodieRecordPayload API that supports merging of old & new values during update * 79e5bbd Add a helper to configure SparkConf for SparkSQL on Hoodie tables * f56f423 [maven-release-plugin] prepare for next development iteration * 780fc44 [maven-release-plugin] prepare release hoodie-0.2 * 1ea2238 Modifying the git utl * b0af8dc Depending on hdrone release version * 7753693 Removing a System.out.println which got in by mistake * 1f5b019 Adding HBase Config to HoodieClientConfig * 2fce97f Implement Review comments and merge into master * f389820 Bunch of API changes * 909a856 HoodieClientConfig split up and revamp * c2ad946 Fix TestHoodieClient to not double persist in testFilterExists * 3ab0da6 Fix breaking test * 2860542 CR feedback for small inserts turned to updates * 0dfce57 Small inserts are now turned into upserts * bb1a8b3 Add filterExist API for Hoodie Records * d983c24 Implement review comments * c0bd5d1 Implement HoodieClient.checkExists() * db078f6 Pick up HoodieTable based on hoodie.properties * ad023e9 Refactor upsert() using HoodieTable interface * ee9b9b3 Refactor upsert() using HoodieTable interface * 2d6fdc9 Adding a utility to generate the percentage of updates in commit * ea3ad58 Adding additional optimizations to remove similar queries from the perf test (using levenshtein distance) * 1e443a0 Add test case for the added support for SchemaEvolution during updates * 1cadcbb Add more logging * 6163dfe Parquet read of old file should have the right read schema specified * 29c746a Few fixes in ReduceByKey parallelism, HoodieInputFormat.filterFiles for non-hoodie paths and more logging in upsert schema issues * 5a33af6 Fixing an issue in HoodieReader, target temp directory not created * 09a5e8e Adding more logging in HoodieReader * 1474250 Adding more logging in HoodieReader * a3b0567 Make targetDb not required in HoodieReader * e9c08b9 Setting the inputformat as the CombineHiveInputFormat in the HoodieReader * 61c75d2 Hoodie Query Performance: Add Support for CombineHiveInputFormat and implement CombineFileInputFormat * 38c6e44 Improvements to Hoodie Reader * ac7398a Add totalWriteErrors to HoodieCommitMetadata * fc0536e Change archive location to be under .hoodie * e313294 Implement Hive Perf comparison for Hoodie and non-Hoodie datasets * 17cfe2a Fix bug in HoodieInputFormat, where it filters out files from archived commits * 30de990 Add note about showpartitions command to README * 8634ffb Add commits showpartitions command to show break down per partition * 324b24e Adding a CLI command to print file size stats * 56532ff T484792. Deterministically report metrics during shutdown * 3571768 Fixes to Hoodie Cleaner. Upgrade HDrone version. Changes to HoodieReader. * a02c97f Bumping hdrone-api to 0.7.2 * b29ce67 Bug in RegisterDataset dataset creation * 5a15a9a Fixing bug in cleaning up partial files * dbf6669 Comment out predicate pushdown test * 44ed4d1 Merge branch 'lazyitr-fixes-1' |\ | * e913d3b Fixing bug in LazyInsertIterable | * 8a1fecd Wrapping upsert() inside HoodieUpsertException | * 39cfe39 Fixing bug in LazyInsertIterable - Return a List<WriteStatus> to handle last record in itr, belonging to a separate file - Remove insert() related code form UpsertMapFunction | * 00252e5 Making TestHoodieBloomIndex less flaky * | 6f2d417 Making TestHoodieBloomIndex less flaky * | 63ebbdc fs.mkdirs does not honor permission umask passed. Need to use the static method FileSystem.mkdirs for that. * | f49ef67 Adding more logging to Hoodie Reader * | 9f5a699 Fixing permission on the base intermediate folder created in HoodieReader |/ * 70e501f Fixing the drop table before create table in HoodieReader * 120cda8 Hoodie tools jar should not require jars in the CDH classpath to be available. Needed for HoodieReader to run in Docker. * 60b59de Adding client configurations. Needed to run the HoodieReader in Docker (where CDH is not installed) * fece98d Merge conflicts w/ master * 64e58b0 Auto tuning parallelism in BloomIndex & Upsert() * 930199e Fixing skew in Index join when new partition paths dont exist yet * 9a3e511 Adding subpartitioning to scale join in HoodieBloomIndex * 57512a7 Changing sort key for IndexLookup to (filename, record) to split more evenly * 3ede14c Major changes to BloomIndex & Upsert DAG * 1c4071a Implement Dataset creation if a Hoodie dataset was not already registered * 944f007 Implement Review comments * 6a5b675 Implement Review Comments * bfde3a9 Implement review comments * d195ab3 Implementing Commit Archiving * 8af656b Exception refactor - part 2 * 697a699 HoodieTableMetadata refactor and Exception refactor * 7804ca3 Adding HoodieAppendLog (fork of SequenceFile) & Initial Impl of HoodieCommitArchiveLog * 2db4931 Adjust partitionFileRDD parallelism to max(recordRDD partitions, total partitions) * 23405c5 Config name changes * 5e673ea Implementing more CLI commands * 918cfce Moving to 0.1.1-SNAPSHOT * afad497 Change the master branch to 0.2-SNAPSHOT * 832c1a7 Make sure the bloom filter reading and tagging has a parellel factor >= group by parallelism * 0a6a6d3 Prepare the v0.1 version * 72cfbe2 The snapshoter should also copy hoodie.properties file * 3b0ee45 Add one more metric * 488f1c7 Add switch for cleaning out inflight commits * a259b6f Adding textutils jar to hoodie build * 36e3118 Fix Hoodie CLI - ClassNotFound and added more logging to JDBC Incremental pull * 2c8f554 Fix Predicate pushdown during incremental pull * 888ec20 Add one more graphite metrics * a671dfc Ensure files picked for cleaning are part of some valid commit * ba5cd65 Adding cleaning based on last X commits * 7dc76d3 Organize config values by category * 9da6474 Move cleaning logic into HoodieCleaner class * 7becba9 Change the update metric name * d32b1f3 Fix some graphite issues * 365ee14 hot fix a stupid bug I made * 93eab43 Adding a hoodie.table.type value to hoodie.properties on init * 075c646 Add the database name to the sync * 3bae059 Adding HoodieKey as metadata field into Record * 61513fa Add stats and more cli commands * b0cb112 New Hoodie CLI Framework. Implement CLI function parity with the current CLI * aaa1bf8 New Hoodie CLI Framework. Implement CLI function parity with the current CLI * 3a3db73 New Hoodie CLI Framework. Implement CLI function parity with the current CLI * c413342 Fail the job if exception during writing old records * 7304d3d Exclude javax.servlet from hive-jdbc * 3d65b50 Add the datestr <> '0000-00-00' back to the incremental sql * 0577661 HoodieIncrementalConfig not used anymore * 5338004 Fixing multiple minor issues we found during the SQLStreamer demo preperation * 0744283 Fix the Hive server and Spark Hive client mismatch by setting userClassPathFirst=true and creating a assembly jar with all hadoop related dependencies excluded * c189dc0 Kickoff hdrone sync after SQLStreamer finishing committing to target hoodie dataset * 1eb8da0 Check if the .commit file is empty * f95386a Add support for rollbacking .inflight commit in Admin CLI * 97595ea Update the record count when upserting * 49139cd Remove table config and add _SUCCESS tag * 8500a48 Catch the exception when upserting * 10bcc19 Merge branch 'sqlload' |\ | * 10fcc88 More log statements | * ca6b71d Merge with master | |\ | | * b33db25 Merge remote-tracking branch 'origin/sqlload' into sqlload | | |\ | | | * 8fca7c6 insert() takes a JavaRDD<HoodieRecord> again | | * | 63db8c6 Fix test breakage from javax.servlet pom dependency | | * | b2cff33 insert() takes a JavaRDD<HoodieRecord> again | | * | 0162930 Minor Fixes | | * | a0eb0b8 Minor Fixes | | * | 5853e7c Minor fixed to HoodieSQLStreamer | | * | 379bbed HoodieSQLStreamer improvements | | * | 22bf816 Remove setJsonPayload() and other non-generic calls from HoodieRecordPayload | | * | 4cacde6 Remove setJsonPayload() and other non-generic calls from HoodieRecordPayload | | * | 5f985f3 Refactor of AvroParquetIO and create proper abstraction for StorageWriter | | * | 6b90bb0 Refactor to introduce proper abstractions for RawTripPayload and implement HoodieSQLStreamer | | * | ff24ce8 Implementation of HoodieSQLStreamer | | * | abae08a Implementation of HoodieSQLStreamer | * | | c2d306d Fixes to HoodieSQLStreamer | | |/ | |/| | * | 70bad72 Minor Fixes | * | 8da6abf Minor Fixes | * | 6b9d16b Minor fixed to HoodieSQLStreamer | * | f76f5b8 HoodieSQLStreamer improvements | * | 5f1425e Remove setJsonPayload() and other non-generic calls from HoodieRecordPayload | * | 616e2ee Remove setJsonPayload() and other non-generic calls from HoodieRecordPayload | * | 9e77ef9 Refactor of AvroParquetIO and create proper abstraction for StorageWriter | * | 14e4812 Refactor to introduce proper abstractions for RawTripPayload and implement HoodieSQLStreamer | * | 3b05f04 Implementation of HoodieSQLStreamer | * | 1484c34 Implementation of HoodieSQLStreamer * | | b3b9754 Standardize UTF-8 for getBytes() calls | |/ |/| * | 8cde079 Add graphite metrics to HoodieClient * | b94afad Add testcase for the snapshot copy |/ * 8567225 T417977. WriteStatus for failed records * 11d7cd2 Add code to deflate the HoodieRecord after writing it to storage * 9edafb4 Add a daily snapshot job * 2962bf6 Fix the last file non-closed issue * d995b6b SizeAwareParquetWriter will now have a fixed compression ratio * 6b5f67f HoodieWrapperFileSystem should initialize the underlying filesystem with default uri * 2a607c2 Merging conflicts with master * ac9852d Auto size parquet files to just under block size based on incoming records size * 3c4c0d0 Remove client code leaks & add parallelism config for sorting * 1e51e30 Add UpsertHandle * 685ca1f Add hoodie cli * ded7f6c CR feedback incorporated * d532089 Change the return type to a RDD * 22533c1 Fix bug in cleanup logic by using TaskContext.getPartitionId() in place of unitNumber * 86532fb Implement insert() using sorting, to align file sizes easily * 0967e1c Add hook to compare old record with new incoming record * f48b048 Merge branch 'sort-based-dag' |\ | * 3614cec Rename write() -> upsert() and load() -> insert() * | 65cf631 Parquet version mismatch in HoodieInputFormat * | 160303b Formatting change * | 2c079c8 Formatting change |/ * e4eb658 Fix formatting * 025114a Add test for HoodieAvroWriteSupport * 6fd11ef Fix small bug in HoodieCommits & correct doc to reflect exclusivity of findCommitsInRange - Added simple unit test * 05659c9 Add tests around HoodieClient apis * 8d3f73e Fix some small bugs * 7f1c4bc Modify HoodieInputFormatTest to make it certain that incremental pull is only pulling the required records * 2b73ba0 Remove direct versioning in pom * dd5695f Comment change * f62eef7 Unit test for predicate pushdown * 9941dad Fixing an issue which results in unsorted commits * 5e71506 Update README * 219e103 InputFormat unit tests * 8f1c7ba Enable cobertura coverage to be run with mvn test * 01f76e3 Call out self-join limitation in README * 4284a73 Defaulting to Google Java Style and reformatting existing code * de2cbda Making sure that incremental does not send duplicate records * f6a3833 Implement Review comments * 1de5025 Refactor in HoodieTableMetadata, HoodieInputFormat * 549ad9a Fixing broken test schemas * fbb2190 update the unit number * 9353ba9 Change the io number to 1 for old load data * e28f0cf Add commit metadata fields to create_table.sql * d06e93d Pull avroFn & dedupeFn into a single HoodieClientHooks class * b6d387f Changes to sequence_no/commit metadata addition * 212d237 Add some benchmark results to the code * 70d7715 Add commit rollback logic * 54a4d0f Use FSUtils helper to detemine fileId * 4b672ad Core classes refactoring * f705fab Move partitionPath back into HoodieKey * 39b3ff3 Cleanup Sample job & add a detailed quickstart * 981c6f7 fix the hoodie-query-meta pom * 371ab34 Publish hoodie to uber internal artifactory * b4e83bc improvement on the bloom index tag job * 779b502 Change to use hadoop's bloom filter * cfbd9e6 Add bloom filter indexing mechanism * f519c47 Initial Implementation of storing the client metadata for hoodie queries * d5eccea Initial Implementation of storing the client metadata for hoodie queries * ef34482 Pass on the HDrone configuration profile as an argument * 5578cd3 Implement initial incremental tailing support in InputFormat and provide a seperate module for Hdrone registration to be created as a oozie trigger * b08e5ff Merge branch 'master' into AddBloomFilterWriteSupport * 20b7e8e fix a typo * 4c39407 Quick fix for the HBASE indx duplicates records issue * 6dca38f Adding code to sync to hive using hdrone * 55a1d44 Fixes to InputFormat. Created a placeholder OutputFormat. * beda7ed Revise the globPartitions to avoid the bad partition paths * 5d889c0 Fix a wrong config * a60fbdf First version to add load function * 4b90944 Adding detailed metadata to each commit * 4a97a6c Changes to backfill script + enabling spark event log * ada2b79 Discard records without partition path & move parquet writer to snappy * 954c933 Adding backfill script - Cleanups & additional cmd line options to job - Changed iounit logic to special case 2010-2014 again * 8b5e288 Breaking apart backfill job & single run into two classes * ebdcbea Handle partial failures in update() * 4bf6ffe Fixing an issue where file name is not present * e468bff Fix couple of issues with Hbase indexing and commit ts checks * 17da30c Changing de-dupe implementation to be a Spark reduceByKey * 248c725 removed coalescing which was put in there for testing * 1b3f929 Implement compression when storing large json strings in memory * 5bada98 Changes to accomodate task failure handling, on top of cleaner * 66f895a Clean out files generated by previous failed attempts * 9cbe370 Implementing a rudimentary cleaner & avro conversion rewrite * 3606658 Adding configs for iounits & reduce parallelism * 066c2f5 Registering the Hoodie classes with Kryo * 342eed1 Implementing a rudimentary cleaner * 0d20d1d Merge branch 'trip-test-run' |\ | * 6eafdbb Adding de-dupe step before writing/shuffling * | 34baba7 Packaging hadoop-common with the hadoop-mr InputFormat JAR |/ * d5856db Merge HoodieInputFormat with existing code. Factor out common logic into hadoop-common. Tune the partitions, spark executors, parquet parameters to be able to run on a single day of input data * e8885ce Introduce IOUnit to split parallelize inserts * ab1977a Pushing in a real Spark job that works off real data * 0c86645 HoodirInputFormat with TestDataSimulator * 6af483c Initial checkin for HoodieInputFormat * 99c58f2 Implementing HBase backed index * 4177529 First major chunk of Hoodie Spark Client Impl * 29fad70 Benchmark bloom filter file read performance * 18f52a4 Checking in the simulation code, measuring cost of trip's file-level updates * 885f444 Adding basic datastructures for Client, key & record. * 72e7b4d Initial commit
2016-12-16 14:34:42 -08:00
parent 0512da094b
commit 81874a8406
69 changed files with 10464 additions and 11 deletions
--- a/hoodie-client/src/main/java/com/uber/hoodie/HoodieReadClient.java
+++ b/hoodie-client/src/main/java/com/uber/hoodie/HoodieReadClient.java
@@ -0,0 +1,299 @@
+/*
+ * Copyright (c) 2016 Uber Technologies, Inc. (hoodie-dev-group@uber.com)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *          http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package com.uber.hoodie;
+
+import com.google.common.base.Optional;
+
+import com.uber.hoodie.common.model.HoodieCommitMetadata;
+import com.uber.hoodie.common.model.HoodieCommits;
+import com.uber.hoodie.common.model.HoodieKey;
+import com.uber.hoodie.common.model.HoodieRecord;
+import com.uber.hoodie.common.model.HoodieTableMetadata;
+import com.uber.hoodie.common.model.HoodieWriteStat;
+import com.uber.hoodie.common.util.FSUtils;
+import com.uber.hoodie.config.HoodieWriteConfig;
+import com.uber.hoodie.exception.HoodieException;
+import com.uber.hoodie.index.HoodieBloomIndex;
+
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.SparkConf;
+import org.apache.spark.api.java.JavaPairRDD;
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.api.java.function.Function;
+import org.apache.spark.api.java.function.PairFunction;
+import org.apache.spark.sql.DataFrame;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.SQLContext;
+import org.apache.spark.sql.types.StructType;
+
+import java.io.IOException;
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+
+import scala.Tuple2;
+
+/**
+ * Provides first class support for accessing Hoodie tables for data processing via Apache Spark.
+ *
+ *
+ * TODO: Need to move all read operations here, since Hoodie is a single writer & multiple reader
+ */
+public class HoodieReadClient implements Serializable {
+
+    private static Logger logger = LogManager.getLogger(HoodieReadClient.class);
+
+    private transient final JavaSparkContext jsc;
+
+    private transient final FileSystem fs;
+    /**
+     * TODO: We need to persist the index type into hoodie.properties & be able to access the index
+     * just with a simple basepath pointing to the dataset. Until, then just always assume a
+     * BloomIndex
+     */
+    private transient final HoodieBloomIndex index;
+    private HoodieTableMetadata metadata;
+    private transient Optional<SQLContext> sqlContextOpt;
+
+
+    /**
+     * @param basePath path to Hoodie dataset
+     */
+    public HoodieReadClient(JavaSparkContext jsc, String basePath) {
+        this.jsc = jsc;
+        this.fs = FSUtils.getFs();
+        this.metadata = new HoodieTableMetadata(fs, basePath);
+        this.index = new HoodieBloomIndex(HoodieWriteConfig.newBuilder().withPath(basePath).build(), jsc);
+        this.sqlContextOpt = Optional.absent();
+    }
+
+    /**
+     *
+     * @param jsc
+     * @param basePath
+     * @param sqlContext
+     */
+    public HoodieReadClient(JavaSparkContext jsc, String basePath, SQLContext sqlContext) {
+        this(jsc, basePath);
+        this.sqlContextOpt = Optional.of(sqlContext);
+    }
+
+    /**
+     * Adds support for accessing Hoodie built tables from SparkSQL, as you normally would.
+     *
+     * @return SparkConf object to be used to construct the SparkContext by caller
+     */
+    public static SparkConf addHoodieSupport(SparkConf conf) {
+        conf.set("spark.sql.hive.convertMetastoreParquet", "false");
+        return conf;
+    }
+
+    private void assertSqlContext() {
+        if (!sqlContextOpt.isPresent()) {
+            throw new IllegalStateException("SQLContext must be set, when performing dataframe operations");
+        }
+    }
+
+    /**
+     * Given a bunch of hoodie keys, fetches all the individual records out as a data frame
+     *
+     * @return a dataframe
+     */
+    public DataFrame read(JavaRDD<HoodieKey> hoodieKeys, int parallelism)
+            throws Exception {
+
+        assertSqlContext();
+        JavaPairRDD<HoodieKey, Optional<String>> keyToFileRDD =
+                index.fetchRecordLocation(hoodieKeys, metadata);
+        List<String> paths = keyToFileRDD
+                .filter(new Function<Tuple2<HoodieKey, Optional<String>>, Boolean>() {
+                    @Override
+                    public Boolean call(Tuple2<HoodieKey, Optional<String>> keyFileTuple) throws Exception {
+                        return keyFileTuple._2().isPresent();
+                    }
+                })
+                .map(new Function<Tuple2<HoodieKey, Optional<String>>, String>() {
+
+                    @Override
+                    public String call(Tuple2<HoodieKey, Optional<String>> keyFileTuple) throws Exception {
+                        return keyFileTuple._2().get();
+                    }
+                }).collect();
+
+        // record locations might be same for multiple keys, so need a unique list
+        Set<String> uniquePaths = new HashSet<>(paths);
+        DataFrame originalDF = sqlContextOpt.get().read()
+                .parquet(uniquePaths.toArray(new String[uniquePaths.size()]));
+        StructType schema = originalDF.schema();
+        JavaPairRDD<HoodieKey, Row> keyRowRDD = originalDF.javaRDD()
+                .mapToPair(new PairFunction<Row, HoodieKey, Row>() {
+                    @Override
+                    public Tuple2<HoodieKey, Row> call(Row row) throws Exception {
+                        HoodieKey key = new HoodieKey(
+                                row.<String>getAs(HoodieRecord.RECORD_KEY_METADATA_FIELD),
+                                row.<String>getAs(HoodieRecord.PARTITION_PATH_METADATA_FIELD));
+                        return new Tuple2<>(key, row);
+                    }
+                });
+
+        // Now, we need to further filter out, for only rows that match the supplied hoodie keys
+        JavaRDD<Row> rowRDD = keyRowRDD.join(keyToFileRDD, parallelism)
+                .map(new Function<Tuple2<HoodieKey, Tuple2<Row, Optional<String>>>, Row>() {
+                    @Override
+                    public Row call(Tuple2<HoodieKey, Tuple2<Row, Optional<String>>> tuple) throws Exception {
+                        return tuple._2()._1();
+                    }
+                });
+
+        return sqlContextOpt.get().createDataFrame(rowRDD, schema);
+    }
+
+    /**
+     * Reads the paths under the a hoodie dataset out as a DataFrame
+     */
+    public DataFrame read(String... paths) {
+        assertSqlContext();
+        List<String> filteredPaths = new ArrayList<>();
+        try {
+            for (String path : paths) {
+                if (!path.contains(metadata.getBasePath())) {
+                    throw new HoodieException("Path " + path
+                            + " does not seem to be a part of a Hoodie dataset at base path "
+                            + metadata.getBasePath());
+                }
+
+                FileStatus[] latestFiles = metadata.getLatestVersions(fs.globStatus(new Path(path)));
+                for (FileStatus file : latestFiles) {
+                    filteredPaths.add(file.getPath().toString());
+                }
+            }
+            return sqlContextOpt.get().read()
+                    .parquet(filteredPaths.toArray(new String[filteredPaths.size()]));
+        } catch (Exception e) {
+            throw new HoodieException("Error reading hoodie dataset as a dataframe", e);
+        }
+    }
+
+    /**
+     * Obtain all new data written into the Hoodie dataset since the given timestamp.
+     *
+     * If you made a prior call to {@link HoodieReadClient#latestCommit()}, it gives you all data in
+     * the time window (commitTimestamp, latestCommit)
+     */
+    public DataFrame readSince(String lastCommitTimestamp) {
+
+        List<String> commitsToReturn = metadata.findCommitsAfter(lastCommitTimestamp, Integer.MAX_VALUE);
+        //TODO: we can potentially trim this down to only affected partitions, using CommitMetadata
+        try {
+
+            // Go over the commit metadata, and obtain the new files that need to be read.
+            HashMap<String, String> fileIdToFullPath = new HashMap<>();
+            for (String commit: commitsToReturn) {
+                // get files from each commit, and replace any previous versions
+                fileIdToFullPath.putAll(metadata.getCommitMetadata(commit).getFileIdAndFullPaths());
+            }
+
+            return sqlContextOpt.get().read()
+                    .parquet(fileIdToFullPath.values().toArray(new String[fileIdToFullPath.size()]))
+                    .filter(String.format("%s >'%s'", HoodieRecord.COMMIT_TIME_METADATA_FIELD, lastCommitTimestamp));
+        } catch (IOException e) {
+            throw new HoodieException("Error pulling data incrementally from commitTimestamp :" + lastCommitTimestamp, e);
+        }
+    }
+
+    /**
+     * Obtain
+     */
+    public DataFrame readCommit(String commitTime) {
+        assertSqlContext();
+        HoodieCommits commits = metadata.getAllCommits();
+        if (!commits.contains(commitTime)) {
+            new HoodieException("No commit exists at " + commitTime);
+        }
+
+        try {
+            HoodieCommitMetadata commitMetdata = metadata.getCommitMetadata(commitTime);
+            Collection<String> paths = commitMetdata.getFileIdAndFullPaths().values();
+            return sqlContextOpt.get().read()
+                    .parquet(paths.toArray(new String[paths.size()]))
+                    .filter(String.format("%s ='%s'", HoodieRecord.COMMIT_TIME_METADATA_FIELD, commitTime));
+        } catch (Exception e) {
+            throw new HoodieException("Error reading commit " + commitTime, e);
+        }
+    }
+
+    /**
+     * Checks if the given [Keys] exists in the hoodie table and returns [Key,
+     * Optional<FullFilePath>] If the optional FullFilePath value is not present, then the key is
+     * not found. If the FullFilePath value is present, it is the path component (without scheme) of
+     * the URI underlying file
+     */
+    public JavaPairRDD<HoodieKey, Optional<String>> checkExists(
+            JavaRDD<HoodieKey> hoodieKeys) {
+        return index.fetchRecordLocation(hoodieKeys, metadata);
+    }
+
+    /**
+     * Filter out HoodieRecords that already exists in the output folder. This is useful in
+     * deduplication.
+     *
+     * @param hoodieRecords Input RDD of Hoodie records.
+     * @return A subset of hoodieRecords RDD, with existing records filtered out.
+     */
+    public JavaRDD<HoodieRecord> filterExists(JavaRDD<HoodieRecord> hoodieRecords) {
+        JavaRDD<HoodieRecord> recordsWithLocation = index.tagLocation(hoodieRecords, metadata);
+        return recordsWithLocation.filter(new Function<HoodieRecord, Boolean>() {
+            @Override
+            public Boolean call(HoodieRecord v1) throws Exception {
+                return !v1.isCurrentLocationKnown();
+            }
+        });
+    }
+
+    /**
+     * Checks if the Hoodie dataset has new data since given timestamp. This can be subsequently
+     * used to call {@link HoodieReadClient#readSince(String)} to perform incremental processing.
+     */
+    public boolean hasNewCommits(String commitTimestamp) {
+        return listCommitsSince(commitTimestamp).size() > 0;
+    }
+
+    /**
+     *
+     * @param commitTimestamp
+     * @return
+     */
+    public List<String> listCommitsSince(String commitTimestamp) {
+        return metadata.getAllCommits().findCommitsAfter(commitTimestamp, Integer.MAX_VALUE);
+    }
+
+    /**
+     * Returns the last successful commit (a successful write operation) into a Hoodie table.
+     */
+    public String latestCommit() {
+        return metadata.getAllCommits().lastCommit();
+    }
+}
--- a/hoodie-client/src/main/java/com/uber/hoodie/HoodieWriteClient.java
+++ b/hoodie-client/src/main/java/com/uber/hoodie/HoodieWriteClient.java
@@ -0,0 +1,556 @@
+/*
+ * Copyright (c) 2016 Uber Technologies, Inc. (hoodie-dev-group@uber.com)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *          http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package com.uber.hoodie;
+
+import com.codahale.metrics.Timer;
+import com.uber.hoodie.common.model.HoodieCommitMetadata;
+import com.uber.hoodie.common.model.HoodieKey;
+import com.uber.hoodie.common.model.HoodieRecord;
+import com.uber.hoodie.common.model.HoodieRecordLocation;
+import com.uber.hoodie.common.model.HoodieRecordPayload;
+import com.uber.hoodie.common.model.HoodieTableMetadata;
+import com.uber.hoodie.common.model.HoodieWriteStat;
+import com.uber.hoodie.common.util.FSUtils;
+import com.uber.hoodie.config.HoodieWriteConfig;
+import com.uber.hoodie.exception.HoodieCommitException;
+import com.uber.hoodie.exception.HoodieIOException;
+import com.uber.hoodie.exception.HoodieInsertException;
+import com.uber.hoodie.exception.HoodieRollbackException;
+import com.uber.hoodie.exception.HoodieUpsertException;
+import com.uber.hoodie.func.InsertMapFunction;
+import com.uber.hoodie.index.HoodieIndex;
+import com.uber.hoodie.io.HoodieCleaner;
+import com.uber.hoodie.io.HoodieCommitArchiveLog;
+import com.uber.hoodie.metrics.HoodieMetrics;
+import com.uber.hoodie.table.HoodieTable;
+import com.uber.hoodie.table.WorkloadProfile;
+
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.PathFilter;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.Accumulator;
+import org.apache.spark.Partitioner;
+import org.apache.spark.SparkConf;
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.api.java.function.FlatMapFunction;
+import org.apache.spark.api.java.function.Function;
+import org.apache.spark.api.java.function.Function2;
+import org.apache.spark.api.java.function.PairFunction;
+import org.apache.spark.api.java.function.VoidFunction;
+import org.apache.spark.storage.StorageLevel;
+
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.charset.StandardCharsets;
+import java.text.ParseException;
+import java.text.SimpleDateFormat;
+import java.util.Collections;
+import java.util.Date;
+import java.util.Iterator;
+import java.util.List;
+
+import scala.Option;
+import scala.Tuple2;
+
+/**
+ * Hoodie Write Client helps you build datasets on HDFS [insert()] and then
+ * perform efficient mutations on a HDFS dataset [upsert()]
+ *
+ * Note that, at any given time, there can only be one Spark job performing
+ * these operatons on a Hoodie dataset.
+ *
+ */
+public class HoodieWriteClient<T extends HoodieRecordPayload> implements Serializable {
+
+    private static Logger logger = LogManager.getLogger(HoodieWriteClient.class);
+    private transient final FileSystem fs;
+    private transient final JavaSparkContext jsc;
+    private final HoodieWriteConfig config;
+    private transient final HoodieMetrics metrics;
+    private transient final HoodieIndex<T> index;
+    private transient final HoodieCommitArchiveLog archiveLog;
+    private transient Timer.Context writeContext = null;
+
+    private final SimpleDateFormat FORMATTER = new SimpleDateFormat("yyyyMMddHHmmss");
+
+    /**
+     * @param jsc
+     * @param clientConfig
+     * @throws Exception
+     */
+    public HoodieWriteClient(JavaSparkContext jsc, HoodieWriteConfig clientConfig) throws Exception {
+        this(jsc, clientConfig, false);
+    }
+
+    /**
+     * @param jsc
+     * @param clientConfig
+     * @param rollbackInFlight
+     * @throws Exception
+     */
+    public HoodieWriteClient(JavaSparkContext jsc, HoodieWriteConfig clientConfig, boolean rollbackInFlight) {
+        this.fs = FSUtils.getFs();
+        this.jsc = jsc;
+        this.config = clientConfig;
+        this.index = HoodieIndex.createIndex(config, jsc);
+        this.metrics = new HoodieMetrics(config, config.getTableName());
+        this.archiveLog = new HoodieCommitArchiveLog(clientConfig);
+        if (rollbackInFlight) {
+            rollbackInflightCommits();
+        }
+    }
+
+    /**
+     * Filter out HoodieRecords that already exists in the output folder. This is useful in
+     * deduplication.
+     *
+     * @param hoodieRecords Input RDD of Hoodie records.
+     * @return A subset of hoodieRecords RDD, with existing records filtered out.
+     */
+    public JavaRDD<HoodieRecord<T>> filterExists(JavaRDD<HoodieRecord<T>> hoodieRecords) {
+        final HoodieTableMetadata metadata =
+                new HoodieTableMetadata(fs, config.getBasePath(), config.getTableName());
+        JavaRDD<HoodieRecord<T>> recordsWithLocation = index.tagLocation(hoodieRecords, metadata);
+        return recordsWithLocation.filter(new Function<HoodieRecord<T>, Boolean>() {
+            @Override
+            public Boolean call(HoodieRecord<T> v1) throws Exception {
+                return !v1.isCurrentLocationKnown();
+            }
+        });
+    }
+
+    /**
+     * Upserts a bunch of new records into the Hoodie table, at the supplied commitTime
+     */
+    public JavaRDD<WriteStatus> upsert(JavaRDD<HoodieRecord<T>> records, final String commitTime) {
+        final HoodieTableMetadata metadata =
+            new HoodieTableMetadata(fs, config.getBasePath(), config.getTableName());
+        writeContext = metrics.getCommitCtx();
+        final HoodieTable table =
+            HoodieTable.getHoodieTable(metadata.getTableType(), commitTime, config, metadata);
+
+        try {
+            // De-dupe/merge if needed
+            JavaRDD<HoodieRecord<T>> dedupedRecords =
+                combineOnCondition(config.shouldCombineBeforeUpsert(), records,
+                    config.getUpsertShuffleParallelism());
+
+            // perform index loop up to get existing location of records
+            JavaRDD<HoodieRecord<T>> taggedRecords = index.tagLocation(dedupedRecords, metadata);
+
+            // Cache the tagged records, so we don't end up computing both
+            taggedRecords.persist(StorageLevel.MEMORY_AND_DISK_SER());
+
+
+            WorkloadProfile profile = null;
+            if (table.isWorkloadProfileNeeded()) {
+                profile = new WorkloadProfile(taggedRecords);
+                logger.info("Workload profile :" + profile);
+            }
+
+            // obtain the upsert partitioner, and the run the tagger records through that & get a partitioned RDD.
+            final Partitioner upsertPartitioner = table.getUpsertPartitioner(profile);
+            JavaRDD<HoodieRecord<T>> partitionedRecords = taggedRecords.mapToPair(
+                new PairFunction<HoodieRecord<T>, Tuple2<HoodieKey, Option<HoodieRecordLocation>>, HoodieRecord<T>>() {
+                    @Override
+                    public Tuple2<Tuple2<HoodieKey, Option<HoodieRecordLocation>>, HoodieRecord<T>> call(
+                        HoodieRecord<T> record) throws Exception {
+                        return new Tuple2<>(new Tuple2<>(record.getKey(),
+                            Option.apply(record.getCurrentLocation())), record);
+                    }
+                }).partitionBy(upsertPartitioner).map(
+                new Function<Tuple2<Tuple2<HoodieKey, Option<HoodieRecordLocation>>, HoodieRecord<T>>, HoodieRecord<T>>() {
+                    @Override
+                    public HoodieRecord<T> call(
+                        Tuple2<Tuple2<HoodieKey, Option<HoodieRecordLocation>>, HoodieRecord<T>> tuple)
+                        throws Exception {
+                        return tuple._2();
+                    }
+                });
+
+
+            // Perform the actual writing.
+            JavaRDD<WriteStatus> upsertStatusRDD = partitionedRecords.mapPartitionsWithIndex(
+                new Function2<Integer, Iterator<HoodieRecord<T>>, Iterator<List<WriteStatus>>>() {
+                    @Override
+                    public Iterator<List<WriteStatus>> call(Integer partition,
+                        Iterator<HoodieRecord<T>> recordItr) throws Exception {
+                        return table.handleUpsertPartition(partition, recordItr, upsertPartitioner);
+                    }
+                }, true).flatMap(new FlatMapFunction<List<WriteStatus>, WriteStatus>() {
+                @Override
+                public Iterable<WriteStatus> call(List<WriteStatus> writeStatuses)
+                    throws Exception {
+                    return writeStatuses;
+                }
+            });
+
+            // Update the index back.
+            JavaRDD<WriteStatus> resultRDD = index.updateLocation(upsertStatusRDD, metadata);
+            resultRDD = resultRDD.persist(config.getWriteStatusStorageLevel());
+            boolean commitResult = commit(commitTime, resultRDD);
+            if (!commitResult) {
+                throw new HoodieCommitException("Failed to commit " + commitTime);
+            }
+            return resultRDD;
+        } catch (Throwable e) {
+            if (e instanceof HoodieUpsertException) {
+                throw (HoodieUpsertException) e;
+            }
+            throw new HoodieUpsertException("Failed to upsert for commit time " + commitTime, e);
+        }
+    }
+
+    private JavaRDD<HoodieRecord<T>> combineOnCondition(boolean condition,
+        JavaRDD<HoodieRecord<T>> records, int parallelism) {
+        if(condition) {
+            return deduplicateRecords(records, parallelism);
+        }
+        return records;
+    }
+
+    /**
+     * Loads the given HoodieRecords, as inserts into the table.
+     * (This implementation uses sortBy and attempts to control the numbers of files with less memory)
+     *
+     * @param records HoodieRecords to insert
+     * @param commitTime Commit Time handle
+     * @return JavaRDD<WriteStatus> - RDD of WriteStatus to inspect errors and counts
+     *
+     */
+    public JavaRDD<WriteStatus> insert(JavaRDD<HoodieRecord<T>> records, final String commitTime) {
+        final HoodieTableMetadata metadata =
+            new HoodieTableMetadata(fs, config.getBasePath(), config.getTableName());
+        writeContext = metrics.getCommitCtx();
+        try {
+            // De-dupe/merge if needed
+            JavaRDD<HoodieRecord<T>> dedupedRecords =
+                combineOnCondition(config.shouldCombineBeforeInsert(), records,
+                    config.getInsertShuffleParallelism());
+
+            // Now, sort the records and line them up nicely for loading.
+            JavaRDD<HoodieRecord<T>> sortedRecords =
+                dedupedRecords.sortBy(new Function<HoodieRecord<T>, String>() {
+                    @Override
+                    public String call(HoodieRecord<T> record) {
+                        // Let's use "partitionPath + key" as the sort key. Spark, will ensure
+                        // the records split evenly across RDD partitions, such that small partitions fit
+                        // into 1 RDD partition, while big ones spread evenly across multiple RDD partitions
+                        return String
+                            .format("%s+%s", record.getPartitionPath(), record.getRecordKey());
+                    }
+                }, true, config.getInsertShuffleParallelism());
+            JavaRDD<WriteStatus> writeStatusRDD = sortedRecords
+                .mapPartitionsWithIndex(new InsertMapFunction<T>(commitTime, config, metadata),
+                    true).flatMap(new FlatMapFunction<List<WriteStatus>, WriteStatus>() {
+                    @Override
+                    public Iterable<WriteStatus> call(List<WriteStatus> writeStatuses)
+                        throws Exception {
+                        return writeStatuses;
+                    }
+                });
+            // Update the index back
+            JavaRDD<WriteStatus> statuses = index.updateLocation(writeStatusRDD, metadata);
+            // Trigger the insert and collect statuses
+            statuses = statuses.persist(config.getWriteStatusStorageLevel());
+            boolean commitResult = commit(commitTime, statuses);
+            if (!commitResult) {
+                throw new HoodieCommitException("Failed to commit " + commitTime);
+            }
+            return statuses;
+        } catch (Throwable e) {
+            if (e instanceof HoodieInsertException) {
+                throw e;
+            }
+            throw new HoodieInsertException("Failed to insert for commit time " + commitTime, e);
+        }
+    }
+
+    /**
+     * Commit changes performed at the given commitTime marker
+     */
+    private boolean commit(String commitTime, JavaRDD<WriteStatus> writeStatuses) {
+        Path commitFile =
+            new Path(config.getBasePath() + "/.hoodie/" + FSUtils.makeCommitFileName(commitTime));
+        try {
+
+            if (fs.exists(commitFile)) {
+                throw new HoodieCommitException("Duplicate commit found. " + commitTime);
+            }
+
+            List<Tuple2<String, HoodieWriteStat>> stats =
+                writeStatuses.mapToPair(new PairFunction<WriteStatus, String, HoodieWriteStat>() {
+                    @Override
+                    public Tuple2<String, HoodieWriteStat> call(WriteStatus writeStatus)
+                        throws Exception {
+                        return new Tuple2<>(writeStatus.getPartitionPath(), writeStatus.getStat());
+                    }
+                }).collect();
+
+            HoodieCommitMetadata metadata = new HoodieCommitMetadata();
+            for (Tuple2<String, HoodieWriteStat> stat : stats) {
+                metadata.addWriteStat(stat._1(), stat._2());
+            }
+
+            // open a new file and write the commit metadata in
+            Path inflightCommitFile = new Path(config.getBasePath() + "/.hoodie/" + FSUtils
+                .makeInflightCommitFileName(commitTime));
+            FSDataOutputStream fsout = fs.create(inflightCommitFile, true);
+            fsout.writeBytes(new String(metadata.toJsonString().getBytes(StandardCharsets.UTF_8),
+                StandardCharsets.UTF_8));
+            fsout.close();
+
+            boolean success = fs.rename(inflightCommitFile, commitFile);
+            if (success) {
+                // We cannot have unbounded commit files. Archive commits if we have to archive
+                archiveLog.archiveIfRequired();
+                // Call clean to cleanup if there is anything to cleanup after the commit,
+                clean();
+                if(writeContext != null) {
+                    long durationInMs = metrics.getDurationInMs(writeContext.stop());
+                    metrics.updateCommitMetrics(FORMATTER.parse(commitTime).getTime(), durationInMs,
+                        metadata);
+                    writeContext = null;
+                }
+            }
+            return success;
+        } catch (IOException e) {
+            throw new HoodieCommitException(
+                "Failed to commit " + config.getBasePath() + " at time " + commitTime, e);
+        } catch (ParseException e) {
+            throw new HoodieCommitException(
+                "Commit time is not of valid format.Failed to commit " + config.getBasePath()
+                    + " at time " + commitTime, e);
+        }
+    }
+
+    /**
+     * Rollback the (inflight/committed) record changes with the given commit time.
+     * Three steps:
+     * (0) Obtain the commit or rollback file
+     * (1) clean indexing data,
+     * (2) clean new generated parquet files.
+     * (3) Finally delete .commit or .inflight file,
+     */
+    public boolean rollback(final String commitTime) throws HoodieRollbackException {
+
+        final Timer.Context context = metrics.getRollbackCtx();
+        final HoodieTableMetadata metadata =
+                new HoodieTableMetadata(fs, config.getBasePath(), config.getTableName());
+        final String metaPath = config.getBasePath() + "/" + HoodieTableMetadata.METAFOLDER_NAME;
+        try {
+            // 0. Obtain the commit/.inflight file, to work on
+            FileStatus[] commitFiles =
+                    fs.globStatus(new Path(metaPath + "/" + commitTime + ".*"));
+            if (commitFiles.length != 1) {
+                throw new HoodieRollbackException("Expected exactly one .commit or .inflight file for commitTime: " + commitTime);
+            }
+
+            // we first need to unpublish the commit by making it .inflight again. (this will ensure no future queries see this data)
+            Path filePath = commitFiles[0].getPath();
+            if (filePath.getName().endsWith(HoodieTableMetadata.COMMIT_FILE_SUFFIX)) {
+                if (metadata.findCommitsAfter(commitTime, Integer.MAX_VALUE).size() > 0) {
+                    throw new HoodieRollbackException("Found commits after time :" + commitTime +
+                            ", please rollback greater commits first");
+                }
+                Path newInflightPath = new Path(metaPath + "/" + commitTime + HoodieTableMetadata.INFLIGHT_FILE_SUFFIX);
+                if (!fs.rename(filePath, newInflightPath)) {
+                    throw new HoodieRollbackException("Unable to rename .commit file to .inflight for commitTime:" + commitTime);
+                }
+                filePath = newInflightPath;
+            }
+
+            // 1. Revert the index changes
+            logger.info("Clean out index changes at time: " + commitTime);
+            if (!index.rollbackCommit(commitTime)) {
+                throw new HoodieRollbackException("Clean out index changes failed, for time :" + commitTime);
+            }
+
+            // 2. Delete the new generated parquet files
+            logger.info("Clean out all parquet files generated at time: " + commitTime);
+            final Accumulator<Integer> numFilesDeletedAccu = jsc.accumulator(0);
+            jsc.parallelize(FSUtils.getAllPartitionPaths(fs, metadata.getBasePath()))
+                    .foreach(new VoidFunction<String>() {
+                        @Override
+                        public void call(String partitionPath) throws Exception {
+                            // Scan all partitions files with this commit time
+                            FileSystem fs = FSUtils.getFs();
+                            FileStatus[] toBeDeleted =
+                                    fs.listStatus(new Path(config.getBasePath(), partitionPath),
+                                            new PathFilter() {
+                                                @Override
+                                                public boolean accept(Path path) {
+                                                    return commitTime
+                                                            .equals(FSUtils.getCommitTime(path.getName()));
+                                                }
+                                            });
+                            for (FileStatus file : toBeDeleted) {
+                                boolean success = fs.delete(file.getPath(), false);
+                                logger.info("Delete file " + file.getPath() + "\t" + success);
+                                if (success) {
+                                    numFilesDeletedAccu.add(1);
+                                }
+                            }
+                        }
+                    });
+
+            // 3. Clean out metadata (.commit or .tmp)
+            logger.info("Clean out metadata files at time: " + commitTime);
+            if (!fs.delete(filePath, false)) {
+                logger.error("Deleting file " + filePath + " failed.");
+                throw new HoodieRollbackException("Delete file " + filePath + " failed.");
+            }
+
+            if (context != null) {
+                long durationInMs = metrics.getDurationInMs(context.stop());
+                int numFilesDeleted = numFilesDeletedAccu.value();
+                metrics.updateRollbackMetrics(durationInMs, numFilesDeleted);
+            }
+
+            return true;
+        } catch (IOException e) {
+            throw new HoodieRollbackException("Failed to rollback " +
+                    config.getBasePath() + " at commit time" + commitTime, e);
+        }
+    }
+
+    /**
+     * Releases any resources used by the client.
+     */
+    public void close() {
+        // UNDER CONSTRUCTION
+    }
+
+    /**
+     * Clean up any stale/old files/data lying around (either on file storage or index storage)
+     */
+    private void clean() throws HoodieIOException  {
+        try {
+            logger.info("Cleaner started");
+            final Timer.Context context = metrics.getCleanCtx();
+            final HoodieTableMetadata metadata = new HoodieTableMetadata(fs, config.getBasePath(), config.getTableName());
+            List<String> partitionsToClean = FSUtils.getAllPartitionPaths(fs, metadata.getBasePath());
+            // shuffle to distribute cleaning work across partitions evenly
+            Collections.shuffle(partitionsToClean);
+            logger.info("Partitions to clean up : " + partitionsToClean + ", with policy " + config.getCleanerPolicy());
+            if(partitionsToClean.isEmpty()) {
+                logger.info("Nothing to clean here mom. It is already clean");
+                return;
+            }
+
+            int cleanerParallelism = Math.min(partitionsToClean.size(), config.getCleanerParallelism());
+            int numFilesDeleted = jsc.parallelize(partitionsToClean, cleanerParallelism)
+                .map(new Function<String, Integer>() {
+                    @Override
+                    public Integer call(String partitionPathToClean) throws Exception {
+                        FileSystem fs = FSUtils.getFs();
+                        HoodieCleaner cleaner = new HoodieCleaner(metadata, config, fs);
+                        return cleaner.clean(partitionPathToClean);
+                    }
+                }).reduce(new Function2<Integer, Integer, Integer>() {
+                    @Override
+                    public Integer call(Integer v1, Integer v2) throws Exception {
+                        return v1 + v2;
+                    }
+                });
+            logger.info("Cleaned " + numFilesDeleted + " files");
+            // Emit metrics (duration, numFilesDeleted) if needed
+            if (context != null) {
+                long durationInMs = metrics.getDurationInMs(context.stop());
+                logger.info("cleanerElaspsedTime (Minutes): " + durationInMs / (1000 * 60));
+                metrics.updateCleanMetrics(durationInMs, numFilesDeleted);
+            }
+        } catch (IOException e) {
+            throw new HoodieIOException("Failed to clean up after commit", e);
+        }
+    }
+
+    /**
+     * Provides a new commit time for a write operation (insert/update)
+     */
+    public String startCommit() {
+        String commitTime = FORMATTER.format(new Date());
+        startCommitWithTime(commitTime);
+        return commitTime;
+    }
+
+    public void startCommitWithTime(String commitTime) {
+        logger.info("Generate a new commit time " + commitTime);
+        // Create the in-flight commit file
+        Path inflightCommitFilePath = new Path(
+            config.getBasePath() + "/.hoodie/" + FSUtils.makeInflightCommitFileName(commitTime));
+        try {
+            if (fs.createNewFile(inflightCommitFilePath)) {
+                logger.info("Create an inflight commit file " + inflightCommitFilePath);
+                return;
+            }
+            throw new HoodieCommitException(
+                "Failed to create the inflight commit file " + inflightCommitFilePath);
+        } catch (IOException e) {
+            // handled below
+            throw new HoodieCommitException(
+                "Failed to create the inflight commit file " + inflightCommitFilePath, e);
+        }
+    }
+
+    public static SparkConf registerClasses(SparkConf conf) {
+        conf.registerKryoClasses(new Class[]{HoodieWriteConfig.class, HoodieRecord.class, HoodieKey.class});
+        return conf;
+    }
+
+    /**
+     * Deduplicate Hoodie records, using the given deduplication funciton.
+     */
+    private JavaRDD<HoodieRecord<T>> deduplicateRecords(JavaRDD<HoodieRecord<T>> records, int parallelism) {
+        return records.mapToPair(new PairFunction<HoodieRecord<T>, HoodieKey, HoodieRecord<T>>() {
+            @Override
+            public Tuple2<HoodieKey, HoodieRecord<T>> call(HoodieRecord<T> record) {
+                return new Tuple2<>(record.getKey(), record);
+            }
+        }).reduceByKey(new Function2<HoodieRecord<T>, HoodieRecord<T>, HoodieRecord<T>>() {
+            @Override
+            public HoodieRecord<T> call(HoodieRecord<T> rec1, HoodieRecord<T> rec2) {
+                @SuppressWarnings("unchecked")
+                T reducedData = (T) rec1.getData().preCombine(rec2.getData());
+                // we cannot allow the user to change the key or partitionPath, since that will affect everything
+                // so pick it from one of the records.
+                return new HoodieRecord<T>(rec1.getKey(), reducedData);
+            }
+        }, parallelism).map(new Function<Tuple2<HoodieKey, HoodieRecord<T>>, HoodieRecord<T>>() {
+            @Override
+            public HoodieRecord<T> call(Tuple2<HoodieKey, HoodieRecord<T>> recordTuple) {
+                return recordTuple._2();
+            }
+        });
+    }
+
+    /**
+     * Cleanup all inflight commits
+     * @throws IOException
+     */
+    private void rollbackInflightCommits() {
+        final HoodieTableMetadata metadata = new HoodieTableMetadata(fs, config.getBasePath(), config.getTableName());
+        for (String commit : metadata.getAllInflightCommits()) {
+            rollback(commit);
+        }
+    }
+}
--- a/hoodie-client/src/main/java/com/uber/hoodie/WriteStatus.java
+++ b/hoodie-client/src/main/java/com/uber/hoodie/WriteStatus.java
@@ -0,0 +1,133 @@
+/*
+ * Copyright (c) 2016 Uber Technologies, Inc. (hoodie-dev-group@uber.com)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *          http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package com.uber.hoodie;
+
+import com.uber.hoodie.common.model.HoodieKey;
+import com.uber.hoodie.common.model.HoodieRecord;
+import com.uber.hoodie.common.model.HoodieWriteStat;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+
+/**
+ * Status of a write operation.
+ */
+public class WriteStatus implements Serializable {
+
+    private final HashMap<HoodieKey, Throwable> errors = new HashMap<>();
+
+    private final List<HoodieRecord> writtenRecords = new ArrayList<>();
+
+    private final List<HoodieRecord> failedRecords  = new ArrayList<>();
+
+    private Throwable globalError = null;
+
+    private String fileId = null;
+
+    private String partitionPath = null;
+
+    private HoodieWriteStat stat = null;
+
+    private long totalRecords = 0;
+    private long totalErrorRecords = 0;
+
+    public void markSuccess(HoodieRecord record) {
+        writtenRecords.add(record);
+        totalRecords++;
+    }
+
+    public void markFailure(HoodieRecord record, Throwable t) {
+        failedRecords.add(record);
+        errors.put(record.getKey(), t);
+        totalRecords++;
+        totalErrorRecords++;
+    }
+
+    public String getFileId() {
+        return fileId;
+    }
+
+    public void setFileId(String fileId) {
+        this.fileId = fileId;
+    }
+
+    public boolean hasErrors() {
+        return totalErrorRecords > 0;
+    }
+
+    public boolean isErrored(HoodieKey key) {
+        return errors.containsKey(key);
+    }
+
+    public HashMap<HoodieKey, Throwable> getErrors() {
+        return errors;
+    }
+
+    public boolean hasGlobalError() {
+        return globalError != null;
+    }
+
+    public void setGlobalError(Throwable t) {
+        this.globalError = t;
+    }
+
+    public Throwable getGlobalError() {
+        return this.globalError;
+    }
+
+    public List<HoodieRecord> getWrittenRecords() {
+        return writtenRecords;
+    }
+
+    public List<HoodieRecord> getFailedRecords() {
+        return failedRecords;
+    }
+
+    public HoodieWriteStat getStat() {
+        return stat;
+    }
+
+    public void setStat(HoodieWriteStat stat) {
+        this.stat = stat;
+    }
+
+    public String getPartitionPath() {
+        return partitionPath;
+    }
+
+    public void setPartitionPath(String partitionPath) {
+        this.partitionPath = partitionPath;
+    }
+
+    public long getTotalRecords() {
+        return totalRecords;
+    }
+
+    @Override
+    public String toString() {
+        final StringBuilder sb = new StringBuilder("WriteStatus {");
+        sb.append("fileId=").append(fileId);
+        sb.append(", globalError='").append(globalError).append('\'');
+        sb.append(", hasErrors='").append(hasErrors()).append('\'');
+        sb.append(", errorCount='").append(totalErrorRecords).append('\'');
+        sb.append(", errorPct='").append((100.0 * totalErrorRecords) / totalRecords).append('\'');
+        sb.append('}');
+        return sb.toString();
+    }
+}
--- a/hoodie-client/src/main/java/com/uber/hoodie/config/DefaultHoodieConfig.java
+++ b/hoodie-client/src/main/java/com/uber/hoodie/config/DefaultHoodieConfig.java
@@ -0,0 +1,49 @@
+/*
+ * Copyright (c) 2016 Uber Technologies, Inc. (hoodie-dev-group@uber.com)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *          http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package com.uber.hoodie.config;
+
+import java.io.Serializable;
+import java.util.Map;
+import java.util.Properties;
+
+/**
+ * Default Way to load Hoodie config through a java.util.Properties
+ */
+public class DefaultHoodieConfig implements Serializable {
+    protected final Properties props;
+    public DefaultHoodieConfig(Properties props) {
+        this.props = props;
+    }
+
+    public Properties getProps() {
+        return props;
+    }
+
+    public static void setDefaultOnCondition(Properties props, boolean condition, String propName,
+        String defaultValue) {
+        if (condition) {
+            props.setProperty(propName, defaultValue);
+        }
+    }
+
+    public static void setDefaultOnCondition(Properties props, boolean condition, DefaultHoodieConfig config) {
+        if (condition) {
+            props.putAll(config.getProps());
+        }
+    }
+
+}
--- a/hoodie-client/src/main/java/com/uber/hoodie/config/HoodieCompactionConfig.java
+++ b/hoodie-client/src/main/java/com/uber/hoodie/config/HoodieCompactionConfig.java
@@ -0,0 +1,175 @@
+/*
+ * Copyright (c) 2016 Uber Technologies, Inc. (hoodie-dev-group@uber.com)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *          http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package com.uber.hoodie.config;
+
+import com.google.common.base.Preconditions;
+import com.uber.hoodie.io.HoodieCleaner;
+
+import javax.annotation.concurrent.Immutable;
+import java.io.File;
+import java.io.FileReader;
+import java.io.IOException;
+import java.util.Properties;
+
+/**
+ * Compaction related config
+ */
+@Immutable
+public class HoodieCompactionConfig extends DefaultHoodieConfig {
+    public static final String CLEANER_POLICY_PROP = "hoodie.cleaner.policy";
+    private static final String DEFAULT_CLEANER_POLICY =
+        HoodieCleaner.CleaningPolicy.KEEP_LATEST_COMMITS.name();
+
+    public static final String CLEANER_FILE_VERSIONS_RETAINED_PROP =
+        "hoodie.cleaner.fileversions.retained";
+    private static final String DEFAULT_CLEANER_FILE_VERSIONS_RETAINED = "3";
+
+    public static final String CLEANER_COMMITS_RETAINED_PROP = "hoodie.cleaner.commits.retained";
+    private static final String DEFAULT_CLEANER_COMMITS_RETAINED = "24";
+
+    public static final String MAX_COMMITS_TO_KEEP = "hoodie.keep.max.commits";
+    private static final String DEFAULT_MAX_COMMITS_TO_KEEP = String.valueOf(128);
+    public static final String MIN_COMMITS_TO_KEEP = "hoodie.keep.min.commits";
+    private static final String DEFAULT_MIN_COMMITS_TO_KEEP = String.valueOf(96);
+    // Upsert uses this file size to compact new data onto existing files..
+    public static final String PARQUET_SMALL_FILE_LIMIT_BYTES = "hoodie.parquet.small.file.limit";
+    // Turned off by default
+    public static final String DEFAULT_PARQUET_SMALL_FILE_LIMIT_BYTES = String.valueOf(0);
+
+
+    /** Configs related to specific table types **/
+    // Number of inserts, that will be put each partition/bucket for writing
+    public static final String COPY_ON_WRITE_TABLE_INSERT_SPLIT_SIZE = "hoodie.copyonwrite.insert.split.size";
+    // The rationale to pick the insert parallelism is the following. Writing out 100MB files,
+    // with atleast 1kb records, means 100K records per file. we just overprovision to 500K
+    public static final String DEFAULT_COPY_ON_WRITE_TABLE_INSERT_SPLIT_SIZE = String.valueOf(500000);
+
+    // Config to control whether we control insert split sizes automatically based on average record sizes
+    public static final String COPY_ON_WRITE_TABLE_AUTO_SPLIT_INSERTS = "hoodie.copyonwrite.insert.auto.split";
+    // its off by default
+    public static final String DEFAULT_COPY_ON_WRITE_TABLE_AUTO_SPLIT_INSERTS = String.valueOf(false);
+
+
+    // This value is used as a guessimate for the record size, if we can't determine this from previous commits
+    public static final String COPY_ON_WRITE_TABLE_RECORD_SIZE_ESTIMATE = "hoodie.copyonwrite.record.size.estimate";
+    // Used to determine how much more can be packed into a small file, before it exceeds the size limit.
+    public static final String DEFAULT_COPY_ON_WRITE_TABLE_RECORD_SIZE_ESTIMATE = String.valueOf(1024);
+
+    public static final String CLEANER_PARALLELISM = "hoodie.cleaner.parallelism";
+    public static final String DEFAULT_CLEANER_PARALLELISM = String.valueOf(200);
+
+
+    private HoodieCompactionConfig(Properties props) {
+        super(props);
+    }
+
+    public static HoodieCompactionConfig.Builder newBuilder() {
+        return new Builder();
+    }
+
+    public static class Builder {
+        private final Properties props = new Properties();
+
+        public Builder fromFile(File propertiesFile) throws IOException {
+            FileReader reader = new FileReader(propertiesFile);
+            try {
+                this.props.load(reader);
+                return this;
+            } finally {
+                reader.close();
+            }
+        }
+
+        public Builder withCleanerPolicy(HoodieCleaner.CleaningPolicy policy) {
+            props.setProperty(CLEANER_POLICY_PROP, policy.name());
+            return this;
+        }
+
+        public Builder retainFileVersions(int fileVersionsRetained) {
+            props.setProperty(CLEANER_FILE_VERSIONS_RETAINED_PROP,
+                String.valueOf(fileVersionsRetained));
+            return this;
+        }
+
+        public Builder retainCommits(int commitsRetained) {
+            props.setProperty(CLEANER_COMMITS_RETAINED_PROP, String.valueOf(commitsRetained));
+            return this;
+        }
+
+        public Builder archiveCommitsWith(int minToKeep, int maxToKeep) {
+            props.setProperty(MIN_COMMITS_TO_KEEP, String.valueOf(minToKeep));
+            props.setProperty(MAX_COMMITS_TO_KEEP, String.valueOf(maxToKeep));
+            return this;
+        }
+
+        public Builder compactionSmallFileSize(long smallFileLimitBytes) {
+            props.setProperty(PARQUET_SMALL_FILE_LIMIT_BYTES, String.valueOf(smallFileLimitBytes));
+            return this;
+        }
+
+        public Builder insertSplitSize(int insertSplitSize) {
+            props.setProperty(COPY_ON_WRITE_TABLE_INSERT_SPLIT_SIZE, String.valueOf(insertSplitSize));
+            return this;
+        }
+
+        public Builder autoTuneInsertSplits(boolean autoTuneInsertSplits) {
+            props.setProperty(COPY_ON_WRITE_TABLE_AUTO_SPLIT_INSERTS, String.valueOf(autoTuneInsertSplits));
+            return this;
+        }
+
+        public Builder approxRecordSize(int recordSizeEstimate) {
+            props.setProperty(COPY_ON_WRITE_TABLE_RECORD_SIZE_ESTIMATE, String.valueOf(recordSizeEstimate));
+            return this;
+        }
+
+        public Builder withCleanerParallelism(int cleanerParallelism) {
+            props.setProperty(CLEANER_PARALLELISM, String.valueOf(cleanerParallelism));
+            return this;
+        }
+
+        public HoodieCompactionConfig build() {
+            HoodieCompactionConfig config = new HoodieCompactionConfig(props);
+            setDefaultOnCondition(props, !props.containsKey(CLEANER_POLICY_PROP),
+                CLEANER_POLICY_PROP, DEFAULT_CLEANER_POLICY);
+            setDefaultOnCondition(props, !props.containsKey(CLEANER_FILE_VERSIONS_RETAINED_PROP),
+                CLEANER_FILE_VERSIONS_RETAINED_PROP, DEFAULT_CLEANER_FILE_VERSIONS_RETAINED);
+            setDefaultOnCondition(props, !props.containsKey(CLEANER_COMMITS_RETAINED_PROP),
+                CLEANER_COMMITS_RETAINED_PROP, DEFAULT_CLEANER_COMMITS_RETAINED);
+            setDefaultOnCondition(props, !props.containsKey(MAX_COMMITS_TO_KEEP),
+                MAX_COMMITS_TO_KEEP, DEFAULT_MAX_COMMITS_TO_KEEP);
+            setDefaultOnCondition(props, !props.containsKey(MIN_COMMITS_TO_KEEP),
+                MIN_COMMITS_TO_KEEP, DEFAULT_MIN_COMMITS_TO_KEEP);
+            setDefaultOnCondition(props, !props.containsKey(PARQUET_SMALL_FILE_LIMIT_BYTES),
+                PARQUET_SMALL_FILE_LIMIT_BYTES, DEFAULT_PARQUET_SMALL_FILE_LIMIT_BYTES);
+            setDefaultOnCondition(props, !props.containsKey(COPY_ON_WRITE_TABLE_INSERT_SPLIT_SIZE),
+                COPY_ON_WRITE_TABLE_INSERT_SPLIT_SIZE, DEFAULT_COPY_ON_WRITE_TABLE_INSERT_SPLIT_SIZE);
+            setDefaultOnCondition(props, !props.containsKey(COPY_ON_WRITE_TABLE_AUTO_SPLIT_INSERTS),
+                COPY_ON_WRITE_TABLE_AUTO_SPLIT_INSERTS, DEFAULT_COPY_ON_WRITE_TABLE_AUTO_SPLIT_INSERTS);
+            setDefaultOnCondition(props, !props.containsKey(COPY_ON_WRITE_TABLE_RECORD_SIZE_ESTIMATE),
+                COPY_ON_WRITE_TABLE_RECORD_SIZE_ESTIMATE, DEFAULT_COPY_ON_WRITE_TABLE_RECORD_SIZE_ESTIMATE);
+            setDefaultOnCondition(props, !props.containsKey(CLEANER_PARALLELISM),
+                CLEANER_PARALLELISM, DEFAULT_CLEANER_PARALLELISM);
+
+            HoodieCleaner.CleaningPolicy.valueOf(props.getProperty(CLEANER_POLICY_PROP));
+            Preconditions.checkArgument(
+                Integer.parseInt(props.getProperty(MAX_COMMITS_TO_KEEP)) > Integer
+                    .parseInt(props.getProperty(MIN_COMMITS_TO_KEEP)));
+            return config;
+        }
+
+    }
+}
--- a/hoodie-client/src/main/java/com/uber/hoodie/config/HoodieIndexConfig.java
+++ b/hoodie-client/src/main/java/com/uber/hoodie/config/HoodieIndexConfig.java
@@ -0,0 +1,107 @@
+/*
+ * Copyright (c) 2016 Uber Technologies, Inc. (hoodie-dev-group@uber.com)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *          http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package com.uber.hoodie.config;
+
+import com.google.common.base.Preconditions;
+import com.uber.hoodie.index.HoodieIndex;
+
+import javax.annotation.concurrent.Immutable;
+import java.io.File;
+import java.io.FileReader;
+import java.io.IOException;
+import java.util.Properties;
+
+/**
+ * Indexing related config
+ */
+@Immutable
+public class HoodieIndexConfig extends DefaultHoodieConfig {
+    public static final String INDEX_TYPE_PROP = "hoodie.index.type";
+    public static final String DEFAULT_INDEX_TYPE = HoodieIndex.IndexType.BLOOM.name();
+    public static final String BLOOM_FILTER_NUM_ENTRIES = "hoodie.index.bloom.num_entries";
+    public static final String DEFAULT_BLOOM_FILTER_NUM_ENTRIES = "60000";
+    public static final String BLOOM_FILTER_FPP = "hoodie.index.bloom.fpp";
+    public static final String DEFAULT_BLOOM_FILTER_FPP = "0.000000001";
+    public final static String HBASE_ZKQUORUM_PROP = "hoodie.index.hbase.zkquorum";
+    public final static String HBASE_ZKPORT_PROP = "hoodie.index.hbase.zkport";
+    public final static String HBASE_TABLENAME_PROP = "hoodie.index.hbase.table";
+
+    private HoodieIndexConfig(Properties props) {
+        super(props);
+    }
+
+    public static HoodieIndexConfig.Builder newBuilder() {
+        return new Builder();
+    }
+
+    public static class Builder {
+        private final Properties props = new Properties();
+
+        public Builder fromFile(File propertiesFile) throws IOException {
+            FileReader reader = new FileReader(propertiesFile);
+            try {
+                this.props.load(reader);
+                return this;
+            } finally {
+                reader.close();
+            }
+        }
+
+        public Builder withIndexType(HoodieIndex.IndexType indexType) {
+            props.setProperty(INDEX_TYPE_PROP, indexType.name());
+            return this;
+        }
+
+        public Builder bloomFilterNumEntries(int numEntries) {
+            props.setProperty(BLOOM_FILTER_NUM_ENTRIES, String.valueOf(numEntries));
+            return this;
+        }
+
+        public Builder bloomFilterFPP(double fpp) {
+            props.setProperty(BLOOM_FILTER_FPP, String.valueOf(fpp));
+            return this;
+        }
+
+        public Builder hbaseZkQuorum(String zkString) {
+            props.setProperty(HBASE_ZKQUORUM_PROP, zkString);
+            return this;
+        }
+
+        public Builder hbaseZkPort(int port) {
+            props.setProperty(HBASE_ZKPORT_PROP, String.valueOf(port));
+            return this;
+        }
+
+        public Builder hbaseTableName(String tableName) {
+            props.setProperty(HBASE_TABLENAME_PROP, tableName);
+            return this;
+        }
+
+        public HoodieIndexConfig build() {
+            HoodieIndexConfig config = new HoodieIndexConfig(props);
+            setDefaultOnCondition(props, !props.containsKey(INDEX_TYPE_PROP),
+                INDEX_TYPE_PROP, DEFAULT_INDEX_TYPE);
+            setDefaultOnCondition(props, !props.containsKey(BLOOM_FILTER_NUM_ENTRIES),
+                BLOOM_FILTER_NUM_ENTRIES, DEFAULT_BLOOM_FILTER_NUM_ENTRIES);
+            setDefaultOnCondition(props, !props.containsKey(BLOOM_FILTER_FPP),
+                BLOOM_FILTER_FPP, DEFAULT_BLOOM_FILTER_FPP);
+            // Throws IllegalArgumentException if the value set is not a known Hoodie Index Type
+            HoodieIndex.IndexType.valueOf(props.getProperty(INDEX_TYPE_PROP));
+            return config;
+        }
+    }
+}
--- a/hoodie-client/src/main/java/com/uber/hoodie/config/HoodieMetricsConfig.java
+++ b/hoodie-client/src/main/java/com/uber/hoodie/config/HoodieMetricsConfig.java
@@ -0,0 +1,112 @@
+/*
+ * Copyright (c) 2016 Uber Technologies, Inc. (hoodie-dev-group@uber.com)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *          http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package com.uber.hoodie.config;
+
+import com.uber.hoodie.metrics.MetricsReporterType;
+
+import javax.annotation.concurrent.Immutable;
+import java.io.File;
+import java.io.FileReader;
+import java.io.IOException;
+import java.util.Properties;
+
+/**
+ * Fetch the configurations used by the Metrics system.
+ */
+@Immutable
+public class HoodieMetricsConfig extends DefaultHoodieConfig {
+
+    public final static String METRIC_PREFIX = "hoodie.metrics";
+    public final static String METRICS_ON = METRIC_PREFIX + ".on";
+    public final static boolean DEFAULT_METRICS_ON = false;
+    public final static String METRICS_REPORTER_TYPE = METRIC_PREFIX + ".reporter.type";
+    public final static MetricsReporterType DEFAULT_METRICS_REPORTER_TYPE =
+        MetricsReporterType.GRAPHITE;
+
+    // Graphite
+    public final static String GRAPHITE_PREFIX = METRIC_PREFIX + ".graphite";
+    public final static String GRAPHITE_SERVER_HOST = GRAPHITE_PREFIX + ".host";
+    public final static String DEFAULT_GRAPHITE_SERVER_HOST = "localhost";
+
+    public final static String GRAPHITE_SERVER_PORT = GRAPHITE_PREFIX + ".port";
+    public final static int DEFAULT_GRAPHITE_SERVER_PORT = 4756;
+
+    public final static String GRAPHITE_METRIC_PREFIX = GRAPHITE_PREFIX + ".metric.prefix";
+
+    private HoodieMetricsConfig(Properties props) {
+        super(props);
+    }
+
+    public static HoodieMetricsConfig.Builder newBuilder() {
+        return new Builder();
+    }
+
+    public static class Builder {
+        private final Properties props = new Properties();
+
+        public Builder fromFile(File propertiesFile) throws IOException {
+            FileReader reader = new FileReader(propertiesFile);
+            try {
+                this.props.load(reader);
+                return this;
+            } finally {
+                reader.close();
+            }
+        }
+
+        public Builder on(boolean metricsOn) {
+            props.setProperty(METRICS_ON, String.valueOf(metricsOn));
+            return this;
+        }
+
+        public Builder withReporterType(String reporterType) {
+            props.setProperty(METRICS_REPORTER_TYPE, reporterType);
+            return this;
+        }
+
+        public Builder toGraphiteHost(String host) {
+            props.setProperty(GRAPHITE_SERVER_HOST, host);
+            return this;
+        }
+
+        public Builder onGraphitePort(int port) {
+            props.setProperty(GRAPHITE_SERVER_PORT, String.valueOf(port));
+            return this;
+        }
+
+        public Builder usePrefix(String prefix) {
+            props.setProperty(GRAPHITE_METRIC_PREFIX, prefix);
+            return this;
+        }
+
+        public HoodieMetricsConfig build() {
+            HoodieMetricsConfig config = new HoodieMetricsConfig(props);
+            setDefaultOnCondition(props, !props.containsKey(METRICS_ON), METRICS_ON,
+                String.valueOf(DEFAULT_METRICS_ON));
+            setDefaultOnCondition(props, !props.containsKey(METRICS_REPORTER_TYPE),
+                METRICS_REPORTER_TYPE, DEFAULT_METRICS_REPORTER_TYPE.name());
+            setDefaultOnCondition(props, !props.containsKey(GRAPHITE_SERVER_HOST),
+                GRAPHITE_SERVER_HOST, DEFAULT_GRAPHITE_SERVER_HOST);
+            setDefaultOnCondition(props, !props.containsKey(GRAPHITE_SERVER_PORT),
+                GRAPHITE_SERVER_PORT, String.valueOf(DEFAULT_GRAPHITE_SERVER_PORT));
+            setDefaultOnCondition(props, !props.containsKey(GRAPHITE_SERVER_PORT),
+                GRAPHITE_SERVER_PORT, String.valueOf(DEFAULT_GRAPHITE_SERVER_PORT));
+            return config;
+        }
+    }
+
+}
--- a/hoodie-client/src/main/java/com/uber/hoodie/config/HoodieStorageConfig.java
+++ b/hoodie-client/src/main/java/com/uber/hoodie/config/HoodieStorageConfig.java
@@ -0,0 +1,85 @@
+/*
+ * Copyright (c) 2016 Uber Technologies, Inc. (hoodie-dev-group@uber.com)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *          http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package com.uber.hoodie.config;
+
+import javax.annotation.concurrent.Immutable;
+import java.io.File;
+import java.io.FileReader;
+import java.io.IOException;
+import java.util.Properties;
+
+/**
+ * Storage related config
+ */
+@Immutable
+public class HoodieStorageConfig extends DefaultHoodieConfig {
+    public static final String PARQUET_FILE_MAX_BYTES = "hoodie.parquet.max.file.size";
+    public static final String DEFAULT_PARQUET_FILE_MAX_BYTES = String.valueOf(120 * 1024 * 1024);
+    public static final String PARQUET_BLOCK_SIZE_BYTES = "hoodie.parquet.block.size";
+    public static final String DEFAULT_PARQUET_BLOCK_SIZE_BYTES = DEFAULT_PARQUET_FILE_MAX_BYTES;
+    public static final String PARQUET_PAGE_SIZE_BYTES = "hoodie.parquet.page.size";
+    public static final String DEFAULT_PARQUET_PAGE_SIZE_BYTES = String.valueOf(1 * 1024 * 1024);
+
+    private HoodieStorageConfig(Properties props) {
+        super(props);
+    }
+
+    public static HoodieStorageConfig.Builder newBuilder() {
+        return new Builder();
+    }
+
+    public static class Builder {
+        private final Properties props = new Properties();
+
+        public Builder fromFile(File propertiesFile) throws IOException {
+            FileReader reader = new FileReader(propertiesFile);
+            try {
+                this.props.load(reader);
+                return this;
+            } finally {
+                reader.close();
+            }
+        }
+
+        public Builder limitFileSize(int maxFileSize) {
+            props.setProperty(PARQUET_FILE_MAX_BYTES, String.valueOf(maxFileSize));
+            return this;
+        }
+
+        public Builder parquetBlockSize(int blockSize) {
+            props.setProperty(PARQUET_BLOCK_SIZE_BYTES, String.valueOf(blockSize));
+            return this;
+        }
+
+        public Builder parquetPageSize(int pageSize) {
+            props.setProperty(PARQUET_PAGE_SIZE_BYTES, String.valueOf(pageSize));
+            return this;
+        }
+
+        public HoodieStorageConfig build() {
+            HoodieStorageConfig config = new HoodieStorageConfig(props);
+            setDefaultOnCondition(props, !props.containsKey(PARQUET_FILE_MAX_BYTES),
+                PARQUET_FILE_MAX_BYTES, DEFAULT_PARQUET_FILE_MAX_BYTES);
+            setDefaultOnCondition(props, !props.containsKey(PARQUET_BLOCK_SIZE_BYTES),
+                PARQUET_BLOCK_SIZE_BYTES, DEFAULT_PARQUET_BLOCK_SIZE_BYTES);
+            setDefaultOnCondition(props, !props.containsKey(PARQUET_PAGE_SIZE_BYTES),
+                PARQUET_PAGE_SIZE_BYTES, DEFAULT_PARQUET_PAGE_SIZE_BYTES);
+            return config;
+        }
+    }
+
+}
--- a/hoodie-client/src/main/java/com/uber/hoodie/config/HoodieWriteConfig.java
+++ b/hoodie-client/src/main/java/com/uber/hoodie/config/HoodieWriteConfig.java
@@ -0,0 +1,308 @@
+/*
+ * Copyright (c) 2016 Uber Technologies, Inc. (hoodie-dev-group@uber.com)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *          http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package com.uber.hoodie.config;
+
+
+import com.google.common.base.Preconditions;
+import com.uber.hoodie.index.HoodieIndex;
+import com.uber.hoodie.io.HoodieCleaner;
+import com.uber.hoodie.metrics.MetricsReporterType;
+import org.apache.spark.storage.StorageLevel;
+
+import javax.annotation.concurrent.Immutable;
+import java.io.File;
+import java.io.FileReader;
+import java.io.IOException;
+import java.util.Properties;
+
+/**
+ * Class storing configs for the {@link com.uber.hoodie.HoodieWriteClient}
+ */
+@Immutable
+public class HoodieWriteConfig extends DefaultHoodieConfig {
+    private static final String BASE_PATH_PROP = "hoodie.base.path";
+    private static final String AVRO_SCHEMA = "hoodie.avro.schema";
+    private static final String TABLE_NAME = "hoodie.table.name";
+    private static final String DEFAULT_PARALLELISM = "200";
+    private static final String INSERT_PARALLELISM = "hoodie.insert.shuffle.parallelism";
+    private static final String UPSERT_PARALLELISM = "hoodie.upsert.shuffle.parallelism";
+    private static final String COMBINE_BEFORE_INSERT_PROP = "hoodie.combine.before.insert";
+    private static final String DEFAULT_COMBINE_BEFORE_INSERT = "false";
+    private static final String COMBINE_BEFORE_UPSERT_PROP = "hoodie.combine.before.upsert";
+    private static final String DEFAULT_COMBINE_BEFORE_UPSERT = "true";
+    private static final String WRITE_STATUS_STORAGE_LEVEL = "hoodie.write.status.storage.level";
+    private static final String DEFAULT_WRITE_STATUS_STORAGE_LEVEL = "MEMORY_AND_DISK_SER";
+
+    private HoodieWriteConfig(Properties props) {
+        super(props);
+    }
+
+    /**
+     * base properties
+     **/
+    public String getBasePath() {
+        return props.getProperty(BASE_PATH_PROP);
+    }
+
+    public String getSchema() {
+        return props.getProperty(AVRO_SCHEMA);
+    }
+
+    public String getTableName() {
+        return props.getProperty(TABLE_NAME);
+    }
+
+    public int getInsertShuffleParallelism() {
+        return Integer.parseInt(props.getProperty(INSERT_PARALLELISM));
+    }
+
+    public int getUpsertShuffleParallelism() {
+        return Integer.parseInt(props.getProperty(UPSERT_PARALLELISM));
+    }
+
+    public boolean shouldCombineBeforeInsert() {
+        return Boolean.parseBoolean(props.getProperty(COMBINE_BEFORE_INSERT_PROP));
+    }
+
+    public boolean shouldCombineBeforeUpsert() {
+        return Boolean.parseBoolean(props.getProperty(COMBINE_BEFORE_UPSERT_PROP));
+    }
+
+    public StorageLevel getWriteStatusStorageLevel() {
+        return StorageLevel.fromString(props.getProperty(WRITE_STATUS_STORAGE_LEVEL));
+    }
+
+    /**
+     * compaction properties
+     **/
+    public HoodieCleaner.CleaningPolicy getCleanerPolicy() {
+        return HoodieCleaner.CleaningPolicy
+            .valueOf(props.getProperty(HoodieCompactionConfig.CLEANER_POLICY_PROP));
+    }
+
+    public int getCleanerFileVersionsRetained() {
+        return Integer.parseInt(
+            props.getProperty(HoodieCompactionConfig.CLEANER_FILE_VERSIONS_RETAINED_PROP));
+    }
+
+    public int getCleanerCommitsRetained() {
+        return Integer
+            .parseInt(props.getProperty(HoodieCompactionConfig.CLEANER_COMMITS_RETAINED_PROP));
+    }
+
+    public int getMaxCommitsToKeep() {
+        return Integer.parseInt(props.getProperty(HoodieCompactionConfig.MAX_COMMITS_TO_KEEP));
+    }
+
+    public int getMinCommitsToKeep() {
+        return Integer.parseInt(props.getProperty(HoodieCompactionConfig.MIN_COMMITS_TO_KEEP));
+    }
+
+    public int getParquetSmallFileLimit() {
+        return Integer.parseInt(props.getProperty(HoodieCompactionConfig.PARQUET_SMALL_FILE_LIMIT_BYTES));
+    }
+
+    public int getCopyOnWriteInsertSplitSize() {
+        return Integer.parseInt(
+            props.getProperty(HoodieCompactionConfig.COPY_ON_WRITE_TABLE_INSERT_SPLIT_SIZE));
+    }
+
+    public int getCopyOnWriteRecordSizeEstimate() {
+        return Integer.parseInt(
+            props.getProperty(HoodieCompactionConfig.COPY_ON_WRITE_TABLE_RECORD_SIZE_ESTIMATE));
+    }
+
+    public boolean shouldAutoTuneInsertSplits() {
+        return Boolean.parseBoolean(
+                props.getProperty(HoodieCompactionConfig.COPY_ON_WRITE_TABLE_AUTO_SPLIT_INSERTS));
+    }
+
+    public int getCleanerParallelism() {
+        return Integer.parseInt(props.getProperty(HoodieCompactionConfig.CLEANER_PARALLELISM));
+    }
+
+    /**
+     * index properties
+     **/
+    public HoodieIndex.IndexType getIndexType() {
+        return HoodieIndex.IndexType.valueOf(props.getProperty(HoodieIndexConfig.INDEX_TYPE_PROP));
+    }
+
+    public int getBloomFilterNumEntries() {
+        return Integer.parseInt(props.getProperty(HoodieIndexConfig.BLOOM_FILTER_NUM_ENTRIES));
+    }
+
+    public double getBloomFilterFPP() {
+        return Double.parseDouble(props.getProperty(HoodieIndexConfig.BLOOM_FILTER_FPP));
+    }
+
+    public String getHbaseZkQuorum() {
+        return props.getProperty(HoodieIndexConfig.HBASE_ZKQUORUM_PROP);
+    }
+
+    public int getHbaseZkPort() {
+        return Integer.parseInt(props.getProperty(HoodieIndexConfig.HBASE_ZKPORT_PROP));
+    }
+
+    public String getHbaseTableName() {
+        return props.getProperty(HoodieIndexConfig.HBASE_TABLENAME_PROP);
+    }
+
+    /**
+     * storage properties
+     **/
+    public int getParquetMaxFileSize() {
+        return Integer.parseInt(props.getProperty(HoodieStorageConfig.PARQUET_FILE_MAX_BYTES));
+    }
+
+    public int getParquetBlockSize() {
+        return Integer.parseInt(props.getProperty(HoodieStorageConfig.PARQUET_BLOCK_SIZE_BYTES));
+    }
+
+    public int getParquetPageSize() {
+        return Integer.parseInt(props.getProperty(HoodieStorageConfig.PARQUET_PAGE_SIZE_BYTES));
+    }
+
+    /**
+     * metrics properties
+     **/
+    public boolean isMetricsOn() {
+        return Boolean.parseBoolean(props.getProperty(HoodieMetricsConfig.METRICS_ON));
+    }
+
+    public MetricsReporterType getMetricsReporterType() {
+        return MetricsReporterType
+            .valueOf(props.getProperty(HoodieMetricsConfig.METRICS_REPORTER_TYPE));
+    }
+
+    public String getGraphiteServerHost() {
+        return props.getProperty(HoodieMetricsConfig.GRAPHITE_SERVER_HOST);
+    }
+
+    public int getGraphiteServerPort() {
+        return Integer.parseInt(props.getProperty(HoodieMetricsConfig.GRAPHITE_SERVER_PORT));
+    }
+
+    public String getGraphiteMetricPrefix() {
+        return props.getProperty(HoodieMetricsConfig.GRAPHITE_METRIC_PREFIX);
+    }
+
+    public static HoodieWriteConfig.Builder newBuilder() {
+        return new Builder();
+    }
+
+    public static class Builder {
+        private final Properties props = new Properties();
+        private boolean isIndexConfigSet = false;
+        private boolean isStorageConfigSet = false;
+        private boolean isCompactionConfigSet = false;
+        private boolean isMetricsConfigSet = false;
+
+        public Builder fromFile(File propertiesFile) throws IOException {
+            FileReader reader = new FileReader(propertiesFile);
+            try {
+                this.props.load(reader);
+                return this;
+            } finally {
+                reader.close();
+            }
+        }
+
+
+        public Builder withPath(String basePath) {
+            props.setProperty(BASE_PATH_PROP, basePath);
+            return this;
+        }
+
+        public Builder withSchema(String schemaStr) {
+            props.setProperty(AVRO_SCHEMA, schemaStr);
+            return this;
+        }
+
+        public Builder forTable(String tableName) {
+            props.setProperty(TABLE_NAME, tableName);
+            return this;
+        }
+
+        public Builder withParallelism(int insertShuffleParallelism, int upsertShuffleParallelism) {
+            props.setProperty(INSERT_PARALLELISM, String.valueOf(insertShuffleParallelism));
+            props.setProperty(UPSERT_PARALLELISM, String.valueOf(upsertShuffleParallelism));
+            return this;
+        }
+
+        public Builder combineInput(boolean onInsert, boolean onUpsert) {
+            props.setProperty(COMBINE_BEFORE_INSERT_PROP, String.valueOf(onInsert));
+            props.setProperty(COMBINE_BEFORE_UPSERT_PROP, String.valueOf(onUpsert));
+            return this;
+        }
+
+        public Builder withWriteStatusStorageLevel(StorageLevel level) {
+            props.setProperty(WRITE_STATUS_STORAGE_LEVEL, level.toString());
+            return this;
+        }
+
+        public Builder withIndexConfig(HoodieIndexConfig indexConfig) {
+            props.putAll(indexConfig.getProps());
+            isIndexConfigSet = true;
+            return this;
+        }
+
+        public Builder withStorageConfig(HoodieStorageConfig storageConfig) {
+            props.putAll(storageConfig.getProps());
+            isStorageConfigSet = true;
+            return this;
+        }
+
+        public Builder withCompactionConfig(HoodieCompactionConfig compactionConfig) {
+            props.putAll(compactionConfig.getProps());
+            isCompactionConfigSet = true;
+            return this;
+        }
+
+        public Builder withMetricsConfig(HoodieMetricsConfig metricsConfig) {
+            props.putAll(metricsConfig.getProps());
+            isMetricsConfigSet = true;
+            return this;
+        }
+
+        public HoodieWriteConfig build() {
+            HoodieWriteConfig config = new HoodieWriteConfig(props);
+            // Check for mandatory properties
+            Preconditions.checkArgument(config.getBasePath() != null);
+            setDefaultOnCondition(props, !props.containsKey(INSERT_PARALLELISM), INSERT_PARALLELISM,
+                DEFAULT_PARALLELISM);
+            setDefaultOnCondition(props, !props.containsKey(UPSERT_PARALLELISM), UPSERT_PARALLELISM,
+                DEFAULT_PARALLELISM);
+            setDefaultOnCondition(props, !props.containsKey(COMBINE_BEFORE_INSERT_PROP),
+                COMBINE_BEFORE_INSERT_PROP, DEFAULT_COMBINE_BEFORE_INSERT);
+            setDefaultOnCondition(props, !props.containsKey(COMBINE_BEFORE_UPSERT_PROP),
+                COMBINE_BEFORE_UPSERT_PROP, DEFAULT_COMBINE_BEFORE_UPSERT);
+            setDefaultOnCondition(props, !props.containsKey(WRITE_STATUS_STORAGE_LEVEL),
+                WRITE_STATUS_STORAGE_LEVEL, DEFAULT_WRITE_STATUS_STORAGE_LEVEL);
+
+
+            setDefaultOnCondition(props, !isIndexConfigSet, HoodieIndexConfig.newBuilder().build());
+            setDefaultOnCondition(props, !isStorageConfigSet,
+                HoodieStorageConfig.newBuilder().build());
+            setDefaultOnCondition(props, !isCompactionConfigSet,
+                HoodieCompactionConfig.newBuilder().build());
+            setDefaultOnCondition(props, !isMetricsConfigSet,
+                HoodieMetricsConfig.newBuilder().build());
+            return config;
+        }
+    }
+}
--- a/hoodie-client/src/main/java/com/uber/hoodie/exception/HoodieCommitException.java
+++ b/hoodie-client/src/main/java/com/uber/hoodie/exception/HoodieCommitException.java
@@ -0,0 +1,32 @@
+/*
+ * Copyright (c) 2016 Uber Technologies, Inc. (hoodie-dev-group@uber.com)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *          http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package com.uber.hoodie.exception;
+
+/**
+ * <p>
+ * Exception thrown for any higher level errors when <code>HoodieClient</code> is doing a Commit
+ * </p>
+ */
+public class HoodieCommitException extends HoodieException {
+    public HoodieCommitException(String msg) {
+        super(msg);
+    }
+
+    public HoodieCommitException(String msg, Throwable e) {
+        super(msg, e);
+    }
+}
--- a/hoodie-client/src/main/java/com/uber/hoodie/exception/HoodieDependentSystemUnavailableException.java
+++ b/hoodie-client/src/main/java/com/uber/hoodie/exception/HoodieDependentSystemUnavailableException.java
@@ -0,0 +1,35 @@
+/*
+ * Copyright (c) 2016 Uber Technologies, Inc. (hoodie-dev-group@uber.com)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *          http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package com.uber.hoodie.exception;
+
+
+/**
+ * <p>
+ * Exception thrown when dependent system is not available
+ * </p>
+ */
+public class HoodieDependentSystemUnavailableException extends HoodieException {
+    public static final String HBASE = "HBASE";
+
+    public HoodieDependentSystemUnavailableException(String system, String connectURL) {
+        super(getLogMessage(system, connectURL));
+    }
+
+    private static String getLogMessage(String system, String connectURL) {
+        return "System " + system + " unavailable. Tried to connect to " + connectURL;
+    }
+}
--- a/hoodie-client/src/main/java/com/uber/hoodie/exception/HoodieInsertException.java
+++ b/hoodie-client/src/main/java/com/uber/hoodie/exception/HoodieInsertException.java
@@ -0,0 +1,30 @@
+/*
+ * Copyright (c) 2016 Uber Technologies, Inc. (hoodie-dev-group@uber.com)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *          http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package com.uber.hoodie.exception;
+
+import java.io.IOException;
+
+/**
+ * <p>
+ * Exception thrown for any higher level errors when <code>HoodieClient</code> is doing a bulk insert
+ * </p>
+ */
+public class HoodieInsertException extends HoodieException {
+    public HoodieInsertException(String msg, Throwable e) {
+        super(msg, e);
+    }
+}
--- a/hoodie-client/src/main/java/com/uber/hoodie/exception/HoodieRollbackException.java
+++ b/hoodie-client/src/main/java/com/uber/hoodie/exception/HoodieRollbackException.java
@@ -0,0 +1,28 @@
+/*
+ * Copyright (c) 2016 Uber Technologies, Inc. (hoodie-dev-group@uber.com)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *          http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package com.uber.hoodie.exception;
+
+public class HoodieRollbackException extends HoodieException {
+
+    public HoodieRollbackException(String msg, Throwable e) {
+        super(msg, e);
+    }
+
+    public HoodieRollbackException(String msg) {
+        super(msg);
+    }
+}
--- a/hoodie-client/src/main/java/com/uber/hoodie/exception/HoodieUpsertException.java
+++ b/hoodie-client/src/main/java/com/uber/hoodie/exception/HoodieUpsertException.java
@@ -0,0 +1,32 @@
+/*
+ * Copyright (c) 2016 Uber Technologies, Inc. (hoodie-dev-group@uber.com)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *          http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package com.uber.hoodie.exception;
+
+/**
+ * <p>
+ * Exception thrown for any higher level errors when <code>HoodieClient</code> is doing a incremental upsert
+ * </p>
+ */
+public class HoodieUpsertException  extends HoodieException {
+    public HoodieUpsertException(String msg, Throwable e) {
+        super(msg, e);
+    }
+
+    public HoodieUpsertException(String msg) {
+        super(msg);
+    }
+}
--- a/hoodie-client/src/main/java/com/uber/hoodie/func/InsertMapFunction.java
+++ b/hoodie-client/src/main/java/com/uber/hoodie/func/InsertMapFunction.java
@@ -0,0 +1,52 @@
+/*
+ * Copyright (c) 2016 Uber Technologies, Inc. (hoodie-dev-group@uber.com)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *          http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package com.uber.hoodie.func;
+
+import com.uber.hoodie.config.HoodieWriteConfig;
+import com.uber.hoodie.WriteStatus;
+import com.uber.hoodie.common.model.HoodieRecord;
+import com.uber.hoodie.common.model.HoodieRecordPayload;
+import com.uber.hoodie.common.model.HoodieTableMetadata;
+import org.apache.spark.api.java.function.Function2;
+
+import java.util.Iterator;
+import java.util.List;
+
+
+/**
+ * Map function that handles a sorted stream of HoodieRecords
+ */
+public class InsertMapFunction<T extends HoodieRecordPayload>
+    implements Function2<Integer, Iterator<HoodieRecord<T>>, Iterator<List<WriteStatus>>> {
+
+    private String commitTime;
+    private HoodieWriteConfig config;
+    private HoodieTableMetadata metadata;
+
+    public InsertMapFunction(String commitTime, HoodieWriteConfig config,
+        HoodieTableMetadata metadata) {
+        this.commitTime = commitTime;
+        this.config = config;
+        this.metadata = metadata;
+    }
+
+    @Override
+    public Iterator<List<WriteStatus>> call(Integer partition, Iterator<HoodieRecord<T>> sortedRecordItr)
+        throws Exception {
+        return new LazyInsertIterable<>(sortedRecordItr, config, commitTime, metadata);
+    }
+}
--- a/hoodie-client/src/main/java/com/uber/hoodie/func/LazyInsertIterable.java
+++ b/hoodie-client/src/main/java/com/uber/hoodie/func/LazyInsertIterable.java
@@ -0,0 +1,114 @@
+/*
+ * Copyright (c) 2016 Uber Technologies, Inc. (hoodie-dev-group@uber.com)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *          http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package com.uber.hoodie.func;
+
+import com.uber.hoodie.config.HoodieWriteConfig;
+import com.uber.hoodie.WriteStatus;
+import com.uber.hoodie.common.model.HoodieRecord;
+import com.uber.hoodie.common.model.HoodieRecordPayload;
+import com.uber.hoodie.common.model.HoodieTableMetadata;
+
+import com.uber.hoodie.io.HoodieIOHandle;
+import com.uber.hoodie.io.HoodieInsertHandle;
+import org.apache.spark.TaskContext;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Set;
+
+/**
+ * Lazy Iterable, that writes a stream of HoodieRecords sorted by the partitionPath,
+ * into new files.
+ */
+public class LazyInsertIterable<T extends HoodieRecordPayload> extends LazyIterableIterator<HoodieRecord<T>, List<WriteStatus>> {
+
+    private final HoodieWriteConfig hoodieConfig;
+    private final String commitTime;
+    private final HoodieTableMetadata tableMetadata;
+    private Set<String> partitionsCleaned;
+    private HoodieInsertHandle handle;
+
+    public LazyInsertIterable(Iterator<HoodieRecord<T>> sortedRecordItr, HoodieWriteConfig config,
+        String commitTime, HoodieTableMetadata metadata) {
+        super(sortedRecordItr);
+        this.partitionsCleaned = new HashSet<>();
+        this.hoodieConfig = config;
+        this.commitTime = commitTime;
+        this.tableMetadata = metadata;
+    }
+
+    @Override protected void start() {
+    }
+
+
+    @Override protected List<WriteStatus> computeNext()  {
+        List<WriteStatus> statuses = new ArrayList<>();
+
+        while (inputItr.hasNext()) {
+            HoodieRecord record = inputItr.next();
+
+            // clean up any partial failures
+            if (!partitionsCleaned.contains(record.getPartitionPath())) {
+                // This insert task could fail multiple times, but Spark will faithfully retry with
+                // the same data again. Thus, before we open any files under a given partition, we
+                // first delete any files in the same partitionPath written by same Spark partition
+                HoodieIOHandle.cleanupTmpFilesFromCurrentCommit(hoodieConfig,
+                                                                commitTime,
+                                                                record.getPartitionPath(),
+                                                                TaskContext.getPartitionId());
+                partitionsCleaned.add(record.getPartitionPath());
+            }
+
+            // lazily initialize the handle, for the first time
+            if (handle == null) {
+                handle =
+                    new HoodieInsertHandle(hoodieConfig, commitTime, tableMetadata,
+                        record.getPartitionPath());
+            }
+
+            if (handle.canWrite(record)) {
+                // write the record, if the handle has capacity
+                handle.write(record);
+            } else {
+                // handle is full.
+                statuses.add(handle.close());
+                // Need to handle the rejected record & open new handle
+                handle =
+                    new HoodieInsertHandle(hoodieConfig, commitTime, tableMetadata,
+                        record.getPartitionPath());
+                handle.write(record); // we should be able to write 1 record.
+                break;
+            }
+        }
+
+        // If we exited out, because we ran out of records, just close the pending handle.
+        if (!inputItr.hasNext()) {
+            if (handle != null) {
+                statuses.add(handle.close());
+            }
+        }
+
+        assert statuses.size() > 0; // should never return empty statuses
+        return statuses;
+    }
+
+    @Override protected void end() {
+
+    }
+}
--- a/hoodie-client/src/main/java/com/uber/hoodie/func/LazyIterableIterator.java
+++ b/hoodie-client/src/main/java/com/uber/hoodie/func/LazyIterableIterator.java
@@ -0,0 +1,128 @@
+/*
+ * Copyright (c) 2016 Uber Technologies, Inc. (hoodie-dev-group@uber.com)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *          http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package com.uber.hoodie.func;
+
+import java.util.Iterator;
+
+/**
+ * (NOTE: Adapted from Apache SystemML) This class is a generic base class for lazy, single pass
+ * inputItr classes in order to simplify the implementation of lazy iterators for mapPartitions use
+ * cases. Note [SPARK-3369], which gives the reasons for backwards compatibility with regard to the
+ * iterable API despite Spark's single pass nature.
+ *
+ * Provide a way to obtain a inputItr of type O  (output), out of an inputItr of type I (input)
+ *
+ * Things to remember: - Assumes Spark calls hasNext() to check for elements, before calling next()
+ * to obtain them - Assumes hasNext() gets called atleast once. - Concrete Implementation is
+ * responsible for calling inputIterator.next() and doing the processing in computeNext()
+ */
+public abstract class LazyIterableIterator<I, O> implements Iterable<O>, Iterator<O> {
+    protected Iterator<I> inputItr = null;
+    private boolean consumed = false;
+    private boolean startCalled = false;
+    private boolean endCalled = false;
+
+    public LazyIterableIterator(Iterator<I> in) {
+        inputItr = in;
+    }
+
+    /**
+     * Called once, before any elements are processed
+     */
+    protected abstract void start();
+
+    /**
+     * Block computation to be overwritten by sub classes.
+     */
+    protected abstract O computeNext();
+
+
+    /**
+     * Called once, after all elements are processed.
+     */
+    protected abstract void end();
+
+
+    //////////////////
+    // iterable implementation
+
+    private void invokeStartIfNeeded() {
+        if (!startCalled) {
+            startCalled = true;
+            try {
+                start();
+            } catch (Exception e) {
+                throw new RuntimeException("Error in start()");
+            }
+        }
+    }
+
+    private void invokeEndIfNeeded() {
+        // make the calls out to begin() & end()
+        if (!endCalled) {
+            endCalled = true;
+            // if we are out of elements, and end has not been called yet
+            try {
+                end();
+            } catch (Exception e) {
+                throw new RuntimeException("Error in end()");
+            }
+        }
+    }
+
+    @Override
+    public Iterator<O> iterator() {
+        //check for consumed inputItr
+        if (consumed)
+            throw new RuntimeException("Invalid repeated inputItr consumption.");
+
+        //hand out self as inputItr exactly once (note: do not hand out the input
+        //inputItr since it is consumed by the self inputItr implementation)
+        consumed = true;
+        return this;
+    }
+
+    //////////////////
+    // inputItr implementation
+
+    @Override
+    public boolean hasNext() {
+        boolean ret = inputItr.hasNext();
+        // make sure, there is exactly one call to start()
+        invokeStartIfNeeded();
+        if (!ret) {
+            // if we are out of elements, and end has not been called yet
+            invokeEndIfNeeded();
+        }
+
+        return ret;
+    }
+
+    @Override
+    public O next() {
+        try {
+            return computeNext();
+        } catch (Exception ex) {
+            throw new RuntimeException(ex);
+        }
+    }
+
+    @Override
+    public void remove() {
+        throw new RuntimeException("Unsupported remove operation.");
+    }
+}
--- a/hoodie-client/src/main/java/com/uber/hoodie/index/HBaseIndex.java
+++ b/hoodie-client/src/main/java/com/uber/hoodie/index/HBaseIndex.java
@@ -0,0 +1,229 @@
+/*
+ * Copyright (c) 2016 Uber Technologies, Inc. (hoodie-dev-group@uber.com)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *          http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package com.uber.hoodie.index;
+
+import com.google.common.base.Optional;
+import com.uber.hoodie.config.HoodieWriteConfig;
+import com.uber.hoodie.WriteStatus;
+import com.uber.hoodie.common.model.HoodieKey;
+import com.uber.hoodie.common.model.HoodieRecordLocation;
+import com.uber.hoodie.common.model.HoodieRecordPayload;
+import com.uber.hoodie.common.model.HoodieTableMetadata;
+import com.uber.hoodie.common.model.HoodieRecord;
+
+import com.uber.hoodie.config.HoodieIndexConfig;
+import com.uber.hoodie.exception.HoodieDependentSystemUnavailableException;
+import com.uber.hoodie.exception.HoodieIndexException;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.HBaseConfiguration;
+import org.apache.hadoop.hbase.TableName;
+import org.apache.hadoop.hbase.client.*;
+import org.apache.hadoop.hbase.util.Bytes;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaPairRDD;
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.api.java.function.Function2;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+
+/**
+ * Hoodie Index implementation backed by HBase
+ */
+public class HBaseIndex<T extends HoodieRecordPayload> extends HoodieIndex<T> {
+    private final static byte[] SYSTEM_COLUMN_FAMILY = Bytes.toBytes("_s");
+    private final static byte[] COMMIT_TS_COLUMN = Bytes.toBytes("commit_ts");
+    private final static byte[] FILE_NAME_COLUMN = Bytes.toBytes("file_name");
+    private final static byte[] PARTITION_PATH_COLUMN = Bytes.toBytes("partition_path");
+
+    private static Logger logger = LogManager.getLogger(HBaseIndex.class);
+
+    private final String tableName;
+
+    public HBaseIndex(HoodieWriteConfig config, JavaSparkContext jsc) {
+        super(config, jsc);
+        this.tableName = config.getProps().getProperty(HoodieIndexConfig.HBASE_TABLENAME_PROP);
+    }
+
+    @Override
+    public JavaPairRDD<HoodieKey, Optional<String>> fetchRecordLocation(
+        JavaRDD<HoodieKey> hoodieKeys, HoodieTableMetadata metadata) {
+        throw new UnsupportedOperationException("HBase index does not implement check exist yet");
+    }
+
+    private static Connection hbaseConnection = null;
+
+    private Connection getHBaseConnection() {
+        Configuration hbaseConfig = HBaseConfiguration.create();
+        String quorum = config.getProps().getProperty(HoodieIndexConfig.HBASE_ZKQUORUM_PROP);
+        hbaseConfig.set("hbase.zookeeper.quorum", quorum);
+        String port = config.getProps().getProperty(HoodieIndexConfig.HBASE_ZKPORT_PROP);
+        hbaseConfig.set("hbase.zookeeper.property.clientPort", port);
+        try {
+            return ConnectionFactory.createConnection(hbaseConfig);
+        } catch (IOException e) {
+            throw new HoodieDependentSystemUnavailableException(
+                HoodieDependentSystemUnavailableException.HBASE, quorum + ":" + port);
+        }
+    }
+
+    /**
+     * Function that tags each HoodieRecord with an existing location, if known.
+     */
+    class LocationTagFunction
+            implements Function2<Integer, Iterator<HoodieRecord<T>>, Iterator<HoodieRecord<T>>> {
+
+        private final HoodieTableMetadata metadata;
+
+        LocationTagFunction(HoodieTableMetadata metadata) {
+            this.metadata = metadata;
+        }
+
+        @Override
+        public Iterator<HoodieRecord<T>> call(Integer partitionNum,
+                                           Iterator<HoodieRecord<T>> hoodieRecordIterator) {
+            // Grab the global HBase connection
+            synchronized (HBaseIndex.class) {
+                if (hbaseConnection == null) {
+                    hbaseConnection = getHBaseConnection();
+                }
+            }
+            List<HoodieRecord<T>> taggedRecords = new ArrayList<>();
+            HTable hTable = null;
+            try {
+                hTable = (HTable) hbaseConnection.getTable(TableName.valueOf(tableName));
+                // Do the tagging.
+                while (hoodieRecordIterator.hasNext()) {
+                    HoodieRecord rec = hoodieRecordIterator.next();
+                    // TODO(vc): This may need to be a multi get.
+                    Result result = hTable.get(
+                            new Get(Bytes.toBytes(rec.getRecordKey())).setMaxVersions(1)
+                                    .addColumn(SYSTEM_COLUMN_FAMILY, COMMIT_TS_COLUMN)
+                                    .addColumn(SYSTEM_COLUMN_FAMILY, FILE_NAME_COLUMN)
+                                    .addColumn(SYSTEM_COLUMN_FAMILY, PARTITION_PATH_COLUMN));
+
+                    // first, attempt to grab location from HBase
+                    if (result.getRow() != null) {
+                        String commitTs =
+                                Bytes.toString(result.getValue(SYSTEM_COLUMN_FAMILY, COMMIT_TS_COLUMN));
+                        String fileId =
+                                Bytes.toString(result.getValue(SYSTEM_COLUMN_FAMILY, FILE_NAME_COLUMN));
+
+                        // if the last commit ts for this row is less than the system commit ts
+                        if (!metadata.isCommitsEmpty() && metadata.isCommitTsSafe(commitTs)) {
+                            rec.setCurrentLocation(new HoodieRecordLocation(commitTs, fileId));
+                        }
+                    }
+                    taggedRecords.add(rec);
+                }
+            } catch (IOException e) {
+                throw new HoodieIndexException(
+                    "Failed to Tag indexed locations because of exception with HBase Client", e);
+            }
+
+            finally {
+                if (hTable != null) {
+                    try {
+                        hTable.close();
+                    } catch (IOException e) {
+                        // Ignore
+                    }
+                }
+
+            }
+            return taggedRecords.iterator();
+        }
+    }
+
+    @Override
+    public JavaRDD<HoodieRecord<T>> tagLocation(JavaRDD<HoodieRecord<T>> recordRDD,
+                                             HoodieTableMetadata metadata) {
+        return recordRDD.mapPartitionsWithIndex(this.new LocationTagFunction(metadata), true);
+    }
+
+    class UpdateLocationTask implements Function2<Integer, Iterator<WriteStatus>, Iterator<WriteStatus>> {
+        @Override
+        public Iterator<WriteStatus> call(Integer partition, Iterator<WriteStatus> statusIterator) {
+
+            List<WriteStatus> writeStatusList = new ArrayList<>();
+            // Grab the global HBase connection
+            synchronized (HBaseIndex.class) {
+                if (hbaseConnection == null) {
+                    hbaseConnection = getHBaseConnection();
+                }
+            }
+            HTable hTable = null;
+            try {
+                hTable = (HTable) hbaseConnection.getTable(TableName.valueOf(tableName));
+                while (statusIterator.hasNext()) {
+                    WriteStatus writeStatus = statusIterator.next();
+                    List<Put> puts = new ArrayList<>();
+                    try {
+                        for (HoodieRecord rec : writeStatus.getWrittenRecords()) {
+                            if (!writeStatus.isErrored(rec.getKey())) {
+                                Put put = new Put(Bytes.toBytes(rec.getRecordKey()));
+                                HoodieRecordLocation loc = rec.getNewLocation();
+                                put.addColumn(SYSTEM_COLUMN_FAMILY, COMMIT_TS_COLUMN,
+                                    Bytes.toBytes(loc.getCommitTime()));
+                                put.addColumn(SYSTEM_COLUMN_FAMILY, FILE_NAME_COLUMN,
+                                    Bytes.toBytes(loc.getFileId()));
+                                put.addColumn(SYSTEM_COLUMN_FAMILY, PARTITION_PATH_COLUMN,
+                                    Bytes.toBytes(rec.getPartitionPath()));
+                                puts.add(put);
+                            }
+                        }
+                        hTable.put(puts);
+                        hTable.flushCommits();
+                    } catch (Exception e) {
+                        Exception we = new Exception("Error updating index for " + writeStatus, e);
+                        logger.error(we);
+                        writeStatus.setGlobalError(we);
+                    }
+                    writeStatusList.add(writeStatus);
+                }
+            } catch (IOException e) {
+                throw new HoodieIndexException(
+                    "Failed to Update Index locations because of exception with HBase Client", e);
+            } finally {
+                if (hTable != null) {
+                    try {
+                        hTable.close();
+                    } catch (IOException e) {
+                        // Ignore
+                    }
+                }
+            }
+            return writeStatusList.iterator();
+        }
+    }
+
+    @Override
+    public JavaRDD<WriteStatus> updateLocation(JavaRDD<WriteStatus> writeStatusRDD,
+                                               HoodieTableMetadata metadata) {
+        return writeStatusRDD.mapPartitionsWithIndex(new UpdateLocationTask(), true);
+    }
+
+    @Override
+    public boolean rollbackCommit(String commitTime) {
+        // TODO (weiy)
+        return true;
+    }
+}
--- a/hoodie-client/src/main/java/com/uber/hoodie/index/HoodieBloomIndex.java
+++ b/hoodie-client/src/main/java/com/uber/hoodie/index/HoodieBloomIndex.java
@@ -0,0 +1,422 @@
+/*
+ * Copyright (c) 2016 Uber Technologies, Inc. (hoodie-dev-group@uber.com)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *          http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package com.uber.hoodie.index;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Optional;
+
+import com.uber.hoodie.config.HoodieWriteConfig;
+import com.uber.hoodie.WriteStatus;
+import com.uber.hoodie.common.model.HoodieKey;
+import com.uber.hoodie.common.model.HoodieRecord;
+import com.uber.hoodie.common.model.HoodieRecordLocation;
+import com.uber.hoodie.common.model.HoodieRecordPayload;
+import com.uber.hoodie.common.model.HoodieTableMetadata;
+import com.uber.hoodie.common.util.FSUtils;
+
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaPairRDD;
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.api.java.function.FlatMapFunction;
+import org.apache.spark.api.java.function.Function;
+import org.apache.spark.api.java.function.PairFlatMapFunction;
+import org.apache.spark.api.java.function.PairFunction;
+
+import scala.Tuple2;
+import java.util.*;
+
+/**
+ * Indexing mechanism based on bloom filter. Each parquet file includes its row_key bloom filter in
+ * its metadata.
+ */
+public class HoodieBloomIndex<T extends HoodieRecordPayload> extends HoodieIndex<T> {
+
+    private static Logger logger = LogManager.getLogger(HoodieBloomIndex.class);
+
+    // we need to limit the join such that it stays within 1.5GB per Spark partition. (SPARK-1476)
+    private static final int SPARK_MAXIMUM_BYTES_PER_PARTITION = 1500 * 1024 * 1024;
+    // this is how much a triplet of (partitionPath, fileId, recordKey) costs.
+    private static final int BYTES_PER_PARTITION_FILE_KEY_TRIPLET = 300;
+    private static int MAX_ITEMS_PER_JOIN_PARTITION = SPARK_MAXIMUM_BYTES_PER_PARTITION / BYTES_PER_PARTITION_FILE_KEY_TRIPLET;
+
+    public HoodieBloomIndex(HoodieWriteConfig config, JavaSparkContext jsc) {
+        super(config, jsc);
+    }
+
+    @Override
+    /**
+     *
+     */
+    public JavaRDD<HoodieRecord<T>> tagLocation(JavaRDD<HoodieRecord<T>> recordRDD, final HoodieTableMetadata metadata) {
+
+        // Step 1: Extract out thinner JavaPairRDD of (partitionPath, recordKey)
+        JavaPairRDD<String, String> partitionRecordKeyPairRDD = recordRDD
+                .mapToPair(new PairFunction<HoodieRecord<T>, String, String>() {
+                    @Override
+                    public Tuple2<String, String> call(HoodieRecord<T> record) throws Exception {
+                        return new Tuple2<>(record.getPartitionPath(), record.getRecordKey());
+                    }
+                });
+
+        // Lookup indexes for all the partition/recordkey pair
+        JavaPairRDD<String, String> rowKeyFilenamePairRDD =
+            lookupIndex(partitionRecordKeyPairRDD, metadata);
+
+        // Cache the result, for subsequent stages.
+        rowKeyFilenamePairRDD.cache();
+        long totalTaggedRecords = rowKeyFilenamePairRDD.count();
+        logger.info("Number of update records (ones tagged with a fileID): " + totalTaggedRecords);
+
+
+        // Step 4: Tag the incoming records, as inserts or updates, by joining with existing record keys
+        // Cost: 4 sec.
+        return tagLocationBacktoRecords(rowKeyFilenamePairRDD, recordRDD);
+    }
+
+    public JavaPairRDD<HoodieKey, Optional<String>> fetchRecordLocation(
+        JavaRDD<HoodieKey> hoodieKeys, final HoodieTableMetadata metadata) {
+        JavaPairRDD<String, String> partitionRecordKeyPairRDD =
+            hoodieKeys.mapToPair(new PairFunction<HoodieKey, String, String>() {
+                @Override
+                public Tuple2<String, String> call(HoodieKey key) throws Exception {
+                    return new Tuple2<>(key.getPartitionPath(), key.getRecordKey());
+                }
+            });
+
+        // Lookup indexes for all the partition/recordkey pair
+        JavaPairRDD<String, String> rowKeyFilenamePairRDD =
+            lookupIndex(partitionRecordKeyPairRDD, metadata);
+
+        JavaPairRDD<String, HoodieKey> rowKeyHoodieKeyPairRDD =
+            hoodieKeys.mapToPair(new PairFunction<HoodieKey, String, HoodieKey>() {
+                @Override
+                public Tuple2<String, HoodieKey> call(HoodieKey key) throws Exception {
+                    return new Tuple2<>(key.getRecordKey(), key);
+                }
+            });
+
+        return rowKeyHoodieKeyPairRDD.leftOuterJoin(rowKeyFilenamePairRDD).mapToPair(
+            new PairFunction<Tuple2<String, Tuple2<HoodieKey, Optional<String>>>, HoodieKey, Optional<String>>() {
+                @Override
+                public Tuple2<HoodieKey, Optional<String>> call(
+                    Tuple2<String, Tuple2<HoodieKey, Optional<String>>> keyPathTuple)
+                    throws Exception {
+                    Optional<String> recordLocationPath;
+                    if (keyPathTuple._2._2.isPresent()) {
+                        String fileName = keyPathTuple._2._2.get();
+                        String partitionPath = keyPathTuple._2._1.getPartitionPath();
+                        recordLocationPath = Optional
+                            .of(new Path(new Path(metadata.getBasePath(), partitionPath), fileName)
+                                .toUri().getPath());
+                    } else {
+                        recordLocationPath = Optional.absent();
+                    }
+                    return new Tuple2<>(keyPathTuple._2._1, recordLocationPath);
+                }
+            });
+    }
+
+    /**
+     * Lookup the location for each record key and return the pair<record_key,location> for all
+     * record keys already present and drop the record keys if not present
+     *
+     * @param partitionRecordKeyPairRDD
+     * @param metadata
+     * @return
+     */
+    private JavaPairRDD<String, String> lookupIndex(
+        JavaPairRDD<String, String> partitionRecordKeyPairRDD, final HoodieTableMetadata metadata) {
+        // Obtain records per partition, in the incoming records
+        Map<String, Object> recordsPerPartition = partitionRecordKeyPairRDD.countByKey();
+        List<String> affectedPartitionPathList = new ArrayList<>(recordsPerPartition.keySet());
+
+        // Step 2: Load all involved files as <Partition, filename> pairs
+        JavaPairRDD<String, String> partitionFilePairRDD =
+            loadInvolvedFiles(affectedPartitionPathList, metadata);
+        Map<String, Object> filesPerPartition = partitionFilePairRDD.countByKey();
+
+        // Compute total subpartitions, to split partitions into.
+        Map<String, Long> subpartitionCountMap =
+            computeSubPartitions(recordsPerPartition, filesPerPartition);
+
+        // Step 3: Obtain a RDD, for each incoming record, that already exists, with the file id, that contains it.
+        return findMatchingFilesForRecordKeys(partitionFilePairRDD, partitionRecordKeyPairRDD,
+            subpartitionCountMap);
+    }
+
+    /**
+     * The index lookup can be skewed in three dimensions : #files, #partitions, #records
+     *
+     * To be able to smoothly handle skews, we need to compute how to split each partitions
+     * into subpartitions. We do it here, in a way that keeps the amount of each Spark join
+     * partition to < 2GB.
+     *
+     * @param recordsPerPartition
+     * @param filesPerPartition
+     * @return
+     */
+    private Map<String, Long> computeSubPartitions(Map<String, Object> recordsPerPartition, Map<String, Object> filesPerPartition) {
+        Map<String, Long> subpartitionCountMap = new HashMap<>();
+        long totalRecords = 0;
+        long totalFiles = 0;
+
+        for (String partitionPath : recordsPerPartition.keySet()) {
+            long numRecords = (Long) recordsPerPartition.get(partitionPath);
+            long numFiles = filesPerPartition.containsKey(partitionPath) ? (Long) filesPerPartition.get(partitionPath) : 1L;
+            subpartitionCountMap.put(partitionPath, ((numFiles * numRecords) / MAX_ITEMS_PER_JOIN_PARTITION) + 1);
+
+            totalFiles += filesPerPartition.containsKey(partitionPath) ? (Long) filesPerPartition.get(partitionPath) : 0L;
+            totalRecords += numRecords;
+        }
+        logger.info("TotalRecords: " + totalRecords + ", TotalFiles: " + totalFiles + ", TotalAffectedPartitions:" + recordsPerPartition.size());
+        logger.info("Sub Partition Counts : " + subpartitionCountMap);
+        return subpartitionCountMap;
+    }
+
+    /**
+     * Load the input records as <Partition, RowKeys> in memory.
+     */
+    @VisibleForTesting
+    Map<String, Iterable<String>> getPartitionToRowKeys(JavaRDD<HoodieRecord<T>> recordRDD) {
+        // Have to wrap the map into a hashmap becuase of the need to braoadcast (see: http://php.sabscape.com/blog/?p=671)
+        return recordRDD.mapToPair(new PairFunction<HoodieRecord<T>, String, String>() {
+            @Override
+            public Tuple2<String, String> call(HoodieRecord record) {
+                return new Tuple2<>(record.getPartitionPath(), record.getRecordKey());
+            }
+        }).groupByKey().collectAsMap();
+    }
+
+    /**
+     * Load all involved files as <Partition, filename> pair RDD.
+     */
+    @VisibleForTesting
+    JavaPairRDD<String, String> loadInvolvedFiles(List<String> partitions, final HoodieTableMetadata metadata) {
+        return jsc.parallelize(partitions, Math.max(partitions.size(), 1))
+                .flatMapToPair(new PairFlatMapFunction<String, String, String>() {
+                    @Override
+                    public Iterable<Tuple2<String, String>> call(String partitionPath) {
+                        FileSystem fs = FSUtils.getFs();
+                        String latestCommitTime = metadata.getAllCommits().lastCommit();
+                        FileStatus[] filteredStatus = metadata.getLatestVersionInPartition(fs, partitionPath, latestCommitTime);
+                        List<Tuple2<String, String>> list = new ArrayList<>();
+                        for (FileStatus fileStatus : filteredStatus) {
+                            list.add(new Tuple2<>(partitionPath, fileStatus.getPath().getName()));
+                        }
+                        return list;
+                    }
+                });
+    }
+
+    @Override
+    public boolean rollbackCommit(String commitTime) {
+        // Nope, don't need to do anything.
+        return true;
+    }
+
+
+    /**
+     * When we subpartition records going into a partition, we still need to check them against
+     * all the files within the partition. Thus, we need to explode the (partition, file) pairs
+     * to (partition_subpartnum, file), so we can later join.
+     *
+     *
+     * @param partitionFilePairRDD
+     * @param subpartitionCountMap
+     * @return
+     */
+    private JavaPairRDD<String, String> explodePartitionFilePairRDD(JavaPairRDD<String, String> partitionFilePairRDD,
+                                                                    final Map<String, Long> subpartitionCountMap) {
+        return partitionFilePairRDD
+                .map(new Function<Tuple2<String, String>, List<Tuple2<String, String>>>() {
+                    @Override
+                    public List<Tuple2<String, String>> call(Tuple2<String, String> partitionFilePair) throws Exception {
+                        List<Tuple2<String, String>> explodedPartitionFilePairs = new ArrayList<>();
+                        for (long l = 0; l < subpartitionCountMap.get(partitionFilePair._1); l++) {
+                            explodedPartitionFilePairs.add(new Tuple2<>(
+                                    String.format("%s#%d", partitionFilePair._1, l),
+                                    partitionFilePair._2));
+                        }
+                        return explodedPartitionFilePairs;
+                    }
+                })
+                .flatMapToPair(new PairFlatMapFunction<List<Tuple2<String, String>>, String, String>() {
+                    @Override
+                    public Iterable<Tuple2<String, String>> call(List<Tuple2<String, String>> exploded) throws Exception {
+                        return exploded;
+                    }
+                });
+
+    }
+
+    /**
+     * To handle tons of incoming records to a partition, we need to split them into groups or create subpartitions.
+     * Here, we do a simple hash mod splitting, based on computed sub partitions.
+     *
+     * @param partitionRecordKeyPairRDD
+     * @param subpartitionCountMap
+     * @return
+     */
+    private JavaPairRDD<String, String> splitPartitionRecordKeysPairRDD(JavaPairRDD<String, String> partitionRecordKeyPairRDD,
+                                                                        final Map<String, Long> subpartitionCountMap) {
+        return partitionRecordKeyPairRDD
+                .mapToPair(new PairFunction<Tuple2<String, String>, String, String>() {
+                    @Override
+                    public Tuple2<String, String> call(Tuple2<String, String> partitionRecordKeyPair) throws Exception {
+                        long subpart = Math.abs(partitionRecordKeyPair._2.hashCode()) % subpartitionCountMap.get(partitionRecordKeyPair._1);
+                        return new Tuple2<>(
+                                String.format("%s#%d", partitionRecordKeyPair._1, subpart),
+                                partitionRecordKeyPair._2);
+                    }
+                });
+    }
+
+
+    /**
+     * Its crucial to pick the right parallelism.
+     *
+     * totalSubPartitions : this is deemed safe limit, to be nice with Spark.
+     * inputParallelism : typically number of input files.
+     *
+     * We pick the max such that, we are always safe, but go higher if say a there are
+     * a lot of input files. (otherwise, we will fallback to number of partitions in input and
+     * end up with slow performance)
+     *
+     *
+     * @param inputParallelism
+     * @param subpartitionCountMap
+     * @return
+     */
+    private int determineParallelism(int inputParallelism, final Map<String, Long> subpartitionCountMap) {
+        // size the join parallelism to max(total number of sub partitions, total number of files).
+        int totalSubparts = 0;
+        for (long subparts : subpartitionCountMap.values()) {
+            totalSubparts += (int) subparts;
+        }
+        int joinParallelism = Math.max(totalSubparts, inputParallelism);
+        logger.info("InputParallelism: ${" + inputParallelism + "}, " +
+                "TotalSubParts: ${" + totalSubparts + "}, " +
+                "Join Parallelism set to : " + joinParallelism);
+        return joinParallelism;
+    }
+
+
+    /**
+     * Find out <RowKey, filename> pair. All workload grouped by file-level.
+     *
+     *         // Join PairRDD(PartitionPath, RecordKey) and PairRDD(PartitionPath, File) & then repartition such that
+     // each RDD partition is a file, then for each file, we do (1) load bloom filter, (2) load rowKeys, (3) Tag rowKey
+     // Make sure the parallelism is atleast the groupby parallelism for tagging location
+     */
+    private JavaPairRDD<String, String> findMatchingFilesForRecordKeys(JavaPairRDD<String, String> partitionFilePairRDD,
+                                                                       JavaPairRDD<String, String> partitionRecordKeyPairRDD,
+                                                                       final Map<String, Long> subpartitionCountMap) {
+
+        // prepare the two RDDs and their join parallelism
+        JavaPairRDD<String, String> subpartitionFilePairRDD = explodePartitionFilePairRDD(partitionFilePairRDD, subpartitionCountMap);
+        JavaPairRDD<String, String> subpartitionRecordKeyPairRDD = splitPartitionRecordKeysPairRDD(partitionRecordKeyPairRDD,
+                subpartitionCountMap);
+        int joinParallelism = determineParallelism(partitionRecordKeyPairRDD.partitions().size(), subpartitionCountMap);
+
+        // Perform a join, to bring all the files in each subpartition ,together with the record keys to be tested against them
+        JavaPairRDD<String, Tuple2<String, String>> joinedTripletRDD = subpartitionFilePairRDD.join(subpartitionRecordKeyPairRDD, joinParallelism);
+
+        // sort further based on filename, such that all checking for the file can happen within a single partition, on-the-fly
+        JavaPairRDD<String, Tuple2<String, HoodieKey>> fileSortedTripletRDD = joinedTripletRDD
+                .mapToPair(new PairFunction<Tuple2<String, Tuple2<String, String>>, String, Tuple2<String, HoodieKey>>() {
+                    @Override
+                    /**
+                     * Incoming triplet is (partitionPath_subpart) => (file, recordKey)
+                     */
+                    public Tuple2<String, Tuple2<String, HoodieKey>> call(Tuple2<String, Tuple2<String, String>> joinedTriplet) throws Exception {
+                        String partitionPath = joinedTriplet._1.split("#")[0]; // throw away the subpart
+                        String fileName = joinedTriplet._2._1;
+                        String recordKey = joinedTriplet._2._2;
+
+                        // make a sort key as <file>#<recordKey>, to handle skews
+                        return new Tuple2<>(String.format("%s#%s", fileName, recordKey),
+                                new Tuple2<>(fileName, new HoodieKey(recordKey, partitionPath)));
+                    }
+                }).sortByKey(true, joinParallelism);
+
+        return fileSortedTripletRDD
+            .mapPartitionsWithIndex(new HoodieBloomIndexCheckFunction(config.getBasePath()), true)
+            .flatMap(new FlatMapFunction<List<IndexLookupResult>, IndexLookupResult>() {
+                @Override
+                public Iterable<IndexLookupResult> call(List<IndexLookupResult> indexLookupResults)
+                    throws Exception {
+                    return indexLookupResults;
+                }
+            }).filter(new Function<IndexLookupResult, Boolean>() {
+                @Override
+                public Boolean call(IndexLookupResult lookupResult) throws Exception {
+                    return lookupResult.getMatchingRecordKeys().size() > 0;
+                }
+            }).flatMapToPair(new PairFlatMapFunction<IndexLookupResult, String, String>() {
+                @Override
+                public Iterable<Tuple2<String, String>> call(IndexLookupResult lookupResult)
+                    throws Exception {
+                    List<Tuple2<String, String>> vals = new ArrayList<>();
+                    for (String recordKey : lookupResult.getMatchingRecordKeys()) {
+                        vals.add(new Tuple2<>(recordKey, lookupResult.getFileName()));
+                    }
+                    return vals;
+                }
+            });
+    }
+
+    /**
+     * Tag the <rowKey, filename> back to the original HoodieRecord RDD.
+     */
+    private JavaRDD<HoodieRecord<T>> tagLocationBacktoRecords(JavaPairRDD<String, String> rowKeyFilenamePairRDD,
+                                                              JavaRDD<HoodieRecord<T>> recordRDD) {
+        JavaPairRDD<String, HoodieRecord<T>> rowKeyRecordPairRDD = recordRDD.mapToPair(
+                new PairFunction<HoodieRecord<T>, String, HoodieRecord<T>>() {
+                    @Override
+                    public Tuple2<String, HoodieRecord<T>> call(HoodieRecord<T> record) throws Exception {
+                        return new Tuple2<>(record.getRecordKey(), record);
+                    }
+                });
+
+        // Here as the recordRDD might have more data than rowKeyRDD (some rowKeys' fileId is null), so we do left outer join.
+        return rowKeyRecordPairRDD.leftOuterJoin(rowKeyFilenamePairRDD).values().map(
+                new Function<Tuple2<HoodieRecord<T>, Optional<String>>, HoodieRecord<T>>() {
+                    @Override
+                    public HoodieRecord<T> call(Tuple2<HoodieRecord<T>, Optional<String>> v1) throws Exception {
+                        HoodieRecord<T> record = v1._1();
+                        if (v1._2().isPresent()) {
+                            String filename = v1._2().get();
+                            if (filename != null && !filename.isEmpty()) {
+                                record.setCurrentLocation(new HoodieRecordLocation(FSUtils.getCommitTime(filename),
+                                        FSUtils.getFileId(filename)));
+                            }
+                        }
+                        return record;
+                    }
+                });
+    }
+
+    @Override
+    public JavaRDD<WriteStatus> updateLocation(JavaRDD<WriteStatus> writeStatusRDD, HoodieTableMetadata metadata) {
+        return writeStatusRDD;
+    }
+}
--- a/hoodie-client/src/main/java/com/uber/hoodie/index/HoodieBloomIndexCheckFunction.java
+++ b/hoodie-client/src/main/java/com/uber/hoodie/index/HoodieBloomIndexCheckFunction.java
@@ -0,0 +1,193 @@
+/*
+ * Copyright (c) 2016 Uber Technologies, Inc. (hoodie-dev-group@uber.com)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *          http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package com.uber.hoodie.index;
+
+import com.uber.hoodie.common.BloomFilter;
+import com.uber.hoodie.common.model.HoodieKey;
+import com.uber.hoodie.common.util.ParquetUtils;
+import com.uber.hoodie.exception.HoodieException;
+import com.uber.hoodie.exception.HoodieIndexException;
+import com.uber.hoodie.func.LazyIterableIterator;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.function.Function2;
+
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Set;
+
+import scala.Tuple2;
+
+/**
+ * Function performing actual checking of RDD parition containing (fileId, hoodieKeys) against the
+ * actual files
+ */
+public class HoodieBloomIndexCheckFunction implements Function2<Integer, Iterator<Tuple2<String, Tuple2<String, HoodieKey>>>, Iterator<List<IndexLookupResult>>> {
+
+    private static Logger logger = LogManager.getLogger(HoodieBloomIndexCheckFunction.class);
+
+    private final String basePath;
+
+    public HoodieBloomIndexCheckFunction(String basePath) {
+        this.basePath = basePath;
+    }
+
+    /**
+     * Given a list of row keys and one file, return only row keys existing in that file.
+     */
+    public static List<String> checkCandidatesAgainstFile(List<String> candidateRecordKeys, Path filePath) throws HoodieIndexException {
+        List<String> foundRecordKeys = new ArrayList<>();
+        try {
+            // Load all rowKeys from the file, to double-confirm
+            if (!candidateRecordKeys.isEmpty()) {
+                Set<String> fileRowKeys = ParquetUtils.readRowKeysFromParquet(filePath);
+                logger.info("Loading " + fileRowKeys.size() + " row keys from " + filePath);
+                if (logger.isDebugEnabled()) {
+                    logger.debug("Keys from " + filePath + " => " + fileRowKeys);
+                }
+                for (String rowKey : candidateRecordKeys) {
+                    if (fileRowKeys.contains(rowKey)) {
+                        foundRecordKeys.add(rowKey);
+                    }
+                }
+                logger.info("After checking with row keys, we have " + foundRecordKeys.size() + " results, for file " + filePath + " => " + foundRecordKeys);
+                if (logger.isDebugEnabled()) {
+                    logger.debug("Keys matching for file " + filePath + " => " + foundRecordKeys);
+                }
+            }
+        } catch (Exception e){
+            throw new HoodieIndexException("Error checking candidate keys against file.", e);
+        }
+        return foundRecordKeys;
+    }
+
+    class LazyKeyCheckIterator extends LazyIterableIterator<Tuple2<String, Tuple2<String, HoodieKey>>, List<IndexLookupResult>> {
+
+        private List<String> candidateRecordKeys;
+
+        private BloomFilter bloomFilter;
+
+        private String currentFile;
+
+        private String currentParitionPath;
+
+        LazyKeyCheckIterator(Iterator<Tuple2<String, Tuple2<String, HoodieKey>>> fileParitionRecordKeyTripletItr) {
+            super(fileParitionRecordKeyTripletItr);
+            currentFile = null;
+            candidateRecordKeys = new ArrayList<>();
+            bloomFilter = null;
+            currentParitionPath = null;
+        }
+
+        @Override
+        protected void start() {
+        }
+
+        private void initState(String fileName, String partitionPath) throws HoodieIndexException {
+            try {
+                Path filePath = new Path(basePath + "/" + partitionPath + "/" + fileName);
+                bloomFilter = ParquetUtils.readBloomFilterFromParquetMetadata(filePath);
+                candidateRecordKeys = new ArrayList<>();
+                currentFile = fileName;
+                currentParitionPath = partitionPath;
+            } catch (Exception e) {
+                throw new HoodieIndexException("Error checking candidate keys against file.", e);
+            }
+        }
+
+        @Override
+        protected List<IndexLookupResult> computeNext() {
+
+            List<IndexLookupResult> ret = new ArrayList<>();
+            try {
+                // process one file in each go.
+                while (inputItr.hasNext()) {
+
+                    Tuple2<String, Tuple2<String, HoodieKey>> currentTuple = inputItr.next();
+                    String fileName = currentTuple._2._1;
+                    String partitionPath = currentTuple._2._2.getPartitionPath();
+                    String recordKey = currentTuple._2._2.getRecordKey();
+
+                    // lazily init state
+                    if (currentFile == null) {
+                        initState(fileName, partitionPath);
+                    }
+
+                    // if continue on current file)
+                    if (fileName.equals(currentFile)) {
+                        // check record key against bloom filter of current file & add to possible keys if needed
+                        if (bloomFilter.mightContain(recordKey)) {
+                            if (logger.isDebugEnabled()) {
+                                logger.debug("#1 Adding " + recordKey + " as candidate for file " + fileName);
+                            }
+                            candidateRecordKeys.add(recordKey);
+                        }
+                    } else {
+                        // do the actual checking of file & break out
+                        Path filePath = new Path(basePath + "/" + currentParitionPath + "/" + currentFile);
+                        logger.info("#1 After bloom filter, the candidate row keys is reduced to " + candidateRecordKeys.size() + " for " + filePath);
+                        if (logger.isDebugEnabled()) {
+                            logger.debug("#The candidate row keys for " + filePath + " => " + candidateRecordKeys);
+                        }
+                        ret.add(new IndexLookupResult(currentFile, checkCandidatesAgainstFile(candidateRecordKeys, filePath)));
+
+                        initState(fileName, partitionPath);
+                        if (bloomFilter.mightContain(recordKey)) {
+                            if (logger.isDebugEnabled()) {
+                                logger.debug("#2 Adding " + recordKey + " as candidate for file " + fileName);
+                            }
+                            candidateRecordKeys.add(recordKey);
+                        }
+                        break;
+                    }
+                }
+
+                // handle case, where we ran out of input, finish pending work, update return val
+                if (!inputItr.hasNext()) {
+                    Path filePath = new Path(basePath + "/" + currentParitionPath + "/" + currentFile);
+                    logger.info("#2 After bloom filter, the candidate row keys is reduced to " + candidateRecordKeys.size() + " for " + filePath);
+                    if (logger.isDebugEnabled()) {
+                        logger.debug("#The candidate row keys for " + filePath + " => " + candidateRecordKeys);
+                    }
+                    ret.add(new IndexLookupResult(currentFile, checkCandidatesAgainstFile(candidateRecordKeys, filePath)));
+                }
+
+            } catch (Throwable e) {
+                if (e instanceof HoodieException) {
+                    throw e;
+                }
+                throw new HoodieIndexException("Error checking bloom filter index. ", e);
+            }
+
+            return ret;
+        }
+
+        @Override
+        protected void end() {
+        }
+    }
+
+
+    @Override
+    public Iterator<List<IndexLookupResult>> call(Integer partition,
+                                                  Iterator<Tuple2<String, Tuple2<String, HoodieKey>>> fileParitionRecordKeyTripletItr) throws Exception {
+        return new LazyKeyCheckIterator(fileParitionRecordKeyTripletItr);
+    }
+}
--- a/hoodie-client/src/main/java/com/uber/hoodie/index/HoodieIndex.java
+++ b/hoodie-client/src/main/java/com/uber/hoodie/index/HoodieIndex.java
@@ -0,0 +1,101 @@
+/*
+ * Copyright (c) 2016 Uber Technologies, Inc. (hoodie-dev-group@uber.com)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *          http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package com.uber.hoodie.index;
+
+import com.google.common.base.Optional;
+import com.uber.hoodie.config.HoodieWriteConfig;
+import com.uber.hoodie.WriteStatus;
+import com.uber.hoodie.common.model.HoodieKey;
+import com.uber.hoodie.common.model.HoodieRecordPayload;
+import com.uber.hoodie.common.model.HoodieTableMetadata;
+import com.uber.hoodie.common.model.HoodieRecord;
+
+import com.uber.hoodie.exception.HoodieIndexException;
+import org.apache.spark.api.java.JavaPairRDD;
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.api.java.JavaSparkContext;
+
+import java.io.Serializable;
+
+/**
+ * Base class for different types of indexes to determine the mapping from uuid
+ * <p/>
+ * TODO(vc): need methods for recovery and rollback
+ */
+public abstract class HoodieIndex<T extends HoodieRecordPayload> implements Serializable {
+    protected transient JavaSparkContext jsc = null;
+
+    public enum IndexType {
+        HBASE,
+        INMEMORY,
+        BLOOM
+    }
+
+    protected final HoodieWriteConfig config;
+
+    protected HoodieIndex(HoodieWriteConfig config, JavaSparkContext jsc) {
+        this.config = config;
+        this.jsc = jsc;
+    }
+
+    /**
+     * Checks if the given [Keys] exists in the hoodie table and returns [Key, Optional<FullFilePath>]
+     * If the optional FullFilePath value is not present, then the key is not found. If the FullFilePath
+     * value is present, it is the path component (without scheme) of the URI underlying file
+     *
+     * @param hoodieKeys
+     * @param metadata
+     * @return
+     */
+    public abstract JavaPairRDD<HoodieKey, Optional<String>> fetchRecordLocation(
+        JavaRDD<HoodieKey> hoodieKeys, final HoodieTableMetadata metadata);
+
+    /**
+     * Looks up the index and tags each incoming record with a location of a file that contains the
+     * row (if it is actually present)
+     */
+    public abstract JavaRDD<HoodieRecord<T>> tagLocation(JavaRDD<HoodieRecord<T>> recordRDD,
+                                                      HoodieTableMetadata metadata) throws
+        HoodieIndexException;
+
+    /**
+     * Extracts the location of written records, and updates the index.
+     * <p/>
+     * TODO(vc): We may need to propagate the record as well in a WriteStatus class
+     */
+    public abstract JavaRDD<WriteStatus> updateLocation(JavaRDD<WriteStatus> writeStatusRDD,
+                                                        HoodieTableMetadata metadata) throws
+        HoodieIndexException;
+
+    /**
+     * Rollback the efffects of the commit made at commitTime.
+     */
+    public abstract boolean rollbackCommit(String commitTime);
+
+    public static <T extends HoodieRecordPayload> HoodieIndex<T> createIndex(
+            HoodieWriteConfig config, JavaSparkContext jsc) throws HoodieIndexException {
+        switch (config.getIndexType()) {
+            case HBASE:
+                return new HBaseIndex<>(config, jsc);
+            case INMEMORY:
+                return new InMemoryHashIndex<>(config, jsc);
+            case BLOOM:
+                return new HoodieBloomIndex<>(config, jsc);
+        }
+        throw new HoodieIndexException("Index type unspecified, set " + config.getIndexType());
+    }
+}
--- a/hoodie-client/src/main/java/com/uber/hoodie/index/InMemoryHashIndex.java
+++ b/hoodie-client/src/main/java/com/uber/hoodie/index/InMemoryHashIndex.java
@@ -0,0 +1,109 @@
+/*
+ * Copyright (c) 2016 Uber Technologies, Inc. (hoodie-dev-group@uber.com)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *          http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package com.uber.hoodie.index;
+
+import com.google.common.base.Optional;
+import com.uber.hoodie.config.HoodieWriteConfig;
+import com.uber.hoodie.WriteStatus;
+import com.uber.hoodie.common.model.HoodieKey;
+import com.uber.hoodie.common.model.HoodieRecord;
+import com.uber.hoodie.common.model.HoodieRecordLocation;
+import com.uber.hoodie.common.model.HoodieRecordPayload;
+import com.uber.hoodie.common.model.HoodieTableMetadata;
+
+import org.apache.spark.api.java.JavaPairRDD;
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.api.java.function.Function;
+import org.apache.spark.api.java.function.Function2;
+
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+
+
+/**
+ * Hoodie Index implementation backed by an in-memory Hash map.
+ *
+ * ONLY USE FOR LOCAL TESTING
+ *
+ */
+public class InMemoryHashIndex<T extends HoodieRecordPayload> extends HoodieIndex<T> {
+
+    private static ConcurrentMap<HoodieKey, HoodieRecordLocation> recordLocationMap;
+
+    public InMemoryHashIndex(HoodieWriteConfig config, JavaSparkContext jsc) {
+        super(config, jsc);
+        recordLocationMap = new ConcurrentHashMap<>();
+    }
+
+    @Override
+    public JavaPairRDD<HoodieKey, Optional<String>> fetchRecordLocation(
+        JavaRDD<HoodieKey> hoodieKeys, final HoodieTableMetadata metadata) {
+        throw new UnsupportedOperationException("InMemory index does not implement check exist yet");
+    }
+
+    /**
+     * Function that tags each HoodieRecord with an existing location, if known.
+     */
+    class LocationTagFunction
+            implements Function2<Integer, Iterator<HoodieRecord<T>>, Iterator<HoodieRecord<T>>> {
+        @Override
+        public Iterator<HoodieRecord<T>> call(Integer partitionNum,
+                                           Iterator<HoodieRecord<T>> hoodieRecordIterator) {
+            List<HoodieRecord<T>> taggedRecords = new ArrayList<>();
+            while (hoodieRecordIterator.hasNext()) {
+                HoodieRecord<T> rec = hoodieRecordIterator.next();
+                if (recordLocationMap.containsKey(rec.getKey())) {
+                    rec.setCurrentLocation(recordLocationMap.get(rec.getKey()));
+                }
+                taggedRecords.add(rec);
+            }
+            return taggedRecords.iterator();
+        }
+    }
+
+    @Override
+    public JavaRDD<HoodieRecord<T>> tagLocation(JavaRDD<HoodieRecord<T>> recordRDD,
+                                             HoodieTableMetadata metadata) {
+        return recordRDD.mapPartitionsWithIndex(this.new LocationTagFunction(), true);
+    }
+
+    @Override
+    public JavaRDD<WriteStatus> updateLocation(JavaRDD<WriteStatus> writeStatusRDD,
+                                               HoodieTableMetadata metadata) {
+        return writeStatusRDD.map(new Function<WriteStatus, WriteStatus>() {
+            @Override
+            public WriteStatus call(WriteStatus writeStatus) {
+                for (HoodieRecord record : writeStatus.getWrittenRecords()) {
+                    if (!writeStatus.isErrored(record.getKey())) {
+                        recordLocationMap.put(record.getKey(), record.getNewLocation());
+                    }
+                }
+                return writeStatus;
+            }
+        });
+    }
+
+    @Override
+    public boolean rollbackCommit(String commitTime) {
+        // TODO (weiy)
+        return true;
+    }
+}
--- a/hoodie-client/src/main/java/com/uber/hoodie/index/IndexLookupResult.java
+++ b/hoodie-client/src/main/java/com/uber/hoodie/index/IndexLookupResult.java
@@ -0,0 +1,43 @@
+/*
+ * Copyright (c) 2016 Uber Technologies, Inc. (hoodie-dev-group@uber.com)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *          http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package com.uber.hoodie.index;
+
+import java.util.List;
+
+/**
+ * Encapsulates the result from an index lookup
+ */
+public class IndexLookupResult {
+
+    private String fileName;
+
+
+    private List<String> matchingRecordKeys;
+
+    public IndexLookupResult(String fileName, List<String> matchingRecordKeys) {
+        this.fileName = fileName;
+        this.matchingRecordKeys = matchingRecordKeys;
+    }
+
+    public String getFileName() {
+        return fileName;
+    }
+
+    public List<String> getMatchingRecordKeys() {
+        return matchingRecordKeys;
+    }
+}
--- a/hoodie-client/src/main/java/com/uber/hoodie/io/HoodieCleaner.java
+++ b/hoodie-client/src/main/java/com/uber/hoodie/io/HoodieCleaner.java
@@ -0,0 +1,224 @@
+/*
+ * Copyright (c) 2016 Uber Technologies, Inc. (hoodie-dev-group@uber.com)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *          http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package com.uber.hoodie.io;
+
+import com.uber.hoodie.config.HoodieWriteConfig;
+import com.uber.hoodie.common.model.HoodieCommits;
+import com.uber.hoodie.common.model.HoodieTableMetadata;
+import com.uber.hoodie.common.util.FSUtils;
+
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Cleaner is responsible for garbage collecting older files in a given partition path, such that
+ *
+ * 1) It provides sufficient time for existing queries running on older versions, to finish
+ *
+ * 2) It bounds the growth of the files in the file system
+ *
+ * TODO: Should all cleaning be done based on {@link com.uber.hoodie.common.model.HoodieCommitMetadata}
+ *
+ *
+ */
+public class HoodieCleaner {
+
+    public enum CleaningPolicy {
+        KEEP_LATEST_FILE_VERSIONS,
+        KEEP_LATEST_COMMITS
+    }
+
+
+    private static Logger logger = LogManager.getLogger(HoodieCleaner.class);
+
+
+    private HoodieTableMetadata metadata;
+
+    private HoodieWriteConfig config;
+
+    private FileSystem fs;
+
+    public HoodieCleaner(HoodieTableMetadata metadata,
+                         HoodieWriteConfig config,
+                         FileSystem fs) {
+        this.metadata = metadata;
+        this.config = config;
+        this.fs = fs;
+    }
+
+
+    /**
+     *
+     * Selects the older versions of files for cleaning, such that it bounds the number of versions of each file.
+     * This policy is useful, if you are simply interested in querying the table, and you don't want too many
+     * versions for a single file (i.e run it with versionsRetained = 1)
+     *
+     *
+     * @param partitionPath
+     * @return
+     * @throws IOException
+     */
+    private List<String> getFilesToCleanKeepingLatestVersions(String partitionPath) throws IOException {
+        logger.info("Cleaning "+ partitionPath+", retaining latest "+ config.getCleanerFileVersionsRetained()+" file versions. ");
+        Map<String, List<FileStatus>> fileVersions = metadata.getAllVersionsInPartition(fs, partitionPath);
+        List<String> deletePaths = new ArrayList<>();
+
+        for (String file : fileVersions.keySet()) {
+            List<FileStatus> commitList = fileVersions.get(file);
+            int keepVersions = config.getCleanerFileVersionsRetained();
+            Iterator<FileStatus> commitItr = commitList.iterator();
+            while (commitItr.hasNext() && keepVersions > 0) {
+                // Skip this most recent version
+                commitItr.next();
+                keepVersions--;
+            }
+            // Delete the remaining files
+            while (commitItr.hasNext()) {
+                deletePaths.add(String.format("%s/%s/%s",
+                        config.getBasePath(),
+                        partitionPath,
+                        commitItr.next().getPath().getName()));
+            }
+        }
+        return deletePaths;
+    }
+
+
+    /**
+     * Selects the versions for file for cleaning, such that it
+     *
+     *  - Leaves the latest version of the file untouched
+     *  - For older versions,
+     *      - It leaves all the commits untouched which has occured in last <code>config.getCleanerCommitsRetained()</code> commits
+     *      - It leaves ONE commit before this window. We assume that the max(query execution time) == commit_batch_time *  config.getCleanerCommitsRetained(). This is 12 hours by default.
+     *        This is essential to leave the file used by the query thats running for the max time.
+     *
+     *  This provides the effect of having lookback into all changes that happened in the last X
+     *  commits. (eg: if you retain 24 commits, and commit batch time is 30 mins, then you have 12 hrs of lookback)
+     *
+     *  This policy is the default.
+     *
+     * @param partitionPath
+     * @return
+     * @throws IOException
+     */
+    private List<String> getFilesToCleanKeepingLatestCommits(String partitionPath)
+        throws IOException {
+        int commitsRetained = config.getCleanerCommitsRetained();
+        logger.info(
+            "Cleaning " + partitionPath + ", retaining latest " + commitsRetained + " commits. ");
+        List<String> deletePaths = new ArrayList<>();
+
+        // determine if we have enough commits, to start cleaning.
+        HoodieCommits commits = metadata.getAllCommits();
+        if (commits.getNumCommits() > commitsRetained) {
+            String earliestCommitToRetain =
+                commits.nthCommit(commits.getNumCommits() - commitsRetained);
+            Map<String, List<FileStatus>> fileVersions =
+                metadata.getAllVersionsInPartition(fs, partitionPath);
+            for (String file : fileVersions.keySet()) {
+                List<FileStatus> fileList = fileVersions.get(file);
+                String lastVersion = FSUtils.getCommitTime(fileList.get(0).getPath().getName());
+                String lastVersionBeforeEarliestCommitToRetain =
+                    getLatestVersionBeforeCommit(fileList, earliestCommitToRetain);
+
+                // Ensure there are more than 1 version of the file (we only clean old files from updates)
+                // i.e always spare the last commit.
+                for (FileStatus afile : fileList) {
+                    String fileCommitTime = FSUtils.getCommitTime(afile.getPath().getName());
+                    // Dont delete the latest commit and also the last commit before the earliest commit we are retaining
+                    // The window of commit retain == max query run time. So a query could be running which still
+                    // uses this file.
+                    if (fileCommitTime.equals(lastVersion) || (
+                        lastVersionBeforeEarliestCommitToRetain != null && fileCommitTime
+                            .equals(lastVersionBeforeEarliestCommitToRetain))) {
+                        // move on to the next file
+                        continue;
+                    }
+
+                    // Always keep the last commit
+                    if (HoodieCommits.isCommit1After(earliestCommitToRetain, fileCommitTime)) {
+                        // this is a commit, that should be cleaned.
+                        deletePaths.add(String
+                            .format("%s/%s/%s", config.getBasePath(), partitionPath,
+                                FSUtils.maskWithoutTaskPartitionId(fileCommitTime, file)));
+                    }
+                }
+            }
+        }
+
+        return deletePaths;
+    }
+
+    /**
+     * Gets the latest version < commitTime. This version file could still be used by queries.
+     */
+    private String getLatestVersionBeforeCommit(List<FileStatus> fileList, String commitTime) {
+        for (FileStatus file : fileList) {
+            String fileCommitTime = FSUtils.getCommitTime(file.getPath().getName());
+            if (HoodieCommits.isCommit1After(commitTime, fileCommitTime)) {
+                // fileList is sorted on the reverse, so the first commit we find <= commitTime is the one we want
+                return fileCommitTime;
+            }
+        }
+        // There is no version of this file which is <= commitTime
+        return null;
+    }
+
+
+    /**
+     * Performs cleaning of the partition path according to cleaning policy and returns the number
+     * of files cleaned.
+     *
+     * @throws IllegalArgumentException if unknown cleaning policy is provided
+     */
+    public int clean(String partitionPath) throws IOException {
+        CleaningPolicy policy = config.getCleanerPolicy();
+        List<String> deletePaths;
+        if (policy == CleaningPolicy.KEEP_LATEST_COMMITS) {
+            deletePaths = getFilesToCleanKeepingLatestCommits(partitionPath);
+        } else if (policy == CleaningPolicy.KEEP_LATEST_FILE_VERSIONS) {
+            deletePaths = getFilesToCleanKeepingLatestVersions(partitionPath);
+        } else {
+            throw new IllegalArgumentException("Unknown cleaning policy : " + policy.name());
+        }
+
+        // perform the actual deletes
+        for (String deletePath : deletePaths) {
+            logger.info("Working on delete path :" + deletePath);
+            FileStatus[] deleteVersions = fs.globStatus(new Path(deletePath));
+            if (deleteVersions != null) {
+                for (FileStatus deleteVersion : deleteVersions) {
+                    if (fs.delete(deleteVersion.getPath(), false)) {
+                        logger.info("Cleaning file at path :" + deleteVersion.getPath());
+                    }
+                }
+            }
+        }
+        logger.info(deletePaths.size() + " files deleted for partition path:" + partitionPath);
+        return deletePaths.size();
+    }
+}
--- a/hoodie-client/src/main/java/com/uber/hoodie/io/HoodieCommitArchiveLog.java
+++ b/hoodie-client/src/main/java/com/uber/hoodie/io/HoodieCommitArchiveLog.java
@@ -0,0 +1,144 @@
+/*
+ * Copyright (c) 2016 Uber Technologies, Inc. (hoodie-dev-group@uber.com)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *          http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package com.uber.hoodie.io;
+
+import com.uber.hoodie.config.HoodieWriteConfig;
+import com.uber.hoodie.common.file.HoodieAppendLog;
+import com.uber.hoodie.common.model.HoodieTableMetadata;
+import com.uber.hoodie.common.util.FSUtils;
+import com.uber.hoodie.exception.HoodieCommitException;
+import com.uber.hoodie.exception.HoodieIOException;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.io.Text;
+import org.apache.hadoop.io.compress.BZip2Codec;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+
+/**
+ * Log to hold older historical commits, to bound the growth of .commit files
+ */
+public class HoodieCommitArchiveLog {
+    private static Logger log = LogManager.getLogger(HoodieCommitArchiveLog.class);
+    private static final String HOODIE_COMMIT_ARCHIVE_LOG_FILE = "commits.archived";
+
+    private final Path archiveFilePath;
+    private final FileSystem fs;
+    private final HoodieWriteConfig config;
+
+    public HoodieCommitArchiveLog(HoodieWriteConfig config) {
+        this.archiveFilePath =
+            new Path(config.getBasePath(),
+                    HoodieTableMetadata.METAFOLDER_NAME + "/" +HOODIE_COMMIT_ARCHIVE_LOG_FILE);
+        this.fs = FSUtils.getFs();
+        this.config = config;
+    }
+
+    /**
+     * Check if commits need to be archived. If yes, archive commits.
+     */
+    public boolean archiveIfRequired() {
+        HoodieTableMetadata metadata = new HoodieTableMetadata(fs, config.getBasePath());
+        List<String> commitsToArchive = getCommitsToArchive(metadata);
+        if (!commitsToArchive.isEmpty()) {
+            log.info("Archiving commits " + commitsToArchive);
+            archive(metadata, commitsToArchive);
+            return deleteCommits(metadata, commitsToArchive);
+        } else {
+            log.info("No Commits to archive");
+            return true;
+        }
+    }
+
+    private List<String> getCommitsToArchive(HoodieTableMetadata metadata) {
+        int maxCommitsToKeep = config.getMaxCommitsToKeep();
+        int minCommitsToKeep = config.getMinCommitsToKeep();
+
+        List<String> commits = metadata.getAllCommits().getCommitList();
+        List<String> commitsToArchive = new ArrayList<String>();
+        if (commits.size() > maxCommitsToKeep) {
+            // Actually do the commits
+            commitsToArchive = commits.subList(0, commits.size() - minCommitsToKeep);
+        }
+        return commitsToArchive;
+    }
+
+    private boolean deleteCommits(HoodieTableMetadata metadata, List<String> commitsToArchive) {
+        log.info("Deleting commits " + commitsToArchive);
+        boolean success = true;
+        for(String commitToArchive:commitsToArchive) {
+            Path commitFile =
+                new Path(metadata.getBasePath() + "/" +
+                        HoodieTableMetadata.METAFOLDER_NAME + "/" +
+                        FSUtils.makeCommitFileName(commitToArchive));
+            try {
+                if (fs.exists(commitFile)) {
+                    success &= fs.delete(commitFile, false);
+                    log.info("Archived and deleted commit file " + commitFile);
+                }
+            } catch (IOException e) {
+                throw new HoodieIOException(
+                    "Failed to delete archived commit " + commitToArchive, e);
+            }
+        }
+        return success;
+    }
+
+    private HoodieAppendLog.Writer openWriter() throws IOException {
+        log.info("Opening archive file at path: " + archiveFilePath);
+        return HoodieAppendLog
+            .createWriter(fs.getConf(), HoodieAppendLog.Writer.file(archiveFilePath),
+                HoodieAppendLog.Writer.keyClass(Text.class),
+                HoodieAppendLog.Writer.appendIfExists(true),
+                HoodieAppendLog.Writer.valueClass(Text.class), HoodieAppendLog.Writer
+                    .compression(HoodieAppendLog.CompressionType.RECORD, new BZip2Codec()));
+    }
+
+    private void archive(HoodieTableMetadata metadata, List<String> commits)
+        throws HoodieCommitException {
+        HoodieAppendLog.Writer writer = null;
+        try {
+            writer = openWriter();
+            for (String commitTime : commits) {
+                Text k = new Text(commitTime);
+                Text v = new Text(metadata.getCommitMetadata(commitTime).toJsonString());
+                writer.append(k, v);
+                log.info("Wrote " + k);
+            }
+        } catch (IOException e) {
+            throw new HoodieCommitException("Could not archive commits " + commits, e);
+        } finally {
+            if (writer != null) {
+                try {
+                    writer.hsync();
+                    writer.close();
+                } catch (IOException e) {
+                    throw new HoodieCommitException(
+                        "Could not close the archive commits writer " + commits, e);
+                }
+            }
+        }
+    }
+
+    public Path getArchiveFilePath() {
+        return archiveFilePath;
+    }
+}
--- a/hoodie-client/src/main/java/com/uber/hoodie/io/HoodieIOHandle.java
+++ b/hoodie-client/src/main/java/com/uber/hoodie/io/HoodieIOHandle.java
@@ -0,0 +1,92 @@
+/*
+ * Copyright (c) 2016 Uber Technologies, Inc. (hoodie-dev-group@uber.com)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *          http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package com.uber.hoodie.io;
+
+import com.uber.hoodie.config.HoodieWriteConfig;
+import com.uber.hoodie.common.model.HoodieRecordPayload;
+import com.uber.hoodie.common.model.HoodieTableMetadata;
+import com.uber.hoodie.common.util.FSUtils;
+import com.uber.hoodie.common.util.HoodieAvroUtils;
+import com.uber.hoodie.exception.HoodieIOException;
+import org.apache.avro.Schema;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+
+public abstract class HoodieIOHandle<T extends HoodieRecordPayload> {
+    private static Logger logger = LogManager.getLogger(HoodieIOHandle.class);
+    protected final String commitTime;
+    protected final HoodieWriteConfig config;
+    protected final FileSystem fs;
+    protected final HoodieTableMetadata metadata;
+    protected final Schema schema;
+
+    public HoodieIOHandle(HoodieWriteConfig config, String commitTime,
+                          HoodieTableMetadata metadata) {
+        this.commitTime = commitTime;
+        this.config = config;
+        this.fs = FSUtils.getFs();
+        this.metadata = metadata;
+        this.schema =
+            HoodieAvroUtils.addMetadataFields(new Schema.Parser().parse(config.getSchema()));
+    }
+
+    public Path makeNewPath(String partitionPath, int taskPartitionId, String fileName) {
+        Path path = new Path(config.getBasePath(), partitionPath);
+        try {
+            fs.mkdirs(path); // create a new partition as needed.
+        } catch (IOException e) {
+            throw new HoodieIOException("Failed to make dir " + path, e);
+        }
+
+        return new Path(path.toString(),
+            FSUtils.makeDataFileName(commitTime, taskPartitionId, fileName));
+    }
+
+    /**
+     * Deletes any new tmp files written during the current commit, into the partition
+     */
+    public static void cleanupTmpFilesFromCurrentCommit(HoodieWriteConfig config,
+                                                        String commitTime,
+                                                        String partitionPath,
+                                                        int taskPartitionId) {
+        FileSystem fs = FSUtils.getFs();
+        try {
+            FileStatus[] prevFailedFiles = fs.globStatus(new Path(String
+                .format("%s/%s/%s", config.getBasePath(), partitionPath,
+                    FSUtils.maskWithoutFileId(commitTime, taskPartitionId))));
+            if (prevFailedFiles != null) {
+                logger.info("Deleting " + prevFailedFiles.length
+                    + " files generated by previous failed attempts.");
+                for (FileStatus status : prevFailedFiles) {
+                    fs.delete(status.getPath(), false);
+                }
+            }
+        } catch (IOException e) {
+            throw new HoodieIOException("Failed to cleanup Temp files from commit " + commitTime,
+                e);
+        }
+    }
+
+    public Schema getSchema() {
+        return schema;
+    }
+}
--- a/hoodie-client/src/main/java/com/uber/hoodie/io/HoodieInsertHandle.java
+++ b/hoodie-client/src/main/java/com/uber/hoodie/io/HoodieInsertHandle.java
@@ -0,0 +1,125 @@
+/*
+ * Copyright (c) 2016 Uber Technologies, Inc. (hoodie-dev-group@uber.com)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *          http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package com.uber.hoodie.io;
+
+import com.uber.hoodie.config.HoodieWriteConfig;
+import com.uber.hoodie.WriteStatus;
+import com.uber.hoodie.common.model.HoodieRecord;
+import com.uber.hoodie.common.model.HoodieRecordLocation;
+import com.uber.hoodie.common.model.HoodieRecordPayload;
+import com.uber.hoodie.common.model.HoodieTableMetadata;
+import com.uber.hoodie.common.model.HoodieWriteStat;
+import com.uber.hoodie.common.util.FSUtils;
+import com.uber.hoodie.exception.HoodieInsertException;
+import com.uber.hoodie.io.storage.HoodieStorageWriter;
+import com.uber.hoodie.io.storage.HoodieStorageWriterFactory;
+import org.apache.avro.generic.IndexedRecord;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.TaskContext;
+
+import java.io.IOException;
+import java.util.UUID;
+
+public class HoodieInsertHandle<T extends HoodieRecordPayload> extends HoodieIOHandle<T> {
+    private static Logger logger = LogManager.getLogger(HoodieInsertHandle.class);
+
+    private final WriteStatus status;
+    private final HoodieStorageWriter<IndexedRecord> storageWriter;
+    private final Path path;
+    private int recordsWritten = 0;
+
+    public HoodieInsertHandle(HoodieWriteConfig config, String commitTime,
+                              HoodieTableMetadata metadata, String partitionPath) {
+        super(config, commitTime, metadata);
+        this.status = new WriteStatus();
+        status.setFileId(UUID.randomUUID().toString());
+        status.setPartitionPath(partitionPath);
+
+        this.path = makeNewPath(partitionPath, TaskContext.getPartitionId(), status.getFileId());
+        try {
+            this.storageWriter =
+                HoodieStorageWriterFactory.getStorageWriter(commitTime, path, metadata, config, schema);
+        } catch (IOException e) {
+            throw new HoodieInsertException(
+                "Failed to initialize HoodieStorageWriter for path " + path, e);
+        }
+        logger.info("New InsertHandle for partition :" + partitionPath);
+    }
+
+    /**
+     * Determines whether we can accept the incoming records, into the current file, depending on
+     * <p/>
+     * - Whether it belongs to the same partitionPath as existing records
+     * - Whether the current file written bytes < max file size
+     *
+     * @return
+     */
+    public boolean canWrite(HoodieRecord record) {
+        return storageWriter.canWrite() && record.getPartitionPath()
+            .equals(status.getPartitionPath());
+    }
+
+    /**
+     * Perform the actual writing of the given record into the backing file.
+     *
+     * @param record
+     */
+    public void write(HoodieRecord record) {
+        try {
+            IndexedRecord avroRecord = record.getData().getInsertValue(schema);
+            storageWriter.writeAvroWithMetadata(avroRecord, record);
+            status.markSuccess(record);
+            // update the new location of record, so we know where to find it next
+            record.setNewLocation(new HoodieRecordLocation(commitTime, status.getFileId()));
+            record.deflate();
+            recordsWritten++;
+        } catch (Throwable t) {
+            status.markFailure(record, t);
+            logger.error("Error writing record " + record, t);
+        }
+    }
+
+    /**
+     * Performs actions to durably, persist the current changes and returns a WriteStatus object
+     *
+     * @return
+     */
+    public WriteStatus close() {
+        logger.info(
+            "Closing the file " + status.getFileId() + " as we are done with all the records "
+                + recordsWritten);
+        try {
+            storageWriter.close();
+
+            HoodieWriteStat stat = new HoodieWriteStat();
+            stat.setNumWrites(recordsWritten);
+            stat.setPrevCommit(HoodieWriteStat.NULL_COMMIT);
+            stat.setFileId(status.getFileId());
+            stat.setFullPath(path.toString());
+            stat.setTotalWriteBytes(FSUtils.getFileSize(fs, path));
+            stat.setTotalWriteErrors(status.getFailedRecords().size());
+            status.setStat(stat);
+
+            return status;
+        } catch (IOException e) {
+            throw new HoodieInsertException("Failed to close the Insert Handle for path " + path,
+                e);
+        }
+    }
+}
--- a/hoodie-client/src/main/java/com/uber/hoodie/io/HoodieUpdateHandle.java
+++ b/hoodie-client/src/main/java/com/uber/hoodie/io/HoodieUpdateHandle.java
@@ -0,0 +1,193 @@
+/*
+ * Copyright (c) 2016 Uber Technologies, Inc. (hoodie-dev-group@uber.com)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *          http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package com.uber.hoodie.io;
+
+import com.uber.hoodie.config.HoodieWriteConfig;
+import com.uber.hoodie.WriteStatus;
+import com.uber.hoodie.common.model.HoodieRecord;
+import com.uber.hoodie.common.model.HoodieRecordLocation;
+import com.uber.hoodie.common.model.HoodieRecordPayload;
+import com.uber.hoodie.common.model.HoodieTableMetadata;
+import com.uber.hoodie.common.model.HoodieWriteStat;
+import com.uber.hoodie.common.util.FSUtils;
+import com.uber.hoodie.exception.HoodieUpsertException;
+import com.uber.hoodie.io.storage.HoodieStorageWriter;
+import com.uber.hoodie.io.storage.HoodieStorageWriterFactory;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.IndexedRecord;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.TaskContext;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.Iterator;
+
+@SuppressWarnings("Duplicates") public class HoodieUpdateHandle <T extends HoodieRecordPayload> extends HoodieIOHandle<T> {
+    private static Logger logger = LogManager.getLogger(HoodieUpdateHandle.class);
+
+    private final WriteStatus writeStatus;
+    private final HashMap<String, HoodieRecord<T>> keyToNewRecords;
+    private HoodieStorageWriter<IndexedRecord> storageWriter;
+    private Path newFilePath;
+    private Path oldFilePath;
+    private long recordsWritten = 0;
+    private long updatedRecordsWritten = 0;
+    private String fileId;
+
+    public HoodieUpdateHandle(HoodieWriteConfig config,
+                              String commitTime,
+                              HoodieTableMetadata metadata,
+                              Iterator<HoodieRecord<T>> recordItr,
+                              String fileId) {
+        super(config, commitTime, metadata);
+        WriteStatus writeStatus = new WriteStatus();
+        writeStatus.setStat(new HoodieWriteStat());
+        this.writeStatus = writeStatus;
+        this.fileId = fileId;
+        this.keyToNewRecords = new HashMap<>();
+        init(recordItr);
+    }
+
+    /**
+     * Load the new incoming records in a map, and extract the old file path.
+     */
+    private void init(Iterator<HoodieRecord<T>> newRecordsItr) {
+        try {
+            // Load the new records in a map
+            while (newRecordsItr.hasNext()) {
+                HoodieRecord<T> record = newRecordsItr.next();
+                // If the first record, we need to extract some info out
+                if (oldFilePath == null) {
+                    String latestValidFilePath = metadata.getFilenameForRecord(fs, record, fileId);
+                    writeStatus.getStat().setPrevCommit(FSUtils.getCommitTime(latestValidFilePath));
+                    oldFilePath = new Path(
+                        config.getBasePath() + "/" + record.getPartitionPath() + "/"
+                            + latestValidFilePath);
+                    newFilePath = new Path(
+                        config.getBasePath() + "/" + record.getPartitionPath() + "/" + FSUtils
+                            .makeDataFileName(commitTime, TaskContext.getPartitionId(), fileId));
+
+                    // handle cases of partial failures, for update task
+                    if (fs.exists(newFilePath)) {
+                        fs.delete(newFilePath, false);
+                    }
+
+                    logger.info(String.format("Merging new data into oldPath %s, as newPath %s",
+                        oldFilePath.toString(), newFilePath.toString()));
+                    // file name is same for all records, in this bunch
+                    writeStatus.setFileId(fileId);
+                    writeStatus.setPartitionPath(record.getPartitionPath());
+                    writeStatus.getStat().setFileId(fileId);
+                    writeStatus.getStat().setFullPath(newFilePath.toString());
+                }
+                keyToNewRecords.put(record.getRecordKey(), record);
+                // update the new location of the record, so we know where to find it next
+                record.setNewLocation(new HoodieRecordLocation(commitTime, fileId));
+            }
+            // Create the writer for writing the new version file
+            storageWriter = HoodieStorageWriterFactory
+                .getStorageWriter(commitTime, newFilePath, metadata, config, schema);
+
+        } catch (Exception e) {
+            logger.error("Error in update task at commit " + commitTime, e);
+            writeStatus.setGlobalError(e);
+        }
+    }
+
+
+    private void writeUpdateRecord(HoodieRecord<T> hoodieRecord, IndexedRecord indexedRecord) {
+        try {
+            storageWriter.writeAvroWithMetadata(indexedRecord, hoodieRecord);
+            hoodieRecord.deflate();
+            writeStatus.markSuccess(hoodieRecord);
+            recordsWritten ++;
+            updatedRecordsWritten ++;
+        } catch (Exception e) {
+            logger.error("Error writing record  "+ hoodieRecord, e);
+            writeStatus.markFailure(hoodieRecord, e);
+        }
+    }
+
+    /**
+     * Go through an old record. Here if we detect a newer version shows up, we write the new one to the file.
+     */
+    public void write(GenericRecord oldRecord) {
+        String key = oldRecord.get(HoodieRecord.RECORD_KEY_METADATA_FIELD).toString();
+        HoodieRecord<T> hoodieRecord = keyToNewRecords.get(key);
+        if (keyToNewRecords.containsKey(key)) {
+            try {
+                IndexedRecord avroRecord = hoodieRecord.getData().combineAndGetUpdateValue(oldRecord, schema);
+                writeUpdateRecord(hoodieRecord, avroRecord);
+                keyToNewRecords.remove(key);
+            } catch (Exception e) {
+                throw new HoodieUpsertException("Failed to combine/merge new record with old value in storage, for new record {"
+                        + keyToNewRecords.get(key) + "}, old value {" + oldRecord + "}", e);
+            }
+        } else {
+            // this should work as it is, since this is an existing record
+            String errMsg = "Failed to merge old record into new file for key " + key + " from old file "
+                + getOldFilePath() + " to new file " + newFilePath;
+            try {
+                storageWriter.writeAvro(key, oldRecord);
+            } catch (ClassCastException e) {
+                logger.error(
+                    "Schema mismatch when rewriting old record " + oldRecord + " from file "
+                        + getOldFilePath() + " to file " + newFilePath + " with schema " + schema
+                        .toString(true));
+                throw new HoodieUpsertException(errMsg, e);
+            } catch (IOException e) {
+                logger.error("Failed to merge old record into new file for key " + key + " from old file "
+                    + getOldFilePath() + " to new file " + newFilePath, e);
+                throw new HoodieUpsertException(errMsg, e);
+            }
+            recordsWritten ++;
+        }
+    }
+
+    public void close() {
+        try {
+            // write out any pending records (this can happen when inserts are turned into updates)
+            Iterator<String> pendingRecordsItr = keyToNewRecords.keySet().iterator();
+            while (pendingRecordsItr.hasNext()) {
+                String key = pendingRecordsItr.next();
+                HoodieRecord<T> hoodieRecord = keyToNewRecords.get(key);
+                writeUpdateRecord(hoodieRecord, hoodieRecord.getData().getInsertValue(schema));
+            }
+            keyToNewRecords.clear();
+
+            if (storageWriter != null) {
+                storageWriter.close();
+            }
+            writeStatus.getStat().setTotalWriteBytes(FSUtils.getFileSize(fs, newFilePath));
+            writeStatus.getStat().setNumWrites(recordsWritten);
+            writeStatus.getStat().setNumUpdateWrites(updatedRecordsWritten);
+            writeStatus.getStat().setTotalWriteErrors(writeStatus.getFailedRecords().size());
+        } catch (IOException e) {
+            throw new HoodieUpsertException("Failed to close UpdateHandle", e);
+        }
+    }
+
+    public Path getOldFilePath() {
+        return oldFilePath;
+    }
+
+    public WriteStatus getWriteStatus() {
+        return writeStatus;
+    }
+}
--- a/hoodie-client/src/main/java/com/uber/hoodie/io/storage/HoodieParquetConfig.java
+++ b/hoodie-client/src/main/java/com/uber/hoodie/io/storage/HoodieParquetConfig.java
@@ -0,0 +1,66 @@
+/*
+ * Copyright (c) 2016 Uber Technologies, Inc. (hoodie-dev-group@uber.com)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *          http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package com.uber.hoodie.io.storage;
+
+import com.uber.hoodie.avro.HoodieAvroWriteSupport;
+import org.apache.avro.Schema;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.parquet.hadoop.metadata.CompressionCodecName;
+
+public class HoodieParquetConfig {
+    private HoodieAvroWriteSupport writeSupport;
+    private CompressionCodecName compressionCodecName;
+    private int blockSize;
+    private int pageSize;
+    private int maxFileSize;
+    private Configuration hadoopConf;
+
+    public HoodieParquetConfig(HoodieAvroWriteSupport writeSupport,
+        CompressionCodecName compressionCodecName, int blockSize, int pageSize, int maxFileSize,
+        Configuration hadoopConf) {
+        this.writeSupport = writeSupport;
+        this.compressionCodecName = compressionCodecName;
+        this.blockSize = blockSize;
+        this.pageSize = pageSize;
+        this.maxFileSize = maxFileSize;
+        this.hadoopConf = hadoopConf;
+    }
+
+    public HoodieAvroWriteSupport getWriteSupport() {
+        return writeSupport;
+    }
+
+    public CompressionCodecName getCompressionCodecName() {
+        return compressionCodecName;
+    }
+
+    public int getBlockSize() {
+        return blockSize;
+    }
+
+    public int getPageSize() {
+        return pageSize;
+    }
+
+    public int getMaxFileSize() {
+        return maxFileSize;
+    }
+
+    public Configuration getHadoopConf() {
+        return hadoopConf;
+    }
+}
--- a/hoodie-client/src/main/java/com/uber/hoodie/io/storage/HoodieParquetWriter.java
+++ b/hoodie-client/src/main/java/com/uber/hoodie/io/storage/HoodieParquetWriter.java
@@ -0,0 +1,107 @@
+/*
+ * Copyright (c) 2016 Uber Technologies, Inc. (hoodie-dev-group@uber.com)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *          http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package com.uber.hoodie.io.storage;
+
+import com.uber.hoodie.avro.HoodieAvroWriteSupport;
+import com.uber.hoodie.common.model.HoodieRecord;
+import com.uber.hoodie.common.model.HoodieRecordPayload;
+import com.uber.hoodie.common.util.HoodieAvroUtils;
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.IndexedRecord;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.parquet.hadoop.ParquetFileWriter;
+import org.apache.parquet.hadoop.ParquetWriter;
+import org.apache.spark.TaskContext;
+
+import java.io.IOException;
+import java.util.concurrent.atomic.AtomicLong;
+
+/**
+ * HoodieParquetWriter extends the ParquetWriter to help limit the size of underlying file.
+ * Provides a way to check if the current file can take more records with the <code>canWrite()</code>
+ *
+ * @param <T>
+ */
+public class HoodieParquetWriter<T extends HoodieRecordPayload, R extends IndexedRecord>
+    extends ParquetWriter<IndexedRecord> implements HoodieStorageWriter<R> {
+    private static double STREAM_COMPRESSION_RATIO = 0.1;
+    private static AtomicLong recordIndex = new AtomicLong(1);
+
+    private final Path file;
+    private final HoodieWrapperFileSystem fs;
+    private final long maxFileSize;
+    private final HoodieAvroWriteSupport writeSupport;
+    private final String commitTime;
+    private final Schema schema;
+
+
+    private static Configuration registerFileSystem(Configuration conf) {
+        Configuration returnConf = new Configuration(conf);
+        String scheme = FileSystem.getDefaultUri(conf).getScheme();
+        returnConf.set("fs." + HoodieWrapperFileSystem.getHoodieScheme(scheme) + ".impl",
+            HoodieWrapperFileSystem.class.getName());
+        return returnConf;
+    }
+
+    public HoodieParquetWriter(String commitTime, Path file,
+        HoodieParquetConfig parquetConfig, Schema schema) throws IOException {
+        super(HoodieWrapperFileSystem.convertToHoodiePath(file, parquetConfig.getHadoopConf()),
+            ParquetFileWriter.Mode.CREATE, parquetConfig.getWriteSupport(),
+            parquetConfig.getCompressionCodecName(), parquetConfig.getBlockSize(),
+            parquetConfig.getPageSize(), parquetConfig.getPageSize(),
+            ParquetWriter.DEFAULT_IS_DICTIONARY_ENABLED,
+            ParquetWriter.DEFAULT_IS_VALIDATING_ENABLED, ParquetWriter.DEFAULT_WRITER_VERSION,
+            registerFileSystem(parquetConfig.getHadoopConf()));
+        this.file =
+            HoodieWrapperFileSystem.convertToHoodiePath(file, parquetConfig.getHadoopConf());
+        this.fs = (HoodieWrapperFileSystem) this.file
+            .getFileSystem(registerFileSystem(parquetConfig.getHadoopConf()));
+        // We cannot accurately measure the snappy compressed output file size. We are choosing a conservative 10%
+        // TODO - compute this compression ratio dynamically by looking at the bytes written to the stream and the actual file size reported by HDFS
+        this.maxFileSize = parquetConfig.getMaxFileSize() + Math
+            .round(parquetConfig.getMaxFileSize() * STREAM_COMPRESSION_RATIO);
+        this.writeSupport = parquetConfig.getWriteSupport();
+        this.commitTime = commitTime;
+        this.schema = schema;
+    }
+
+
+    @Override
+    public void writeAvroWithMetadata(R avroRecord, HoodieRecord record) throws IOException {
+        String seqId = HoodieRecord.generateSequenceId(commitTime, TaskContext.getPartitionId(),
+                recordIndex.getAndIncrement());
+        HoodieAvroUtils.addHoodieKeyToRecord((GenericRecord) avroRecord,
+                record.getRecordKey(),
+                record.getPartitionPath(),
+                file.getName());
+        HoodieAvroUtils.addCommitMetadataToRecord((GenericRecord) avroRecord, commitTime, seqId);
+        super.write(avroRecord);
+        writeSupport.add(record.getRecordKey());
+    }
+
+    public boolean canWrite() {
+        return fs.getBytesWritten(file) < maxFileSize;
+    }
+
+    @Override public void writeAvro(String key, IndexedRecord object) throws IOException {
+        super.write(object);
+        writeSupport.add(key);
+    }
+}
--- a/hoodie-client/src/main/java/com/uber/hoodie/io/storage/HoodieStorageWriter.java
+++ b/hoodie-client/src/main/java/com/uber/hoodie/io/storage/HoodieStorageWriter.java
@@ -0,0 +1,29 @@
+/*
+ * Copyright (c) 2016 Uber Technologies, Inc. (hoodie-dev-group@uber.com)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *          http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package com.uber.hoodie.io.storage;
+
+import com.uber.hoodie.common.model.HoodieRecord;
+import org.apache.avro.generic.IndexedRecord;
+
+import java.io.IOException;
+
+public interface HoodieStorageWriter<R extends IndexedRecord> {
+    void writeAvroWithMetadata(R newRecord, HoodieRecord record) throws IOException;
+    boolean canWrite();
+    void close() throws IOException;
+    void writeAvro(String key, R oldRecord) throws IOException;
+}
--- a/hoodie-client/src/main/java/com/uber/hoodie/io/storage/HoodieStorageWriterFactory.java
+++ b/hoodie-client/src/main/java/com/uber/hoodie/io/storage/HoodieStorageWriterFactory.java
@@ -0,0 +1,56 @@
+/*
+ * Copyright (c) 2016 Uber Technologies, Inc. (hoodie-dev-group@uber.com)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *          http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package com.uber.hoodie.io.storage;
+
+import com.uber.hoodie.config.HoodieWriteConfig;
+import com.uber.hoodie.avro.HoodieAvroWriteSupport;
+import com.uber.hoodie.common.BloomFilter;
+import com.uber.hoodie.common.model.HoodieRecordPayload;
+import com.uber.hoodie.common.model.HoodieTableMetadata;
+import com.uber.hoodie.common.util.FSUtils;
+import org.apache.avro.Schema;
+import org.apache.avro.generic.IndexedRecord;
+import org.apache.hadoop.fs.Path;
+import org.apache.parquet.avro.AvroSchemaConverter;
+import org.apache.parquet.hadoop.metadata.CompressionCodecName;
+
+import java.io.IOException;
+
+public class HoodieStorageWriterFactory {
+    public static <T extends HoodieRecordPayload, R extends IndexedRecord> HoodieStorageWriter<R> getStorageWriter(
+            String commitTime, Path path, HoodieTableMetadata metadata, HoodieWriteConfig config, Schema schema)
+        throws IOException {
+        //TODO - based on the metadata choose the implementation of HoodieStorageWriter
+        // Currently only parquet is supported
+        return newParquetStorageWriter(commitTime, path, config, schema);
+    }
+
+    private static <T extends HoodieRecordPayload, R extends IndexedRecord> HoodieStorageWriter<R> newParquetStorageWriter(
+            String commitTime, Path path, HoodieWriteConfig config, Schema schema) throws IOException {
+        BloomFilter filter =
+            new BloomFilter(config.getBloomFilterNumEntries(), config.getBloomFilterFPP());
+        HoodieAvroWriteSupport writeSupport =
+            new HoodieAvroWriteSupport(new AvroSchemaConverter().convert(schema), schema, filter);
+
+        HoodieParquetConfig parquetConfig =
+            new HoodieParquetConfig(writeSupport, CompressionCodecName.GZIP,
+                config.getParquetBlockSize(), config.getParquetPageSize(),
+                config.getParquetMaxFileSize(), FSUtils.getFs().getConf());
+
+        return new HoodieParquetWriter<>(commitTime, path, parquetConfig, schema);
+    }
+}
--- a/hoodie-client/src/main/java/com/uber/hoodie/io/storage/HoodieWrapperFileSystem.java
+++ b/hoodie-client/src/main/java/com/uber/hoodie/io/storage/HoodieWrapperFileSystem.java
@@ -0,0 +1,677 @@
+/*
+ * Copyright (c) 2016 Uber Technologies, Inc. (hoodie-dev-group@uber.com)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *          http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package com.uber.hoodie.io.storage;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.*;
+import org.apache.hadoop.fs.permission.AclEntry;
+import org.apache.hadoop.fs.permission.AclStatus;
+import org.apache.hadoop.fs.permission.FsAction;
+import org.apache.hadoop.fs.permission.FsPermission;
+import org.apache.hadoop.security.AccessControlException;
+import org.apache.hadoop.security.Credentials;
+import org.apache.hadoop.security.token.Token;
+import org.apache.hadoop.util.Progressable;
+
+import java.io.FileNotFoundException;
+import java.io.IOException;
+import java.net.URI;
+import java.net.URISyntaxException;
+import java.util.EnumSet;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+
+/**
+ * HoodieWrapperFileSystem wraps the default file system.
+ * It holds state about the open streams in the file system to support getting the
+ * written size to each of the open streams.
+ */
+public class HoodieWrapperFileSystem extends FileSystem {
+    private static final Set<String> SUPPORT_SCHEMES;
+    public static final String HOODIE_SCHEME_PREFIX = "hoodie-";
+
+    static {
+        SUPPORT_SCHEMES = new HashSet<>(2);
+        SUPPORT_SCHEMES.add("file");
+        SUPPORT_SCHEMES.add("hdfs");
+    }
+
+    private ConcurrentMap<String, SizeAwareFSDataOutputStream> openStreams =
+        new ConcurrentHashMap<>();
+    private FileSystem fileSystem;
+    private URI uri;
+
+    @Override public void initialize(URI uri, Configuration conf) throws IOException {
+        // Get the default filesystem to decorate
+        fileSystem = FileSystem.get(conf);
+        // Do not need to explicitly initialize the default filesystem, its done already in the above FileSystem.get
+        // fileSystem.initialize(FileSystem.getDefaultUri(conf), conf);
+        // fileSystem.setConf(conf);
+        this.uri = uri;
+    }
+
+    @Override public URI getUri() {
+        return uri;
+    }
+
+    @Override public FSDataInputStream open(Path f, int bufferSize) throws IOException {
+        return fileSystem.open(convertToDefaultPath(f), bufferSize);
+    }
+
+    @Override public FSDataOutputStream create(Path f, FsPermission permission, boolean overwrite,
+        int bufferSize, short replication, long blockSize, Progressable progress)
+        throws IOException {
+        final Path translatedPath = convertToDefaultPath(f);
+        return wrapOutputStream(f, fileSystem
+            .create(translatedPath, permission, overwrite, bufferSize, replication, blockSize,
+                progress));
+    }
+
+    private FSDataOutputStream wrapOutputStream(final Path path,
+        FSDataOutputStream fsDataOutputStream) throws IOException {
+        if (fsDataOutputStream instanceof SizeAwareFSDataOutputStream) {
+            return fsDataOutputStream;
+        }
+
+        SizeAwareFSDataOutputStream os =
+            new SizeAwareFSDataOutputStream(fsDataOutputStream, new Runnable() {
+                @Override public void run() {
+                    openStreams.remove(path.getName());
+                }
+            });
+        openStreams.put(path.getName(), os);
+        return os;
+    }
+
+    @Override public FSDataOutputStream create(Path f, boolean overwrite) throws IOException {
+        return wrapOutputStream(f, fileSystem.create(convertToDefaultPath(f), overwrite));
+    }
+
+    @Override public FSDataOutputStream create(Path f) throws IOException {
+        return wrapOutputStream(f, fileSystem.create(convertToDefaultPath(f)));
+    }
+
+    @Override public FSDataOutputStream create(Path f, Progressable progress) throws IOException {
+        return fileSystem.create(convertToDefaultPath(f), progress);
+    }
+
+    @Override public FSDataOutputStream create(Path f, short replication) throws IOException {
+        return fileSystem.create(convertToDefaultPath(f), replication);
+    }
+
+    @Override public FSDataOutputStream create(Path f, short replication, Progressable progress)
+        throws IOException {
+        return fileSystem.create(convertToDefaultPath(f), replication, progress);
+    }
+
+    @Override public FSDataOutputStream create(Path f, boolean overwrite, int bufferSize)
+        throws IOException {
+        return fileSystem.create(convertToDefaultPath(f), overwrite, bufferSize);
+    }
+
+    @Override public FSDataOutputStream create(Path f, boolean overwrite, int bufferSize,
+        Progressable progress) throws IOException {
+        return fileSystem.create(convertToDefaultPath(f), overwrite, bufferSize, progress);
+    }
+
+    @Override
+    public FSDataOutputStream create(Path f, boolean overwrite, int bufferSize, short replication,
+        long blockSize, Progressable progress) throws IOException {
+        return fileSystem
+            .create(convertToDefaultPath(f), overwrite, bufferSize, replication, blockSize,
+                progress);
+    }
+
+    @Override
+    public FSDataOutputStream create(Path f, FsPermission permission, EnumSet<CreateFlag> flags,
+        int bufferSize, short replication, long blockSize, Progressable progress)
+        throws IOException {
+        return fileSystem
+            .create(convertToDefaultPath(f), permission, flags, bufferSize, replication, blockSize,
+                progress);
+    }
+
+    @Override
+    public FSDataOutputStream create(Path f, FsPermission permission, EnumSet<CreateFlag> flags,
+        int bufferSize, short replication, long blockSize, Progressable progress,
+        Options.ChecksumOpt checksumOpt) throws IOException {
+        return fileSystem
+            .create(convertToDefaultPath(f), permission, flags, bufferSize, replication, blockSize,
+                progress, checksumOpt);
+    }
+
+
+    @Override
+    public FSDataOutputStream create(Path f, boolean overwrite, int bufferSize, short replication,
+        long blockSize) throws IOException {
+        return fileSystem
+            .create(convertToDefaultPath(f), overwrite, bufferSize, replication, blockSize);
+    }
+
+
+    @Override public FSDataOutputStream append(Path f, int bufferSize, Progressable progress)
+        throws IOException {
+        return fileSystem.append(convertToDefaultPath(f), bufferSize, progress);
+    }
+
+    @Override public boolean rename(Path src, Path dst) throws IOException {
+        return fileSystem.rename(convertToDefaultPath(src), convertToDefaultPath(dst));
+    }
+
+    @Override public boolean delete(Path f, boolean recursive) throws IOException {
+        return fileSystem.delete(convertToDefaultPath(f), recursive);
+    }
+
+    @Override public FileStatus[] listStatus(Path f) throws FileNotFoundException, IOException {
+        return fileSystem.listStatus(convertToDefaultPath(f));
+    }
+
+    @Override public void setWorkingDirectory(Path new_dir) {
+        fileSystem.setWorkingDirectory(convertToDefaultPath(new_dir));
+    }
+
+    @Override public Path getWorkingDirectory() {
+        return convertToHoodiePath(fileSystem.getWorkingDirectory());
+    }
+
+    @Override public boolean mkdirs(Path f, FsPermission permission) throws IOException {
+        return fileSystem.mkdirs(convertToDefaultPath(f), permission);
+    }
+
+    @Override public FileStatus getFileStatus(Path f) throws IOException {
+        return fileSystem.getFileStatus(convertToDefaultPath(f));
+    }
+
+    @Override public String getScheme() {
+        return uri.getScheme();
+    }
+
+    @Override public String getCanonicalServiceName() {
+        return fileSystem.getCanonicalServiceName();
+    }
+
+    @Override public String getName() {
+        return fileSystem.getName();
+    }
+
+    @Override public Path makeQualified(Path path) {
+        return convertToHoodiePath(fileSystem.makeQualified(convertToDefaultPath(path)));
+    }
+
+    @Override public Token<?> getDelegationToken(String renewer) throws IOException {
+        return fileSystem.getDelegationToken(renewer);
+    }
+
+    @Override public Token<?>[] addDelegationTokens(String renewer, Credentials credentials)
+        throws IOException {
+        return fileSystem.addDelegationTokens(renewer, credentials);
+    }
+
+    @Override public FileSystem[] getChildFileSystems() {
+        return fileSystem.getChildFileSystems();
+    }
+
+    @Override public BlockLocation[] getFileBlockLocations(FileStatus file, long start, long len)
+        throws IOException {
+        return fileSystem.getFileBlockLocations(file, start, len);
+    }
+
+    @Override public BlockLocation[] getFileBlockLocations(Path p, long start, long len)
+        throws IOException {
+        return fileSystem.getFileBlockLocations(convertToDefaultPath(p), start, len);
+    }
+
+    @Override public FsServerDefaults getServerDefaults() throws IOException {
+        return fileSystem.getServerDefaults();
+    }
+
+    @Override public FsServerDefaults getServerDefaults(Path p) throws IOException {
+        return fileSystem.getServerDefaults(convertToDefaultPath(p));
+    }
+
+    @Override public Path resolvePath(Path p) throws IOException {
+        return convertToHoodiePath(fileSystem.resolvePath(convertToDefaultPath(p)));
+    }
+
+    @Override public FSDataInputStream open(Path f) throws IOException {
+        return fileSystem.open(convertToDefaultPath(f));
+    }
+
+    @Override
+    public FSDataOutputStream createNonRecursive(Path f, boolean overwrite, int bufferSize,
+        short replication, long blockSize, Progressable progress) throws IOException {
+        return fileSystem
+            .createNonRecursive(convertToDefaultPath(f), overwrite, bufferSize, replication,
+                blockSize, progress);
+    }
+
+    @Override
+    public FSDataOutputStream createNonRecursive(Path f, FsPermission permission, boolean overwrite,
+        int bufferSize, short replication, long blockSize, Progressable progress)
+        throws IOException {
+        return fileSystem
+            .createNonRecursive(convertToDefaultPath(f), permission, overwrite, bufferSize,
+                replication, blockSize, progress);
+    }
+
+    @Override public FSDataOutputStream createNonRecursive(Path f, FsPermission permission,
+        EnumSet<CreateFlag> flags, int bufferSize, short replication, long blockSize,
+        Progressable progress) throws IOException {
+        return fileSystem
+            .createNonRecursive(convertToDefaultPath(f), permission, flags, bufferSize, replication,
+                blockSize, progress);
+    }
+
+    @Override public boolean createNewFile(Path f) throws IOException {
+        return fileSystem.createNewFile(convertToDefaultPath(f));
+    }
+
+    @Override public FSDataOutputStream append(Path f) throws IOException {
+        return fileSystem.append(convertToDefaultPath(f));
+    }
+
+    @Override public FSDataOutputStream append(Path f, int bufferSize) throws IOException {
+        return fileSystem.append(convertToDefaultPath(f), bufferSize);
+    }
+
+    @Override public void concat(Path trg, Path[] psrcs) throws IOException {
+        Path[] psrcsNew = convertDefaults(psrcs);
+        fileSystem.concat(convertToDefaultPath(trg), psrcsNew);
+    }
+
+    @Override public short getReplication(Path src) throws IOException {
+        return fileSystem.getReplication(convertToDefaultPath(src));
+    }
+
+    @Override public boolean setReplication(Path src, short replication) throws IOException {
+        return fileSystem.setReplication(convertToDefaultPath(src), replication);
+    }
+
+    @Override public boolean delete(Path f) throws IOException {
+        return fileSystem.delete(convertToDefaultPath(f));
+    }
+
+    @Override public boolean deleteOnExit(Path f) throws IOException {
+        return fileSystem.deleteOnExit(convertToDefaultPath(f));
+    }
+
+    @Override public boolean cancelDeleteOnExit(Path f) {
+        return fileSystem.cancelDeleteOnExit(convertToDefaultPath(f));
+    }
+
+    @Override public boolean exists(Path f) throws IOException {
+        return fileSystem.exists(convertToDefaultPath(f));
+    }
+
+    @Override public boolean isDirectory(Path f) throws IOException {
+        return fileSystem.isDirectory(convertToDefaultPath(f));
+    }
+
+    @Override public boolean isFile(Path f) throws IOException {
+        return fileSystem.isFile(convertToDefaultPath(f));
+    }
+
+    @Override public long getLength(Path f) throws IOException {
+        return fileSystem.getLength(convertToDefaultPath(f));
+    }
+
+    @Override public ContentSummary getContentSummary(Path f) throws IOException {
+        return fileSystem.getContentSummary(convertToDefaultPath(f));
+    }
+
+    @Override public RemoteIterator<Path> listCorruptFileBlocks(Path path) throws IOException {
+        return fileSystem.listCorruptFileBlocks(convertToDefaultPath(path));
+    }
+
+    @Override public FileStatus[] listStatus(Path f, PathFilter filter)
+        throws FileNotFoundException, IOException {
+        return fileSystem.listStatus(convertToDefaultPath(f), filter);
+    }
+
+    @Override public FileStatus[] listStatus(Path[] files)
+        throws FileNotFoundException, IOException {
+        return fileSystem.listStatus(convertDefaults(files));
+    }
+
+    @Override public FileStatus[] listStatus(Path[] files, PathFilter filter)
+        throws FileNotFoundException, IOException {
+        return fileSystem.listStatus(convertDefaults(files), filter);
+    }
+
+    @Override public FileStatus[] globStatus(Path pathPattern) throws IOException {
+        return fileSystem.globStatus(convertToDefaultPath(pathPattern));
+    }
+
+    @Override public FileStatus[] globStatus(Path pathPattern, PathFilter filter)
+        throws IOException {
+        return fileSystem.globStatus(convertToDefaultPath(pathPattern), filter);
+    }
+
+    @Override public RemoteIterator<LocatedFileStatus> listLocatedStatus(Path f)
+        throws FileNotFoundException, IOException {
+        return fileSystem.listLocatedStatus(convertToDefaultPath(f));
+    }
+
+    @Override public RemoteIterator<LocatedFileStatus> listFiles(Path f, boolean recursive)
+        throws FileNotFoundException, IOException {
+        return fileSystem.listFiles(convertToDefaultPath(f), recursive);
+    }
+
+    @Override public Path getHomeDirectory() {
+        return convertToHoodiePath(fileSystem.getHomeDirectory());
+    }
+
+    @Override public boolean mkdirs(Path f) throws IOException {
+        return fileSystem.mkdirs(convertToDefaultPath(f));
+    }
+
+    @Override public void copyFromLocalFile(Path src, Path dst) throws IOException {
+        fileSystem.copyFromLocalFile(convertToDefaultPath(src), convertToDefaultPath(dst));
+    }
+
+    @Override public void moveFromLocalFile(Path[] srcs, Path dst) throws IOException {
+        fileSystem.moveFromLocalFile(convertDefaults(srcs), convertToDefaultPath(dst));
+    }
+
+    @Override public void moveFromLocalFile(Path src, Path dst) throws IOException {
+        fileSystem.moveFromLocalFile(convertToDefaultPath(src), convertToDefaultPath(dst));
+    }
+
+    @Override public void copyFromLocalFile(boolean delSrc, Path src, Path dst) throws IOException {
+        fileSystem.copyFromLocalFile(delSrc, convertToDefaultPath(src), convertToDefaultPath(dst));
+    }
+
+    @Override
+    public void copyFromLocalFile(boolean delSrc, boolean overwrite, Path[] srcs, Path dst)
+        throws IOException {
+        fileSystem
+            .copyFromLocalFile(delSrc, overwrite, convertDefaults(srcs), convertToDefaultPath(dst));
+    }
+
+    @Override public void copyFromLocalFile(boolean delSrc, boolean overwrite, Path src, Path dst)
+        throws IOException {
+        fileSystem.copyFromLocalFile(delSrc, overwrite, convertToDefaultPath(src),
+            convertToDefaultPath(dst));
+    }
+
+    @Override public void copyToLocalFile(Path src, Path dst) throws IOException {
+        fileSystem.copyToLocalFile(convertToDefaultPath(src), convertToDefaultPath(dst));
+    }
+
+    @Override public void moveToLocalFile(Path src, Path dst) throws IOException {
+        fileSystem.moveToLocalFile(convertToDefaultPath(src), convertToDefaultPath(dst));
+    }
+
+    @Override public void copyToLocalFile(boolean delSrc, Path src, Path dst) throws IOException {
+        fileSystem.copyToLocalFile(delSrc, convertToDefaultPath(src), convertToDefaultPath(dst));
+    }
+
+    @Override
+    public void copyToLocalFile(boolean delSrc, Path src, Path dst, boolean useRawLocalFileSystem)
+        throws IOException {
+        fileSystem.copyToLocalFile(delSrc, convertToDefaultPath(src), convertToDefaultPath(dst),
+            useRawLocalFileSystem);
+    }
+
+    @Override public Path startLocalOutput(Path fsOutputFile, Path tmpLocalFile)
+        throws IOException {
+        return convertToHoodiePath(fileSystem.startLocalOutput(convertToDefaultPath(fsOutputFile),
+            convertToDefaultPath(tmpLocalFile)));
+    }
+
+    @Override public void completeLocalOutput(Path fsOutputFile, Path tmpLocalFile)
+        throws IOException {
+        fileSystem.completeLocalOutput(convertToDefaultPath(fsOutputFile),
+            convertToDefaultPath(tmpLocalFile));
+    }
+
+    @Override public void close() throws IOException {
+        fileSystem.close();
+    }
+
+    @Override public long getUsed() throws IOException {
+        return fileSystem.getUsed();
+    }
+
+    @Override public long getBlockSize(Path f) throws IOException {
+        return fileSystem.getBlockSize(convertToDefaultPath(f));
+    }
+
+    @Override public long getDefaultBlockSize() {
+        return fileSystem.getDefaultBlockSize();
+    }
+
+    @Override public long getDefaultBlockSize(Path f) {
+        return fileSystem.getDefaultBlockSize(convertToDefaultPath(f));
+    }
+
+    @Override public short getDefaultReplication() {
+        return fileSystem.getDefaultReplication();
+    }
+
+    @Override public short getDefaultReplication(Path path) {
+        return fileSystem.getDefaultReplication(convertToDefaultPath(path));
+    }
+
+    @Override public void access(Path path, FsAction mode)
+        throws AccessControlException, FileNotFoundException, IOException {
+        fileSystem.access(convertToDefaultPath(path), mode);
+    }
+
+    @Override public void createSymlink(Path target, Path link, boolean createParent)
+        throws AccessControlException, FileAlreadyExistsException, FileNotFoundException,
+        ParentNotDirectoryException, UnsupportedFileSystemException, IOException {
+        fileSystem
+            .createSymlink(convertToDefaultPath(target), convertToDefaultPath(link), createParent);
+    }
+
+    @Override public FileStatus getFileLinkStatus(Path f)
+        throws AccessControlException, FileNotFoundException, UnsupportedFileSystemException,
+        IOException {
+        return fileSystem.getFileLinkStatus(convertToDefaultPath(f));
+    }
+
+    @Override public boolean supportsSymlinks() {
+        return fileSystem.supportsSymlinks();
+    }
+
+    @Override public Path getLinkTarget(Path f) throws IOException {
+        return convertToHoodiePath(fileSystem.getLinkTarget(convertToDefaultPath(f)));
+    }
+
+    @Override public FileChecksum getFileChecksum(Path f) throws IOException {
+        return fileSystem.getFileChecksum(convertToDefaultPath(f));
+    }
+
+    @Override public FileChecksum getFileChecksum(Path f, long length) throws IOException {
+        return fileSystem.getFileChecksum(convertToDefaultPath(f), length);
+    }
+
+    @Override public void setVerifyChecksum(boolean verifyChecksum) {
+        fileSystem.setVerifyChecksum(verifyChecksum);
+    }
+
+    @Override public void setWriteChecksum(boolean writeChecksum) {
+        fileSystem.setWriteChecksum(writeChecksum);
+    }
+
+    @Override public FsStatus getStatus() throws IOException {
+        return fileSystem.getStatus();
+    }
+
+    @Override public FsStatus getStatus(Path p) throws IOException {
+        return fileSystem.getStatus(convertToDefaultPath(p));
+    }
+
+    @Override public void setPermission(Path p, FsPermission permission) throws IOException {
+        fileSystem.setPermission(convertToDefaultPath(p), permission);
+    }
+
+    @Override public void setOwner(Path p, String username, String groupname) throws IOException {
+        fileSystem.setOwner(convertToDefaultPath(p), username, groupname);
+    }
+
+    @Override public void setTimes(Path p, long mtime, long atime) throws IOException {
+        fileSystem.setTimes(convertToDefaultPath(p), mtime, atime);
+    }
+
+    @Override public Path createSnapshot(Path path, String snapshotName) throws IOException {
+        return convertToHoodiePath(
+            fileSystem.createSnapshot(convertToDefaultPath(path), snapshotName));
+    }
+
+    @Override public void renameSnapshot(Path path, String snapshotOldName, String snapshotNewName)
+        throws IOException {
+        fileSystem.renameSnapshot(convertToDefaultPath(path), snapshotOldName, snapshotNewName);
+    }
+
+    @Override public void deleteSnapshot(Path path, String snapshotName) throws IOException {
+        fileSystem.deleteSnapshot(convertToDefaultPath(path), snapshotName);
+    }
+
+    @Override public void modifyAclEntries(Path path, List<AclEntry> aclSpec) throws IOException {
+        fileSystem.modifyAclEntries(convertToDefaultPath(path), aclSpec);
+    }
+
+    @Override public void removeAclEntries(Path path, List<AclEntry> aclSpec) throws IOException {
+        fileSystem.removeAclEntries(convertToDefaultPath(path), aclSpec);
+    }
+
+    @Override public void removeDefaultAcl(Path path) throws IOException {
+        fileSystem.removeDefaultAcl(convertToDefaultPath(path));
+    }
+
+    @Override public void removeAcl(Path path) throws IOException {
+        fileSystem.removeAcl(convertToDefaultPath(path));
+    }
+
+    @Override public void setAcl(Path path, List<AclEntry> aclSpec) throws IOException {
+        fileSystem.setAcl(convertToDefaultPath(path), aclSpec);
+    }
+
+    @Override public AclStatus getAclStatus(Path path) throws IOException {
+        return fileSystem.getAclStatus(convertToDefaultPath(path));
+    }
+
+    @Override public void setXAttr(Path path, String name, byte[] value) throws IOException {
+        fileSystem.setXAttr(convertToDefaultPath(path), name, value);
+    }
+
+    @Override public void setXAttr(Path path, String name, byte[] value, EnumSet<XAttrSetFlag> flag)
+        throws IOException {
+        fileSystem.setXAttr(convertToDefaultPath(path), name, value, flag);
+    }
+
+    @Override public byte[] getXAttr(Path path, String name) throws IOException {
+        return fileSystem.getXAttr(convertToDefaultPath(path), name);
+    }
+
+    @Override public Map<String, byte[]> getXAttrs(Path path) throws IOException {
+        return fileSystem.getXAttrs(convertToDefaultPath(path));
+    }
+
+    @Override public Map<String, byte[]> getXAttrs(Path path, List<String> names)
+        throws IOException {
+        return fileSystem.getXAttrs(convertToDefaultPath(path), names);
+    }
+
+    @Override public List<String> listXAttrs(Path path) throws IOException {
+        return fileSystem.listXAttrs(convertToDefaultPath(path));
+    }
+
+    @Override public void removeXAttr(Path path, String name) throws IOException {
+        fileSystem.removeXAttr(convertToDefaultPath(path), name);
+    }
+
+    @Override public void setConf(Configuration conf) {
+        // ignore this. we will set conf on init
+    }
+
+    @Override public Configuration getConf() {
+        return fileSystem.getConf();
+    }
+
+    @Override public int hashCode() {
+        return fileSystem.hashCode();
+    }
+
+    @Override public boolean equals(Object obj) {
+        return fileSystem.equals(obj);
+    }
+
+    @Override public String toString() {
+        return fileSystem.toString();
+    }
+
+    public Path convertToHoodiePath(Path oldPath) {
+        return convertPathWithScheme(oldPath, getHoodieScheme(fileSystem.getScheme()));
+    }
+
+    public static Path convertToHoodiePath(Path file, Configuration conf) {
+        String scheme = FileSystem.getDefaultUri(conf).getScheme();
+        return convertPathWithScheme(file, getHoodieScheme(scheme));
+    }
+
+    private Path convertToDefaultPath(Path oldPath) {
+        return convertPathWithScheme(oldPath, fileSystem.getScheme());
+    }
+
+    private Path[] convertDefaults(Path[] psrcs) {
+        Path[] psrcsNew = new Path[psrcs.length];
+        for (int i = 0; i < psrcs.length; i++) {
+            psrcsNew[i] = convertToDefaultPath(psrcs[i]);
+        }
+        return psrcsNew;
+    }
+
+    private static Path convertPathWithScheme(Path oldPath, String newScheme) {
+        URI oldURI = oldPath.toUri();
+        URI newURI;
+        try {
+            newURI = new URI(newScheme, oldURI.getUserInfo(), oldURI.getHost(), oldURI.getPort(),
+                oldURI.getPath(), oldURI.getQuery(), oldURI.getFragment());
+            return new Path(newURI);
+        } catch (URISyntaxException e) {
+            // TODO - Better Exception handling
+            throw new RuntimeException(e);
+        }
+    }
+
+    public static String getHoodieScheme(String scheme) {
+        String newScheme;
+        if (SUPPORT_SCHEMES.contains(scheme)) {
+            newScheme = HOODIE_SCHEME_PREFIX + scheme;
+        } else {
+            throw new IllegalArgumentException(
+                "BlockAlignedAvroParquetWriter does not support schema " + scheme);
+        }
+        return newScheme;
+    }
+
+    public long getBytesWritten(Path file) {
+        if (openStreams.containsKey(file.getName())) {
+            return openStreams.get(file.getName()).getBytesWritten();
+        }
+        // When the file is first written, we do not have a track of it
+        throw new IllegalArgumentException(file.toString()
+            + " does not have a open stream. Cannot get the bytes written on the stream");
+    }
+}
--- a/hoodie-client/src/main/java/com/uber/hoodie/io/storage/SizeAwareFSDataOutputStream.java
+++ b/hoodie-client/src/main/java/com/uber/hoodie/io/storage/SizeAwareFSDataOutputStream.java
@@ -0,0 +1,59 @@
+/*
+ * Copyright (c) 2016 Uber Technologies, Inc. (hoodie-dev-group@uber.com)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *          http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package com.uber.hoodie.io.storage;
+
+import org.apache.hadoop.fs.FSDataOutputStream;
+
+import java.io.IOException;
+import java.util.concurrent.atomic.AtomicLong;
+
+/**
+ * Wrapper over <code>FSDataOutputStream</code> to keep track of the size of the written bytes.
+ * This gives a cheap way to check on the underlying file size.
+ */
+public class SizeAwareFSDataOutputStream extends FSDataOutputStream {
+    // A callback to call when the output stream is closed.
+    private final Runnable closeCallback;
+    // Keep track of the bytes written
+    private final AtomicLong bytesWritten = new AtomicLong(0L);
+
+    public SizeAwareFSDataOutputStream(FSDataOutputStream out, Runnable closeCallback)
+        throws IOException {
+        super(out);
+        this.closeCallback = closeCallback;
+    }
+
+    @Override public synchronized void write(byte[] b, int off, int len) throws IOException {
+        bytesWritten.addAndGet(len);
+        super.write(b, off, len);
+    }
+
+    @Override public void write(byte[] b) throws IOException {
+        bytesWritten.addAndGet(b.length);
+        super.write(b);
+    }
+
+    @Override public void close() throws IOException {
+        super.close();
+        closeCallback.run();
+    }
+
+    public long getBytesWritten() {
+        return bytesWritten.get();
+    }
+
+}
--- a/hoodie-client/src/main/java/com/uber/hoodie/metrics/HoodieMetrics.java
+++ b/hoodie-client/src/main/java/com/uber/hoodie/metrics/HoodieMetrics.java
@@ -0,0 +1,148 @@
+/*
+ * Copyright (c) 2016 Uber Technologies, Inc. (hoodie-dev-group@uber.com)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *          http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package com.uber.hoodie.metrics;
+
+import com.codahale.metrics.Gauge;
+import com.codahale.metrics.MetricRegistry;
+import com.codahale.metrics.Timer;
+import com.google.common.annotations.VisibleForTesting;
+import com.uber.hoodie.common.model.HoodieCommitMetadata;
+import com.uber.hoodie.config.HoodieWriteConfig;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+/**
+ * Wrapper for metrics-related operations.
+ */
+public class HoodieMetrics {
+    private HoodieWriteConfig config = null;
+    private String tableName = null;
+    private static Logger logger = LogManager.getLogger(HoodieMetrics.class);
+    // Some timers
+    public String rollbackTimerName = null;
+    public String cleanTimerName = null;
+    public  String commitTimerName = null;
+    private Timer rollbackTimer = null;
+    private Timer cleanTimer = null;
+    private Timer commitTimer = null;
+
+    public HoodieMetrics(HoodieWriteConfig config, String tableName) {
+        this.config = config;
+        this.tableName = tableName;
+        if (config.isMetricsOn()) {
+            Metrics.init(config);
+            this.rollbackTimerName = getMetricsName("timer", "rollback");
+            this.cleanTimerName = getMetricsName("timer", "clean");
+            this.commitTimerName = getMetricsName("timer", "commit");
+        }
+    }
+
+    private Timer createTimer(String name) {
+        return config.isMetricsOn() ? Metrics.getInstance().getRegistry().timer(name) : null;
+    }
+
+    public Timer.Context getRollbackCtx() {
+        if (config.isMetricsOn() && rollbackTimer == null) {
+            rollbackTimer = createTimer(rollbackTimerName);
+        }
+        return rollbackTimer == null ? null : rollbackTimer.time();
+    }
+
+    public Timer.Context getCleanCtx() {
+        if (config.isMetricsOn() && cleanTimer == null) {
+            cleanTimer = createTimer(cleanTimerName);
+        }
+        return cleanTimer == null ? null : cleanTimer.time();
+    }
+
+    public Timer.Context getCommitCtx() {
+        if (config.isMetricsOn() && commitTimer == null) {
+            commitTimer = createTimer(commitTimerName);
+        }
+        return commitTimer == null ? null : commitTimer.time();
+    }
+
+    public void updateCommitMetrics(long commitEpochTimeInMs, long durationInMs, HoodieCommitMetadata metadata) {
+        if (config.isMetricsOn()) {
+            long totalPartitionsWritten = metadata.fetchTotalPartitionsWritten();
+            long totalFilesInsert = metadata.fetchTotalFilesInsert();
+            long totalFilesUpdate = metadata.fetchTotalFilesUpdated();
+            long totalRecordsWritten = metadata.fetchTotalRecordsWritten();
+            long totalUpdateRecordsWritten = metadata.fetchTotalUpdateRecordsWritten();
+            long totalInsertRecordsWritten = metadata.fetchTotalInsertRecordsWritten();
+            long totalBytesWritten = metadata.fetchTotalBytesWritten();
+            registerGauge(getMetricsName("commit", "duration"), durationInMs);
+            registerGauge(getMetricsName("commit", "totalPartitionsWritten"), totalPartitionsWritten);
+            registerGauge(getMetricsName("commit", "totalFilesInsert"), totalFilesInsert);
+            registerGauge(getMetricsName("commit", "totalFilesUpdate"), totalFilesUpdate);
+            registerGauge(getMetricsName("commit", "totalRecordsWritten"), totalRecordsWritten);
+            registerGauge(getMetricsName("commit", "totalUpdateRecordsWritten"), totalUpdateRecordsWritten);
+            registerGauge(getMetricsName("commit", "totalInsertRecordsWritten"), totalInsertRecordsWritten);
+            registerGauge(getMetricsName("commit", "totalBytesWritten"), totalBytesWritten);
+            registerGauge(getMetricsName("commit", "commitTime"), commitEpochTimeInMs);
+        }
+    }
+
+    public void updateRollbackMetrics(long durationInMs, int numFilesDeleted) {
+        if (config.isMetricsOn()) {
+            logger.info(String.format("Sending rollback metrics (duration=%d, numFilesDeleted=$d)",
+                    durationInMs, numFilesDeleted));
+            registerGauge(getMetricsName("rollback", "duration"), durationInMs);
+            registerGauge(getMetricsName("rollback", "numFilesDeleted"), numFilesDeleted);
+        }
+    }
+
+    public void updateCleanMetrics(long durationInMs, int numFilesDeleted) {
+        if (config.isMetricsOn()) {
+            logger.info(String.format("Sending clean metrics (duration=%d, numFilesDeleted=%d)",
+                    durationInMs, numFilesDeleted));
+            registerGauge(getMetricsName("clean", "duration"), durationInMs);
+            registerGauge(getMetricsName("clean", "numFilesDeleted"), numFilesDeleted);
+        }
+    }
+
+    @VisibleForTesting
+    String getMetricsName(String action, String metric) {
+        return config == null ? null :
+                String.format("%s.%s.%s", tableName, action, metric);
+    }
+
+    void registerGauge(String metricName, final long value) {
+        try {
+            MetricRegistry registry = Metrics.getInstance().getRegistry();
+            registry.register(metricName, new Gauge<Long>() {
+                @Override
+                public Long getValue() {
+                    return value;
+                }
+            });
+        } catch (Exception e) {
+            // Here we catch all exception, so the major upsert pipeline will not be affected if the metrics system
+            // has some issues.
+            logger.error("Failed to send metrics: ", e);
+        }
+    }
+
+    /**
+     * By default, the timer context returns duration with nano seconds.
+     * Convert it to millisecond.
+     */
+    public long getDurationInMs(long ctxDuration) {
+        return ctxDuration / 1000000;
+    }
+}
--- a/hoodie-client/src/main/java/com/uber/hoodie/metrics/InMemoryMetricsReporter.java
+++ b/hoodie-client/src/main/java/com/uber/hoodie/metrics/InMemoryMetricsReporter.java
@@ -0,0 +1,37 @@
+/*
+ * Copyright (c) 2016 Uber Technologies, Inc. (hoodie-dev-group@uber.com)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *          http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package com.uber.hoodie.metrics;
+
+import java.io.Closeable;
+
+/**
+ * Used for testing.
+ */
+public class InMemoryMetricsReporter extends MetricsReporter {
+    @Override
+    public void start() {
+    }
+
+    @Override
+    public void report() {
+    }
+
+    @Override
+    public Closeable getReporter() {
+        return null;
+    }
+}
--- a/hoodie-client/src/main/java/com/uber/hoodie/metrics/Metrics.java
+++ b/hoodie-client/src/main/java/com/uber/hoodie/metrics/Metrics.java
@@ -0,0 +1,87 @@
+/*
+ * Copyright (c) 2016 Uber Technologies, Inc. (hoodie-dev-group@uber.com)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *          http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package com.uber.hoodie.metrics;
+
+import com.codahale.metrics.MetricRegistry;
+import com.google.common.io.Closeables;
+import com.uber.hoodie.config.HoodieWriteConfig;
+import com.uber.hoodie.config.HoodieMetricsConfig;
+import com.uber.hoodie.exception.HoodieException;
+import org.apache.commons.configuration.ConfigurationException;
+
+import java.io.Closeable;
+
+/**
+ * This is the main class of the metrics system. To use it,
+ * users need to call the {@link #init(HoodieMetricsConfig) init} method to initialize the system.
+ * Input for {@link #init(HoodieMetricsConfig) init} includes a configuration object, where
+ * users can specify the reporter type, and special configs for that reporter.
+ * Refer to {@see MetricsConfiguration} for more configurable fields.
+ */
+public class Metrics {
+    private static volatile boolean initialized = false;
+    private static Metrics metrics = null;
+    private final MetricRegistry registry;
+    private MetricsReporter reporter = null;
+
+    private Metrics(HoodieWriteConfig metricConfig) throws ConfigurationException {
+        registry = new MetricRegistry();
+
+        reporter = MetricsReporterFactory.createReporter(metricConfig, registry);
+        if (reporter == null) {
+            throw new RuntimeException("Cannot initialize Reporter.");
+        }
+//        reporter.start();
+
+        Runtime.getRuntime().addShutdownHook(new Thread() {
+            @Override
+            public void run() {
+                try {
+                    reporter.report();
+                    Closeables.close(reporter.getReporter(), true);
+                } catch (Exception e) {
+                    e.printStackTrace();
+                }
+            }
+        });
+    }
+
+    public static Metrics getInstance() {
+        assert initialized;
+        return metrics;
+    }
+
+    public static synchronized void init(HoodieWriteConfig metricConfig) {
+        if (initialized) {
+            return;
+        }
+        try {
+            metrics = new Metrics(metricConfig);
+        } catch (ConfigurationException e) {
+            throw new HoodieException(e);
+        }
+        initialized = true;
+    }
+
+    public MetricRegistry getRegistry() {
+        return registry;
+    }
+
+    public Closeable getReporter() {
+        return reporter.getReporter();
+    }
+}
--- a/hoodie-client/src/main/java/com/uber/hoodie/metrics/MetricsGraphiteReporter.java
+++ b/hoodie-client/src/main/java/com/uber/hoodie/metrics/MetricsGraphiteReporter.java
@@ -0,0 +1,95 @@
+/*
+ * Copyright (c) 2016 Uber Technologies, Inc. (hoodie-dev-group@uber.com)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *          http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package com.uber.hoodie.metrics;
+
+import com.codahale.metrics.MetricFilter;
+import com.codahale.metrics.MetricRegistry;
+import com.codahale.metrics.graphite.Graphite;
+import com.codahale.metrics.graphite.GraphiteReporter;
+import com.uber.hoodie.config.HoodieWriteConfig;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.Closeable;
+import java.net.InetSocketAddress;
+import java.util.concurrent.TimeUnit;
+
+/**
+ * Implementation of Graphite reporter, which connects to the Graphite server,
+ * and send metrics to that server.
+ */
+public class MetricsGraphiteReporter extends MetricsReporter {
+    private final MetricRegistry registry;
+    private final GraphiteReporter graphiteReporter;
+    private final HoodieWriteConfig config;
+    private String serverHost;
+    private int serverPort;
+
+    private static Logger logger = LogManager.getLogger(MetricsGraphiteReporter.class);
+
+    public MetricsGraphiteReporter(HoodieWriteConfig config, MetricRegistry registry) {
+        this.registry = registry;
+        this.config = config;
+
+        // Check the serverHost and serverPort here
+        this.serverHost = config.getGraphiteServerHost();
+        this.serverPort = config.getGraphiteServerPort();
+        if (serverHost == null || serverPort == 0) {
+            throw new RuntimeException(
+                    String.format("Graphite cannot be initialized with serverHost[%s] and serverPort[%s].",
+                            serverHost, serverPort));
+        }
+
+        this.graphiteReporter = createGraphiteReport();
+    }
+
+    @Override
+    public void start() {
+        if (graphiteReporter != null) {
+            graphiteReporter.start(30, TimeUnit.SECONDS);
+        } else {
+            logger.error("Cannot start as the graphiteReporter is null.");
+        }
+    }
+
+    @Override
+    public void report() {
+        if (graphiteReporter != null) {
+            graphiteReporter.report();
+        } else {
+            logger.error("Cannot report metrics as the graphiteReporter is null.");
+        }
+    }
+
+    @Override
+    public Closeable getReporter() {
+        return graphiteReporter;
+    }
+
+    private GraphiteReporter createGraphiteReport() {
+        Graphite graphite = new Graphite(
+                new InetSocketAddress(serverHost, serverPort));
+        String reporterPrefix = config.getGraphiteMetricPrefix();
+        return GraphiteReporter.forRegistry(registry)
+                .prefixedWith(reporterPrefix)
+                .convertRatesTo(TimeUnit.SECONDS)
+                .convertDurationsTo(TimeUnit.MILLISECONDS)
+                .filter(MetricFilter.ALL)
+                .build(graphite);
+    }
+}
--- a/hoodie-client/src/main/java/com/uber/hoodie/metrics/MetricsReporter.java
+++ b/hoodie-client/src/main/java/com/uber/hoodie/metrics/MetricsReporter.java
@@ -0,0 +1,36 @@
+/*
+ * Copyright (c) 2016 Uber Technologies, Inc. (hoodie-dev-group@uber.com)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *          http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package com.uber.hoodie.metrics;
+
+import java.io.Closeable;
+
+/**
+ * Interface for implementing a Reporter.
+ */
+public abstract class MetricsReporter {
+    /**
+     * Push out metrics at scheduled intervals
+     */
+    public abstract void start();
+
+    /**
+     * Deterministically push out metrics
+     */
+    public abstract void report();
+
+    public abstract Closeable getReporter();
+}
--- a/hoodie-client/src/main/java/com/uber/hoodie/metrics/MetricsReporterFactory.java
+++ b/hoodie-client/src/main/java/com/uber/hoodie/metrics/MetricsReporterFactory.java
@@ -0,0 +1,48 @@
+/*
+ * Copyright (c) 2016 Uber Technologies, Inc. (hoodie-dev-group@uber.com)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *          http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package com.uber.hoodie.metrics;
+
+import com.codahale.metrics.MetricRegistry;
+import com.uber.hoodie.config.HoodieWriteConfig;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+/**
+ * Factory class for creating MetricsReporter.
+ */
+public class MetricsReporterFactory {
+    private static Logger logger = LogManager.getLogger(MetricsReporterFactory.class);
+
+    public static MetricsReporter createReporter(HoodieWriteConfig config,
+                                                 MetricRegistry registry) {
+        MetricsReporterType type = config.getMetricsReporterType();
+        MetricsReporter reporter = null;
+        switch (type) {
+            case GRAPHITE:
+                reporter = new MetricsGraphiteReporter(config, registry);
+                break;
+            case INMEMORY:
+                reporter = new InMemoryMetricsReporter();
+                break;
+            default:
+                logger.error("Reporter type[" + type + "] is not supported.");
+                break;
+        }
+        return reporter;
+    }
+}
--- a/hoodie-client/src/main/java/com/uber/hoodie/metrics/MetricsReporterType.java
+++ b/hoodie-client/src/main/java/com/uber/hoodie/metrics/MetricsReporterType.java
@@ -0,0 +1,26 @@
+/*
+ * Copyright (c) 2016 Uber Technologies, Inc. (hoodie-dev-group@uber.com)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *          http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package com.uber.hoodie.metrics;
+
+/**
+ * Types of the reporter. Right now we only support Graphite.
+ * We can include JMX and CSV in the future.
+ */
+public enum MetricsReporterType {
+    GRAPHITE,
+    INMEMORY
+}
--- a/hoodie-client/src/main/java/com/uber/hoodie/table/HoodieCopyOnWriteTable.java
+++ b/hoodie-client/src/main/java/com/uber/hoodie/table/HoodieCopyOnWriteTable.java
@@ -0,0 +1,451 @@
+/*
+ * Copyright (c) 2016 Uber Technologies, Inc. (hoodie-dev-group@uber.com)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *          http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package com.uber.hoodie.table;
+
+import com.uber.hoodie.config.HoodieWriteConfig;
+import com.uber.hoodie.WriteStatus;
+import com.uber.hoodie.common.model.HoodieCommitMetadata;
+import com.uber.hoodie.common.model.HoodieKey;
+import com.uber.hoodie.common.model.HoodieRecord;
+import com.uber.hoodie.common.model.HoodieRecordLocation;
+import com.uber.hoodie.common.model.HoodieRecordPayload;
+import com.uber.hoodie.common.model.HoodieTableMetadata;
+import com.uber.hoodie.common.util.FSUtils;
+import com.uber.hoodie.exception.HoodieUpsertException;
+import com.uber.hoodie.func.LazyInsertIterable;
+import com.uber.hoodie.io.HoodieUpdateHandle;
+
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.IndexedRecord;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.avro.AvroReadSupport;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.spark.Partitioner;
+
+import java.io.IOException;
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.Random;
+import java.util.Set;
+
+import scala.Option;
+import scala.Tuple2;
+
+/**
+ * Implementation of a very heavily read-optimized Hoodie Table where
+ *
+ * INSERTS - Produce new files, block aligned to desired size (or)
+ *           Merge with the smallest existing file, to expand it
+ *
+ * UPDATES - Produce a new version of the file containing the invalidated records
+ *
+ */
+public class HoodieCopyOnWriteTable<T extends HoodieRecordPayload> extends HoodieTable {
+
+    // seed for random number generator. No particular significance, just makes testing deterministic
+    private static final long RANDOM_NUMBER_SEED = 356374L;
+
+
+    private static Logger logger = LogManager.getLogger(HoodieCopyOnWriteTable.class);
+
+    enum BucketType {
+        UPDATE,
+        INSERT
+    }
+
+    /**
+     * Helper class for a small file's location and its actual size on disk
+     */
+    class SmallFile implements Serializable {
+        HoodieRecordLocation location;
+        long sizeBytes;
+
+        @Override
+        public String toString() {
+            final StringBuilder sb = new StringBuilder("SmallFile {");
+            sb.append("location=").append(location).append(", ");
+            sb.append("sizeBytes=").append(sizeBytes);
+            sb.append('}');
+            return sb.toString();
+        }
+    }
+
+    /**
+     * Helper class for an insert bucket along with the weight [0.0, 0.1]
+     * that defines the amount of incoming inserts that should be allocated to
+     * the bucket
+     */
+    class InsertBucket implements Serializable {
+        int bucketNumber;
+        // fraction of total inserts, that should go into this bucket
+        double weight;
+
+        @Override
+        public String toString() {
+            final StringBuilder sb = new StringBuilder("WorkloadStat {");
+            sb.append("bucketNumber=").append(bucketNumber).append(", ");
+            sb.append("weight=").append(weight);
+            sb.append('}');
+            return sb.toString();
+        }
+    }
+
+    /**
+     * Helper class for a bucket's type (INSERT & UPDATE) and its file location
+     */
+    class BucketInfo implements Serializable {
+        BucketType bucketType;
+        String fileLoc;
+
+        @Override
+        public String toString() {
+            final StringBuilder sb = new StringBuilder("BucketInfo {");
+            sb.append("bucketType=").append(bucketType).append(", ");
+            sb.append("fileLoc=").append(fileLoc);
+            sb.append('}');
+            return sb.toString();
+        }
+    }
+
+
+    public HoodieCopyOnWriteTable(String commitTime, HoodieWriteConfig config, HoodieTableMetadata metadata) {
+        super(commitTime, config, metadata);
+    }
+
+    /**
+     * Packs incoming records to be upserted, into buckets (1 bucket = 1 RDD partition)
+     */
+    class UpsertPartitioner extends Partitioner {
+
+        /**
+         * Total number of RDD partitions, is determined by total buckets we want to
+         * pack the incoming workload into
+         */
+        private int totalBuckets = 0;
+
+        /**
+         * Helps decide which bucket an incoming update should go to.
+         */
+        private HashMap<String, Integer> updateLocationToBucket;
+
+
+        /**
+         * Helps us pack inserts into 1 or more buckets depending on number of
+         * incoming records.
+         */
+        private HashMap<String, List<InsertBucket>> partitionPathToInsertBuckets;
+
+
+        /**
+         * Remembers what type each bucket is for later.
+         */
+        private HashMap<Integer, BucketInfo> bucketInfoMap;
+
+
+        /**
+         * Random number generator to use for splitting inserts into buckets by weight
+         */
+        private Random rand = new Random(RANDOM_NUMBER_SEED);
+
+
+        UpsertPartitioner(WorkloadProfile profile) {
+            updateLocationToBucket = new HashMap<>();
+            partitionPathToInsertBuckets = new HashMap<>();
+            bucketInfoMap = new HashMap<>();
+
+            assignUpdates(profile);
+            assignInserts(profile);
+
+            logger.info("Total Buckets :" + totalBuckets + ", " +
+                    "buckets info => " + bucketInfoMap + ", \n" +
+                    "Partition to insert buckets => " + partitionPathToInsertBuckets + ", \n" +
+                    "UpdateLocations mapped to buckets =>" + updateLocationToBucket);
+        }
+
+        private void assignUpdates(WorkloadProfile profile) {
+            // each update location gets a partition
+            WorkloadStat gStat = profile.getGlobalStat();
+            for (Map.Entry<String, Long> updateLocEntry : gStat.getUpdateLocationToCount().entrySet()) {
+                addUpdateBucket(updateLocEntry.getKey());
+            }
+        }
+
+        private int addUpdateBucket(String fileLoc) {
+            int bucket = totalBuckets;
+            updateLocationToBucket.put(fileLoc, bucket);
+            BucketInfo bucketInfo = new BucketInfo();
+            bucketInfo.bucketType = BucketType.UPDATE;
+            bucketInfo.fileLoc = fileLoc;
+            bucketInfoMap.put(totalBuckets, bucketInfo);
+            totalBuckets++;
+            return bucket;
+        }
+
+        private void assignInserts(WorkloadProfile profile) {
+            // for new inserts, compute buckets depending on how many records we have for each partition
+            Set<String> partitionPaths = profile.getPartitionPaths();
+            long averageRecordSize = averageBytesPerRecord();
+            logger.info("AvgRecordSize => " + averageRecordSize);
+            for (String partitionPath : partitionPaths) {
+                WorkloadStat pStat = profile.getWorkloadStat(partitionPath);
+                if (pStat.getNumInserts() > 0) {
+
+                    List<SmallFile> smallFiles = getSmallFiles(partitionPath);
+                    logger.info("For partitionPath : "+ partitionPath + " Small Files => " + smallFiles);
+
+                    long totalUnassignedInserts = pStat.getNumInserts();
+                    List<Integer> bucketNumbers = new ArrayList<>();
+                    List<Long> recordsPerBucket = new ArrayList<>();
+
+                    // first try packing this into one of the smallFiles
+                    for (SmallFile smallFile: smallFiles) {
+                        long recordsToAppend = Math.min((config.getParquetMaxFileSize() - smallFile.sizeBytes)/ averageRecordSize, totalUnassignedInserts);
+                        if (recordsToAppend > 0 && totalUnassignedInserts > 0){
+                            // create a new bucket or re-use an existing bucket
+                            int bucket;
+                            if (updateLocationToBucket.containsKey(smallFile.location.getFileId())) {
+                                bucket = updateLocationToBucket.get(smallFile.location.getFileId());
+                                logger.info("Assigning " + recordsToAppend + " inserts to existing update bucket "+ bucket);
+                            } else {
+                                bucket = addUpdateBucket(smallFile.location.getFileId());
+                                logger.info("Assigning " + recordsToAppend + " inserts to new update bucket "+ bucket);
+                            }
+                            bucketNumbers.add(bucket);
+                            recordsPerBucket.add(recordsToAppend);
+                            totalUnassignedInserts -= recordsToAppend;
+                        }
+                    }
+
+                    // if we have anything more, create new insert buckets, like normal
+                    if (totalUnassignedInserts > 0) {
+                        long insertRecordsPerBucket = config.getCopyOnWriteInsertSplitSize();
+                        if (config.shouldAutoTuneInsertSplits()) {
+                            insertRecordsPerBucket = config.getParquetMaxFileSize()/averageRecordSize;
+                        }
+
+                        int insertBuckets = (int) Math.max(totalUnassignedInserts / insertRecordsPerBucket, 1L);
+                        logger.info("After small file assignment: unassignedInserts => " + totalUnassignedInserts
+                                + ", totalInsertBuckets => " + insertBuckets
+                                + ", recordsPerBucket => " + insertRecordsPerBucket);
+                        for (int b = 0; b < insertBuckets; b++) {
+                            bucketNumbers.add(totalBuckets);
+                            recordsPerBucket.add(totalUnassignedInserts/insertBuckets);
+                            BucketInfo bucketInfo = new BucketInfo();
+                            bucketInfo.bucketType = BucketType.INSERT;
+                            bucketInfoMap.put(totalBuckets, bucketInfo);
+                            totalBuckets++;
+                        }
+                    }
+
+                    // Go over all such buckets, and assign weights as per amount of incoming inserts.
+                    List<InsertBucket> insertBuckets = new ArrayList<>();
+                    for (int i = 0; i < bucketNumbers.size(); i++) {
+                        InsertBucket bkt = new InsertBucket();
+                        bkt.bucketNumber = bucketNumbers.get(i);
+                        bkt.weight = (1.0 * recordsPerBucket.get(i))/pStat.getNumInserts();
+                        insertBuckets.add(bkt);
+                    }
+                    logger.info("Total insert buckets for partition path "+ partitionPath + " => " + insertBuckets);
+                    partitionPathToInsertBuckets.put(partitionPath, insertBuckets);
+                }
+            }
+        }
+
+
+        /**
+         * Returns a list  of small files in the given partition path
+         *
+         * @param partitionPath
+         * @return
+         */
+        private List<SmallFile> getSmallFiles(String partitionPath) {
+            FileSystem fs = FSUtils.getFs();
+            List<SmallFile> smallFileLocations = new ArrayList<>();
+
+            if (metadata.getAllCommits().getNumCommits() > 0) { // if we have some commits
+                String latestCommitTime = metadata.getAllCommits().lastCommit();
+                FileStatus[] allFiles = metadata.getLatestVersionInPartition(fs, partitionPath, latestCommitTime);
+
+                if (allFiles != null && allFiles.length > 0) {
+                    for (FileStatus fileStatus : allFiles) {
+                        if (fileStatus.getLen() < config.getParquetSmallFileLimit()) {
+                            String filename = fileStatus.getPath().getName();
+                            SmallFile sf = new SmallFile();
+                            sf.location = new HoodieRecordLocation(
+                                    FSUtils.getCommitTime(filename),
+                                    FSUtils.getFileId(filename));
+                            sf.sizeBytes = fileStatus.getLen();
+                            smallFileLocations.add(sf);
+                        }
+                    }
+                }
+            }
+
+            return smallFileLocations;
+        }
+
+        /**
+         * Obtains the average record size based on records written during last commit.
+         * Used for estimating how many records pack into one file.
+         *
+         * @return
+         */
+        private long averageBytesPerRecord() {
+            long avgSize = 0L;
+            try {
+                if (metadata.getAllCommits().getNumCommits() > 0) {
+                    String latestCommitTime = metadata.getAllCommits().lastCommit();
+                    HoodieCommitMetadata commitMetadata = metadata.getCommitMetadata(latestCommitTime);
+                    avgSize =(long) Math.ceil((1.0 * commitMetadata.fetchTotalBytesWritten())/commitMetadata.fetchTotalRecordsWritten());
+                }
+            } catch (Throwable t) {
+                // make this fail safe.
+                logger.error("Error trying to compute average bytes/record ", t);
+            }
+            return avgSize <= 0L ? config.getCopyOnWriteRecordSizeEstimate() : avgSize;
+        }
+
+        public BucketInfo getBucketInfo(int bucketNumber) {
+            return bucketInfoMap.get(bucketNumber);
+        }
+
+        public List<InsertBucket> getInsertBuckets(String partitionPath) {
+            return partitionPathToInsertBuckets.get(partitionPath);
+        }
+
+        @Override
+        public int numPartitions() {
+            return totalBuckets;
+        }
+
+        @Override
+        public int getPartition(Object key) {
+            Tuple2<HoodieKey, Option<HoodieRecordLocation>> keyLocation = (Tuple2<HoodieKey, Option<HoodieRecordLocation>>) key;
+            if (keyLocation._2().isDefined()) {
+                HoodieRecordLocation location = keyLocation._2().get();
+                return updateLocationToBucket.get(location.getFileId());
+            } else {
+                List<InsertBucket> targetBuckets = partitionPathToInsertBuckets.get(keyLocation._1().getPartitionPath());
+                // pick the target bucket to use based on the weights.
+                double totalWeight = 0.0;
+                double r = rand.nextDouble();
+                for (InsertBucket insertBucket: targetBuckets) {
+                    totalWeight += insertBucket.weight;
+                    if (r <= totalWeight) {
+                        return insertBucket.bucketNumber;
+                    }
+                }
+                // return first one, by default
+                return targetBuckets.get(0).bucketNumber;
+            }
+        }
+    }
+
+
+    @Override
+    public Partitioner getUpsertPartitioner(WorkloadProfile profile) {
+        if (profile == null) {
+            throw new HoodieUpsertException("Need workload profile to construct the upsert partitioner.");
+        }
+        return new UpsertPartitioner(profile);
+    }
+
+    @Override
+    public Partitioner getInsertPartitioner(WorkloadProfile profile) {
+        return null;
+    }
+
+    @Override
+    public boolean isWorkloadProfileNeeded() {
+        return true;
+    }
+
+
+
+    public Iterator<List<WriteStatus>> handleUpdate(String fileLoc, Iterator<HoodieRecord<T>> recordItr) throws Exception {
+        // these are updates
+        HoodieUpdateHandle upsertHandle =
+                new HoodieUpdateHandle<>(config, commitTime, metadata, recordItr, fileLoc);
+        if (upsertHandle.getOldFilePath() == null) {
+            logger.error("Error in finding the old file path at commit " + commitTime);
+        } else {
+            Configuration conf = FSUtils.getFs().getConf();
+            AvroReadSupport.setAvroReadSchema(conf, upsertHandle.getSchema());
+            ParquetReader<IndexedRecord> reader =
+                    AvroParquetReader.builder(upsertHandle.getOldFilePath()).withConf(conf).build();
+            try {
+                IndexedRecord record;
+                while ((record = reader.read()) != null) {
+                    // Two types of writes here (new record, and old record).
+                    // We have already catch the exception during writing new records.
+                    // But for old records, we should fail if any exception happens.
+                    upsertHandle.write((GenericRecord) record);
+                }
+            } catch (IOException e) {
+                throw new HoodieUpsertException(
+                        "Failed to read record from " + upsertHandle.getOldFilePath()
+                                + " with new Schema " + upsertHandle.getSchema(), e);
+            } finally {
+                reader.close();
+                upsertHandle.close();
+            }
+        }
+        if (upsertHandle.getWriteStatus().getPartitionPath() == null) {
+            logger.info(
+                    "Upsert Handle has partition path as null " + upsertHandle.getOldFilePath()
+                            + ", " + upsertHandle.getWriteStatus());
+        }
+        return Collections.singletonList(Collections.singletonList(upsertHandle.getWriteStatus())).iterator();
+    }
+
+    public Iterator<List<WriteStatus>> handleInsert(Iterator<HoodieRecord<T>> recordItr) throws Exception {
+        return new LazyInsertIterable<>(recordItr, config, commitTime, metadata);
+    }
+
+
+    @Override
+    public Iterator<List<WriteStatus>> handleUpsertPartition(Integer partition,
+                                                             Iterator recordItr,
+                                                             Partitioner partitioner) {
+        UpsertPartitioner upsertPartitioner = (UpsertPartitioner) partitioner;
+        BucketInfo binfo = upsertPartitioner.getBucketInfo(partition);
+        BucketType btype = binfo.bucketType;
+        try {
+            if (btype.equals(BucketType.INSERT)) {
+                return handleInsert(recordItr);
+            } else if (btype.equals(BucketType.UPDATE)) {
+                return handleUpdate(binfo.fileLoc, recordItr);
+            } else {
+                throw new HoodieUpsertException("Unknown bucketType " + btype + " for partition :" + partition);
+            }
+        } catch (Throwable t) {
+            String msg = "Error upserting bucketType " + btype + " for partition :" + partition;
+            logger.error(msg, t);
+            throw new HoodieUpsertException(msg, t);
+        }
+    }
+}
--- a/hoodie-client/src/main/java/com/uber/hoodie/table/HoodieTable.java
+++ b/hoodie-client/src/main/java/com/uber/hoodie/table/HoodieTable.java
@@ -0,0 +1,98 @@
+/*
+ * Copyright (c) 2016 Uber Technologies, Inc. (hoodie-dev-group@uber.com)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *          http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package com.uber.hoodie.table;
+
+import com.uber.hoodie.config.HoodieWriteConfig;
+import com.uber.hoodie.WriteStatus;
+import com.uber.hoodie.common.model.HoodieRecord;
+import com.uber.hoodie.common.model.HoodieRecordPayload;
+import com.uber.hoodie.common.model.HoodieTableMetadata;
+import com.uber.hoodie.common.model.HoodieTableType;
+import com.uber.hoodie.exception.HoodieException;
+
+import org.apache.spark.Partitioner;
+
+import java.io.Serializable;
+import java.util.Iterator;
+import java.util.List;
+
+/**
+ * Abstract implementation of a HoodieTable
+ */
+public abstract class HoodieTable<T extends HoodieRecordPayload> implements Serializable {
+
+    protected final String commitTime;
+
+    protected final HoodieWriteConfig config;
+
+    protected final HoodieTableMetadata metadata;
+
+    protected HoodieTable(String commitTime, HoodieWriteConfig config, HoodieTableMetadata metadata) {
+        this.commitTime = commitTime;
+        this.config = config;
+        this.metadata = metadata;
+    }
+
+    /**
+     * Provides a partitioner to perform the upsert operation, based on the
+     * workload profile
+     *
+     * @return
+     */
+    public abstract Partitioner getUpsertPartitioner(WorkloadProfile profile);
+
+
+    /**
+     * Provides a partitioner to perform the insert operation, based on the workload profile
+     *
+     * @return
+     */
+    public abstract Partitioner getInsertPartitioner(WorkloadProfile profile);
+
+
+    /**
+     * Return whether this HoodieTable implementation can benefit from workload
+     * profiling
+     *
+     * @return
+     */
+    public abstract boolean isWorkloadProfileNeeded();
+
+
+    /**
+     * Perform the ultimate IO for a given upserted (RDD) partition
+     *
+     * @param partition
+     * @param recordIterator
+     * @param partitioner
+     */
+    public abstract Iterator<List<WriteStatus>> handleUpsertPartition(Integer partition,
+                                                                      Iterator<HoodieRecord<T>> recordIterator,
+                                                                      Partitioner partitioner);
+
+
+    public static HoodieTable getHoodieTable(HoodieTableType type,
+                                             String commitTime,
+                                             HoodieWriteConfig config,
+                                             HoodieTableMetadata metadata) {
+        if (type == HoodieTableType.COPY_ON_WRITE) {
+            return new HoodieCopyOnWriteTable(commitTime, config, metadata);
+        } else {
+            throw new HoodieException("Unsupported table type :"+ type);
+        }
+    }
+}
--- a/hoodie-client/src/main/java/com/uber/hoodie/table/WorkloadProfile.java
+++ b/hoodie-client/src/main/java/com/uber/hoodie/table/WorkloadProfile.java
@@ -0,0 +1,115 @@
+/*
+ * Copyright (c) 2016 Uber Technologies, Inc. (hoodie-dev-group@uber.com)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *          http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package com.uber.hoodie.table;
+
+
+import com.uber.hoodie.common.model.HoodieRecord;
+import com.uber.hoodie.common.model.HoodieRecordLocation;
+import com.uber.hoodie.common.model.HoodieRecordPayload;
+
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.api.java.function.PairFunction;
+
+import java.io.Serializable;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Set;
+
+import scala.Option;
+import scala.Tuple2;
+
+/**
+ * Information about incoming records for upsert/insert obtained either via sampling or
+ * introspecting the data fully
+ *
+ * TODO(vc): Think about obtaining this directly from index.tagLocation
+ */
+public class WorkloadProfile<T extends HoodieRecordPayload> implements Serializable {
+
+    /**
+     * Input workload
+     */
+    private final JavaRDD<HoodieRecord<T>> taggedRecords;
+
+    /**
+     * Computed workload profile
+     */
+    private final HashMap<String, WorkloadStat> partitionPathStatMap;
+
+
+    private final WorkloadStat globalStat;
+
+
+    public WorkloadProfile(JavaRDD<HoodieRecord<T>> taggedRecords) {
+        this.taggedRecords = taggedRecords;
+        this.partitionPathStatMap = new HashMap<>();
+        this.globalStat = new WorkloadStat();
+        buildProfile();
+    }
+
+    private void buildProfile() {
+
+        Map<Tuple2<String, Option<HoodieRecordLocation>>, Object> partitionLocationCounts =
+                taggedRecords.mapToPair(new PairFunction<HoodieRecord<T>, Tuple2<String, Option<HoodieRecordLocation>>, HoodieRecord<T>>() {
+            @Override
+            public Tuple2<Tuple2<String, Option<HoodieRecordLocation>>, HoodieRecord<T>> call(HoodieRecord<T> record) throws Exception {
+                return new Tuple2<>(new Tuple2<>(record.getPartitionPath(), Option.apply(record.getCurrentLocation())), record);
+            }
+        }).countByKey();
+
+        for (Map.Entry<Tuple2<String, Option<HoodieRecordLocation>>, Object> e: partitionLocationCounts.entrySet()) {
+            String partitionPath = e.getKey()._1();
+            Long count = (Long) e.getValue();
+            Option<HoodieRecordLocation> locOption = e.getKey()._2();
+
+            if (!partitionPathStatMap.containsKey(partitionPath)){
+                partitionPathStatMap.put(partitionPath, new WorkloadStat());
+            }
+
+            if (locOption.isDefined()) {
+                // update
+                partitionPathStatMap.get(partitionPath).addUpdates(locOption.get(), count);
+                globalStat.addUpdates(locOption.get(), count);
+            } else {
+                // insert
+                partitionPathStatMap.get(partitionPath).addInserts(count);
+                globalStat.addInserts(count);
+            }
+        }
+    }
+
+    public WorkloadStat getGlobalStat() {
+        return globalStat;
+    }
+
+    public Set<String> getPartitionPaths() {
+        return partitionPathStatMap.keySet();
+    }
+
+    public WorkloadStat getWorkloadStat(String partitionPath){
+        return partitionPathStatMap.get(partitionPath);
+    }
+
+    @Override
+    public String toString() {
+        final StringBuilder sb = new StringBuilder("WorkloadProfile {");
+        sb.append("globalStat=").append(globalStat).append(", ");
+        sb.append("partitionStat=").append(partitionPathStatMap);
+        sb.append('}');
+        return sb.toString();
+    }
+}
--- a/hoodie-client/src/main/java/com/uber/hoodie/table/WorkloadStat.java
+++ b/hoodie-client/src/main/java/com/uber/hoodie/table/WorkloadStat.java
@@ -0,0 +1,67 @@
+/*
+ * Copyright (c) 2016 Uber Technologies, Inc. (hoodie-dev-group@uber.com)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *          http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package com.uber.hoodie.table;
+
+import com.uber.hoodie.common.model.HoodieRecordLocation;
+
+import java.io.Serializable;
+import java.util.HashMap;
+
+/**
+ * Wraps stats about a single partition path.
+ */
+public class WorkloadStat implements Serializable {
+    private long numInserts = 0L;
+
+    private long numUpdates = 0L;
+
+    private HashMap<String, Long> updateLocationToCount;
+
+    public WorkloadStat() {
+        updateLocationToCount = new HashMap<>();
+    }
+
+    long addInserts(long numInserts) {
+        return this.numInserts += numInserts;
+    }
+
+    long addUpdates(HoodieRecordLocation location, long numUpdates) {
+        updateLocationToCount.put(location.getFileId(), numUpdates);
+        return this.numUpdates += numUpdates;
+    }
+
+    public long getNumUpdates() {
+        return numUpdates;
+    }
+
+    public long getNumInserts() {
+        return numInserts;
+    }
+
+    public HashMap<String, Long> getUpdateLocationToCount() {
+        return updateLocationToCount;
+    }
+
+    @Override
+    public String toString() {
+        final StringBuilder sb = new StringBuilder("WorkloadStat {");
+        sb.append("numInserts=").append(numInserts).append(", ");
+        sb.append("numUpdates=").append(numUpdates);
+        sb.append('}');
+        return sb.toString();
+    }
+}