1
0

Commit Graph

  • f120427607 HUDI-105 : Fix up offsets not available on leader exception (#650) leiline 2019-05-24 10:32:31 +08:00
  • 2fe526d548 Allow users to set hoodie configs figs for Compactor, Cleaner and HDFSParquetImporter utility scripts Balaji Varadarajan 2019-05-23 16:35:30 -07:00
  • 145034c5fa Spark Stage retry handling Balaji Varadarajan 2019-03-08 15:05:33 -08:00
  • 3fd2fd6e9d Remove redundant string from file comp rdd David Muto (pseudomuto) 2019-05-10 10:10:51 -04:00
  • a7e6cf5197 Support nested types for recordKey, partitionPath and combineKey Balaji Varadarajan 2019-05-14 19:59:04 -07:00
  • e43efa042f Downgrading fasterxml jackson to 2.6.7 to be spark compatible Vinoth Chandar 2019-05-15 19:54:19 -07:00
  • 64fec64097 Timeline Service with Incremental View Syncing support Balaji Varadarajan 2019-02-12 21:29:14 -08:00
  • 446f99aa0f [maven-release-plugin] prepare for next development iteration vinothchandar 2019-05-14 07:29:22 -07:00
  • cc38abecc8 [maven-release-plugin] prepare release hoodie-0.4.6 vinothchandar 2019-05-14 07:29:11 -07:00
  • 7002ca6775 Update release notes for 0.4.6 release Vinoth Chandar 2019-05-14 05:15:29 -07:00
  • 6e1e626357 Minor CLI documentation change in delta-streamer Balaji Varadarajan 2019-05-13 22:18:20 -07:00
  • af46078a82 converting map task memory from mb to bytes Nishith Agarwal 2019-05-13 20:30:30 -07:00
  • 9cce9abf4d Fix various errors found by long running delta-streamer tests 1. Parquet Avro schema mismatch errors when ingesting are sometimes silently ignored due to race-condition in BoundedInMemoryExecutor. This was reproducible when running long-running delta-streamer with wrong schema and it caused data-loss 2. Fix behavior of Delta-Streamer to error out by default if there are any error records 3. Fix a bug in tracking write errors in WriteStats. Earlier the write errors were tracking sampled errors as opposed to total errors. 4. Delta Streamer does not commit the changes done as part of inline compaction as auto-commit is force disabled. Fix this behavior to always auto-commit inline compaction as it would not otherwise commit. Balaji Varadarajan 2019-05-12 11:57:04 -07:00
  • a0e62b7919 Bucketized Bloom Filter checking Vinoth Chandar 2019-05-08 20:20:58 -07:00
  • 4b27cc72bb Don't raise when spark-defaults.conf doesn't exist David Muto (pseudomuto) 2019-05-07 21:16:58 -04:00
  • e2dcef8606 HUDI-101: added exclusion filters for signature files. Abhishek Sharma 2019-05-07 15:17:36 -04:00
  • 738635306b migrating kryo's dependency from twitter chill to plain kryo library Omkar Joshi 2019-04-30 19:00:30 -07:00
  • a33a55fcb5 Caching Avro Binary encoder/decoder to avoid creating new one for every record Nishith Agarwal 2019-05-05 21:26:16 -07:00
  • ee1feb7c75 Revert "HUDI-101: added mevn-shade plugin with filters." Creates fat jars for all hoodie packages Balaji Varadarajan 2019-05-05 18:30:22 -07:00
  • f47f0eb6cb HUDI-101: added mevn-shade plugin with filters. Abhishek Sharma 2019-05-01 15:45:39 -04:00
  • 978470af33 Rollback inflights when using Spark [Streaming] write Balaji Varadarajan 2019-05-01 17:20:35 -07:00
  • 57a8b9cc8c Making DataSource/DeltaStreamer use defaults for combining vinothchandar 2019-05-01 05:06:34 -07:00
  • ea20d47248 Introduce config to control interval tree pruning Vinoth Chandar 2019-04-18 15:21:52 -07:00
  • 7129dc5bb7 Improving Tag location using interval trees for index files Sivabalan Narayanan 2018-11-28 11:00:02 -08:00
  • 461ce18bd1 Fix to enable hoodie.datasource.read.incr.filters Naoki Takezoe 2019-04-27 01:09:38 +09:00
  • 26f24b6728 Removing OLD MAGIC header since a) it's no longer used b) causes issues when the data actually has OLD MAGIC Nishith Agarwal 2019-04-19 11:21:45 -07:00
  • 2f1e3e15fb Revert "Read and apply schema for each log block from the metadata header instead of the latest schema" Balaji Varadarajan 2019-04-18 00:44:01 -07:00
  • 9ef51deb84 Add empty payload class to support deletes via apache spark lyogev 2019-04-11 15:14:53 +03:00
  • 243c58f77c Move to apachehudi dockerhub repository & use openjdk docker containers Balaji Varadarajan 2019-04-17 09:53:22 -07:00
  • 36ef94004e Fix Hive RT query failure in hoodie demo Balaji Varadarajan 2019-04-17 11:49:01 -07:00
  • e35d24f31d Revert "Replacing Apache commons-lang3 object serializer with Kryo serializer" Omkar Joshi 2019-04-16 18:35:25 -07:00
  • 9e7ce19b06 Read and apply schema for each log block from the metadata header instead of the latest schema Nishith Agarwal 2019-04-16 12:47:09 -07:00
  • 83b6aa5e91 Fix multiple issues when using build_local_docker_images for setting up the demo Bhavani Sudha Saktheeswaran 2019-04-11 17:39:13 -07:00
  • a8feee9293 Performing commit archiving in batches to avoid keeping a huge chunk in memory Nishith Agarwal 2019-04-07 11:12:22 -07:00
  • b07110b9fd Essential Hive packages missing in hoodie spark bundle Balaji Varadarajan 2019-04-09 10:06:22 -07:00
  • 2577014617 1. Minor changes to fix compaction 2. Adding 2 compaction policies Nishith Agarwal 2019-04-03 12:25:35 -07:00
  • d1d33f725e [HUDI-66] FSUtils.getRelativePartitionPath does not handle repeated folder names Jing Chen 2019-04-02 03:29:25 -07:00
  • b34a204a52 Fixing small file handling, inline compaction defaults Vinoth Chandar 2019-03-12 15:59:41 -07:00
  • 51f4908989 Follow up HUDI-27 : Call super.close() in HoodieWraperFileSystem::close() Vinoth Chandar 2019-03-29 22:05:03 -07:00
  • 5847f0c934 Fix HUDI-27 : Support num_cores > 1 for writing through spark Vinoth Chandar 2019-03-27 18:19:41 -07:00
  • f1410bfdcd Fixes HUDI-38: Reduce memory overhead of WriteStatus Vinoth Chandar 2019-03-26 14:31:19 -07:00
  • e56c1612e4 Fixed HUDI-87 : Remove schemastr from BaseAvroPayload Vinoth Chandar 2019-03-27 15:47:49 -07:00
  • 372fbc4733 Fixes HUDI-9 : Check precondition minInstantsToKeep > cleanerCommitsRetained Vinoth Chandar 2019-03-26 18:52:33 -07:00
  • 3d9041e216 Fixing source schema and writer schema distinction in payloads Nishith Agarwal 2019-03-22 16:27:51 -07:00
  • 395806fc68 [HUDI-63] Removed unused BucketedIndex code ambition119 2019-03-20 12:25:51 +08:00
  • 194d904c99 run_hive_sync tool must be able to handle case where there are multiple standalone jdbc jars in hive installation dir Balaji Varadarajan 2019-03-21 09:03:36 -07:00
  • a2a052abd9 add a script that shuts down demo cluster gracefully Jing Chen 2019-03-18 19:06:39 -07:00
  • 9e59da7fd9 Refactor HoodieTable Rollback to write one rollback instant for a batch of commits to rollback Nishith Agarwal 2019-02-27 23:43:06 -08:00
  • 0dd4a90b03 Enable multi/nested rollbacks for MOR table type Nishith Agarwal 2019-01-02 19:13:55 -08:00
  • a6c45feb2c Replacing Apache commons-lang3 object serializer with Kryo serializer Omkar Joshi 2019-03-18 11:29:45 -07:00
  • 48797b1ae1 Add compression codec configurations for HoodieParquetWriter. kaka11chen 2019-03-16 01:52:41 +08:00
  • 621f2b878d HUDI-75: Add KEYS smarthi 2019-03-14 14:11:01 -04:00
  • 57bbed21de Removing docs folder from master branch Vinoth Chandar 2019-03-14 17:23:36 -07:00
  • adc8cac743 Fix hive sync (libfb version mismatch) and deltastreamer issue (missing cmdline argument) in demo Balaji Varadarajan 2019-03-01 11:17:53 -08:00
  • 3c647a99cf Fix quickstart documentation for querying via Presto Bhavani Sudha Saktheeswaran 2019-03-08 10:16:22 -08:00
  • 4a8bec7ea5 Handling duplicate record update for single partition (duplicates in single or different parquet files) Omkar Joshi 2019-03-01 15:34:46 -08:00
  • b514e1ab18 Fix avro doesn't have short and byte byte. kaka11chen 2019-03-06 12:29:03 +08:00
  • 3ae6cb4ed5 FileSystem View must treat same fileIds present in different partitions as different file-groups and handle pending compaction correctly Balaji Varadarajan 2019-02-12 21:29:14 -08:00
  • 363df2c12e Upgrade various jar, gem versions for maintenance Vinoth Chandar 2019-02-13 19:53:28 -08:00
  • 687395e40f [maven-release-plugin] prepare for next development iteration vinothchandar 2019-02-27 07:16:27 -08:00
  • bbf40ef987 [maven-release-plugin] prepare release hoodie-0.4.5 vinothchandar 2019-02-27 07:16:15 -08:00
  • 080b7d4d9b Update RELEASE_NOTES for 0.4.5 vinothchandar 2019-02-27 06:46:22 -08:00
  • 75c7a2622b Create hoodie-presto bundle jar Bhavani Sudha Saktheeswaran 2019-02-12 17:13:49 -08:00
  • 94eb6fd919 Merge pull request #570 from yaooqinn/hiveJarSuffix n3nash 2019-02-20 16:32:57 -08:00
  • 639c287cab Close FSDataInputStream for meta file open in HoodiePartitionMetadata Bhavani Sudha Saktheeswaran 2019-02-13 12:54:01 -08:00
  • 8dddecf00f handle no such element exception in HoodieSparkSqlWriter Kent Yao 2019-02-14 16:50:46 +08:00
  • a16aa2a78f Create CNAME vinoth chandar 2019-02-15 21:53:08 -08:00
  • ef0d6f2218 Update site url in README vinoth chandar 2019-02-15 21:28:39 -08:00
  • 09f203d324 typo: bundle jar with unrecongnized variables Kent Yao 2019-02-13 16:46:11 +08:00
  • 8adaca3454 Table rollback for inflight compactions MUST not delete instant files at any time to avoid race conditions Balaji Varadarajan 2019-02-05 16:54:31 -08:00
  • defcf6a0b9 Fix Hoodie Record Reader to work with non-partitioned dataset Balaji Varadarajan 2019-02-11 11:39:29 -08:00
  • 3a0044216c New Features in DeltaStreamer : (1) Apply transformation when using delta-streamer to ingest data. (2) Add Hudi Incremental Source for Delta Streamer (3) Allow delta-streamer config-property to be passed as command-line (4) Add Hive Integration to Delta-Streamer and address Review comments (5) Ensure MultiPartKeysValueExtractor handle hive style partition description (6) Reuse same spark session on both source and transformer (7) Support extracting partition fields from _hoodie_partition_path for HoodieIncrSource (8) Reuse Binary Avro coders (9) Add push down filter for Incremental source (10) Add Hoodie DeltaStreamer metrics to track total time taken Balaji Varadarajan 2018-10-10 10:31:34 -07:00
  • c70dbc13e9 Updating new slack signup link Vinoth Chandar 2019-02-06 13:50:16 -08:00
  • 2b55f0751f Using immutable map instead of mutables to generate parameters Kent Yao 2019-01-29 10:01:25 +08:00
  • 7985eb72b5 Fixing behavior of Merge/CreateHandle for invalid/wrong schema records Nishith Agarwal 2019-01-22 13:04:15 -08:00
  • 994d42d307 cleaner should now use commit timeline and not include deltacomits Nishith Agarwal 2018-12-26 13:37:22 -08:00
  • 68723764ed Adding compaction to HoodieClient example Nishith Agarwal 2019-01-15 13:51:47 -08:00
  • 169e3f66bb Filtering partition paths before performing a list status on all partitions Nishith Agarwal 2018-12-28 10:24:23 -08:00
  • d1bb804577 Passing a path filter to avoid including folders under .hoodie directory as partition paths Nishith Agarwal 2019-01-04 15:01:49 -08:00
  • 110df7190b Enabling hard deletes for MergeOnRead table type Nishith Agarwal 2018-12-21 14:51:44 -08:00
  • 345aaa31aa Add m2 directory to Travis cache Manu Sridharan 2018-12-16 09:27:27 -08:00
  • 6946dd7557 General enhancements arukavytsia 2018-12-12 03:19:43 +02:00
  • 30c5f8b7bd Ensure Hoodie works for non-partitioned Hive table Balaji Varadarajan 2018-12-05 11:42:38 -08:00
  • 466ff73ffb fix some spell errorin Hudi xubo245 2018-12-11 09:16:37 +08:00
  • bf65219b73 feat(SparkDataSource): add structured streaming jiale.tan 2018-10-11 17:40:32 -07:00
  • 7243ce40c9 Serializing the complete payload object instead of serializing just the GenericRecord Removing Converter hierarchy as we now depend purely on JavaSerialization and require the payload to be java serializable Nishith Agarwal 2018-11-04 16:03:56 -08:00
  • e83dde3b95 Returning empty Statues for an empty spark partition caused due to incorrect bin packing Nishith Agarwal 2018-11-26 23:17:32 -08:00
  • 0015c9b00e Update committership for balaji Vinoth Chandar 2018-11-30 16:21:20 -08:00
  • f999e4960c Avoid WriteStatus collect() call when committing batch Balaji Varadarajan 2018-11-27 23:21:34 -08:00
  • fa65db9c4c Explicitly handle lack of append() support during LogWriting Vinoth Chandar 2018-11-27 16:54:46 -08:00
  • d0fde47458 Fixing number of insert buckets to be generated by rounding off to the closest greater integer Nishith Agarwal 2018-11-13 15:42:36 -08:00
  • 1362942aa3 Enabling auto tuning of insert splits by default Vinoth Chandar 2018-11-07 17:14:53 -08:00
  • 25cd05b24e Useful Hudi CLI commands to debug/analyze production workloads Balaji Varadarajan 2018-10-02 10:12:20 -07:00
  • 07324e7a20 Compaction validate, unschedule and repair Balaji Varadarajan 2018-10-03 10:39:10 -07:00
  • d904fe69ca Fix addMetadataFields() to carry over 'props' Xinli shang 2018-10-11 13:48:04 -07:00
  • 48aa026dc4 Adding documentation for migration guide and COW vs MOR tradeoffs, moving some docs around for more clarity Nishith Agarwal 2018-09-25 16:04:50 -07:00
  • 1628d044ac feat(SparkDataSource): add additional feature to drop later arriving dups jiale.tan 2018-10-04 17:56:51 -07:00
  • 8485b9e263 Fix regression which broke HudiInputFormat handling of non-hoodie datasets Balaji Varadarajan 2018-10-11 11:35:32 -07:00
  • 1fca9b21cc Add --filter-dupes to DeltaStreamer Vinoth Chandar 2018-10-03 18:02:09 +01:00
  • 0a200c32e5 Reflect new committership, id changes for devs vinoth chandar 2018-10-02 10:30:39 +05:30