[ANNOUNCE] Apache CarbonData 1.5.1 release

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[ANNOUNCE] Apache CarbonData 1.5.1 release

ravipesala
Hi,

Apache CarbonData community is pleased to announce the release of the
Version 1.5.1 in The Apache Software Foundation (ASF).

CarbonData is a high-performance data solution that supports various data
analytic scenarios, including BI analysis, ad-hoc SQL query, fast filter
lookup on detail record, streaming analytics, and so on. CarbonData has
been deployed in many enterprise production environments, in one of the
largest scenario it supports queries on single table with 3PB data (more
than 5 trillion records) with response time less than 3 seconds!

We encourage you to use the release
https://dist.apache.org/repos/dist/release/carbondata/1.5.1/, and feedback
through the CarbonData user mailing lists <[hidden email]>!

This release note provides information on the new features, improvements,
and bug fixes of this release.
What’s New in CarbonData Version 1.5.1?

CarbonData 1.5.1 intention was to move more closer to unified analytics. We
want to enable CarbonData files to be read from more engines/libraries to
support various use cases. In this regard we have added support to write
CarbonData files from c++ libraries.

CarbonData added multiple optimizations to improve query and compaction
performance.

In this version of CarbonData, more than 78 JIRA tickets related to new
features, improvements, and bugs have been resolved. Following are the
summary.
CarbonData CoreSupport Custom Column Compressor

Carbondata supports customized column compressor so that user can add their
own implementation of compressor. To customize compressor, user can
directly use its full class name while creating table or setting it to
carbon property.
Performance ImprovementsOptimized Carbondata Scan Performance

Carbondata scan performance is improved by avoiding multiple data copies in
case of vector flow. This is achieved through short-circuit the read and
vector filling, it means fill the data directly to vector after reading the
data from file with out any intermediate copies.

Now row level filter processing is handled in execution engine, only
blocklet and page pruning is handled in CarbonData for vector flow. This is
controlled by property  *carbon.push.rowfilters.for.vector *and default it
is false.
Optimized Compaction Performance

Compaction performance is optimized through pre-fetching the data while
reading carbon files.
Improved Blocklet DataMap Pruning in Driver

Blocklet DataMap pruning is improved using multi-thread processing in
driver.
CarbonData SDKSDK Supports C++ Interfaces for Writing CarbonData files

To enable integration with non java based execution engines, CarbonData
supports C++ JNI wrapper to write the CarbonData files. It can be
integrated with any execution engine and write data to CarbonData files
without the dependency on Spark or Hadoop.
Multi-Thread Read API in SDK

To improve the read performance when using SDK, CarbonData supports
multi-thread read APIs. This enables the applications to read data from
multiple CarbonData files in parallel. It significantly improves the SDK
read performance.
Other Improvements

   - Added more CLI enhancements by adding more options.
   - Supported fallback mechanism, when offheap memory is not enough then
   switch to on heap instead of failing the job
   - Supported a separate audit log.
   - Support read batch row in CSDK to improve performance.

Behavior Change

   - Enable Local dictionary by default.
   - Make inverted index false by default.
   - Sort temp files during data loading are now compressed by default with
   Snappy compression to improve IO.

New Configuration Parameters
Configuration name
Default Value
Range
*carbon.push.rowfilters.for.vector* false

NA
*carbon.max.driver.threads.for.block.pruning* 4 1-4


Please find the detailed JIRA list:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220&version=12344320
Sub-task

   - [CARBONDATA-2930
   <https://issues.apache.org/jira/browse/CARBONDATA-2930>] - Support
   customize column compressor
   - [CARBONDATA-2981
   <https://issues.apache.org/jira/browse/CARBONDATA-2981>] - Support read
   primitive data type in CSDK
   - [CARBONDATA-2997
   <https://issues.apache.org/jira/browse/CARBONDATA-2997>] - Support read
   schema from index file and data file in CSDK
   - [CARBONDATA-3000
   <https://issues.apache.org/jira/browse/CARBONDATA-3000>] - Provide C++
   interface for writing carbon data
   - [CARBONDATA-3003
   <https://issues.apache.org/jira/browse/CARBONDATA-3003>] - Suppor read
   batch row in CSDK
   - [CARBONDATA-3004
   <https://issues.apache.org/jira/browse/CARBONDATA-3004>] - Fix bug in
   writing dataframe to carbon table while the field order is different
   - [CARBONDATA-3038
   <https://issues.apache.org/jira/browse/CARBONDATA-3038>] - Add
   annotation for carbon properties and mark whether is dynamic configuration
   - [CARBONDATA-3044
   <https://issues.apache.org/jira/browse/CARBONDATA-3044>] - Handle
   exception in CSDK
   - [CARBONDATA-3056
   <https://issues.apache.org/jira/browse/CARBONDATA-3056>] - Implement
   concurrent reading through CarbonReader
   - [CARBONDATA-3057
   <https://issues.apache.org/jira/browse/CARBONDATA-3057>] - Implement
   Vectorized CarbonReader for SDK
   - [CARBONDATA-3063
   <https://issues.apache.org/jira/browse/CARBONDATA-3063>] - Support set
   carbon property in CSDK
   - [CARBONDATA-3095
   <https://issues.apache.org/jira/browse/CARBONDATA-3095>] - Optimize the
   documentation of SDK/CSDK
   - [CARBONDATA-3131
   <https://issues.apache.org/jira/browse/CARBONDATA-3131>] - Update the
   requested columns to the Scan

Bug

   - [CARBONDATA-2996
   <https://issues.apache.org/jira/browse/CARBONDATA-2996>] -
   readSchemaInIndexFile can't read schema by folder path
   - [CARBONDATA-2998
   <https://issues.apache.org/jira/browse/CARBONDATA-2998>] - Refresh
   column schema for old store(before V3) for SORT_COLUMNS option
   - [CARBONDATA-3002
   <https://issues.apache.org/jira/browse/CARBONDATA-3002>] - Fix some
   spell error and remove the data after test case finished running
   - [CARBONDATA-3007
   <https://issues.apache.org/jira/browse/CARBONDATA-3007>] - Fix error in
   document
   - [CARBONDATA-3025
   <https://issues.apache.org/jira/browse/CARBONDATA-3025>] - Add SQL
   support for cli, and enhance CLI , add more metadata to carbon file
   - [CARBONDATA-3026
   <https://issues.apache.org/jira/browse/CARBONDATA-3026>] - clear expired
   property that may cause GC problem
   - [CARBONDATA-3029
   <https://issues.apache.org/jira/browse/CARBONDATA-3029>] - Failed to run
   spark data source test cases in windows env
   - [CARBONDATA-3036
   <https://issues.apache.org/jira/browse/CARBONDATA-3036>] - Carbon 1.5.0
   B010 - Select query fails when min/max exceeds and index tree cached
   - [CARBONDATA-3040
   <https://issues.apache.org/jira/browse/CARBONDATA-3040>] - Fix bug for
   merging bloom index
   - [CARBONDATA-3058
   <https://issues.apache.org/jira/browse/CARBONDATA-3058>] - Fix some
   exception coding in data loading
   - [CARBONDATA-3060
   <https://issues.apache.org/jira/browse/CARBONDATA-3060>] - Improve CLI
   and fix other bugs in CLI tool
   - [CARBONDATA-3062
   <https://issues.apache.org/jira/browse/CARBONDATA-3062>] - Fix
   Compatibility issue with cache_level as blocklet
   - [CARBONDATA-3065
   <https://issues.apache.org/jira/browse/CARBONDATA-3065>] - by default
   disable inverted index for all the dimension column
   - [CARBONDATA-3066
   <https://issues.apache.org/jira/browse/CARBONDATA-3066>] - ADD
   documentation for new APIs in SDK
   - [CARBONDATA-3069
   <https://issues.apache.org/jira/browse/CARBONDATA-3069>] - fix bugs in
   setting cores for compaction
   - [CARBONDATA-3077
   <https://issues.apache.org/jira/browse/CARBONDATA-3077>] - Fixed query
   failure in fileformat due stale cache issue
   - [CARBONDATA-3078
   <https://issues.apache.org/jira/browse/CARBONDATA-3078>] - Exception
   caused by explain command for count star query without filter
   - [CARBONDATA-3081
   <https://issues.apache.org/jira/browse/CARBONDATA-3081>] - NPE when
   boolean column has null values with Vectorized SDK reader
   - [CARBONDATA-3083
   <https://issues.apache.org/jira/browse/CARBONDATA-3083>] - Null values
   are getting replaced by 0 after update operation.
   - [CARBONDATA-3084
   <https://issues.apache.org/jira/browse/CARBONDATA-3084>] - data load
   with float datatype falis with internal error
   - [CARBONDATA-3098
   <https://issues.apache.org/jira/browse/CARBONDATA-3098>] - Negative
   value exponents giving wrong results
   - [CARBONDATA-3106
   <https://issues.apache.org/jira/browse/CARBONDATA-3106>] -
   Written_BY_APPNAME is not serialized in executor with GlobalSort
   - [CARBONDATA-3117
   <https://issues.apache.org/jira/browse/CARBONDATA-3117>] - Rearrange the
   projection list in the Scan
   - [CARBONDATA-3120
   <https://issues.apache.org/jira/browse/CARBONDATA-3120>] -
   apache-carbondata-1.5.1-rc1.tar.gz Datamap's core and plan project,
   pom.xml, is version 1.5.0, which results in an inability to compile properly
   - [CARBONDATA-3122
   <https://issues.apache.org/jira/browse/CARBONDATA-3122>] - CarbonReader
   memory leak
   - [CARBONDATA-3123
   <https://issues.apache.org/jira/browse/CARBONDATA-3123>] - JVM crash
   when reading through CarbonReader
   - [CARBONDATA-3124
   <https://issues.apache.org/jira/browse/CARBONDATA-3124>] - Updated log
   message in Unsafe Memory Manager and changed faq.md accordingly.
   - [CARBONDATA-3132
   <https://issues.apache.org/jira/browse/CARBONDATA-3132>] - Unequal
   distribution of tasks in case of compaction
   - [CARBONDATA-3134
   <https://issues.apache.org/jira/browse/CARBONDATA-3134>] - Wrong result
   when a column is dropped and added using alter with blocklet cache.

New Feature

   - [CARBONDATA-2977
   <https://issues.apache.org/jira/browse/CARBONDATA-2977>] - Write
   uncompress_size to ChunkCompressMeta in the file

Improvement

   - [CARBONDATA-3008
   <https://issues.apache.org/jira/browse/CARBONDATA-3008>] - make
   yarn-local and multiple dir for temp data enable by default
   - [CARBONDATA-3009
   <https://issues.apache.org/jira/browse/CARBONDATA-3009>] - Optimize the
   entry point of code for MergeIndex
   - [CARBONDATA-3019
   <https://issues.apache.org/jira/browse/CARBONDATA-3019>] - Add error log
   in catch block to avoid to abort the exception which is thrown from catch
   block when there is an exception thrown in finally block
   - [CARBONDATA-3022
   <https://issues.apache.org/jira/browse/CARBONDATA-3022>] - Refactor
   ColumnPageWrapper
   - [CARBONDATA-3024
   <https://issues.apache.org/jira/browse/CARBONDATA-3024>] - Use Log4j
   directly
   - [CARBONDATA-3030
   <https://issues.apache.org/jira/browse/CARBONDATA-3030>] - Remove no use
   parameter in test case
   - [CARBONDATA-3031
   <https://issues.apache.org/jira/browse/CARBONDATA-3031>] - Find wrong
   description in the document for 'carbon.number.of.cores.while.loading'
   - [CARBONDATA-3032
   <https://issues.apache.org/jira/browse/CARBONDATA-3032>] - Remove
   carbon.blocklet.size from properties template
   - [CARBONDATA-3034
   <https://issues.apache.org/jira/browse/CARBONDATA-3034>] - Combing
   CarbonCommonConstants
   - [CARBONDATA-3035
   <https://issues.apache.org/jira/browse/CARBONDATA-3035>] - Optimize
   parameters for unsafe working and sort memory
   - [CARBONDATA-3039
   <https://issues.apache.org/jira/browse/CARBONDATA-3039>] - Fix Custom
   Deterministic Expression for rand() UDF
   - [CARBONDATA-3041
   <https://issues.apache.org/jira/browse/CARBONDATA-3041>] - Optimize load
   minimum size strategy for data loading
   - [CARBONDATA-3042
   <https://issues.apache.org/jira/browse/CARBONDATA-3042>] - Column Schema
   objects are present in Driver and Executor even after dropping table
   - [CARBONDATA-3046
   <https://issues.apache.org/jira/browse/CARBONDATA-3046>] - remove
   outdated configurations in template properties
   - [CARBONDATA-3047
   <https://issues.apache.org/jira/browse/CARBONDATA-3047>] -
   UnsafeMemoryManager fallback mechanism in case of memory not available
   - [CARBONDATA-3048
   <https://issues.apache.org/jira/browse/CARBONDATA-3048>] - Added Lazy
   Loading For 2.2/2.1
   - [CARBONDATA-3050
   <https://issues.apache.org/jira/browse/CARBONDATA-3050>] - Remove unused
   parameter doc
   - [CARBONDATA-3051
   <https://issues.apache.org/jira/browse/CARBONDATA-3051>] - unclosed
   streams cause tests failure in windows env
   - [CARBONDATA-3052
   <https://issues.apache.org/jira/browse/CARBONDATA-3052>] - Improve drop
   table performance by reducing the namenode RPC calls during physical
   deletion of files
   - [CARBONDATA-3053
   <https://issues.apache.org/jira/browse/CARBONDATA-3053>] - Un-closed
   file stream found in cli
   - [CARBONDATA-3054
   <https://issues.apache.org/jira/browse/CARBONDATA-3054>] - Dictionary
   file cannot be read in S3a with CarbonDictionaryDecoder.doConsume() codeGen
   - [CARBONDATA-3061
   <https://issues.apache.org/jira/browse/CARBONDATA-3061>] - Add
   validation for supported format version and Encoding type to throw proper
   exception to the user while reading a file
   - [CARBONDATA-3064
   <https://issues.apache.org/jira/browse/CARBONDATA-3064>] - Support
   separate audit log
   - [CARBONDATA-3067
   <https://issues.apache.org/jira/browse/CARBONDATA-3067>] - Add check for
   debug to avoid string concat
   - [CARBONDATA-3071
   <https://issues.apache.org/jira/browse/CARBONDATA-3071>] - Add
   CarbonSession Java Example
   - [CARBONDATA-3074
   <https://issues.apache.org/jira/browse/CARBONDATA-3074>] - Change
   default sort temp compressor to SNAPPY
   - [CARBONDATA-3075
   <https://issues.apache.org/jira/browse/CARBONDATA-3075>] - Select Filter
   fails for Legacy store if DirectVectorFill is enabled
   - [CARBONDATA-3087
   <https://issues.apache.org/jira/browse/CARBONDATA-3087>] - Prettify DESC
   FORMATTED output
   - [CARBONDATA-3088
   <https://issues.apache.org/jira/browse/CARBONDATA-3088>] - enhance
   compaction performance by using prefetch
   - [CARBONDATA-3104
   <https://issues.apache.org/jira/browse/CARBONDATA-3104>] - Extra
   Unnecessary Hadoop Conf is getting stored in LRU (~100K) for each LRU entry
   - [CARBONDATA-3112
   <https://issues.apache.org/jira/browse/CARBONDATA-3112>] - Optimise
   decompressing while filling the vector during conversion of primitive types
   - [CARBONDATA-3113
   <https://issues.apache.org/jira/browse/CARBONDATA-3113>] - Fixed Local
   Dictionary Query Performance and Added reusable buffer for direct flow
   - [CARBONDATA-3118
   <https://issues.apache.org/jira/browse/CARBONDATA-3118>] - Parallelize
   block pruning of default datamap in driver for filter query processing
   - [CARBONDATA-3121
   <https://issues.apache.org/jira/browse/CARBONDATA-3121>] - CarbonReader
   build time is huge
   - [CARBONDATA-3136
   <https://issues.apache.org/jira/browse/CARBONDATA-3136>] - JVM crash
   with preaggregate datamap


--
Thanks & Regards,
Ravindra
Reply | Threaded
Open this post in threaded view
|

Re: [ANNOUNCE] Apache CarbonData 1.5.1 release

xubo245
Reply | Threaded
Open this post in threaded view
|

Re: [ANNOUNCE] Apache CarbonData 1.5.1 release

xubo245
In reply to this post by ravipesala