[ANNOUNCE] Apache CarbonData 1.6.1 release

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[ANNOUNCE] Apache CarbonData 1.6.1 release

sraghunandan
Hi All,

Apache CarbonData community is pleased to announce the release of the
Version 1.6.1 in The Apache Software Foundation (ASF).

CarbonData is a high-performance data solution that supports various data
analytic scenarios, including BI analysis, ad-hoc SQL query, fast filter
lookup on detail record, streaming analytics, and so on. CarbonData has
been deployed in many enterprise production environments, in one of the
largest scenarios, it supports queries on a single table with 3PB data
(more than 5 trillion records) with response time less than 3 seconds!

We encourage you to use the release
https://dist.apache.org/repos/dist/release/carbondata/1.6.1/, and feedback
through the CarbonData user mailing lists <[hidden email]>!

This release note provides information on the new features, improvements,
and bug fixes of this release.
What’s New in CarbonData Version 1.6.1?

CarbonData 1.6.1 intention was to move closer to unified analytics and
improve the stability. In this version of CarbonData, around 40 JIRA
tickets related to improvements, and bugs have been resolved. Following are
the summary.


Index Server performance improvements for Full Scan and TPCH Queries
Carbon currently prunes and caches all block/blocklet datamap index
information into the driver. If the cache size becomes huge(70-80% of the
driver memory) then there can be excessive GC in the driver which can slow
down the queries and the driver may even go OutOfMemory. Moving out the
indexes to separate JDBCServer reduced the overhead on the primary
JDBCServer, but introduced delay in fetching the bulk pruning blocks list
from the Index server. This is improved in this release and performance is
same as running without Index Server.

Behaviour Change

None


Please find the detailed JIRA list:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220&version=12345993


Sub-task

   - [CARBONDATA-3454
   <https://issues.apache.org/jira/browse/CARBONDATA-3454>] - Optimize the
   performance of select coun(*) for index server
   - [CARBONDATA-3462
   <https://issues.apache.org/jira/browse/CARBONDATA-3462>] - Add usage and
   deployment document for index server

Bug

   - [CARBONDATA-3452
   <https://issues.apache.org/jira/browse/CARBONDATA-3452>] - select query
   failure when substring on dictionary column with join
   - [CARBONDATA-3474
   <https://issues.apache.org/jira/browse/CARBONDATA-3474>] - Fix validate
   mvQuery having filter expression and correct error message
   - [CARBONDATA-3476
   <https://issues.apache.org/jira/browse/CARBONDATA-3476>] - Read time and
   scan time stats shown wrong in executor log for filter query
   - [CARBONDATA-3477
   <https://issues.apache.org/jira/browse/CARBONDATA-3477>] - Throw out
   exception when use sql: 'update table select\n...'
   - [CARBONDATA-3478
   <https://issues.apache.org/jira/browse/CARBONDATA-3478>] - Fix
   ArrayIndexOutOfBoundsException issue on compaction after alter rename
   operation
   - [CARBONDATA-3480
   <https://issues.apache.org/jira/browse/CARBONDATA-3480>] - Remove
   Modified MDT and make relation refresh only when schema file is modified.
   - [CARBONDATA-3481
   <https://issues.apache.org/jira/browse/CARBONDATA-3481>] - Multi-thread
   pruning fails when datamaps count is just near numOfThreadsForPruning
   - [CARBONDATA-3482
   <https://issues.apache.org/jira/browse/CARBONDATA-3482>] - Null pointer
   exception when concurrent select queries are executed from different
   beeline terminals.
   - [CARBONDATA-3483
   <https://issues.apache.org/jira/browse/CARBONDATA-3483>] - Can not run
   horizontal compaction when execute update sql
   - [CARBONDATA-3485
   <https://issues.apache.org/jira/browse/CARBONDATA-3485>] - data loading
   is failed from S3 to hdfs table having ~2K carbonfiles
   - [CARBONDATA-3486
   <https://issues.apache.org/jira/browse/CARBONDATA-3486>] -
   Serialization/ deserialization issue with Datatype
   - [CARBONDATA-3487
   <https://issues.apache.org/jira/browse/CARBONDATA-3487>] - wrong Input
   metrics (size/record) displayed in spark UI during insert into
   - [CARBONDATA-3490
   <https://issues.apache.org/jira/browse/CARBONDATA-3490>] - Concurrent
   data load failure with carbondata FileNotFound exception
   - [CARBONDATA-3493
   <https://issues.apache.org/jira/browse/CARBONDATA-3493>] - Carbon query
   fails when enable.query.statistics is true in specific scenario.
   - [CARBONDATA-3494
   <https://issues.apache.org/jira/browse/CARBONDATA-3494>] - Nullpointer
   exception in case of drop table
   - [CARBONDATA-3495
   <https://issues.apache.org/jira/browse/CARBONDATA-3495>] - Insert into
   Complex data type of Binary fails with Carbon & SparkFileFormat
   - [CARBONDATA-3499
   <https://issues.apache.org/jira/browse/CARBONDATA-3499>] - Fix insert
   failure with customFileProvider
   - [CARBONDATA-3502
   <https://issues.apache.org/jira/browse/CARBONDATA-3502>] - Select query
   fails with UDF having Match expression inside IN expression
   - [CARBONDATA-3505
   <https://issues.apache.org/jira/browse/CARBONDATA-3505>] - Fixed drop
   database cascade issue when 2 database point to same location.
   - [CARBONDATA-3506
   <https://issues.apache.org/jira/browse/CARBONDATA-3506>] - Alter table
   add, drop, rename and datatype change fails with hive compatile property
   - [CARBONDATA-3507
   <https://issues.apache.org/jira/browse/CARBONDATA-3507>] - Create Table
   As Select Fails in Spark-2.3
   - [CARBONDATA-3508
   <https://issues.apache.org/jira/browse/CARBONDATA-3508>] - Select query
   fails when the cg datamap is dropped concurrently while running the select
   query on filter column on which datamap is created
   - [CARBONDATA-3513
   <https://issues.apache.org/jira/browse/CARBONDATA-3513>] - can not run
   major compaction when using hive partition table
   - [CARBONDATA-3520
   <https://issues.apache.org/jira/browse/CARBONDATA-3520>] - CTAS should
   fail if select query contains duplicate columns
   - [CARBONDATA-3526
   <https://issues.apache.org/jira/browse/CARBONDATA-3526>] - Cache issue
   and select query failure with multiple updates
   - [CARBONDATA-3527
   <https://issues.apache.org/jira/browse/CARBONDATA-3527>] - Throw 'String
   length cannot exceed 32000 characters' exception when load data with
   'GLOBAL_SORT' from csv which include big complex type data

Improvement

   - [CARBONDATA-3488
   <https://issues.apache.org/jira/browse/CARBONDATA-3488>] - Check the
   file size after move local file to carbon path
   - [CARBONDATA-3489
   <https://issues.apache.org/jira/browse/CARBONDATA-3489>] - Optimizing
   the performance of sorting
   - [CARBONDATA-3491
   <https://issues.apache.org/jira/browse/CARBONDATA-3491>] - Return
   updated/deleted rows count when execute update/delete sql
   - [CARBONDATA-3501
   <https://issues.apache.org/jira/browse/CARBONDATA-3501>] - Support to
   execute update sql on table with long_string field (Not update long_string
   field)
   - [CARBONDATA-3511
   <https://issues.apache.org/jira/browse/CARBONDATA-3511>] - Query time
   improvement by reducing the number of NameNode calls while having
   carbonindex files in the store
   - [CARBONDATA-3515
   <https://issues.apache.org/jira/browse/CARBONDATA-3515>] - Limit local
   dictionary size to 10% of allowed blocklet size
   - [CARBONDATA-3523
   <https://issues.apache.org/jira/browse/CARBONDATA-3523>] - Should store
   file size into index file
   - [CARBONDATA-3524
   <https://issues.apache.org/jira/browse/CARBONDATA-3524>] - support
   compaction by GLOBAL_SORT
   - [CARBONDATA-3528
   <https://issues.apache.org/jira/browse/CARBONDATA-3528>] - refactor java
   checkstyle rules
   - [CARBONDATA-3540
   <https://issues.apache.org/jira/browse/CARBONDATA-3540>] - Delete all
   external segments when dropping table
   - [CARBONDATA-3544
   <https://issues.apache.org/jira/browse/CARBONDATA-3544>] - CLI should
   support a option to show statistics for all columns