[ANNOUNCE] Apache CarbonData 1.5.3 release

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[ANNOUNCE] Apache CarbonData 1.5.3 release

sraghunandan
Hi All,

Apache CarbonData community is pleased to announce the release of the
Version 1.5.3 in The Apache Software Foundation (ASF).

CarbonData is a high-performance data solution that supports various data
analytic scenarios, including BI analysis, ad-hoc SQL query, fast filter
lookup on detail record, streaming analytics, and so on. CarbonData has
been deployed in many enterprise production environments, in one of the
largest scenario, it supports queries on a single table with 3PB data (more
than 5 trillion records) with response time less than 3 seconds!

We encourage you to use the release
https://dist.apache.org/repos/dist/release/carbondata/1.5.3/, and feedback
through the CarbonData user mailing lists <[hidden email]>!

This release note provides information on the new features, improvements,
and bug fixes of this release.
What’s New in CarbonData Version 1.5.3?

CarbonData 1.5.3 intention was to move closer to unified analytics. We are
allowing DDL to operate on LRU cache for the user to handle LRU cache as
per his requirement. We have also upgraded the integration support for
Presto latest version. More importantly, we have further improved the
CarbonData performance.

In this version of CarbonData, more than 20 JIRA tickets related to new
features, improvements, and bugs have been resolved. Following are the
summary.
CarbonData CoreDDL Support on CarbonData LRU Cache

Before, though the user could set the cache size, the functionality was
limited as the user did not have a clear picture of how much cache should
be set for his/her requirement.

From this version, we support DDL on CarbonData LRU Cache which allows you
to do the following operations:

   - Show the current cache used per table.
   - Showing current cache used for a specific table.
   - Clearing cache for a specific table.

Supports SDK Read from Different Schema

This version allows the user to read two or more CarbonData files in a
location with different schema.
Performance ImprovementsImproved Single/Concurrent Query Performance

When the number of segments are more, query performance reduces due to
higher memory footprint, multi-thread pruning, retrieval from unsafe
Datamap, and so on.

In this version we have improved the  query performance by following
modifications:

   - Reduced memory footprints during the query.
   - Added multi-thread pruning in case of nonfilter query.
   - Updated driver cache unsafe storage format for faster retrieval of
   data.

Improved Count(*) Query Performance

Before for count(*), the prune used to be the same as a select * query
which is very time-consuming due to different processes involved.

In this version, we have optimized the count(*) query performance by
reading blocklet row count directly from DataMapRow. This reduces query
time and improves the query performance.
Other ImprovementsPresto Version Upgrade

Now CarbonData integrates with the Presto version 0.217.
Behavior Change

None


Please find the detailed JIRA list:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220&version=12344322
Bug

   - [CARBONDATA-3202
   <https://issues.apache.org/jira/browse/CARBONDATA-3202>] - updated
   schema is not updated in session catalog after add, drop or rename column.
   - [CARBONDATA-3223
   <https://issues.apache.org/jira/browse/CARBONDATA-3223>] - Datasize and
   Indexsize showing 0B for 1.1 store when show segments is done
   - [CARBONDATA-3284
   <https://issues.apache.org/jira/browse/CARBONDATA-3284>] - Workaround
   for Create-PreAgg Datamap Fail
   - [CARBONDATA-3287
   <https://issues.apache.org/jira/browse/CARBONDATA-3287>] - Remove the
   validation of same chema data files in location for external table and file
   format
   - [CARBONDATA-3298
   <https://issues.apache.org/jira/browse/CARBONDATA-3298>] - Logs are
   getting printed when clean files is executed for old stores
   - [CARBONDATA-3301
   <https://issues.apache.org/jira/browse/CARBONDATA-3301>] - Array<date>
   column is giving null data in case of spark carbon file format
   - [CARBONDATA-3313
   <https://issues.apache.org/jira/browse/CARBONDATA-3313>] - count(*) is
   not invalidating the invalid segments cache
   - [CARBONDATA-3314
   <https://issues.apache.org/jira/browse/CARBONDATA-3314>] - Index Cache
   Size printed in SHOW METACACHE ON TABLE DDL is not accurate
   - [CARBONDATA-3315
   <https://issues.apache.org/jira/browse/CARBONDATA-3315>] - Range Filter
   query with two between clauses with an OR gives wrong results
   - [CARBONDATA-3320
   <https://issues.apache.org/jira/browse/CARBONDATA-3320>] - number of
   partitions are always zero in describe formatted for hive native partition
   - [CARBONDATA-3322
   <https://issues.apache.org/jira/browse/CARBONDATA-3322>] - After
   renaming table, "SHOW METACACHE ON TABLE" still works for old table
   - [CARBONDATA-3323
   <https://issues.apache.org/jira/browse/CARBONDATA-3323>] - Output is
   null when cache is empty
   - [CARBONDATA-3328
   <https://issues.apache.org/jira/browse/CARBONDATA-3328>] - Performance
   issue with merge small files distribution
   - [CARBONDATA-3330
   <https://issues.apache.org/jira/browse/CARBONDATA-3330>] - Fix Invalid
   exception when SDK reader is trying to clear the datamap
   - [CARBONDATA-3332
   <https://issues.apache.org/jira/browse/CARBONDATA-3332>] - Concurrent
   update and compaction failure
   - [CARBONDATA-3333
   <https://issues.apache.org/jira/browse/CARBONDATA-3333>] - Fixed No Sort
   Store Size issue and Compatibility issue after alter addd column done in
   1.1 and load in 1.5

New Feature

   - [CARBONDATA-3300
   <https://issues.apache.org/jira/browse/CARBONDATA-3300>] -
   ClassNotFoundException when using UDF on spark-shell
   - [CARBONDATA-3305
   <https://issues.apache.org/jira/browse/CARBONDATA-3305>] - DDLs to
   Operate on CarbonLRUCache
   - [CARBONDATA-3329
   <https://issues.apache.org/jira/browse/CARBONDATA-3329>] - DeadLock is
   observed when a query fails.

Improvement

   - [CARBONDATA-3293
   <https://issues.apache.org/jira/browse/CARBONDATA-3293>] - Prune
   datamaps improvement for count(*)
   - [CARBONDATA-3318
   <https://issues.apache.org/jira/browse/CARBONDATA-3318>] - Decoupling of
   Cache Commands
   - [CARBONDATA-3321
   <https://issues.apache.org/jira/browse/CARBONDATA-3321>] - Improve
   Single/Concurrent query performance


Regards
Raghunandan