Apache CarbonData Dev Mailing List archive

[ANNOUNCE] Apache CarbonData 1.5.3 release

Classic

List

Threaded

1 message

sraghunandan

[ANNOUNCE] Apache CarbonData 1.5.3 release

Hi All,

Apache CarbonData community is pleased to announce the release of the
Version 1.5.3 in The Apache Software Foundation (ASF).

CarbonData is a high-performance data solution that supports various data
analytic scenarios, including BI analysis, ad-hoc SQL query, fast filter
lookup on detail record, streaming analytics, and so on. CarbonData has
been deployed in many enterprise production environments, in one of the
largest scenario, it supports queries on a single table with 3PB data (more
than 5 trillion records) with response time less than 3 seconds!

We encourage you to use the release
https://dist.apache.org/repos/dist/release/carbondata/1.5.3/, and feedback
through the CarbonData user mailing lists <[hidden email]>!

This release note provides information on the new features, improvements,
and bug fixes of this release.
What’s New in CarbonData Version 1.5.3?

CarbonData 1.5.3 intention was to move closer to unified analytics. We are
allowing DDL to operate on LRU cache for the user to handle LRU cache as
per his requirement. We have also upgraded the integration support for
Presto latest version. More importantly, we have further improved the
CarbonData performance.

In this version of CarbonData, more than 20 JIRA tickets related to new
features, improvements, and bugs have been resolved. Following are the
summary.
CarbonData CoreDDL Support on CarbonData LRU Cache

Before, though the user could set the cache size, the functionality was
limited as the user did not have a clear picture of how much cache should
be set for his/her requirement.

From this version, we support DDL on CarbonData LRU Cache which allows you
to do the following operations:

- Show the current cache used per table.
- Showing current cache used for a specific table.
- Clearing cache for a specific table.

Supports SDK Read from Different Schema

This version allows the user to read two or more CarbonData files in a
location with different schema.
Performance ImprovementsImproved Single/Concurrent Query Performance

When the number of segments are more, query performance reduces due to
higher memory footprint, multi-thread pruning, retrieval from unsafe
Datamap, and so on.

In this version we have improved the query performance by following
modifications:

- Reduced memory footprints during the query.
- Added multi-thread pruning in case of nonfilter query.
- Updated driver cache unsafe storage format for faster retrieval of
data.

Improved Count(*) Query Performance

Before for count(*), the prune used to be the same as a select * query
which is very time-consuming due to different processes involved.

In this version, we have optimized the count(*) query performance by
reading blocklet row count directly from DataMapRow. This reduces query
time and improves the query performance.
Other ImprovementsPresto Version Upgrade

Now CarbonData integrates with the Presto version 0.217.
Behavior Change

None

Please find the detailed JIRA list:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220&version=12344322
Bug

- [CARBONDATA-3202
<https://issues.apache.org/jira/browse/CARBONDATA-3202>] - updated
schema is not updated in session catalog after add, drop or rename column.
- [CARBONDATA-3223
<https://issues.apache.org/jira/browse/CARBONDATA-3223>] - Datasize and
Indexsize showing 0B for 1.1 store when show segments is done
- [CARBONDATA-3284
<https://issues.apache.org/jira/browse/CARBONDATA-3284>] - Workaround
for Create-PreAgg Datamap Fail
- [CARBONDATA-3287
<https://issues.apache.org/jira/browse/CARBONDATA-3287>] - Remove the
validation of same chema data files in location for external table and file
format
- [CARBONDATA-3298
<https://issues.apache.org/jira/browse/CARBONDATA-3298>] - Logs are
getting printed when clean files is executed for old stores
- [CARBONDATA-3301
<https://issues.apache.org/jira/browse/CARBONDATA-3301>] - Array<date>
column is giving null data in case of spark carbon file format
- [CARBONDATA-3313
<https://issues.apache.org/jira/browse/CARBONDATA-3313>] - count(*) is
not invalidating the invalid segments cache
- [CARBONDATA-3314
<https://issues.apache.org/jira/browse/CARBONDATA-3314>] - Index Cache
Size printed in SHOW METACACHE ON TABLE DDL is not accurate
- [CARBONDATA-3315
<https://issues.apache.org/jira/browse/CARBONDATA-3315>] - Range Filter
query with two between clauses with an OR gives wrong results
- [CARBONDATA-3320
<https://issues.apache.org/jira/browse/CARBONDATA-3320>] - number of
partitions are always zero in describe formatted for hive native partition
- [CARBONDATA-3322
<https://issues.apache.org/jira/browse/CARBONDATA-3322>] - After
renaming table, "SHOW METACACHE ON TABLE" still works for old table
- [CARBONDATA-3323
<https://issues.apache.org/jira/browse/CARBONDATA-3323>] - Output is
null when cache is empty
- [CARBONDATA-3328
<https://issues.apache.org/jira/browse/CARBONDATA-3328>] - Performance
issue with merge small files distribution
- [CARBONDATA-3330
<https://issues.apache.org/jira/browse/CARBONDATA-3330>] - Fix Invalid
exception when SDK reader is trying to clear the datamap
- [CARBONDATA-3332
<https://issues.apache.org/jira/browse/CARBONDATA-3332>] - Concurrent
update and compaction failure
- [CARBONDATA-3333
<https://issues.apache.org/jira/browse/CARBONDATA-3333>] - Fixed No Sort
Store Size issue and Compatibility issue after alter addd column done in
1.1 and load in 1.5

New Feature

- [CARBONDATA-3300
<https://issues.apache.org/jira/browse/CARBONDATA-3300>] -
ClassNotFoundException when using UDF on spark-shell
- [CARBONDATA-3305
<https://issues.apache.org/jira/browse/CARBONDATA-3305>] - DDLs to
Operate on CarbonLRUCache
- [CARBONDATA-3329
<https://issues.apache.org/jira/browse/CARBONDATA-3329>] - DeadLock is
observed when a query fails.

Improvement

- [CARBONDATA-3293
<https://issues.apache.org/jira/browse/CARBONDATA-3293>] - Prune
datamaps improvement for count(*)
- [CARBONDATA-3318
<https://issues.apache.org/jira/browse/CARBONDATA-3318>] - Decoupling of
Cache Commands
- [CARBONDATA-3321
<https://issues.apache.org/jira/browse/CARBONDATA-3321>] - Improve
Single/Concurrent query performance

Regards
Raghunandan