Login  Register

Re: Open Discussion:Apache CarbonData Roadmap

Posted by Jihong Ma on Aug 11, 2016; 4:38am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Open-Discussion-Apache-CarbonData-Roadmap-tp49p94.html

I would like to add a little more context to Carbon's future plan:

1. Improve usability to make Carbon easy to use, introducing simplified table properties to configure carbon table, for instance: simple configuration to define MDK index, leave the complexity of performance tuning to internal.

2. Adding partitioning support is important for further performance enhancement, widely proved, no doubt about it.

3. Improve Carbon's extensibility : Define clear API interface between Carbon module to make it easy to extend in the future, this is required for integration with other processing framework as well as Carbon's own extension, for instance : introducing new file type to suite different workload.

4. Integration with streaming framework: as first step, enabling Kafka to write out Cabon data as a reliable sink .

Jihong


Sent from HUAWEI AnyOffice
From: Jacky Li
To: [hidden email];
Subject: Re: Open Discussion:Apache CarbonData Roadmap

Time: 2016-08-10 07:42:35
I think William’s point is valid, we should focus mainly on usability improvement in 0.2.0

Besides what Liang has pointed out, I have a brief list in mind that can be planned in several releases, if they make sense for the community users. They are mainly for more integration and more performance improvement.

1. Streaming ingest. It requires CarbonData to add new format support and integrate with streaming engine
2. Code refactory to make CarbonData in good shape to integrate processing framework other than spark, should be enable to integrate with both batch engine and streaming engine, including Hive/Flink/Beam/SparkStreaming/Kafka , etc.
3. More dictionary support. For example, for really high cardinality columns, can use file level local dictionary for encoding
4. More performance improvement for join operation leveraging CarbonData's late materialization


Regards,
Jacky

> 在 2016年8月9日,下午10:07,chenliang613 <[hidden email]> 写道:
>
> Hi William
>
> Thanks for your input.
> Most of your points would be considered in 0.2.0 : remove kettle, add create
> table properties for simplifying data load,especially for high cardinality
> columns setting, support 2.0
>
> Regards
> Liang
>
>
> --
> View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Open-Discussion-Apache-CarbonData-Roadmap-tp49p65.html
> Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
> Received: from 140.211.11.3 (unknown [140.211.11.3])
>        by newmx27.qq.com (NewMx) with SMTP id
>        for <[hidden email]>; Tue, 09 Aug 2016 22:07:22 +0800
> X-QQ-FEAT: 9w50BnWz/RNfZ7n2vc603oJoUfl5GGivHEdQYBRxC2u7k/n3I2o34fp5yz6iV
>        Dw4zg1QjjWpz1Ne/luuMeWylg81hMbQdOIzWd96hnYDLr8Oo9BEhz4BI/7Nv8seHmet6UWV
>        kTG3vcV0woN6p3vNFt6AtQk5u/McMnGhxo4a6EjwMzDeTCrS8vTKs8guSWINhP7YI3E2CKz
>        HwJxeowSz+Y9P/Sq/78Flhqzh1v3PH7u3AnoWqnKmdVdVF3I9s24fJLtrBYPHiAN9TQ+bwe
>        1Y/g==
> X-QQ-MAILINFO: NL3WKUOj1eeIq9ilG0feeyQgMypg5V3P+LBcwdBmPyY7tepW4nocKSbxX
>        8Yl1xOsQEoqxUiToiLsrhZQFbOerAGpd4F8KNhXiM+Zy1R0HDyfTdKsQxn7uDQZQXhL83Jn
>        wUqMGtxYFoTknKDh0EEgNV4=
> X-QQ-mid: usamxproxy15t1470751643tc27q81
> X-QQ-CSender: dev-return-657-jacky.likun=[hidden email]
> X-QQ-ORGSender: dev-return-657-jacky.likun=[hidden email]
> X-KK-mid:usamxproxy15t1470751643tc27q81
> Received: (qmail 62958 invoked by uid 500); 9 Aug 2016 14:07:22 -0000
> Mailing-List: contact [hidden email]; run by ezmlm
> Precedence: bulk
> List-Help: <mailto:[hidden email]>
> List-Unsubscribe: <mailto:[hidden email]>
> List-Post: <mailto:[hidden email]>
> List-Id: <dev.carbondata.incubator.apache.org>
> Reply-To: [hidden email]
> Delivered-To: mailing list [hidden email]
> Received: (qmail 62945 invoked by uid 99); 9 Aug 2016 14:07:22 -0000
> Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142)
>    by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Aug 2016 14:07:22 +0000
> Received: from localhost (localhost [127.0.0.1])
>        by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id B9F0C1804A2
>        for <[hidden email]>; Tue,  9 Aug 2016 14:07:21 +0000 (UTC)
> X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org
> X-Spam-Flag: NO
> X-Spam-Score: 3.736
> X-Spam-Level: ***
> X-Spam-Status: No, score=3.736 tagged_above=-999 required=6.31
>        tests=[DKIM_ADSP_CUSTOM_MED=0.001, FREEMAIL_ENVFROM_END_DIGIT=0.25,
>        NML_ADSP_CUSTOM_MED=1.2, RCVD_IN_DNSWL_NONE=-0.0001,
>        SPF_SOFTFAIL=0.972, URI_HEX=1.313] autolearn=disabled
> Received: from mx1-lw-eu.apache.org ([10.40.0.8])
>        by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024)
>        with ESMTP id 5ZNIc-hs1KLy for <[hidden email]>;
>        Tue,  9 Aug 2016 14:07:20 +0000 (UTC)
> Received: from mbob.nabble.com (mbob.nabble.com [162.253.133.15])
>        by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 6863860DFD
>        for <[hidden email]>; Tue,  9 Aug 2016 14:07:19 +0000 (UTC)
> Received: from msam.nabble.com (unknown [162.253.133.85])
>        by mbob.nabble.com (Postfix) with ESMTP id 4ED782E5DCA4
>        for <[hidden email]>; Tue,  9 Aug 2016 06:41:42 -0700 (PDT)
> Date: Tue, 9 Aug 2016 07:07:18 -0700 (MST)
> From: chenliang613 <[hidden email]>
> To: [hidden email]
> Message-ID: <[hidden email]>
> In-Reply-To: <[hidden email]>
> References: <[hidden email]> <[hidden email]>
> Subject: Re: Open Discussion:Apache CarbonData Roadmap
> MIME-Version: 1.0
> Content-Type: text/plain; charset=us-ascii
> Content-Transfer-Encoding: 7bit
>
> Hi William
>
> Thanks for your input.
> Most of your points would be considered in 0.2.0 : remove kettle, add create
> table properties for simplifying data load,especially for high cardinality
> columns setting, support 2.0
>
> Regards
> Liang
>
>
> --
> View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Open-Discussion-Apache-CarbonData-Roadmap-tp49p65.html
> Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.