Hi Community,
Three years have passed since the launching of the Apache CarbonData project, CarbonData has become a popular data management solution for various scenarios. As new workload like AI and new runtime environment like the cloud is emerging quickly, I think we are reaching a point that needs to discuss the future of CarbonData. To bring CarbonData to a new level to satisfy those new requirements, Jacky and I drafted a roadmap for CarbonData 2 in the cwiki website. - English Version: https://cwiki.apache.org/confluence/display/CARBONDATA/Apache+CarbonData+2+Roadmap+Proposal - Chinese Version: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120737492 Please feel free to discuss the roadmap in this thread, and we welcome every feedback to make CarbonData better. Thanks and Regards, Ravindra. |
currently, datamap in carbon applys to all segments.
The roadmap refers to commands like add/drop segment, and also maybe something about incremental loading for MV. For these scenes, it is better to make datamap can be use on segment level instead of disable the datamap when any datamap data is not ready for any segment. Also this can make datamap fail-safe and enhance carbon's stablility. Maybe we can consider about this also. ----- Regards Manhua ---Original--- From: "Ravindra Pesala"<[hidden email]> Date: Tue, Jul 16, 2019 22:31 PM To: "dev"<[hidden email]>; Subject: [Discussion] Roadmap for Apache CarbonData 2 Hi Community, Three years have passed since the launching of the Apache CarbonData project, CarbonData has become a popular data management solution for various scenarios. As new workload like AI and new runtime environment like the cloud is emerging quickly, I think we are reaching a point that needs to discuss the future of CarbonData. To bring CarbonData to a new level to satisfy those new requirements, Jacky and I drafted a roadmap for CarbonData 2 in the cwiki website. - English Version: https://cwiki.apache.org/confluence/display/CARBONDATA/Apache+CarbonData+2+Roadmap+Proposal - Chinese Version: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120737492 Please feel free to discuss the roadmap in this thread, and we welcome every feedback to make CarbonData better. Thanks and Regards, Ravindra.
Regards
Manhua |
This post was updated on .
In reply to this post by ravipesala
Hi Community,
I have already read CarbonData 2 roadmap.I consider that integration with Flink of CarbonData 2 features should take more effort to focus on its implementation.As we all know,the 1.9 version of Flink will be released at the end of this month,which is merged with Blink of Alibaba.Building real-time data warehouses through the CarbonData integration of Flink will attract many engineers to use CarbonData to add more real-time artificial intelligence platform possibilities.It's just my option,and I have great interest in build integration with Flink. Thanks, Nicholas |
In reply to this post by manhua
Hi Kevin,
Yes, we can improve it. The implementation is closely related to supporting pre-aggregate datamaps on the streaming table which we have already implemented some time ago. And same will be reimplemented for MV datamap soon as well. The implementation allows using of pre-aggregate datamap for non-streaming segments and main table for streaming segments. We update the query plan to do union on both the tables and query only the streaming segments for main table. So even in our case also we can use the same way, we can do the union query of MV table and main table(only non loaded datamap segments) and execute the query. We can definitely consider after we support streaming table for MV datamap. Regards, Ravindra. On Wed, 17 Jul 2019 at 07:55, kevinjmh <[hidden email]> wrote: > currently, datamap in carbon applys to all segments. > The roadmap refers to commands like add/drop segment, and also maybe > something > about incremental loading for MV. For these scenes, it is better to make > datamap can be use on segment level instead of disable the datamap when any > datamap data is not ready for any segment. Also this can make datamap > fail-safe and enhance carbon's stablility. > Maybe we can consider about this also. > > > > > ----- > Regards > Manhua > > > > ---Original--- > From: "Ravindra Pesala"<[hidden email]> > Date: Tue, Jul 16, 2019 22:31 PM > To: "dev"<[hidden email]>; > Subject: [Discussion] Roadmap for Apache CarbonData 2 > > > Hi Community, > > Three years have passed since the launching of the Apache CarbonData > project, CarbonData has become a popular data management solution for > various scenarios. As new workload like AI and new runtime environment like > the cloud is emerging quickly, I think we are reaching a point that needs > to discuss the future of CarbonData. > > To bring CarbonData to a new level to satisfy those new requirements, Jacky > and I drafted a roadmap for CarbonData 2 in the cwiki website. > - English Version: > > https://cwiki.apache.org/confluence/display/CARBONDATA/Apache+CarbonData+2+Roadmap+Proposal > - Chinese Version: > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120737492 > > Please feel free to discuss the roadmap in this thread, and we welcome > every feedback to make CarbonData better. > > Thanks and Regards, > Ravindra. -- Thanks & Regards, Ravi |
In reply to this post by Nicholas
Hi,
Yes, Flink and CarbonData integration will definitely attract more users. We welcome any contributions in that direction. Regards, Ravindra. On Thu, 18 Jul 2019 at 07:55, 蒋晓峰 <[hidden email]> wrote: > Hi Community, > > > > > I have already read CarbonData 2 roadmap.I consider that integration > with Flink of CarbonData 2 features should take more effort to focus on its > implementation.As we all know,the 1.9 version of Flink will be released at > the end of this month,which is merged with Blink of Alibaba.Building > real-time data warehouses through the CarbonData integration of Flink will > attract many engineers to use CarbonData to add more real-time artificial > intelligence platform possibilities.It's just my option,and I have great > interest in build integration with Flink together with you. > > > > > > > > > > > Thanks, > > > > > Nicholas -- Thanks & Regards, Ravi |
There are some problem when user handle AI data. For example, it's very slow when user upload or download lots of images from S3. It need about 10 hours when user upload 10 million images(40GB) to S3 by using 1 threads. AI developer also want to manage structured data and unstructured data for their AI training algorithm and predict or others.
We already do some works on CarbonData for AI domain, the performance is great, CarbonData is faster many times than raw data when upload/download data from S3. But there still has some problem, CarbonData should support or optimize. CarbonData should be ready to support data management for AI application. by qq mail ------------------ 原始邮件 ------------------ 发件人: "Ravindra Pesala"<[hidden email]>; 发送时间: 2019年7月18日(星期四) 晚上11:26 收件人: "dev"<[hidden email]>; 主题: Re: Apache CarbonData 2 RoadMap Feedback Hi, Yes, Flink and CarbonData integration will definitely attract more users. We welcome any contributions in that direction. Regards, Ravindra. On Thu, 18 Jul 2019 at 07:55, 蒋晓峰 <[hidden email]> wrote: > Hi Community, > > > > > I have already read CarbonData 2 roadmap.I consider that integration > with Flink of CarbonData 2 features should take more effort to focus on its > implementation.As we all know,the 1.9 version of Flink will be released at > the end of this month,which is merged with Blink of Alibaba.Building > real-time data warehouses through the CarbonData integration of Flink will > attract many engineers to use CarbonData to add more real-time artificial > intelligence platform possibilities.It's just my option,and I have great > interest in build integration with Flink together with you. > > > > > > > > > > > Thanks, > > > > > Nicholas -- Thanks & Regards, Ravi |
In reply to this post by ravipesala
Hi Ravi,
We can add below requirements in 2.0: 1. Data Loading performance improvement.(Need to analyze and improve) 2. Unify reading for carbon data file, currently data is read in two parts dimension and measure because of this number of IO is more. 3. Carbon Store size optimization(Already PR is raised need to revisit) and we can explore some more optimization(like RLE hybrid Bit Packing). 4. Presto enhancement(Like write support, Presto SQL adaptation, Complex type read support) 5. Spark Data Source V2 integration. 6. Spatial Index Support. -Regards Kumar Vishal On Thu, Jul 18, 2019 at 8:20 PM Ravindra Pesala <[hidden email]> wrote: > Hi Kevin, > > Yes, we can improve it. The implementation is closely related to supporting > pre-aggregate datamaps on the streaming table which we have already > implemented some time ago. And same will be reimplemented for MV datamap > soon as well. > The implementation allows using of pre-aggregate datamap for non-streaming > segments and main table for streaming segments. We update the query plan to > do union on both the tables and query only the streaming segments for main > table. > So even in our case also we can use the same way, we can do the union query > of MV table and main table(only non loaded datamap segments) and execute > the query. We can definitely consider after we support streaming table for > MV datamap. > > Regards, > Ravindra. > > On Wed, 17 Jul 2019 at 07:55, kevinjmh <[hidden email]> wrote: > > > currently, datamap in carbon applys to all segments. > > The roadmap refers to commands like add/drop segment, and also maybe > > something > > about incremental loading for MV. For these scenes, it is better to make > > datamap can be use on segment level instead of disable the datamap when > any > > datamap data is not ready for any segment. Also this can make datamap > > fail-safe and enhance carbon's stablility. > > Maybe we can consider about this also. > > > > > > > > > > ----- > > Regards > > Manhua > > > > > > > > ---Original--- > > From: "Ravindra Pesala"<[hidden email]> > > Date: Tue, Jul 16, 2019 22:31 PM > > To: "dev"<[hidden email]>; > > Subject: [Discussion] Roadmap for Apache CarbonData 2 > > > > > > Hi Community, > > > > Three years have passed since the launching of the Apache CarbonData > > project, CarbonData has become a popular data management solution for > > various scenarios. As new workload like AI and new runtime environment > like > > the cloud is emerging quickly, I think we are reaching a point that needs > > to discuss the future of CarbonData. > > > > To bring CarbonData to a new level to satisfy those new requirements, > Jacky > > and I drafted a roadmap for CarbonData 2 in the cwiki website. > > - English Version: > > > > > https://cwiki.apache.org/confluence/display/CARBONDATA/Apache+CarbonData+2+Roadmap+Proposal > > - Chinese Version: > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120737492 > > > > Please feel free to discuss the roadmap in this thread, and we welcome > > every feedback to make CarbonData better. > > > > Thanks and Regards, > > Ravindra. > > > > -- > Thanks & Regards, > Ravi >
kumar vishal
|
Hi Team
Its glad to see how Carbondata has grown and become popular over the time. It was important to re-look and come up with a roadmap as per future needs. Carbondata 2.0 proposal looks good as we are trying to align it with Cloud which will be more or less the prominent run time environment in the near future. A lot of code refactoring will be required as per the roadmap. I would like to add a couple of points. 1. Complex type support: Although we do have complex type support there is scope for improvement. use cases for nested columns are growing extensively. We should work on improving the storage of nested columns and should also support creating compound/multi column indexes for the nested columns. 2. Feature code segregation and Pluggability: Current code is tightly coupled. The ideal case would be to have a base and make all the features pluggable into it but that will be hard to achieve. We can try segregation at the package level for major features but for any new feature developed we should think in terms of pluggability. [Clarification] Carbon UI: I did not understand the usage of Carbon segment management UI. For cloud scenario we will have to expose rest end points which will make carbon more like a Microservice and that does not go along with Carbondata use case. UI/tool makes more sense for internal testing but not sure how it will be beneficial for end user. May be a tool showing the data stored in each table would be more useful to the end user. Regards Manish Gupta On Tue, Aug 13, 2019 at 4:51 PM Kumar Vishal <[hidden email]> wrote: > Hi Ravi, > > We can add below requirements in 2.0: > > 1. Data Loading performance improvement.(Need to analyze and improve) > 2. Unify reading for carbon data file, currently data is read in two parts > dimension and measure because of this number of IO is more. > 3. Carbon Store size optimization(Already PR is raised need to revisit) and > we can explore some more optimization(like RLE hybrid Bit Packing). > 4. Presto enhancement(Like write support, Presto SQL adaptation, Complex > type read support) > 5. Spark Data Source V2 integration. > 6. Spatial Index Support. > > > -Regards > Kumar Vishal > > On Thu, Jul 18, 2019 at 8:20 PM Ravindra Pesala <[hidden email]> > wrote: > > > Hi Kevin, > > > > Yes, we can improve it. The implementation is closely related to > supporting > > pre-aggregate datamaps on the streaming table which we have already > > implemented some time ago. And same will be reimplemented for MV datamap > > soon as well. > > The implementation allows using of pre-aggregate datamap for > non-streaming > > segments and main table for streaming segments. We update the query plan > to > > do union on both the tables and query only the streaming segments for > main > > table. > > So even in our case also we can use the same way, we can do the union > query > > of MV table and main table(only non loaded datamap segments) and execute > > the query. We can definitely consider after we support streaming table > for > > MV datamap. > > > > Regards, > > Ravindra. > > > > On Wed, 17 Jul 2019 at 07:55, kevinjmh <[hidden email]> wrote: > > > > > currently, datamap in carbon applys to all segments. > > > The roadmap refers to commands like add/drop segment, and also maybe > > > something > > > about incremental loading for MV. For these scenes, it is better to > make > > > datamap can be use on segment level instead of disable the datamap when > > any > > > datamap data is not ready for any segment. Also this can make datamap > > > fail-safe and enhance carbon's stablility. > > > Maybe we can consider about this also. > > > > > > > > > > > > > > > ----- > > > Regards > > > Manhua > > > > > > > > > > > > ---Original--- > > > From: "Ravindra Pesala"<[hidden email]> > > > Date: Tue, Jul 16, 2019 22:31 PM > > > To: "dev"<[hidden email]>; > > > Subject: [Discussion] Roadmap for Apache CarbonData 2 > > > > > > > > > Hi Community, > > > > > > Three years have passed since the launching of the Apache CarbonData > > > project, CarbonData has become a popular data management solution for > > > various scenarios. As new workload like AI and new runtime environment > > like > > > the cloud is emerging quickly, I think we are reaching a point that > needs > > > to discuss the future of CarbonData. > > > > > > To bring CarbonData to a new level to satisfy those new requirements, > > Jacky > > > and I drafted a roadmap for CarbonData 2 in the cwiki website. > > > - English Version: > > > > > > > > > https://cwiki.apache.org/confluence/display/CARBONDATA/Apache+CarbonData+2+Roadmap+Proposal > > > - Chinese Version: > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120737492 > > > > > > Please feel free to discuss the roadmap in this thread, and we welcome > > > every feedback to make CarbonData better. > > > > > > Thanks and Regards, > > > Ravindra. > > > > > > > > -- > > Thanks & Regards, > > Ravi > > > |
In reply to this post by ravipesala
Hi, so glad to see Carbondata will enter stage 2.x and I have the following
suggestions for your consideration as following: 1. Evolution for Carbondata file format. Previously I thought one of the key highlights of Carbondata is the Carbondata file format, is there any evolution for that? While Carbondata steps to a broader application scopes, will the current file format still suite well for them? 2. Performance commitment of Carbondata. Seems that Carbondata cares more about expanding the scope of application than the performance enhancemance. What is the performance commitment of Carbondata 2 for dataloading&querying? Many enterprises do have big data, but that is not BIG enough to use cloud/datalake etc. For these scenarios, is Carbondata performance obviously better than other fileFormat+executionEngine combination? Do we have any plan for the enhancement? 3. Smarter Carbondata. As we suggested earlier, is Carbondata advisor on the roadmap? Carbondata has many features, but I notice that part of them are never used by the user. While Carbondata will serve AI scope, can itself be smarter as well? The Carbondata advisor is a DBA for Carbondata which will monitor the workload, usage, current performance and give proper suggestions or even can do proper operation itself. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
antl4 parse sql
xuchuanyin <[hidden email]> 于2019年8月17日周六 上午11:31写道: > Hi, so glad to see Carbondata will enter stage 2.x and I have the following > suggestions for your consideration as following: > > 1. Evolution for Carbondata file format. > Previously I thought one of the key highlights of Carbondata is the > Carbondata file format, is there any evolution for that? > While Carbondata steps to a broader application scopes, will the current > file format still suite well for them? > > > 2. Performance commitment of Carbondata. > Seems that Carbondata cares more about expanding the scope of application > than the performance enhancemance. > What is the performance commitment of Carbondata 2 for > dataloading&querying? > Many enterprises do have big data, but that is not BIG enough to use > cloud/datalake etc. > For these scenarios, is Carbondata performance obviously better than other > fileFormat+executionEngine combination? > Do we have any plan for the enhancement? > > > 3. Smarter Carbondata. > As we suggested earlier, is Carbondata advisor on the roadmap? > Carbondata has many features, but I notice that part of them are never used > by the user. > While Carbondata will serve AI scope, can itself be smarter as well? > The Carbondata advisor is a DBA for Carbondata which will monitor the > workload, usage, current performance and give proper suggestions or even > can > do proper operation itself. > > > > > -- > Sent from: > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ > |
Free forum by Nabble | Edit this page |