Apache CarbonData Dev Mailing List archive

Help, carbondata issues on spark

Classic

List

Threaded

3 messages Options

ilegend

Help, carbondata issues on spark

Hi guys
We're testing carbondata for our project. The performance of the carbondata is better than parquet under the special rules, but there are some problems. Do you have any solutions for our issues.
Hdfs 2.6, spark 2.1, carbondata 1.3
1.no multiple levels partitions , we need three levels partitions, like year,day,hour
2.spark needs import carbondata jar, we wouldn't modify the existing sql algorithm
3.low stability, insert failure frequently

Look forward to your reply.

发自我的 iPhone

Liang Chen

Re: Help, carbondata issues on spark

Administrator

Hi

1.no multiple levels partitions , we need three levels partitions, like
year,day,hour

Reply : Year,day,hour belong to one column(field) or three columns ? Can
you explain, what are your exact scenarios? we can help you to design
partition + sort columns to solve your specific query issues.

2.spark needs import carbondata jar, we wouldn't modify the existing sql
algorithm

Reply : No need to modify any sql rules , you can use all sql which be
supported by SparkSQL to query carbondata.

3.low stability, insert failure frequently
Reply : What are the exact error ?

Regards
Liang

ilegend wrote

> Hi guys
> We're testing carbondata for our project. The performance of the
> carbondata is better than parquet under the special rules, but there are
> some problems. Do you have any solutions for our issues.
> Hdfs 2.6, spark 2.1, carbondata 1.3
> 1.no multiple levels partitions , we need three levels partitions, like
> year,day,hour
> 2.spark needs import carbondata jar, we wouldn't modify the existing sql
> algorithm
> 3.low stability, insert failure frequently
>
> Look forward to your reply.
>
> 发自我的 iPhone

--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Jacky Li

Re: Help, carbondata issues on spark

In reply to this post by ilegend

> 在 2018年2月2日，上午11:30，ilegend <[hidden email]> 写道：
>
> Hi guys
> We're testing carbondata for our project. The performance of the carbondata is better than parquet under the special rules, but there are some problems. Do you have any solutions for our issues.
> Hdfs 2.6, spark 2.1, carbondata 1.3
> 1.no multiple levels partitions , we need three levels partitions, like year,day,hour

If you are looking for OLAP on timeseries day, you can try timeseries feature in 1.3, you can refer to the timeseries section in https://github.com/apache/carbondata/blob/master/docs/data-management-on-carbondata.md#pre-aggregate-tables <https://github.com/apache/carbondata/blob/master/docs/data-management-on-carbondata.md#pre-aggregate-tables>

> 2.spark needs import carbondata jar, we wouldn't modify the existing sql algorithm

I think if you are using CarbonSession, you have all builtin sql optimization support from carbon. You do not need to modify your spark jar.

> 3.low stability, insert failure frequently

Is it memory issue?

>
> Look forward to your reply.
>
> 发自我的 iPhone
>
>
>
>
>
>