Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2971 Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10413/ --- |
In reply to this post by qiuchenjian-2
Github user QiangCai commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2971#discussion_r245260405 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/load/DataLoadProcessBuilderOnSpark.scala --- @@ -156,4 +161,206 @@ object DataLoadProcessBuilderOnSpark { Array((uniqueLoadStatusId, (loadMetadataDetails, executionErrors))) } } + + /** + * 1. range partition the whole input data + * 2. for each range, sort the data and writ it to CarbonData files + */ + def loadDataUsingRangeSort( --- End diff -- yes, we can reuse the conversion step and the final status update part. but I find it will not easy to read the code flow. so I try to reuse the final status update part. --- |
In reply to this post by qiuchenjian-2
Github user QiangCai commented on the issue:
https://github.com/apache/carbondata/pull/2971 @ravipesala @kumarvishal09 please review again. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2971 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2164/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2971 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2377/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2971 Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10418/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2971 @QiangCai My question how the user can benefit if he chooses a different range column for each load. I feel range column should be at the table level not at the load level. And regarding compaction, yes currently after compaction it becomes local sort but there is a way we can support range column compaction like how we do compaction for partitions. This work can be done in future. But if you allow the user to choose range column at each load level then this type of compaction cannot be done. --- |
In reply to this post by qiuchenjian-2
Github user QiangCai commented on the issue:
https://github.com/apache/carbondata/pull/2971 @ravipesala I agree with you to add it to the table properties. Even if it becomes the table property, maybe the user also can change it. right? Range_column is different from the partition table. For range_column, the range boundaries are different for all segments. (Global_SORT also) For the partition table, the range boundaries are the same for all segments. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2971 @QiangCai we should restrict changing that property from table properties. I am just explaining about how we can do the compaction on range column since there are similarities with partitioning I mentioned it here. I feel range boundaries can be recalculated during the compaction using min/max of range column and go for the merge sort. --- |
In reply to this post by qiuchenjian-2
Github user QiangCai commented on the issue:
https://github.com/apache/carbondata/pull/2971 @ravipesala In my opinion, it is unnecessary to restrict changing. The users will keep the range_column as unchanged as possible. So I only add this option into loading command. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2971 LGTM @QiangCai I feel it is better to keep in tableproprties as it is not supposed changed for each load. We can further discuss and raise another PR if needed, I am merging this now. Thanks for working on it. --- |
In reply to this post by qiuchenjian-2
|
Free forum by Nabble | Edit this page |