[GitHub] carbondata pull request #2971: [TEST] Test loading performance of range_sort

classic Classic list List threaded Threaded
92 messages Options
12345
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2971: [CARBONDATA-3219] Support range partition the input ...

qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2971
 
    Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10413/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2971: [CARBONDATA-3219] Support range partition the...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user QiangCai commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2971#discussion_r245260405
 
    --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/load/DataLoadProcessBuilderOnSpark.scala ---
    @@ -156,4 +161,206 @@ object DataLoadProcessBuilderOnSpark {
           Array((uniqueLoadStatusId, (loadMetadataDetails, executionErrors)))
         }
       }
    +
    +  /**
    +   * 1. range partition the whole input data
    +   * 2. for each range, sort the data and writ it to CarbonData files
    +   */
    +  def loadDataUsingRangeSort(
    --- End diff --
   
    yes,  we can reuse the conversion step and the final status update part.
    but I find it will not easy to read the code flow.
    so I try to reuse the final status update part.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2971: [CARBONDATA-3219] Support range partition the input ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user QiangCai commented on the issue:

    https://github.com/apache/carbondata/pull/2971
 
    @ravipesala @kumarvishal09
    please review again.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2971: [CARBONDATA-3219] Support range partition the input ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2971
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2164/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2971: [CARBONDATA-3219] Support range partition the input ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2971
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2377/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2971: [CARBONDATA-3219] Support range partition the input ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2971
 
    Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10418/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2971: [CARBONDATA-3219] Support range partition the input ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2971
 
    @QiangCai My question how the user can benefit if he chooses a different range column for each load. I feel range column should be at the table level not at the load level.
    And regarding compaction, yes currently after compaction it becomes local sort but there is a way we can support range column compaction like how we do compaction for partitions. This work can be done in future. But if you allow the user to choose range column at each load level then this type of compaction cannot be done.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2971: [CARBONDATA-3219] Support range partition the input ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user QiangCai commented on the issue:

    https://github.com/apache/carbondata/pull/2971
 
    @ravipesala
    I agree with you to add it to the table properties.
    Even if it becomes the table property, maybe the user also can change it. right?
    Range_column is different from the partition table.
    For range_column, the range boundaries are different for all segments. (Global_SORT also)
    For the partition table, the range boundaries are the same for all segments.



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2971: [CARBONDATA-3219] Support range partition the input ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2971
 
    @QiangCai we should restrict changing that property from table properties.
    I am just explaining about how we can do the compaction on range column since there are similarities with partitioning I mentioned it here.
    I feel range boundaries can be recalculated during the compaction using min/max of range column and go for the merge sort.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2971: [CARBONDATA-3219] Support range partition the input ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user QiangCai commented on the issue:

    https://github.com/apache/carbondata/pull/2971
 
    @ravipesala
    In my opinion, it is unnecessary to restrict changing.
    The users will keep the range_column as unchanged as possible.
    So I only add this option into loading command.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2971: [CARBONDATA-3219] Support range partition the input ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2971
 
    LGTM @QiangCai I feel it is better to keep in tableproprties as it is not supposed changed for each load. We can further discuss and raise another PR if needed, I am merging this now. Thanks for working on it.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2971: [CARBONDATA-3219] Support range partition the...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user asfgit closed the pull request at:

    https://github.com/apache/carbondata/pull/2971


---
12345