Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #2971: [TEST] Test loading performance of range_sort

Classic

List

Threaded

92 messages Options

12345

qiuchenjian-2

[GitHub] carbondata issue #2971: [CARBONDATA-3219] Support range partition the input ...

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2971

Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10413/

---

qiuchenjian-2

[GitHub] carbondata pull request #2971: [CARBONDATA-3219] Support range partition the...

In reply to this post by qiuchenjian-2

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2971#discussion_r245260405

--- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/load/DataLoadProcessBuilderOnSpark.scala ---
@@ -156,4 +161,206 @@ object DataLoadProcessBuilderOnSpark {
Array((uniqueLoadStatusId, (loadMetadataDetails, executionErrors)))
}
}
+
+ /**
+ * 1. range partition the whole input data
+ * 2. for each range, sort the data and writ it to CarbonData files
+ */
+ def loadDataUsingRangeSort(
--- End diff --

yes, we can reuse the conversion step and the final status update part.
but I find it will not easy to read the code flow.
so I try to reuse the final status update part.

---

qiuchenjian-2

[GitHub] carbondata issue #2971: [CARBONDATA-3219] Support range partition the input ...

In reply to this post by qiuchenjian-2

Github user QiangCai commented on the issue:

https://github.com/apache/carbondata/pull/2971

@ravipesala @kumarvishal09
please review again.

---

qiuchenjian-2

[GitHub] carbondata issue #2971: [CARBONDATA-3219] Support range partition the input ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2971

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2164/

---

qiuchenjian-2

[GitHub] carbondata issue #2971: [CARBONDATA-3219] Support range partition the input ...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2971

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2377/

---

qiuchenjian-2

[GitHub] carbondata issue #2971: [CARBONDATA-3219] Support range partition the input ...

In reply to this post by qiuchenjian-2

qiuchenjian-2

[GitHub] carbondata issue #2971: [CARBONDATA-3219] Support range partition the input ...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2971

@QiangCai My question how the user can benefit if he chooses a different range column for each load. I feel range column should be at the table level not at the load level.
And regarding compaction, yes currently after compaction it becomes local sort but there is a way we can support range column compaction like how we do compaction for partitions. This work can be done in future. But if you allow the user to choose range column at each load level then this type of compaction cannot be done.

---

qiuchenjian-2

[GitHub] carbondata issue #2971: [CARBONDATA-3219] Support range partition the input ...

In reply to this post by qiuchenjian-2

Github user QiangCai commented on the issue:

https://github.com/apache/carbondata/pull/2971

@ravipesala
I agree with you to add it to the table properties.
Even if it becomes the table property, maybe the user also can change it. right?
Range_column is different from the partition table.
For range_column, the range boundaries are different for all segments. (Global_SORT also)
For the partition table, the range boundaries are the same for all segments.

---

qiuchenjian-2

[GitHub] carbondata issue #2971: [CARBONDATA-3219] Support range partition the input ...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2971

@QiangCai we should restrict changing that property from table properties.
I am just explaining about how we can do the compaction on range column since there are similarities with partitioning I mentioned it here.
I feel range boundaries can be recalculated during the compaction using min/max of range column and go for the merge sort.

---

qiuchenjian-2

[GitHub] carbondata issue #2971: [CARBONDATA-3219] Support range partition the input ...

In reply to this post by qiuchenjian-2

Github user QiangCai commented on the issue:

https://github.com/apache/carbondata/pull/2971

@ravipesala
In my opinion, it is unnecessary to restrict changing.
The users will keep the range_column as unchanged as possible.
So I only add this option into loading command.

---

qiuchenjian-2

[GitHub] carbondata issue #2971: [CARBONDATA-3219] Support range partition the input ...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2971

LGTM @QiangCai I feel it is better to keep in tableproprties as it is not supposed changed for each load. We can further discuss and raise another PR if needed, I am merging this now. Thanks for working on it.

---

qiuchenjian-2

[GitHub] carbondata pull request #2971: [CARBONDATA-3219] Support range partition the...

In reply to this post by qiuchenjian-2

Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/2971

---

12345