GitHub user xuchuanyin opened a pull request:
https://github.com/apache/carbondata/pull/1594 [CARBONDATA-1838] Refactor SortStepRowUtil to make it more readable Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [X] Any interfaces changed? `YES. REFACTORED convertRow INTERFACE IN SortStepRowUtil` - [X] Any backward compatibility impacted? `NO` - [X] Document update required? `NO` - [X] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? `NO, ONLY REFACTOR THE CODE, DIDN'T CHANGE THE FUNCTION` - How it is tested? Please attach test report. `TESTED IN LOCAL CLUSTER WITH 3 NODES, DATA LOADING AND QUERYING IS OK` - Is it a performance related change? Please attach the performance test report. `NO` - Any additional information to help reviewers in testing this change. `NO` - [X] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. `NOT RELATED` NOTE === Refactor and optimize `SortRowStepUtil` to make it efficient and more readable. 1. Firstly we get all the indices for the 3 groups: dictionary columns, non dictionary dimension columns and measures; 2. Then for each group, just iterate the source row and copy data to each group without any if-else branch. You can merge this pull request into a Git repository by running: $ git pull https://github.com/xuchuanyin/carbondata opt_sort_step_row_util Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1594.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1594 ---- commit 9eb8f88da71c5c0e9196ace9cd1fd150987f14af Author: xuchuanyin <[hidden email]> Date: 2017-11-29T12:40:44Z refactor sort step row ---- --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1594 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1580/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1594 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1981/ --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/1594 @jackylk can you review this? --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1594#discussion_r154667973 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/load/DataLoadProcessorStepOnSpark.scala --- @@ -138,7 +138,7 @@ object DataLoadProcessorStepOnSpark { override def next(): CarbonRow = { val row = - new CarbonRow(SortStepRowUtil.convertRow(rows.next().getData, sortParameters)) + new CarbonRow(sortStepRowUtil.convertRow(rows.next().getData)) --- End diff -- Maybe `SortStepRowUtil` is not a good name anymore, it is some kind of converter --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1594#discussion_r154668281 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/load/DataLoadProcessorStepOnSpark.scala --- @@ -128,7 +128,7 @@ object DataLoadProcessorStepOnSpark { val model: CarbonLoadModel = modelBroadcast.value.getCopyWithTaskNo(index.toString) val conf = DataLoadProcessBuilder.createConfiguration(model) val sortParameters = SortParameters.createSortParameters(conf) - + val sortStepRowUtil = new SortStepRowUtil(sortParameters) --- End diff -- Do we need to create this object everytime? --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1594#discussion_r154824658 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/load/DataLoadProcessorStepOnSpark.scala --- @@ -128,7 +128,7 @@ object DataLoadProcessorStepOnSpark { val model: CarbonLoadModel = modelBroadcast.value.getCopyWithTaskNo(index.toString) val conf = DataLoadProcessBuilder.createConfiguration(model) val sortParameters = SortParameters.createSortParameters(conf) - + val sortStepRowUtil = new SortStepRowUtil(sortParameters) --- End diff -- @jackylk This object will be created for each partition not for each record. Do you prefer to generate it in each partition OR broadcast it? --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1594#discussion_r154825080 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/load/DataLoadProcessorStepOnSpark.scala --- @@ -138,7 +138,7 @@ object DataLoadProcessorStepOnSpark { override def next(): CarbonRow = { val row = - new CarbonRow(SortStepRowUtil.convertRow(rows.next().getData, sortParameters)) + new CarbonRow(sortStepRowUtil.convertRow(rows.next().getData)) --- End diff -- @jackylk yeah, actually in the further PRs (After #1606 ), I plan to rename it to `SortStepRowConverter`. This converter can also be used to directly write records to sort temp file. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1594#discussion_r155002317 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/load/DataLoadProcessorStepOnSpark.scala --- @@ -128,7 +128,7 @@ object DataLoadProcessorStepOnSpark { val model: CarbonLoadModel = modelBroadcast.value.getCopyWithTaskNo(index.toString) val conf = DataLoadProcessBuilder.createConfiguration(model) val sortParameters = SortParameters.createSortParameters(conf) - + val sortStepRowUtil = new SortStepRowUtil(sortParameters) --- End diff -- I think it is ok to create it for each partition --- |
In reply to this post by qiuchenjian-2
|
In reply to this post by qiuchenjian-2
|
Free forum by Nabble | Edit this page |