GitHub user NamanRastogi opened a pull request:
https://github.com/apache/carbondata/pull/2942 [CARBONDATA-3121] Improvement of CarbonReader build time CarbonReader builder is taking huge time. **Reason** Initialization of ChunkRowIterator is triggring actual I/O operation, and thus huge build time. **Solution** remove CarbonIterator.hasNext() from build. - [x] Any interfaces changed? No - [x] Any backward compatibility impacted? No - [x] Document update required? No - [x] Testing done Yes - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/NamanRastogi/carbondata build_improv Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2942.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2942 ---- commit 721b9bb6a4d408ba6658ba55ff1fe431f4e2523a Author: Naman Rastogi <naman.rastogi.52@...> Date: 2018-11-22T08:27:50Z Improvement of CarbonRecord build time ---- --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2942 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1511/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2942 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1721/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2942 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9769/ --- |
In reply to this post by qiuchenjian-2
Github user xubo245 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2942#discussion_r235835735 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/result/iterator/ChunkRowIterator.java --- @@ -52,17 +49,11 @@ public ChunkRowIterator(CarbonIterator<RowBatch> iterator) { * @return {@code true} if the iteration has more elements */ @Override public boolean hasNext() { - if (null != currentChunk) { - if ((currentChunk.hasNext())) { - return true; - } else if (!currentChunk.hasNext()) { - while (iterator.hasNext()) { - currentChunk = iterator.next(); - if (currentChunk != null && currentChunk.hasNext()) { - return true; - } - } - } + if (currentChunk != null && currentChunk.hasNext()) { --- End diff -- Have you tested/compare the performance before/after this change? --- |
In reply to this post by qiuchenjian-2
Github user NamanRastogi commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2942#discussion_r235892844 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/result/iterator/ChunkRowIterator.java --- @@ -52,17 +49,11 @@ public ChunkRowIterator(CarbonIterator<RowBatch> iterator) { * @return {@code true} if the iteration has more elements */ @Override public boolean hasNext() { - if (null != currentChunk) { - if ((currentChunk.hasNext())) { - return true; - } else if (!currentChunk.hasNext()) { - while (iterator.hasNext()) { - currentChunk = iterator.next(); - if (currentChunk != null && currentChunk.hasNext()) { - return true; - } - } - } + if (currentChunk != null && currentChunk.hasNext()) { --- End diff -- The following tables are the build times of CarbonReader. The exact code is: ```scala import org.apache.carbondata.sdk.file._ val startTime = System.currentTimeMillis val reader = CarbonReader .builder("hdfs://hacluster/carbonfiles", "t1") .withBatch(50000) .build println(s"Time to build: ${System.currentTimeMillis - startTime}") reader.close ``` **Before** | DETAIL_QUERY_BATCH_SIZE | 4000 | 50000 | | -- | -- | -- | | With Vector Reader | 1870 ms | 1867 ms | | Without Vector Reader | 12961 ms | 28539 ms | **After** | DETAIL_QUERY_BATCH_SIZE | 4000 | 50000 | | -- | -- | -- | | With Vector Reader | 1238 ms | 1279 ms | | Without Vector Reader | 1878 ms | 1913 ms | --- |
In reply to this post by qiuchenjian-2
Github user xubo245 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2942#discussion_r235897415 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/result/iterator/ChunkRowIterator.java --- @@ -52,17 +49,11 @@ public ChunkRowIterator(CarbonIterator<RowBatch> iterator) { * @return {@code true} if the iteration has more elements */ @Override public boolean hasNext() { - if (null != currentChunk) { - if ((currentChunk.hasNext())) { - return true; - } else if (!currentChunk.hasNext()) { - while (iterator.hasNext()) { - currentChunk = iterator.next(); - if (currentChunk != null && currentChunk.hasNext()) { - return true; - } - } - } + if (currentChunk != null && currentChunk.hasNext()) { --- End diff -- nice, can you put this in content of this PR? --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2942 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1523/ --- |
In reply to this post by qiuchenjian-2
Github user NamanRastogi commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2942#discussion_r235901284 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/result/iterator/ChunkRowIterator.java --- @@ -52,17 +49,11 @@ public ChunkRowIterator(CarbonIterator<RowBatch> iterator) { * @return {@code true} if the iteration has more elements */ @Override public boolean hasNext() { - if (null != currentChunk) { - if ((currentChunk.hasNext())) { - return true; - } else if (!currentChunk.hasNext()) { - while (iterator.hasNext()) { - currentChunk = iterator.next(); - if (currentChunk != null && currentChunk.hasNext()) { - return true; - } - } - } + if (currentChunk != null && currentChunk.hasNext()) { --- End diff -- Done. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2942 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9781/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2942 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1733/ --- |
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on the issue:
https://github.com/apache/carbondata/pull/2942 retest this please --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2942 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1539/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2942 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1750/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2942 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9798/ --- |
In reply to this post by qiuchenjian-2
Github user NamanRastogi commented on the issue:
https://github.com/apache/carbondata/pull/2942 retest this please --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2942 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1540/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2942 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1751/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2942 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9799/ --- |
In reply to this post by qiuchenjian-2
Github user kunal642 commented on the issue:
https://github.com/apache/carbondata/pull/2942 LGTM --- |
Free forum by Nabble | Edit this page |