[GitHub] carbondata pull request #2942: [CARBONDATA-3121] Improvement of CarbonReader...

classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2942: [CARBONDATA-3121] Improvement of CarbonReader...

qiuchenjian-2
GitHub user NamanRastogi opened a pull request:

    https://github.com/apache/carbondata/pull/2942

    [CARBONDATA-3121] Improvement of CarbonReader build time

    CarbonReader builder is taking huge time.
   
    **Reason**
    Initialization of ChunkRowIterator is triggring actual I/O operation, and thus huge build time.
   
    **Solution**
    remove CarbonIterator.hasNext() from build.
   
   
     - [x] Any interfaces changed?
         No
     - [x] Any backward compatibility impacted?
           No
     - [x] Document update required?
            No
     - [x] Testing done
          Yes
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/NamanRastogi/carbondata build_improv

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2942.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2942
   
----
commit 721b9bb6a4d408ba6658ba55ff1fe431f4e2523a
Author: Naman Rastogi <naman.rastogi.52@...>
Date:   2018-11-22T08:27:50Z

    Improvement of CarbonRecord build time

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2942: [CARBONDATA-3121] Improvement of CarbonReader build ...

qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2942
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1511/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2942: [CARBONDATA-3121] Improvement of CarbonReader build ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2942
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1721/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2942: [CARBONDATA-3121] Improvement of CarbonReader build ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2942
 
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9769/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2942: [CARBONDATA-3121] Improvement of CarbonReader...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xubo245 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2942#discussion_r235835735
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/result/iterator/ChunkRowIterator.java ---
    @@ -52,17 +49,11 @@ public ChunkRowIterator(CarbonIterator<RowBatch> iterator) {
        * @return {@code true} if the iteration has more elements
        */
       @Override public boolean hasNext() {
    -    if (null != currentChunk) {
    -      if ((currentChunk.hasNext())) {
    -        return true;
    -      } else if (!currentChunk.hasNext()) {
    -        while (iterator.hasNext()) {
    -          currentChunk = iterator.next();
    -          if (currentChunk != null && currentChunk.hasNext()) {
    -            return true;
    -          }
    -        }
    -      }
    +    if (currentChunk != null && currentChunk.hasNext()) {
    --- End diff --
   
    Have you tested/compare the performance before/after this change?


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2942: [CARBONDATA-3121] Improvement of CarbonReader...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user NamanRastogi commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2942#discussion_r235892844
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/result/iterator/ChunkRowIterator.java ---
    @@ -52,17 +49,11 @@ public ChunkRowIterator(CarbonIterator<RowBatch> iterator) {
        * @return {@code true} if the iteration has more elements
        */
       @Override public boolean hasNext() {
    -    if (null != currentChunk) {
    -      if ((currentChunk.hasNext())) {
    -        return true;
    -      } else if (!currentChunk.hasNext()) {
    -        while (iterator.hasNext()) {
    -          currentChunk = iterator.next();
    -          if (currentChunk != null && currentChunk.hasNext()) {
    -            return true;
    -          }
    -        }
    -      }
    +    if (currentChunk != null && currentChunk.hasNext()) {
    --- End diff --
   
    The following tables are the build times of CarbonReader. The exact code is:
    ```scala
    import org.apache.carbondata.sdk.file._
   
    val startTime = System.currentTimeMillis
    val reader = CarbonReader
    .builder("hdfs://hacluster/carbonfiles", "t1")
    .withBatch(50000)
    .build
    println(s"Time to build: ${System.currentTimeMillis - startTime}")
    reader.close
   
    ```
   
    **Before**
   
    | DETAIL_QUERY_BATCH_SIZE | 4000 | 50000 |
    | -- | -- | -- |
    | With Vector Reader | 1870 ms | 1867 ms |
    | Without Vector Reader | 12961 ms | 28539 ms |
   
   
    **After**
   
    | DETAIL_QUERY_BATCH_SIZE | 4000 | 50000 |
    | -- | -- | -- |
    | With Vector Reader | 1238 ms | 1279 ms |
    | Without Vector Reader | 1878 ms | 1913 ms |


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2942: [CARBONDATA-3121] Improvement of CarbonReader...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xubo245 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2942#discussion_r235897415
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/result/iterator/ChunkRowIterator.java ---
    @@ -52,17 +49,11 @@ public ChunkRowIterator(CarbonIterator<RowBatch> iterator) {
        * @return {@code true} if the iteration has more elements
        */
       @Override public boolean hasNext() {
    -    if (null != currentChunk) {
    -      if ((currentChunk.hasNext())) {
    -        return true;
    -      } else if (!currentChunk.hasNext()) {
    -        while (iterator.hasNext()) {
    -          currentChunk = iterator.next();
    -          if (currentChunk != null && currentChunk.hasNext()) {
    -            return true;
    -          }
    -        }
    -      }
    +    if (currentChunk != null && currentChunk.hasNext()) {
    --- End diff --
   
    nice, can you put this in content of this PR?


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2942: [CARBONDATA-3121] Improvement of CarbonReader build ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2942
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1523/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2942: [CARBONDATA-3121] Improvement of CarbonReader...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user NamanRastogi commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2942#discussion_r235901284
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/result/iterator/ChunkRowIterator.java ---
    @@ -52,17 +49,11 @@ public ChunkRowIterator(CarbonIterator<RowBatch> iterator) {
        * @return {@code true} if the iteration has more elements
        */
       @Override public boolean hasNext() {
    -    if (null != currentChunk) {
    -      if ((currentChunk.hasNext())) {
    -        return true;
    -      } else if (!currentChunk.hasNext()) {
    -        while (iterator.hasNext()) {
    -          currentChunk = iterator.next();
    -          if (currentChunk != null && currentChunk.hasNext()) {
    -            return true;
    -          }
    -        }
    -      }
    +    if (currentChunk != null && currentChunk.hasNext()) {
    --- End diff --
   
    Done.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2942: [CARBONDATA-3121] Improvement of CarbonReader build ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2942
 
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9781/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2942: [CARBONDATA-3121] Improvement of CarbonReader build ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2942
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1733/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2942: [CARBONDATA-3121] Improvement of CarbonReader build ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on the issue:

    https://github.com/apache/carbondata/pull/2942
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2942: [CARBONDATA-3121] Improvement of CarbonReader build ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2942
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1539/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2942: [CARBONDATA-3121] Improvement of CarbonReader build ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2942
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1750/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2942: [CARBONDATA-3121] Improvement of CarbonReader build ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2942
 
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9798/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2942: [CARBONDATA-3121] Improvement of CarbonReader build ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user NamanRastogi commented on the issue:

    https://github.com/apache/carbondata/pull/2942
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2942: [CARBONDATA-3121] Improvement of CarbonReader build ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2942
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1540/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2942: [CARBONDATA-3121] Improvement of CarbonReader build ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2942
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1751/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2942: [CARBONDATA-3121] Improvement of CarbonReader build ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2942
 
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9799/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2942: [CARBONDATA-3121] Improvement of CarbonReader build ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kunal642 commented on the issue:

    https://github.com/apache/carbondata/pull/2942
 
    LGTM


---
12