[GitHub] carbondata pull request #2816: [CARBONDATA-300] Suppor read batch row in CSD...

classic Classic list List threaded Threaded
168 messages Options
1234 ... 9
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2816: [CARBONDATA-300] Suppor read batch row in CSD...

qiuchenjian-2
GitHub user xubo245 opened a pull request:

    https://github.com/apache/carbondata/pull/2816

     [CARBONDATA-300] Suppor read batch row in CSDK

     [CARBONDATA-300] Suppor read batch row in CSDK
        1. support read batch row in SDK
        2. support read batch row in CSDK
        3. add SDKReaderBenchmark IN SDK and testNextBatchRowPerformance in CSDK
        4. improve CSDK read performance
   
    This PR based on https://github.com/apache/carbondata/pull/2792 and cherry pick its commits. After PR2792 merged , this PR will remove its commits.
   
    For SDK batch read:
    readNextBatchRow:
   
    total lines is 200100000, build time is 10.434133262 s, total read time is 167.567157044 s, average speed is 1194148.0868321797records/s.
   
    readNextCarbonRow:
   
    total lines is 200100000, build time is 15.775965656 s, total read time is 183.312544655 s, average speed is 1091578.322567037records/s.
    read batch row is faster 9.4% than readCarbonRow( one by one)
   
    For CSDK:
   
    Test next Row Performance:
   
   
    build time is: 2.749129 s
   
    100000: time is 0.147732 s, speed is 676901.416078 records/s  [hidden email] [hidden email] from_to <5164240.1075855667637.JavaMail.evans@thyme> 1538015558000000 971703720000000
    200000: time is 0.320773 s, speed is 311746.936307 records/s  [hidden email] [hidden email] from_to <14154714.1075858633174.JavaMail.evans@thyme> 1538015558000000 1003768608000000
    300000: time is 0.138412 s, speed is 722480.709765 records/s  [hidden email] [hidden email] from_to <5977904.1075858636257.JavaMail.evans@thyme> 1538015558000000 1004057196000000
    400000: time is 0.381501 s, speed is 262122.510819 records/s  [hidden email] [hidden email] from_to <23732985.1075855665438.JavaMail.evans@thyme> 1538015558000000 976725540000000
    500000: time is 0.124684 s, speed is 802027.525585 records/s  [hidden email] [hidden email] from_to <31706076.1075858632278.JavaMail.evans@thyme> 1538015558000000 1003441879000000
    600000: time is 1.260054 s, speed is 79361.678150 records/s  [hidden email] [hidden email] from_to <14154714.1075858633174.JavaMail.evans@thyme> 1538015558000000 1003768608000000
    700000: time is 0.120333 s, speed is 831027.232762 records/s  from_email11347ryan.o'[hidden email] [hidden email] from_to <2047280.1075858635378.JavaMail.evans@thyme> 1538015558000000 1003974318000000
    800000: time is 0.424332 s, speed is 235664.526833 records/s  from_email11540ryan.o'[hidden email] [hidden email] from_to <2047280.1075858635378.JavaMail.evans@thyme> 1538015558000000 1003974318000000
    900000: time is 0.127125 s, speed is 786627.335300 records/s  [hidden email] [hidden email] from_to <14154714.1075858633174.JavaMail.evans@thyme> 1538015558000000 1003768608000000
    1000000: time is 0.135605 s, speed is 737435.935253 records/s  [hidden email] [hidden email] from_to <14154714.1075858633174.JavaMail.evans@thyme> 1538015558000000 1003768608000000
    1100000: time is 0.653121 s, speed is 153110.985560 records/s  [hidden email] [hidden email] from_to <12338129.1075855667248.JavaMail.evans@thyme> 1538015558000000 972994320000000
   
    readNextBatchRow
    log4j:WARN No appenders could be found for logger (org.apache.carbondata.core.util.CarbonProperties).
    log4j:WARN Please initialize the log4j system properly.
    log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
    build time is 10.434133262
    100000: time is 0.015234857 s, speed is 6563894.88920047records/s, hasNext time is 1.872E-5s [hidden email] [hidden email] from_to <5164240.1075855667637.JavaMail.evans@thyme> 1538015558000000 971703720000000
    200000: time is 1.381838498 s, speed is 72367.35707156423records/s, hasNext time is 1.373877655s [hidden email] [hidden email] from_to <14154714.1075858633174.JavaMail.evans@thyme> 1538015558000000 1003768608000000
    300000: time is 0.071597049 s, speed is 1396705.6100314974records/s, hasNext time is 0.068254875s [hidden email] [hidden email] from_to <5977904.1075858636257.JavaMail.evans@thyme> 1538015558000000 1004057196000000
    400000: time is 0.071777167 s, speed is 1393200.7096351406records/s, hasNext time is 0.069637177s [hidden email] [hidden email] from_to <23732985.1075855665438.JavaMail.evans@thyme> 1538015558000000 976725540000000
    500000: time is 0.227270961 s, speed is 440003.4195305752records/s, hasNext time is 0.225358746s [hidden email] [hidden email] from_to <31706076.1075858632278.JavaMail.evans@thyme> 1538015558000000 1003441879000000
    600000: time is 0.069326305 s, speed is 1442453.9141383634records/s, hasNext time is 0.06744768s [hidden email] [hidden email] from_to <14154714.1075858633174.JavaMail.evans@thyme> 1538015558000000 1003768608000000
    700000: time is 0.07079448 s, speed is 1412539.508730059records/s, hasNext time is 0.068803357s from_email11347ryan.o'[hidden email] [hidden email] from_to <2047280.1075858635378.JavaMail.evans@thyme> 1538015558000000 1003974318000000
    800000: time is 0.147471892 s, speed is 678095.3213782597records/s, hasNext time is 0.145297739s from_email11540ryan.o'[hidden email] [hidden email] from_to <2047280.1075858635378.JavaMail.evans@thyme> 1538015558000000 1003974318000000
    900000: time is 0.073139928 s, speed is 1367242.2537796318records/s, hasNext time is 0.070579908s [hidden email] [hidden email] from_to <14154714.1075858633174.JavaMail.evans@thyme> 1538015558000000 1003768608000000
    1000000: time is 0.073197467 s, speed is 1366167.493200277records/s, hasNext time is 0.071379687s [hidden email] [hidden email] from_to <14154714.1075858633174.JavaMail.evans@thyme> 1538015558000000 1003768608000000
    1100000: time is 0.141830179 s, speed is 705068.5594918412records/s, hasNext time is 0.140102684s [hidden email] [hidden email] from_to <12338129.1075855667248.JavaMail.evans@thyme> 1538015558000000 972994320000000
   
    Be sure to do all of the following checklist to help us incorporate
    your contribution quickly and easily:
   
     - [ ] Any interfaces changed?
     
     - [ ] Any backward compatibility impacted?
     
     - [ ] Document update required?
   
     - [ ] Testing done
            Please provide details on
            - Whether new unit test cases have been added or why no new tests are required?
            - How it is tested? Please attach test report.
            - Is it a performance related change? Please attach the performance test report.
            - Any additional information to help reviewers in testing this change.
           
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/xubo245/carbondata CARBONDATA-3003_supportBatchRow

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2816.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2816
   
----
commit 425e76333ddb0991799f2fc7c0c028a18aca58b5
Author: xubo245 <xubo29@...>
Date:   2018-10-09T09:58:48Z

     [CARBONDATA-2981] Support read primitive data type in CSDK
   
                1.support readNextCarbonRow
                2.support read different primitive data type in c code from java side: int double short long string
                3.support some data type and convert: date timestamp varchar decimal array<T>
                4.remove readNextStringRow
   
    remove the file after finished run
   
    change the file

commit 4fc5ce599ada4875337c88f5eb8d217a8ae73ddd
Author: xubo245 <xubo29@...>
Date:   2018-10-11T02:12:20Z

    remove timestamp check

commit d74ce01c499b5b031d8123ea4dfc0cd90e56a2e8
Author: xubo245 <xubo29@...>
Date:   2018-10-16T03:02:07Z

    [CARBONDATA-300] Suppor read batch row in CSDK
    1. support read batch row in SDK
    2. support read batch row in CSDK
    3. add SDKReaderBenchmark IN SDK and testNextBatchRowPerformance in CSDK
    4. improve CSDK read performance

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2816: [CARBONDATA-300] Suppor read batch row in CSDK

qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2816
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/794/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2816: [CARBONDATA-300] Suppor read batch row in CSDK

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2816
 
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9059/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2816: [CARBONDATA-300] Suppor read batch row in CSDK

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2816
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/991/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2816: [CARBONDATA-300] Suppor read batch row in CSDK

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2816
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/799/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2816: [CARBONDATA-300] Suppor read batch row in CSDK

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2816
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/801/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2816: [CARBONDATA-300] Suppor read batch row in CSDK

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2816
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/999/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2816: [CARBONDATA-300] Suppor read batch row in CSDK

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2816
 
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9067/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2816: [CARBONDATA-300] Suppor read batch row in CSDK

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2816
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/808/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2816: [CARBONDATA-300] Suppor read batch row in CSDK

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2816
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1005/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2816: [CARBONDATA-300] Suppor read batch row in CSDK

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2816
 
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9073/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2816: [CARBONDATA-300] Suppor read batch row in CSDK

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2816
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/824/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2816
 
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9089/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2816
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1021/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xubo245 commented on the issue:

    https://github.com/apache/carbondata/pull/2816
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2816
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/830/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2816
 
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9095/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2816
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1027/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2816#discussion_r226558824
 
    --- Diff: examples/spark2/src/main/java/org/apache/carbondata/benchmark/SDKReaderBenchmark.java ---
    @@ -0,0 +1,261 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.carbondata.benchmark;
    +
    +import java.io.File;
    +import java.io.FilenameFilter;
    +import java.io.IOException;
    +import java.sql.Timestamp;
    +import java.util.HashMap;
    +import java.util.Map;
    +import java.util.Random;
    +
    +import org.apache.hadoop.conf.Configuration;
    +
    +import org.apache.carbondata.common.exceptions.sql.InvalidLoadOptionException;
    +import org.apache.carbondata.core.constants.CarbonCommonConstants;
    +import org.apache.carbondata.core.util.CarbonProperties;
    +import org.apache.carbondata.sdk.file.*;
    +
    +/**
    + * Test SDK read performance
    + */
    +public class SDKReaderBenchmark {
    --- End diff --
   
    1. It seems this class is not only for reader but also for writer. If it is so, please optimize the class name; If it is not, I think you can include them in one class.
    2. For a benchmark, I don't see how the data is generated.



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xubo245 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2816#discussion_r226565398
 
    --- Diff: examples/spark2/src/main/java/org/apache/carbondata/benchmark/SDKReaderBenchmark.java ---
    @@ -0,0 +1,261 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.carbondata.benchmark;
    +
    +import java.io.File;
    +import java.io.FilenameFilter;
    +import java.io.IOException;
    +import java.sql.Timestamp;
    +import java.util.HashMap;
    +import java.util.Map;
    +import java.util.Random;
    +
    +import org.apache.hadoop.conf.Configuration;
    +
    +import org.apache.carbondata.common.exceptions.sql.InvalidLoadOptionException;
    +import org.apache.carbondata.core.constants.CarbonCommonConstants;
    +import org.apache.carbondata.core.util.CarbonProperties;
    +import org.apache.carbondata.sdk.file.*;
    +
    +/**
    + * Test SDK read performance
    + */
    +public class SDKReaderBenchmark {
    --- End diff --
   
    1. write code is for read.
    This PR didn't test write performance.
    I will write SDKWriterBenchmark  for writing date after CSDK support write carbondata
   
    2. The data is from could stream, and this PR will enlarge the data.


---
1234 ... 9