GitHub user xubo245 opened a pull request:
https://github.com/apache/carbondata/pull/2816 [CARBONDATA-300] Suppor read batch row in CSDK [CARBONDATA-300] Suppor read batch row in CSDK 1. support read batch row in SDK 2. support read batch row in CSDK 3. add SDKReaderBenchmark IN SDK and testNextBatchRowPerformance in CSDK 4. improve CSDK read performance This PR based on https://github.com/apache/carbondata/pull/2792 and cherry pick its commits. After PR2792 merged , this PR will remove its commits. For SDK batch read: readNextBatchRow: total lines is 200100000, build time is 10.434133262 s, total read time is 167.567157044 s, average speed is 1194148.0868321797records/s. readNextCarbonRow: total lines is 200100000, build time is 15.775965656 s, total read time is 183.312544655 s, average speed is 1091578.322567037records/s. read batch row is faster 9.4% than readCarbonRow( one by one) For CSDK: Test next Row Performance: build time is: 2.749129 s 100000: time is 0.147732 s, speed is 676901.416078 records/s [hidden email] [hidden email] from_to <5164240.1075855667637.JavaMail.evans@thyme> 1538015558000000 971703720000000 200000: time is 0.320773 s, speed is 311746.936307 records/s [hidden email] [hidden email] from_to <14154714.1075858633174.JavaMail.evans@thyme> 1538015558000000 1003768608000000 300000: time is 0.138412 s, speed is 722480.709765 records/s [hidden email] [hidden email] from_to <5977904.1075858636257.JavaMail.evans@thyme> 1538015558000000 1004057196000000 400000: time is 0.381501 s, speed is 262122.510819 records/s [hidden email] [hidden email] from_to <23732985.1075855665438.JavaMail.evans@thyme> 1538015558000000 976725540000000 500000: time is 0.124684 s, speed is 802027.525585 records/s [hidden email] [hidden email] from_to <31706076.1075858632278.JavaMail.evans@thyme> 1538015558000000 1003441879000000 600000: time is 1.260054 s, speed is 79361.678150 records/s [hidden email] [hidden email] from_to <14154714.1075858633174.JavaMail.evans@thyme> 1538015558000000 1003768608000000 700000: time is 0.120333 s, speed is 831027.232762 records/s from_email11347ryan.o'[hidden email] [hidden email] from_to <2047280.1075858635378.JavaMail.evans@thyme> 1538015558000000 1003974318000000 800000: time is 0.424332 s, speed is 235664.526833 records/s from_email11540ryan.o'[hidden email] [hidden email] from_to <2047280.1075858635378.JavaMail.evans@thyme> 1538015558000000 1003974318000000 900000: time is 0.127125 s, speed is 786627.335300 records/s [hidden email] [hidden email] from_to <14154714.1075858633174.JavaMail.evans@thyme> 1538015558000000 1003768608000000 1000000: time is 0.135605 s, speed is 737435.935253 records/s [hidden email] [hidden email] from_to <14154714.1075858633174.JavaMail.evans@thyme> 1538015558000000 1003768608000000 1100000: time is 0.653121 s, speed is 153110.985560 records/s [hidden email] [hidden email] from_to <12338129.1075855667248.JavaMail.evans@thyme> 1538015558000000 972994320000000 readNextBatchRow log4j:WARN No appenders could be found for logger (org.apache.carbondata.core.util.CarbonProperties). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. build time is 10.434133262 100000: time is 0.015234857 s, speed is 6563894.88920047records/s, hasNext time is 1.872E-5s [hidden email] [hidden email] from_to <5164240.1075855667637.JavaMail.evans@thyme> 1538015558000000 971703720000000 200000: time is 1.381838498 s, speed is 72367.35707156423records/s, hasNext time is 1.373877655s [hidden email] [hidden email] from_to <14154714.1075858633174.JavaMail.evans@thyme> 1538015558000000 1003768608000000 300000: time is 0.071597049 s, speed is 1396705.6100314974records/s, hasNext time is 0.068254875s [hidden email] [hidden email] from_to <5977904.1075858636257.JavaMail.evans@thyme> 1538015558000000 1004057196000000 400000: time is 0.071777167 s, speed is 1393200.7096351406records/s, hasNext time is 0.069637177s [hidden email] [hidden email] from_to <23732985.1075855665438.JavaMail.evans@thyme> 1538015558000000 976725540000000 500000: time is 0.227270961 s, speed is 440003.4195305752records/s, hasNext time is 0.225358746s [hidden email] [hidden email] from_to <31706076.1075858632278.JavaMail.evans@thyme> 1538015558000000 1003441879000000 600000: time is 0.069326305 s, speed is 1442453.9141383634records/s, hasNext time is 0.06744768s [hidden email] [hidden email] from_to <14154714.1075858633174.JavaMail.evans@thyme> 1538015558000000 1003768608000000 700000: time is 0.07079448 s, speed is 1412539.508730059records/s, hasNext time is 0.068803357s from_email11347ryan.o'[hidden email] [hidden email] from_to <2047280.1075858635378.JavaMail.evans@thyme> 1538015558000000 1003974318000000 800000: time is 0.147471892 s, speed is 678095.3213782597records/s, hasNext time is 0.145297739s from_email11540ryan.o'[hidden email] [hidden email] from_to <2047280.1075858635378.JavaMail.evans@thyme> 1538015558000000 1003974318000000 900000: time is 0.073139928 s, speed is 1367242.2537796318records/s, hasNext time is 0.070579908s [hidden email] [hidden email] from_to <14154714.1075858633174.JavaMail.evans@thyme> 1538015558000000 1003768608000000 1000000: time is 0.073197467 s, speed is 1366167.493200277records/s, hasNext time is 0.071379687s [hidden email] [hidden email] from_to <14154714.1075858633174.JavaMail.evans@thyme> 1538015558000000 1003768608000000 1100000: time is 0.141830179 s, speed is 705068.5594918412records/s, hasNext time is 0.140102684s [hidden email] [hidden email] from_to <12338129.1075855667248.JavaMail.evans@thyme> 1538015558000000 972994320000000 Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/xubo245/carbondata CARBONDATA-3003_supportBatchRow Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2816.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2816 ---- commit 425e76333ddb0991799f2fc7c0c028a18aca58b5 Author: xubo245 <xubo29@...> Date: 2018-10-09T09:58:48Z [CARBONDATA-2981] Support read primitive data type in CSDK 1.support readNextCarbonRow 2.support read different primitive data type in c code from java side: int double short long string 3.support some data type and convert: date timestamp varchar decimal array<T> 4.remove readNextStringRow remove the file after finished run change the file commit 4fc5ce599ada4875337c88f5eb8d217a8ae73ddd Author: xubo245 <xubo29@...> Date: 2018-10-11T02:12:20Z remove timestamp check commit d74ce01c499b5b031d8123ea4dfc0cd90e56a2e8 Author: xubo245 <xubo29@...> Date: 2018-10-16T03:02:07Z [CARBONDATA-300] Suppor read batch row in CSDK 1. support read batch row in SDK 2. support read batch row in CSDK 3. add SDKReaderBenchmark IN SDK and testNextBatchRowPerformance in CSDK 4. improve CSDK read performance ---- --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2816 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/794/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2816 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9059/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2816 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/991/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2816 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/799/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2816 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/801/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2816 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/999/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2816 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9067/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2816 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/808/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2816 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1005/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2816 Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9073/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2816 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/824/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2816 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9089/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2816 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1021/ --- |
In reply to this post by qiuchenjian-2
Github user xubo245 commented on the issue:
https://github.com/apache/carbondata/pull/2816 retest this please --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2816 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/830/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2816 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9095/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2816 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1027/ --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2816#discussion_r226558824 --- Diff: examples/spark2/src/main/java/org/apache/carbondata/benchmark/SDKReaderBenchmark.java --- @@ -0,0 +1,261 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.benchmark; + +import java.io.File; +import java.io.FilenameFilter; +import java.io.IOException; +import java.sql.Timestamp; +import java.util.HashMap; +import java.util.Map; +import java.util.Random; + +import org.apache.hadoop.conf.Configuration; + +import org.apache.carbondata.common.exceptions.sql.InvalidLoadOptionException; +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.util.CarbonProperties; +import org.apache.carbondata.sdk.file.*; + +/** + * Test SDK read performance + */ +public class SDKReaderBenchmark { --- End diff -- 1. It seems this class is not only for reader but also for writer. If it is so, please optimize the class name; If it is not, I think you can include them in one class. 2. For a benchmark, I don't see how the data is generated. --- |
In reply to this post by qiuchenjian-2
Github user xubo245 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2816#discussion_r226565398 --- Diff: examples/spark2/src/main/java/org/apache/carbondata/benchmark/SDKReaderBenchmark.java --- @@ -0,0 +1,261 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.benchmark; + +import java.io.File; +import java.io.FilenameFilter; +import java.io.IOException; +import java.sql.Timestamp; +import java.util.HashMap; +import java.util.Map; +import java.util.Random; + +import org.apache.hadoop.conf.Configuration; + +import org.apache.carbondata.common.exceptions.sql.InvalidLoadOptionException; +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.util.CarbonProperties; +import org.apache.carbondata.sdk.file.*; + +/** + * Test SDK read performance + */ +public class SDKReaderBenchmark { --- End diff -- 1. write code is for read. This PR didn't test write performance. I will write SDKWriterBenchmark for writing date after CSDK support write carbondata 2. The data is from could stream, and this PR will enlarge the data. --- |
Free forum by Nabble | Edit this page |