GitHub user ravipesala opened a pull request:
https://github.com/apache/incubator-carbondata/pull/412 [WIP]Added vector reader in Carbon scan. This PR enables carbon to read the data in vector columnar format. New interface classes added `CarbonColumnarBatch` and `CarbonColumnVector` to read data in vector format directly from scanner. In case of Spark2.0 batch reader we can directly pass wrapper class of 'org.apache.spark.sql.execution.vectorized.ColumnarBatch' to carbon and set the data to it. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ravipesala/incubator-carbondata vectorreader Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/412.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #412 ---- commit 1bdf725452092b23fc50b4c8cb513541b430fae2 Author: ravipesala <[hidden email]> Date: 2016-12-06T17:54:05Z add initial check in for vector reader commit 6bce6a80d09a7853e3fb883a320368471d5e739a Author: ravipesala <[hidden email]> Date: 2016-12-08T10:39:44Z Added vector reader in carbon commit c52f8644160a1ac7ad2531e80f3154ca893a03ad Author: ravipesala <[hidden email]> Date: 2016-12-08T10:43:34Z Fixed check style ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/412 Build Success with Spark 1.5.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/71/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/412#discussion_r91829957 --- Diff: core/src/main/java/org/apache/carbondata/scan/executor/impl/DetailQueryExecutor.java --- @@ -36,8 +37,13 @@ @Override public CarbonIterator<Object[]> execute(QueryModel queryModel) throws QueryExecutionException { List<BlockExecutionInfo> blockExecutionInfoList = getBlockExecutionInfos(queryModel); - return new DetailQueryResultIterator(blockExecutionInfoList, queryModel, - queryProperties.executorService); + if (queryModel.isVectorReader()) { --- End diff -- can we create different implementation of AbstractQueryExecutor and new a different one in CarbonRecordReader? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/412#discussion_r91830068 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonRecordReader.java --- @@ -78,7 +80,13 @@ public void initialize(InputSplit inputSplit, TaskAttemptContext context) readSupport.initialize(queryModel.getProjectionColumns(), queryModel.getAbsoluteTableIdentifier()); try { - carbonIterator = new ChunkRowIterator(queryExecutor.execute(queryModel)); + if (queryModel.isVectorReader()) { + carbonIterator = new VectorChunkRowIterator( --- End diff -- use factory to create it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/412#discussion_r91830109 --- Diff: core/src/main/java/org/apache/carbondata/scan/result/vector/CarbonColumnarBatch.java --- @@ -0,0 +1,45 @@ +package org.apache.carbondata.scan.result.vector; --- End diff -- missing file header --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/412#discussion_r91830114 --- Diff: core/src/main/java/org/apache/carbondata/scan/result/vector/CarbonColumnarBatch.java --- @@ -0,0 +1,45 @@ +package org.apache.carbondata.scan.result.vector; + +public class CarbonColumnarBatch { --- End diff -- I think it is better to use name `ColumnarBatch` directly --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/412#discussion_r91830120 --- Diff: core/src/main/java/org/apache/carbondata/scan/result/vector/CarbonColumnVector.java --- @@ -0,0 +1,29 @@ +package org.apache.carbondata.scan.result.vector; --- End diff -- missing file header --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/412#discussion_r91830121 --- Diff: core/src/main/java/org/apache/carbondata/scan/result/vector/CarbonColumnVector.java --- @@ -0,0 +1,29 @@ +package org.apache.carbondata.scan.result.vector; + +import org.apache.spark.sql.types.Decimal; + +public interface CarbonColumnVector { --- End diff -- name it `ColumnVector` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/412#discussion_r91830226 --- Diff: core/src/main/java/org/apache/carbondata/scan/result/vector/CarbonColumnVector.java --- @@ -0,0 +1,29 @@ +package org.apache.carbondata.scan.result.vector; + +import org.apache.spark.sql.types.Decimal; + +public interface CarbonColumnVector { --- End diff -- Why is it an interface? Are you considering make it offheap in the future? If not, I think use class directly is enough --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/412#discussion_r91830271 --- Diff: core/src/main/java/org/apache/carbondata/scan/result/vector/CarbonColumnVector.java --- @@ -0,0 +1,29 @@ +package org.apache.carbondata.scan.result.vector; + +import org.apache.spark.sql.types.Decimal; --- End diff -- can we not introduce spark dependency in this interface and implementation? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/412#discussion_r91830305 --- Diff: core/src/main/java/org/apache/carbondata/scan/result/vector/CarbonColumnVector.java --- @@ -0,0 +1,29 @@ +package org.apache.carbondata.scan.result.vector; + +import org.apache.spark.sql.types.Decimal; + +public interface CarbonColumnVector { + + public void putShort(int rowId, short value); + + public void putInt(int rowId, int value); + + public void putLong(int rowId, long value); + + public void putDecimal(int rowId, Decimal value, int precision); + + public void putDouble(int rowId, double value); + + public void putBytes(int rowId, byte[] value); + + public void putBytes(int rowId, int offset, int length, byte[] value); --- End diff -- This interface is never used --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/412#discussion_r91835389 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonRecordReader.java --- @@ -78,7 +80,13 @@ public void initialize(InputSplit inputSplit, TaskAttemptContext context) readSupport.initialize(queryModel.getProjectionColumns(), queryModel.getAbsoluteTableIdentifier()); try { - carbonIterator = new ChunkRowIterator(queryExecutor.execute(queryModel)); + if (queryModel.isVectorReader()) { + carbonIterator = new VectorChunkRowIterator( --- End diff -- I moved out this logic out of the class. And new `VectorizedCarbonRecordReader` is created. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/412#discussion_r91835397 --- Diff: core/src/main/java/org/apache/carbondata/scan/result/vector/CarbonColumnarBatch.java --- @@ -0,0 +1,45 @@ +package org.apache.carbondata.scan.result.vector; --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/412#discussion_r91835416 --- Diff: core/src/main/java/org/apache/carbondata/scan/result/vector/CarbonColumnarBatch.java --- @@ -0,0 +1,45 @@ +package org.apache.carbondata.scan.result.vector; + +public class CarbonColumnarBatch { --- End diff -- In spark the name is aready `ColumnarBatch` so I just used this name to avoid confusion --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/412#discussion_r91835427 --- Diff: core/src/main/java/org/apache/carbondata/scan/result/vector/CarbonColumnVector.java --- @@ -0,0 +1,29 @@ +package org.apache.carbondata.scan.result.vector; --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/412 Build Success with Spark 1.5.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/86/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/412 Build Success with Spark 1.5.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/87/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/412#discussion_r91839108 --- Diff: core/src/main/java/org/apache/carbondata/scan/result/vector/CarbonColumnVector.java --- @@ -0,0 +1,29 @@ +package org.apache.carbondata.scan.result.vector; + +import org.apache.spark.sql.types.Decimal; + +public interface CarbonColumnVector { + + public void putShort(int rowId, short value); + + public void putInt(int rowId, int value); + + public void putLong(int rowId, long value); + + public void putDecimal(int rowId, Decimal value, int precision); + + public void putDouble(int rowId, double value); + + public void putBytes(int rowId, byte[] value); + + public void putBytes(int rowId, int offset, int length, byte[] value); --- End diff -- `ColumnarVectorWrapper` is the implementation class for this interface --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on the issue:
https://github.com/apache/incubator-carbondata/pull/412 please rebase --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/412 Build Success with Spark 1.5.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/101/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
Free forum by Nabble | Edit this page |