Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] incubator-carbondata pull request #412: [WIP]Added vector reader in Carbon s...

Classic

List

53 messages Options

Options

123

[GitHub] incubator-carbondata pull request #412: [WIP]Added vector reader in Carbon s...

GitHub user ravipesala opened a pull request:

https://github.com/apache/incubator-carbondata/pull/412

[WIP]Added vector reader in Carbon scan.

This PR enables carbon to read the data in vector columnar format.
New interface classes added `CarbonColumnarBatch` and `CarbonColumnVector` to read data in vector format directly from scanner.
In case of Spark2.0 batch reader we can directly pass wrapper class of 'org.apache.spark.sql.execution.vectorized.ColumnarBatch' to carbon and set the data to it.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ravipesala/incubator-carbondata vectorreader

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/412.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #412

----
commit 1bdf725452092b23fc50b4c8cb513541b430fae2
Author: ravipesala <[hidden email]>
Date: 2016-12-06T17:54:05Z

add initial check in for vector reader

commit 6bce6a80d09a7853e3fb883a320368471d5e739a
Author: ravipesala <[hidden email]>
Date: 2016-12-08T10:39:44Z

Added vector reader in carbon

commit c52f8644160a1ac7ad2531e80f3154ca893a03ad
Author: ravipesala <[hidden email]>
Date: 2016-12-08T10:43:34Z

Fixed check style

----

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #412: [WIP]Added vector reader in Carbon scan.

Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/412

Build Success with Spark 1.5.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/71/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [WIP]Added vector reader in Carbon s...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/412#discussion_r91829957

--- Diff: core/src/main/java/org/apache/carbondata/scan/executor/impl/DetailQueryExecutor.java ---
@@ -36,8 +37,13 @@
@Override public CarbonIterator<Object[]> execute(QueryModel queryModel)
throws QueryExecutionException {
List<BlockExecutionInfo> blockExecutionInfoList = getBlockExecutionInfos(queryModel);
- return new DetailQueryResultIterator(blockExecutionInfoList, queryModel,
- queryProperties.executorService);
+ if (queryModel.isVectorReader()) {
--- End diff --

can we create different implementation of AbstractQueryExecutor and new a different one in CarbonRecordReader?

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [WIP]Added vector reader in Carbon s...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/412#discussion_r91830068

--- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonRecordReader.java ---
@@ -78,7 +80,13 @@ public void initialize(InputSplit inputSplit, TaskAttemptContext context)
readSupport.initialize(queryModel.getProjectionColumns(),
queryModel.getAbsoluteTableIdentifier());
try {
- carbonIterator = new ChunkRowIterator(queryExecutor.execute(queryModel));
+ if (queryModel.isVectorReader()) {
+ carbonIterator = new VectorChunkRowIterator(
--- End diff --

use factory to create it

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [WIP]Added vector reader in Carbon s...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/412#discussion_r91830109

--- Diff: core/src/main/java/org/apache/carbondata/scan/result/vector/CarbonColumnarBatch.java ---
@@ -0,0 +1,45 @@
+package org.apache.carbondata.scan.result.vector;
--- End diff --

missing file header

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [WIP]Added vector reader in Carbon s...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/412#discussion_r91830114

--- Diff: core/src/main/java/org/apache/carbondata/scan/result/vector/CarbonColumnarBatch.java ---
@@ -0,0 +1,45 @@
+package org.apache.carbondata.scan.result.vector;
+
+public class CarbonColumnarBatch {
--- End diff --

I think it is better to use name `ColumnarBatch` directly

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [WIP]Added vector reader in Carbon s...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/412#discussion_r91830120

--- Diff: core/src/main/java/org/apache/carbondata/scan/result/vector/CarbonColumnVector.java ---
@@ -0,0 +1,29 @@
+package org.apache.carbondata.scan.result.vector;
--- End diff --

missing file header

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [WIP]Added vector reader in Carbon s...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/412#discussion_r91830121

--- Diff: core/src/main/java/org/apache/carbondata/scan/result/vector/CarbonColumnVector.java ---
@@ -0,0 +1,29 @@
+package org.apache.carbondata.scan.result.vector;
+
+import org.apache.spark.sql.types.Decimal;
+
+public interface CarbonColumnVector {
--- End diff --

name it `ColumnVector`

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [WIP]Added vector reader in Carbon s...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/412#discussion_r91830226

--- Diff: core/src/main/java/org/apache/carbondata/scan/result/vector/CarbonColumnVector.java ---
@@ -0,0 +1,29 @@
+package org.apache.carbondata.scan.result.vector;
+
+import org.apache.spark.sql.types.Decimal;
+
+public interface CarbonColumnVector {
--- End diff --

Why is it an interface? Are you considering make it offheap in the future? If not, I think use class directly is enough

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [WIP]Added vector reader in Carbon s...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/412#discussion_r91830271

--- Diff: core/src/main/java/org/apache/carbondata/scan/result/vector/CarbonColumnVector.java ---
@@ -0,0 +1,29 @@
+package org.apache.carbondata.scan.result.vector;
+
+import org.apache.spark.sql.types.Decimal;
--- End diff --

can we not introduce spark dependency in this interface and implementation?

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [WIP]Added vector reader in Carbon s...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/412#discussion_r91830305

--- Diff: core/src/main/java/org/apache/carbondata/scan/result/vector/CarbonColumnVector.java ---
@@ -0,0 +1,29 @@
+package org.apache.carbondata.scan.result.vector;
+
+import org.apache.spark.sql.types.Decimal;
+
+public interface CarbonColumnVector {
+
+ public void putShort(int rowId, short value);
+
+ public void putInt(int rowId, int value);
+
+ public void putLong(int rowId, long value);
+
+ public void putDecimal(int rowId, Decimal value, int precision);
+
+ public void putDouble(int rowId, double value);
+
+ public void putBytes(int rowId, byte[] value);
+
+ public void putBytes(int rowId, int offset, int length, byte[] value);
--- End diff --

This interface is never used

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [WIP]Added vector reader in Carbon s...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/412#discussion_r91835389

--- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonRecordReader.java ---
@@ -78,7 +80,13 @@ public void initialize(InputSplit inputSplit, TaskAttemptContext context)
readSupport.initialize(queryModel.getProjectionColumns(),
queryModel.getAbsoluteTableIdentifier());
try {
- carbonIterator = new ChunkRowIterator(queryExecutor.execute(queryModel));
+ if (queryModel.isVectorReader()) {
+ carbonIterator = new VectorChunkRowIterator(
--- End diff --

I moved out this logic out of the class. And new `VectorizedCarbonRecordReader` is created.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [WIP]Added vector reader in Carbon s...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/412#discussion_r91835397

--- Diff: core/src/main/java/org/apache/carbondata/scan/result/vector/CarbonColumnarBatch.java ---
@@ -0,0 +1,45 @@
+package org.apache.carbondata.scan.result.vector;
--- End diff --

ok

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [WIP]Added vector reader in Carbon s...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/412#discussion_r91835416

--- Diff: core/src/main/java/org/apache/carbondata/scan/result/vector/CarbonColumnarBatch.java ---
@@ -0,0 +1,45 @@
+package org.apache.carbondata.scan.result.vector;
+
+public class CarbonColumnarBatch {
--- End diff --

In spark the name is aready `ColumnarBatch` so I just used this name to avoid confusion

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [WIP]Added vector reader in Carbon s...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/412#discussion_r91835427

--- Diff: core/src/main/java/org/apache/carbondata/scan/result/vector/CarbonColumnVector.java ---
@@ -0,0 +1,29 @@
+package org.apache.carbondata.scan.result.vector;
--- End diff --

ok

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #412: [WIP]Added vector reader in Carbon scan.

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/412

Build Success with Spark 1.5.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/86/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #412: [WIP]Added vector reader in Carbon scan.

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/412

Build Success with Spark 1.5.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/87/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [CARBONDATA-519]Added vector reader ...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/412#discussion_r91839108

--- Diff: core/src/main/java/org/apache/carbondata/scan/result/vector/CarbonColumnVector.java ---
@@ -0,0 +1,29 @@
+package org.apache.carbondata.scan.result.vector;
+
+import org.apache.spark.sql.types.Decimal;
+
+public interface CarbonColumnVector {
+
+ public void putShort(int rowId, short value);
+
+ public void putInt(int rowId, int value);
+
+ public void putLong(int rowId, long value);
+
+ public void putDecimal(int rowId, Decimal value, int precision);
+
+ public void putDouble(int rowId, double value);
+
+ public void putBytes(int rowId, byte[] value);
+
+ public void putBytes(int rowId, int offset, int length, byte[] value);
--- End diff --

`ColumnarVectorWrapper` is the implementation class for this interface

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #412: [CARBONDATA-519]Added vector reader in Carb...

In reply to this post by qiuchenjian-2

Github user jackylk commented on the issue:

https://github.com/apache/incubator-carbondata/pull/412

please rebase

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #412: [CARBONDATA-519]Added vector reader in Carb...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/412

Build Success with Spark 1.5.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/101/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

123