Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] incubator-carbondata pull request #412: [WIP]Added vector reader in Carbon s...

Classic

List

53 messages Options

Options

123

[GitHub] incubator-carbondata issue #412: [CARBONDATA-519]Added vector reader in Carb...

Github user ravipesala commented on the issue:

https://github.com/apache/incubator-carbondata/pull/412

@jackylk, rebased please review

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [CARBONDATA-519]Added vector reader ...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/412#discussion_r91871166

--- Diff: core/src/main/java/org/apache/carbondata/scan/result/vector/CarbonColumnVector.java ---
@@ -0,0 +1,29 @@
+package org.apache.carbondata.scan.result.vector;
+
+import org.apache.spark.sql.types.Decimal;
+
+public interface CarbonColumnVector {
+
+ public void putShort(int rowId, short value);
+
+ public void putInt(int rowId, int value);
+
+ public void putLong(int rowId, long value);
+
+ public void putDecimal(int rowId, Decimal value, int precision);
+
+ public void putDouble(int rowId, double value);
+
+ public void putBytes(int rowId, byte[] value);
+
+ public void putBytes(int rowId, int offset, int length, byte[] value);
--- End diff --

Yes, but I mean this function `putBytes(int rowId, int offset, int length, byte[] value)` is never used

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #412: [CARBONDATA-519]Added vector reader in Carb...

In reply to this post by qiuchenjian-2

Github user jackylk commented on the issue:

https://github.com/apache/incubator-carbondata/pull/412

Is there a test case for this feature? I could not find it

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #412: [CARBONDATA-519]Added vector reader in Carb...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/412

Build Success with Spark 1.5.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/106/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #412: [CARBONDATA-519]Added vector reader in Carb...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/412

Build Success with Spark 1.5.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/107/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #412: [CARBONDATA-519]Added vector reader in Carb...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/incubator-carbondata/pull/412

@jackylk Added testcase, please review

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [CARBONDATA-519]Added vector reader ...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/412#discussion_r91885134

--- Diff: core/src/main/java/org/apache/carbondata/scan/result/vector/CarbonColumnVector.java ---
@@ -0,0 +1,29 @@
+package org.apache.carbondata.scan.result.vector;
+
+import org.apache.spark.sql.types.Decimal;
+
+public interface CarbonColumnVector {
+
+ public void putShort(int rowId, short value);
+
+ public void putInt(int rowId, int value);
+
+ public void putLong(int rowId, long value);
+
+ public void putDecimal(int rowId, Decimal value, int precision);
+
+ public void putDouble(int rowId, double value);
+
+ public void putBytes(int rowId, byte[] value);
+
+ public void putBytes(int rowId, int offset, int length, byte[] value);
--- End diff --

Yes, never used, just added for future purpose

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [CARBONDATA-519]Added vector reader ...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/412#discussion_r91961576

--- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonScanRDD.scala ---
@@ -150,12 +153,25 @@ class CarbonScanRDD[V: ClassTag](
val attemptContext = new TaskAttemptContextImpl(new Configuration(), attemptId)
val format = prepareInputFormatForExecutor(attemptContext.getConfiguration)
val inputSplit = split.asInstanceOf[CarbonSparkPartition].split.value
- val reader = format.createRecordReader(inputSplit, attemptContext)
+ val model = format.getQueryModel(inputSplit, attemptContext)
+ val reader = {
+ if (vectorReader) {
+ val carbonRecordReader = createVectorizedCarbonRecordReader(model)
+ if (carbonRecordReader == null) {
+ new CarbonRecordReader(model, format.getReadSupportClass(attemptContext.getConfiguration))
+ } else {
+ carbonRecordReader
+ }
+ } else {
+ new CarbonRecordReader(model, format.getReadSupportClass(attemptContext.getConfiguration))
--- End diff --

should not new CarbonRecordReader directly, can we choose:
option 1: create two InputFormat, one for batch and another for non-batch
option 2: one InputFormat, create RecordReader according to configuration.
I prefer option 2.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [CARBONDATA-519]Added vector reader ...

In reply to this post by qiuchenjian-2

Github user piaoyats commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/412#discussion_r92108917

--- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/CarbonLateDecodeStrategy.scala ---
@@ -87,19 +90,17 @@ private[sql] class CarbonLateDecodeStrategy extends SparkStrategy {
private[this] def toCatalystRDD(
relation: LogicalRelation,
output: Seq[Attribute],
- rdd: RDD[Row],
+ rdd: RDD[InternalRow],
needDecode: ArrayBuffer[AttributeReference]):
RDD[InternalRow] = {
- val newRdd = if (needDecode.size > 0) {
+ if (needDecode.size > 0) {
+ rdd.asInstanceOf[CarbonScanRDD].setVectorReaderSupport(false)
getDecoderRDD(relation, needDecode, rdd, output)
--- End diff --

hi, i want to know what will happen if setVectorReaderSupport(true) when needDecode.size > 0

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [CARBONDATA-519]Added vector reader ...

In reply to this post by qiuchenjian-2

Github user piaoyats commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/412#discussion_r92531150

--- Diff: core/src/main/java/org/apache/carbondata/core/carbon/datastore/chunk/impl/FixedLengthDimensionDataChunk.java ---
@@ -77,13 +79,75 @@ public FixedLengthDimensionDataChunk(byte[] dataChunk, DimensionChunkAttributes
rowId = chunkAttributes.getInvertedIndexesReverse()[rowId];
}
int start = rowId * chunkAttributes.getColumnValueSize();
+ int dict = getInt(chunkAttributes.getColumnValueSize(), start);
+ row[columnIndex] = dict;
+ return columnIndex + 1;
+ }
+
+ @Override public int fillConvertedChunkData(ColumnVectorInfo[] vectorInfo, int column,
+ KeyStructureInfo restructuringInfo) {
--- End diff --

What does KeyStructureInfo use? Seem we did not use this parameter

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [CARBONDATA-519]Added vector reader ...

In reply to this post by qiuchenjian-2

Github user piaoyats commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/412#discussion_r92531597

--- Diff: core/src/main/java/org/apache/carbondata/core/carbon/datastore/chunk/impl/FixedLengthDimensionDataChunk.java ---
@@ -77,13 +79,75 @@ public FixedLengthDimensionDataChunk(byte[] dataChunk, DimensionChunkAttributes
rowId = chunkAttributes.getInvertedIndexesReverse()[rowId];
}
int start = rowId * chunkAttributes.getColumnValueSize();
+ int dict = getInt(chunkAttributes.getColumnValueSize(), start);
+ row[columnIndex] = dict;
+ return columnIndex + 1;
+ }
+
+ @Override public int fillConvertedChunkData(ColumnVectorInfo[] vectorInfo, int column,
+ KeyStructureInfo restructuringInfo) {
+ ColumnVectorInfo columnVectorInfo = vectorInfo[column];
+ int offset = columnVectorInfo.offset;
+ int vectorOffset = columnVectorInfo.vectorOffset;
+ int len = columnVectorInfo.size + offset;
+ int[] indexesReverse = chunkAttributes.getInvertedIndexesReverse();
+ int columnValueSize = chunkAttributes.getColumnValueSize();
+ CarbonColumnVector vector = columnVectorInfo.vector;
+ for (int j = offset; j < len; j++) {
+ int start =
+ indexesReverse == null ? j * columnValueSize : indexesReverse[j] * columnValueSize;
+ int dict = getInt(columnValueSize, start);
+ if (columnVectorInfo.directDictionaryGenerator == null) {
+ vector.putInt(vectorOffset++, dict);
+ } else {
+ Object valueFromSurrogate =
+ columnVectorInfo.directDictionaryGenerator.getValueFromSurrogate(dict);
+ if (valueFromSurrogate == null) {
+ vector.putNull(vectorOffset++);
+ } else {
+ vector.putLong(vectorOffset++, (long) valueFromSurrogate);
+ }
+ }
+ }
+ return column + 1;
+ }
+
+ @Override
+ public int fillConvertedChunkData(int[] rowMapping, ColumnVectorInfo[] vectorInfo, int column,
+ KeyStructureInfo restructuringInfo) {
+ ColumnVectorInfo columnVectorInfo = vectorInfo[column];
+ int offset = columnVectorInfo.offset;
+ int vectorOffset = columnVectorInfo.vectorOffset;
+ int len = columnVectorInfo.size + offset;
+ int[] indexesReverse = chunkAttributes.getInvertedIndexesReverse();
+ int columnValueSize = chunkAttributes.getColumnValueSize();
+ CarbonColumnVector vector = columnVectorInfo.vector;
+ for (int j = offset; j < len; j++) {
+ int start = indexesReverse == null ?
+ rowMapping[j] * columnValueSize :indexesReverse[rowMapping[j]] * columnValueSize;
--- End diff --

add space after :
can we add some notes or docs to show the logic, so readers can understand easily

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [CARBONDATA-519]Added vector reader ...

In reply to this post by qiuchenjian-2

Github user piaoyats commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/412#discussion_r92533279

--- Diff: core/src/main/java/org/apache/carbondata/scan/executor/QueryExecutorFactory.java ---
@@ -18,15 +18,69 @@
*/
package org.apache.carbondata.scan.executor;

+import java.util.List;
+
+import org.apache.carbondata.core.carbon.metadata.datatype.DataType;
+import org.apache.carbondata.core.carbon.metadata.encoder.Encoding;
import org.apache.carbondata.scan.executor.impl.DetailQueryExecutor;
+import org.apache.carbondata.scan.executor.impl.VectorDetailQueryExecutor;
+import org.apache.carbondata.scan.model.QueryDimension;
+import org.apache.carbondata.scan.model.QueryMeasure;
+import org.apache.carbondata.scan.model.QueryModel;
+import org.apache.carbondata.scan.result.vector.CarbonColumnVector;
+import org.apache.carbondata.scan.result.vector.CarbonColumnarBatch;
+import org.apache.carbondata.scan.result.vector.impl.CarbonColumnVectorImpl;

/**
* Factory class to get the query executor from RDD
* This will return the executor based on query type
*/
public class QueryExecutorFactory {

- public static QueryExecutor getQueryExecutor() {
- return new DetailQueryExecutor();
+ public static QueryExecutor getQueryExecutor(QueryModel queryModel) {
+ if (queryModel.isVectorReader()) {
+ return new VectorDetailQueryExecutor();
+ } else {
+ return new DetailQueryExecutor();
+ }
+ }
+
+ public static CarbonColumnarBatch createColuminarBatch(QueryModel queryModel) {
--- End diff --

spell errorï¼ Columinar -> Columnar

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [CARBONDATA-519]Added vector reader ...

In reply to this post by qiuchenjian-2

Github user piaoyats commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/412#discussion_r92533460

--- Diff: core/src/main/java/org/apache/carbondata/scan/executor/QueryExecutorFactory.java ---
@@ -18,15 +18,69 @@
*/
package org.apache.carbondata.scan.executor;

+import java.util.List;
+
+import org.apache.carbondata.core.carbon.metadata.datatype.DataType;
+import org.apache.carbondata.core.carbon.metadata.encoder.Encoding;
import org.apache.carbondata.scan.executor.impl.DetailQueryExecutor;
+import org.apache.carbondata.scan.executor.impl.VectorDetailQueryExecutor;
+import org.apache.carbondata.scan.model.QueryDimension;
+import org.apache.carbondata.scan.model.QueryMeasure;
+import org.apache.carbondata.scan.model.QueryModel;
+import org.apache.carbondata.scan.result.vector.CarbonColumnVector;
+import org.apache.carbondata.scan.result.vector.CarbonColumnarBatch;
+import org.apache.carbondata.scan.result.vector.impl.CarbonColumnVectorImpl;

/**
* Factory class to get the query executor from RDD
* This will return the executor based on query type
*/
public class QueryExecutorFactory {

- public static QueryExecutor getQueryExecutor() {
- return new DetailQueryExecutor();
+ public static QueryExecutor getQueryExecutor(QueryModel queryModel) {
+ if (queryModel.isVectorReader()) {
+ return new VectorDetailQueryExecutor();
+ } else {
+ return new DetailQueryExecutor();
+ }
+ }
+
+ public static CarbonColumnarBatch createColuminarBatch(QueryModel queryModel) {
+ int batchSize = 10000;
--- End diff --

Can we set batchSize through configuration? which is useful for tuning.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [CARBONDATA-519]Added vector reader ...

In reply to this post by qiuchenjian-2

Github user piaoyats commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/412#discussion_r92534171

--- Diff: core/src/main/java/org/apache/carbondata/scan/executor/QueryExecutorFactory.java ---
@@ -18,15 +18,69 @@
*/
package org.apache.carbondata.scan.executor;

+import java.util.List;
+
+import org.apache.carbondata.core.carbon.metadata.datatype.DataType;
+import org.apache.carbondata.core.carbon.metadata.encoder.Encoding;
import org.apache.carbondata.scan.executor.impl.DetailQueryExecutor;
+import org.apache.carbondata.scan.executor.impl.VectorDetailQueryExecutor;
+import org.apache.carbondata.scan.model.QueryDimension;
+import org.apache.carbondata.scan.model.QueryMeasure;
+import org.apache.carbondata.scan.model.QueryModel;
+import org.apache.carbondata.scan.result.vector.CarbonColumnVector;
+import org.apache.carbondata.scan.result.vector.CarbonColumnarBatch;
+import org.apache.carbondata.scan.result.vector.impl.CarbonColumnVectorImpl;

/**
* Factory class to get the query executor from RDD
* This will return the executor based on query type
*/
public class QueryExecutorFactory {

- public static QueryExecutor getQueryExecutor() {
- return new DetailQueryExecutor();
+ public static QueryExecutor getQueryExecutor(QueryModel queryModel) {
+ if (queryModel.isVectorReader()) {
+ return new VectorDetailQueryExecutor();
+ } else {
+ return new DetailQueryExecutor();
+ }
+ }
+
+ public static CarbonColumnarBatch createColuminarBatch(QueryModel queryModel) {
+ int batchSize = 10000;
+ List<QueryDimension> queryDimension = queryModel.getQueryDimension();
+ List<QueryMeasure> queryMeasures = queryModel.getQueryMeasures();
+ CarbonColumnVector[] vectors =
+ new CarbonColumnVector[queryDimension.size() + queryMeasures.size()];
+ for (int i = 0; i < queryDimension.size(); i++) {
+ QueryDimension dim = queryDimension.get(i);
+ if (dim.getDimension().hasEncoding(Encoding.DIRECT_DICTIONARY)) {
+ vectors[dim.getQueryOrder()] = new CarbonColumnVectorImpl(batchSize, DataType.LONG);
+ } else if (!dim.getDimension().hasEncoding(Encoding.DICTIONARY)) {
+ vectors[dim.getQueryOrder()] =
+ new CarbonColumnVectorImpl(batchSize, dim.getDimension().getDataType());
+ } else if (dim.getDimension().isComplex()) {
+ vectors[dim.getQueryOrder()] = new CarbonColumnVectorImpl(batchSize, DataType.STRUCT);
+ } else {
+ vectors[dim.getQueryOrder()] = new CarbonColumnVectorImpl(batchSize, DataType.INT);
+ }
+ }
+
+ for (int i = 0; i < queryMeasures.size(); i++) {
+ QueryMeasure msr = queryMeasures.get(i);
+ switch (msr.getMeasure().getDataType()) {
+ case SHORT:
+ case INT:
+ case LONG:
+ vectors[msr.getQueryOrder()] =
+ new CarbonColumnVectorImpl(batchSize, msr.getMeasure().getDataType());
+ break;
+ case DECIMAL:
+ vectors[msr.getQueryOrder()] = new CarbonColumnVectorImpl(batchSize, DataType.DECIMAL);
+ break;
+ default:
+ vectors[msr.getQueryOrder()] = new CarbonColumnVectorImpl(batchSize, DataType.DOUBLE);
+ }
--- End diff --

what is the difference between msr.getMeasure().getDataType() with DataType.Decimal and DataType.DOUBLE?
Seems that case DECIMAL logic can merge witch LONG

can we just use vectors[msr.getQueryOrder()] =
new CarbonColumnVectorImpl(batchSize, msr.getMeasure().getDataType()); for all measure data type ?

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [CARBONDATA-519]Added vector reader ...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/412#discussion_r92741003

--- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonScanRDD.scala ---
@@ -150,12 +153,25 @@ class CarbonScanRDD[V: ClassTag](
val attemptContext = new TaskAttemptContextImpl(new Configuration(), attemptId)
val format = prepareInputFormatForExecutor(attemptContext.getConfiguration)
val inputSplit = split.asInstanceOf[CarbonSparkPartition].split.value
- val reader = format.createRecordReader(inputSplit, attemptContext)
+ val model = format.getQueryModel(inputSplit, attemptContext)
+ val reader = {
+ if (vectorReader) {
+ val carbonRecordReader = createVectorizedCarbonRecordReader(model)
+ if (carbonRecordReader == null) {
+ new CarbonRecordReader(model, format.getReadSupportClass(attemptContext.getConfiguration))
+ } else {
+ carbonRecordReader
+ }
+ } else {
+ new CarbonRecordReader(model, format.getReadSupportClass(attemptContext.getConfiguration))
--- End diff --

This vector carbon reader is specific to spark and it has spark dependencies that is why it is created in spark module.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [CARBONDATA-519]Added vector reader ...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/412#discussion_r92742973

--- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/CarbonLateDecodeStrategy.scala ---
@@ -87,19 +90,17 @@ private[sql] class CarbonLateDecodeStrategy extends SparkStrategy {
private[this] def toCatalystRDD(
relation: LogicalRelation,
output: Seq[Attribute],
- rdd: RDD[Row],
+ rdd: RDD[InternalRow],
needDecode: ArrayBuffer[AttributeReference]):
RDD[InternalRow] = {
- val newRdd = if (needDecode.size > 0) {
+ if (needDecode.size > 0) {
+ rdd.asInstanceOf[CarbonScanRDD].setVectorReaderSupport(false)
getDecoderRDD(relation, needDecode, rdd, output)
--- End diff --

if vector reader is true and `needDecode.size > 0` then it uses dictionary decoder rdd in its parent.But decoder rdd is not capable of handling columnar batches.
I will raise another PR to move the decoder RDD logic to carbon layer.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [CARBONDATA-519]Added vector reader ...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/412#discussion_r92743655

--- Diff: core/src/main/java/org/apache/carbondata/scan/executor/QueryExecutorFactory.java ---
@@ -18,15 +18,69 @@
*/
package org.apache.carbondata.scan.executor;

+import java.util.List;
+
+import org.apache.carbondata.core.carbon.metadata.datatype.DataType;
+import org.apache.carbondata.core.carbon.metadata.encoder.Encoding;
import org.apache.carbondata.scan.executor.impl.DetailQueryExecutor;
+import org.apache.carbondata.scan.executor.impl.VectorDetailQueryExecutor;
+import org.apache.carbondata.scan.model.QueryDimension;
+import org.apache.carbondata.scan.model.QueryMeasure;
+import org.apache.carbondata.scan.model.QueryModel;
+import org.apache.carbondata.scan.result.vector.CarbonColumnVector;
+import org.apache.carbondata.scan.result.vector.CarbonColumnarBatch;
+import org.apache.carbondata.scan.result.vector.impl.CarbonColumnVectorImpl;

/**
* Factory class to get the query executor from RDD
* This will return the executor based on query type
*/
public class QueryExecutorFactory {

- public static QueryExecutor getQueryExecutor() {
- return new DetailQueryExecutor();
+ public static QueryExecutor getQueryExecutor(QueryModel queryModel) {
+ if (queryModel.isVectorReader()) {
+ return new VectorDetailQueryExecutor();
+ } else {
+ return new DetailQueryExecutor();
+ }
+ }
+
+ public static CarbonColumnarBatch createColuminarBatch(QueryModel queryModel) {
+ int batchSize = 10000;
--- End diff --

This method is not used now, I am removing it

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [CARBONDATA-519]Added vector reader ...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/412#discussion_r92743691

--- Diff: core/src/main/java/org/apache/carbondata/scan/executor/QueryExecutorFactory.java ---
@@ -18,15 +18,69 @@
*/
package org.apache.carbondata.scan.executor;

+import java.util.List;
+
+import org.apache.carbondata.core.carbon.metadata.datatype.DataType;
+import org.apache.carbondata.core.carbon.metadata.encoder.Encoding;
import org.apache.carbondata.scan.executor.impl.DetailQueryExecutor;
+import org.apache.carbondata.scan.executor.impl.VectorDetailQueryExecutor;
+import org.apache.carbondata.scan.model.QueryDimension;
+import org.apache.carbondata.scan.model.QueryMeasure;
+import org.apache.carbondata.scan.model.QueryModel;
+import org.apache.carbondata.scan.result.vector.CarbonColumnVector;
+import org.apache.carbondata.scan.result.vector.CarbonColumnarBatch;
+import org.apache.carbondata.scan.result.vector.impl.CarbonColumnVectorImpl;

/**
* Factory class to get the query executor from RDD
* This will return the executor based on query type
*/
public class QueryExecutorFactory {

- public static QueryExecutor getQueryExecutor() {
- return new DetailQueryExecutor();
+ public static QueryExecutor getQueryExecutor(QueryModel queryModel) {
+ if (queryModel.isVectorReader()) {
+ return new VectorDetailQueryExecutor();
+ } else {
+ return new DetailQueryExecutor();
+ }
+ }
+
+ public static CarbonColumnarBatch createColuminarBatch(QueryModel queryModel) {
--- End diff --

This method is not used now, I am removing it

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #412: [CARBONDATA-519]Added vector reader ...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/412#discussion_r92744134

--- Diff: core/src/main/java/org/apache/carbondata/scan/executor/QueryExecutorFactory.java ---
@@ -18,15 +18,69 @@
*/
package org.apache.carbondata.scan.executor;

+import java.util.List;
+
+import org.apache.carbondata.core.carbon.metadata.datatype.DataType;
+import org.apache.carbondata.core.carbon.metadata.encoder.Encoding;
import org.apache.carbondata.scan.executor.impl.DetailQueryExecutor;
+import org.apache.carbondata.scan.executor.impl.VectorDetailQueryExecutor;
+import org.apache.carbondata.scan.model.QueryDimension;
+import org.apache.carbondata.scan.model.QueryMeasure;
+import org.apache.carbondata.scan.model.QueryModel;
+import org.apache.carbondata.scan.result.vector.CarbonColumnVector;
+import org.apache.carbondata.scan.result.vector.CarbonColumnarBatch;
+import org.apache.carbondata.scan.result.vector.impl.CarbonColumnVectorImpl;

/**
* Factory class to get the query executor from RDD
* This will return the executor based on query type
*/
public class QueryExecutorFactory {

- public static QueryExecutor getQueryExecutor() {
- return new DetailQueryExecutor();
+ public static QueryExecutor getQueryExecutor(QueryModel queryModel) {
+ if (queryModel.isVectorReader()) {
+ return new VectorDetailQueryExecutor();
+ } else {
+ return new DetailQueryExecutor();
+ }
+ }
+
+ public static CarbonColumnarBatch createColuminarBatch(QueryModel queryModel) {
+ int batchSize = 10000;
+ List<QueryDimension> queryDimension = queryModel.getQueryDimension();
+ List<QueryMeasure> queryMeasures = queryModel.getQueryMeasures();
+ CarbonColumnVector[] vectors =
+ new CarbonColumnVector[queryDimension.size() + queryMeasures.size()];
+ for (int i = 0; i < queryDimension.size(); i++) {
+ QueryDimension dim = queryDimension.get(i);
+ if (dim.getDimension().hasEncoding(Encoding.DIRECT_DICTIONARY)) {
+ vectors[dim.getQueryOrder()] = new CarbonColumnVectorImpl(batchSize, DataType.LONG);
+ } else if (!dim.getDimension().hasEncoding(Encoding.DICTIONARY)) {
+ vectors[dim.getQueryOrder()] =
+ new CarbonColumnVectorImpl(batchSize, dim.getDimension().getDataType());
+ } else if (dim.getDimension().isComplex()) {
+ vectors[dim.getQueryOrder()] = new CarbonColumnVectorImpl(batchSize, DataType.STRUCT);
+ } else {
+ vectors[dim.getQueryOrder()] = new CarbonColumnVectorImpl(batchSize, DataType.INT);
+ }
+ }
+
+ for (int i = 0; i < queryMeasures.size(); i++) {
+ QueryMeasure msr = queryMeasures.get(i);
+ switch (msr.getMeasure().getDataType()) {
+ case SHORT:
+ case INT:
+ case LONG:
+ vectors[msr.getQueryOrder()] =
+ new CarbonColumnVectorImpl(batchSize, msr.getMeasure().getDataType());
+ break;
+ case DECIMAL:
+ vectors[msr.getQueryOrder()] = new CarbonColumnVectorImpl(batchSize, DataType.DECIMAL);
+ break;
+ default:
+ vectors[msr.getQueryOrder()] = new CarbonColumnVectorImpl(batchSize, DataType.DOUBLE);
+ }
--- End diff --

No we can't use as we support few datatypes while storing reading measure data.
Anyway this method is not used now, I am removing it

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #412: [CARBONDATA-519]Added vector reader in Carb...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/412

Build Failed with Spark 1.5.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/207/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

123