[GitHub] [carbondata] ajantha-bhat opened a new pull request #3887: [WIP] Refactor #3773 and support struct type

classic Classic list List threaded Threaded
128 messages Options
1234567
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto

GitBox

ajantha-bhat commented on a change in pull request #3887:
URL: https://github.com/apache/carbondata/pull/3887#discussion_r476624952



##########
File path: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveCodec.java
##########
@@ -260,4 +265,52 @@ protected String debugInfo() {
     return this.toString();
   }
 
+  // Utility class to update current vector to child vector in case of complex type handling
+  public static class VectorUtil {
+    private ColumnVectorInfo vectorInfo;
+    private int pageSize;
+    private CarbonColumnVector vector;
+    private DataType vectorDataType;
+
+    public VectorUtil(ColumnVectorInfo vectorInfo, int pageSize, CarbonColumnVector vector,
+        DataType vectorDataType) {
+      this.vectorInfo = vectorInfo;
+      this.pageSize = pageSize;
+      this.vector = vector;
+      this.vectorDataType = vectorDataType;
+    }
+
+    public int getPageSize() {
+      return pageSize;
+    }
+
+    public CarbonColumnVector getVector() {
+      return vector;
+    }
+
+    public DataType getVectorDataType() {
+      return vectorDataType;
+    }
+
+    public VectorUtil checkAndUpdateToChildVector() {
+      Stack<CarbonColumnVector> vectorStack = vectorInfo.getVectorStack();
+      if (vectorStack != null && vectorStack.peek() != null && vectorDataType.isComplexType()) {
+        if (vectorDataType.getName().equals("ARRAY")) {

Review comment:
       yes. I can modify

##########
File path: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveCodec.java
##########
@@ -260,4 +265,52 @@ protected String debugInfo() {
     return this.toString();
   }
 
+  // Utility class to update current vector to child vector in case of complex type handling
+  public static class VectorUtil {
+    private ColumnVectorInfo vectorInfo;
+    private int pageSize;
+    private CarbonColumnVector vector;
+    private DataType vectorDataType;
+
+    public VectorUtil(ColumnVectorInfo vectorInfo, int pageSize, CarbonColumnVector vector,
+        DataType vectorDataType) {
+      this.vectorInfo = vectorInfo;
+      this.pageSize = pageSize;
+      this.vector = vector;
+      this.vectorDataType = vectorDataType;
+    }
+
+    public int getPageSize() {
+      return pageSize;
+    }
+
+    public CarbonColumnVector getVector() {
+      return vector;
+    }
+
+    public DataType getVectorDataType() {
+      return vectorDataType;
+    }
+
+    public VectorUtil checkAndUpdateToChildVector() {

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #3887:
URL: https://github.com/apache/carbondata/pull/3887#discussion_r476625159



##########
File path: core/src/main/java/org/apache/carbondata/core/scan/result/vector/CarbonColumnVector.java
##########
@@ -114,4 +115,20 @@
 
   void setLazyPage(LazyPageLoader lazyPage);
 
+  // Add default implementation for interface,
+  // to avoid implementing presto required functions for spark or core module.
+  default List<CarbonColumnVector> getChildrenVector() {
+    return null;

Review comment:
       done

##########
File path: core/src/main/java/org/apache/carbondata/core/scan/result/vector/impl/CarbonColumnVectorImpl.java
##########
@@ -70,10 +73,14 @@
 
   private CarbonColumnVector dictionaryVector;
 
+  private List<CarbonColumnVector> childrenVector;
+
   private LazyPageLoader lazyPage;
 
   private boolean loaded;
 
+  private ArrayList<Integer> childElementsForEachRow = null;

Review comment:
       yeah. done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #3887:
URL: https://github.com/apache/carbondata/pull/3887#discussion_r476625494



##########
File path: integration/presto/src/main/prestosql/org/apache/carbondata/presto/readers/ComplexTypeStreamReader.java
##########
@@ -0,0 +1,199 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.presto.readers;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Optional;
+
+import io.prestosql.spi.block.ArrayBlock;
+import io.prestosql.spi.block.RowBlock;
+import io.prestosql.spi.type.*;
+
+import org.apache.carbondata.core.metadata.datatype.DataType;
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
+import org.apache.carbondata.core.metadata.datatype.StructField;
+import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector;
+import org.apache.carbondata.core.scan.result.vector.impl.CarbonColumnVectorImpl;
+
+import io.prestosql.spi.block.Block;
+import io.prestosql.spi.block.BlockBuilder;
+
+import org.apache.carbondata.presto.CarbonVectorBatch;
+import org.apache.carbondata.presto.ColumnarVectorWrapperDirect;
+
+/**
+ * Class to read the complex type Stream [array/struct/map]
+ */
+
+public class ComplexTypeStreamReader extends CarbonColumnVectorImpl
+    implements PrestoVectorBlockBuilder {
+
+  protected int batchSize;
+
+  protected Type type;
+  protected BlockBuilder builder;
+
+  public ComplexTypeStreamReader(int batchSize, StructField field) {
+    super(batchSize, field.getDataType());
+    this.batchSize = batchSize;
+    this.type = getType(field);
+    ArrayList<CarbonColumnVector> childrenList = new ArrayList<>();
+    for (StructField child : field.getChildren()) {
+      childrenList.add(new ColumnarVectorWrapperDirect(
+          CarbonVectorBatch.createDirectStreamReader(this.batchSize, child.getDataType(), child)));
+    }
+    setChildrenVector(childrenList);
+    this.builder = type.createBlockBuilder(null, batchSize);
+  }
+
+  Type getType(StructField field) {
+    DataType dataType = field.getDataType();
+    if (dataType == DataTypes.STRING || dataType == DataTypes.VARCHAR) {
+      return VarcharType.VARCHAR;
+    } else if (dataType == DataTypes.SHORT) {
+      return SmallintType.SMALLINT;
+    } else if (dataType == DataTypes.INT) {
+      return IntegerType.INTEGER;
+    } else if (dataType == DataTypes.LONG) {
+      return BigintType.BIGINT;
+    } else if (dataType == DataTypes.DOUBLE) {
+      return DoubleType.DOUBLE;
+    } else if (dataType == DataTypes.FLOAT) {
+      return RealType.REAL;
+    } else if (dataType == DataTypes.BOOLEAN) {
+      return BooleanType.BOOLEAN;
+    } else if (dataType == DataTypes.BINARY) {
+      return VarbinaryType.VARBINARY;
+    } else if (dataType == DataTypes.DATE) {
+      return DateType.DATE;
+    } else if (dataType == DataTypes.TIMESTAMP) {
+      return TimestampType.TIMESTAMP;
+    } else if (dataType == DataTypes.BYTE) {
+      return TinyintType.TINYINT;
+    } else if (DataTypes.isDecimal(dataType)) {
+      org.apache.carbondata.core.metadata.datatype.DecimalType decimal =
+          (org.apache.carbondata.core.metadata.datatype.DecimalType) dataType;
+      return DecimalType.createDecimalType(decimal.getPrecision(), decimal.getScale());
+    } else if (DataTypes.isArrayType(dataType)) {
+      return new ArrayType(getType(field.getChildren().get(0)));
+    } else if (DataTypes.isStructType(dataType)) {
+      List<RowType.Field> children = new ArrayList<>();
+      for (StructField child : field.getChildren()) {
+        children.add(new RowType.Field(Optional.of(child.getFieldName()), getType(child)));
+      }
+      return RowType.from(children);
+    } else {
+      throw new UnsupportedOperationException("Unsupported type: " + dataType);
+    }
+  }
+
+  @Override public Block buildBlock() {
+    return builder.build();
+  }
+
+  @Override public void setBatchSize(int batchSize) {
+    this.batchSize = batchSize;
+  }
+
+  @Override public void putObject(int rowId, Object value) {
+    if (value == null) {
+      putNull(rowId);
+    } else {
+      getChildrenVector().get(0).putObject(rowId, value);
+    }
+  }
+
+  public void putComplexObject(List<Integer> offsetVector) {
+    if (type instanceof ArrayType) {
+      Block childBlock = buildChildBlock(getChildrenVector().get(0));
+      // prepare an offset vector with 0 as initial offset
+      int[] offsetVectorArray = new int[offsetVector.size() + 1];
+      for (int i = 1; i <= offsetVector.size(); i++) {
+        offsetVectorArray[i] = offsetVectorArray[i -1] + offsetVector.get(i - 1);

Review comment:
       wonder why checkstyle is not catching




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #3887:
URL: https://github.com/apache/carbondata/pull/3887#discussion_r476625603



##########
File path: integration/presto/src/main/prestosql/org/apache/carbondata/presto/readers/ComplexTypeStreamReader.java
##########
@@ -0,0 +1,199 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.presto.readers;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Optional;
+
+import io.prestosql.spi.block.ArrayBlock;
+import io.prestosql.spi.block.RowBlock;
+import io.prestosql.spi.type.*;
+
+import org.apache.carbondata.core.metadata.datatype.DataType;
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
+import org.apache.carbondata.core.metadata.datatype.StructField;
+import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector;
+import org.apache.carbondata.core.scan.result.vector.impl.CarbonColumnVectorImpl;
+
+import io.prestosql.spi.block.Block;
+import io.prestosql.spi.block.BlockBuilder;
+
+import org.apache.carbondata.presto.CarbonVectorBatch;
+import org.apache.carbondata.presto.ColumnarVectorWrapperDirect;
+
+/**
+ * Class to read the complex type Stream [array/struct/map]
+ */
+
+public class ComplexTypeStreamReader extends CarbonColumnVectorImpl
+    implements PrestoVectorBlockBuilder {
+
+  protected int batchSize;
+
+  protected Type type;
+  protected BlockBuilder builder;
+
+  public ComplexTypeStreamReader(int batchSize, StructField field) {
+    super(batchSize, field.getDataType());
+    this.batchSize = batchSize;
+    this.type = getType(field);
+    ArrayList<CarbonColumnVector> childrenList = new ArrayList<>();
+    for (StructField child : field.getChildren()) {
+      childrenList.add(new ColumnarVectorWrapperDirect(
+          CarbonVectorBatch.createDirectStreamReader(this.batchSize, child.getDataType(), child)));
+    }
+    setChildrenVector(childrenList);
+    this.builder = type.createBlockBuilder(null, batchSize);
+  }
+
+  Type getType(StructField field) {
+    DataType dataType = field.getDataType();
+    if (dataType == DataTypes.STRING || dataType == DataTypes.VARCHAR) {
+      return VarcharType.VARCHAR;
+    } else if (dataType == DataTypes.SHORT) {
+      return SmallintType.SMALLINT;
+    } else if (dataType == DataTypes.INT) {
+      return IntegerType.INTEGER;
+    } else if (dataType == DataTypes.LONG) {
+      return BigintType.BIGINT;
+    } else if (dataType == DataTypes.DOUBLE) {
+      return DoubleType.DOUBLE;
+    } else if (dataType == DataTypes.FLOAT) {
+      return RealType.REAL;
+    } else if (dataType == DataTypes.BOOLEAN) {
+      return BooleanType.BOOLEAN;
+    } else if (dataType == DataTypes.BINARY) {
+      return VarbinaryType.VARBINARY;
+    } else if (dataType == DataTypes.DATE) {
+      return DateType.DATE;
+    } else if (dataType == DataTypes.TIMESTAMP) {
+      return TimestampType.TIMESTAMP;
+    } else if (dataType == DataTypes.BYTE) {
+      return TinyintType.TINYINT;
+    } else if (DataTypes.isDecimal(dataType)) {
+      org.apache.carbondata.core.metadata.datatype.DecimalType decimal =
+          (org.apache.carbondata.core.metadata.datatype.DecimalType) dataType;
+      return DecimalType.createDecimalType(decimal.getPrecision(), decimal.getScale());
+    } else if (DataTypes.isArrayType(dataType)) {
+      return new ArrayType(getType(field.getChildren().get(0)));
+    } else if (DataTypes.isStructType(dataType)) {
+      List<RowType.Field> children = new ArrayList<>();
+      for (StructField child : field.getChildren()) {
+        children.add(new RowType.Field(Optional.of(child.getFieldName()), getType(child)));
+      }
+      return RowType.from(children);
+    } else {
+      throw new UnsupportedOperationException("Unsupported type: " + dataType);
+    }
+  }
+
+  @Override public Block buildBlock() {
+    return builder.build();
+  }
+
+  @Override public void setBatchSize(int batchSize) {
+    this.batchSize = batchSize;
+  }
+
+  @Override public void putObject(int rowId, Object value) {
+    if (value == null) {
+      putNull(rowId);
+    } else {
+      getChildrenVector().get(0).putObject(rowId, value);

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #3887:
URL: https://github.com/apache/carbondata/pull/3887#discussion_r476628989



##########
File path: core/src/main/java/org/apache/carbondata/core/scan/result/vector/ColumnVectorInfo.java
##########
@@ -39,6 +40,15 @@
   public int[] invertedIndex;
   public BitSet deletedRows;
   public DecimalConverterFactory.DecimalConverter decimalConverter;
+  public Stack<CarbonColumnVector> vectorStack = null;

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #3887:
URL: https://github.com/apache/carbondata/pull/3887#discussion_r476636246



##########
File path: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveDeltaFloatingCodec.java
##########
@@ -282,6 +288,12 @@ public void decodeAndFillVector(byte[] pageData, ColumnVectorInfo vectorInfo, Bi
           for (int i = 0; i < size; i += DataTypes.INT.getSizeInBytes()) {
             vector.putFloat(rowId++, (max - ByteUtil.toIntLittleEndian(pageData, i)) / floatFactor);
           }
+        } else if (pageDataType == DataTypes.LONG) {

Review comment:
       I have tested, With current UT itself it is hitting. for [Null, 5.512] it is using long as storage for complex primitive adaptive. Base behavior needs to check. I guess it can be analyzed separately




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3887:
URL: https://github.com/apache/carbondata/pull/3887#issuecomment-680225124


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3868/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3887:
URL: https://github.com/apache/carbondata/pull/3887#issuecomment-680226920


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2127/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #3887:
URL: https://github.com/apache/carbondata/pull/3887#discussion_r477040770



##########
File path: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveDeltaFloatingCodec.java
##########
@@ -255,6 +255,12 @@ public void decodeAndFillVector(byte[] pageData, ColumnVectorInfo vectorInfo, Bi
       CarbonColumnVector vector = vectorInfo.vector;
       BitSet deletedRows = vectorInfo.deletedRows;
       DataType vectorDataType = vector.getType();
+      VectorUtil vectorUtil = new VectorUtil(vectorInfo, pageSize, vector, vectorDataType)

Review comment:
       Also, decodeAndFillVector takes vectorInfo, not the vectors. So, all interface need to change if I do this.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #3887:
URL: https://github.com/apache/carbondata/pull/3887#discussion_r477058306



##########
File path: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveDeltaFloatingCodec.java
##########
@@ -282,6 +288,12 @@ public void decodeAndFillVector(byte[] pageData, ColumnVectorInfo vectorInfo, Bi
           for (int i = 0; i < size; i += DataTypes.INT.getSizeInBytes()) {
             vector.putFloat(rowId++, (max - ByteUtil.toIntLittleEndian(pageData, i)) / floatFactor);
           }
+        } else if (pageDataType == DataTypes.LONG) {

Review comment:
       For this, I have checked
   If No complex type, (if it is just primitive type) same values goes to DirectCompress, not adaptive. But for complex primitive it goes to adaptive because of below code. And as min max is stored as double precision. Long is chosen for this.
   
   
   `DefaultEncodingFactory#selectCodecByAlgorithmForFloating()`
   
   ```
   } else if (decimalCount < 0 && !isComplexPrimitive) {
         return new DirectCompressCodec(DataTypes.DOUBLE);
       } else {
         return getColumnPageCodec(stats, isComplexPrimitive, columnSpec, srcDataType, maxValue,
             minValue, decimalCount, absMaxValue);
       }
   ```
   
   I don't know (remember) why complex primitive should not enter direct compress.  why that check is explicitly added.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kumarvishal09 commented on a change in pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto

GitBox
In reply to this post by GitBox

kumarvishal09 commented on a change in pull request #3887:
URL: https://github.com/apache/carbondata/pull/3887#discussion_r477289221



##########
File path: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveCodec.java
##########
@@ -260,4 +265,65 @@ protected String debugInfo() {
     return this.toString();
   }
 
+  public static VectorUtil checkAndUpdateToChildVector(ColumnVectorInfo vectorInfo, int pageSize,
+      CarbonColumnVector vector, DataType vectorDataType) {
+    VectorUtil vectorUtil = new VectorUtil(pageSize, vector, vectorDataType);
+    Stack<CarbonColumnVector> vectorStack = vectorInfo.getVectorStack();
+    // check and update to child vector info
+    if (vectorStack != null && vectorStack.peek() != null && vectorDataType.isComplexType()) {

Review comment:
       use Objects.nonNull for not null case and for null use Objects.isnull




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kumarvishal09 commented on a change in pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto

GitBox
In reply to this post by GitBox

kumarvishal09 commented on a change in pull request #3887:
URL: https://github.com/apache/carbondata/pull/3887#discussion_r477289221



##########
File path: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveCodec.java
##########
@@ -260,4 +265,65 @@ protected String debugInfo() {
     return this.toString();
   }
 
+  public static VectorUtil checkAndUpdateToChildVector(ColumnVectorInfo vectorInfo, int pageSize,
+      CarbonColumnVector vector, DataType vectorDataType) {
+    VectorUtil vectorUtil = new VectorUtil(pageSize, vector, vectorDataType);
+    Stack<CarbonColumnVector> vectorStack = vectorInfo.getVectorStack();
+    // check and update to child vector info
+    if (vectorStack != null && vectorStack.peek() != null && vectorDataType.isComplexType()) {

Review comment:
       use Objects.nonNull for not null case and for null use Objects.isnull, pls handle in all applicable places




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kumarvishal09 commented on a change in pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto

GitBox
In reply to this post by GitBox

kumarvishal09 commented on a change in pull request #3887:
URL: https://github.com/apache/carbondata/pull/3887#discussion_r477292131



##########
File path: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveCodec.java
##########
@@ -260,4 +265,65 @@ protected String debugInfo() {
     return this.toString();
   }
 
+  public static VectorUtil checkAndUpdateToChildVector(ColumnVectorInfo vectorInfo, int pageSize,
+      CarbonColumnVector vector, DataType vectorDataType) {
+    VectorUtil vectorUtil = new VectorUtil(pageSize, vector, vectorDataType);
+    Stack<CarbonColumnVector> vectorStack = vectorInfo.getVectorStack();
+    // check and update to child vector info
+    if (vectorStack != null && vectorStack.peek() != null && vectorDataType.isComplexType()) {
+      if (DataTypes.isArrayType(vectorDataType)) {
+        List<Integer> childElementsCountForEachRow =
+            ((CarbonColumnVectorImpl) vector.getColumnVector())
+                .getNumberOfChildrenElementsInEachRow();
+        int newPageSize = 0;
+        for (int childElementsCount : childElementsCountForEachRow) {
+          newPageSize += childElementsCount;
+        }
+        vectorUtil.setPageSize(newPageSize);
+      }
+      // child vector flow, so fill the child vector
+      CarbonColumnVector childVector = vectorStack.pop();
+      vectorUtil.setVector(childVector);
+      vectorUtil.setVectorDataType(childVector.getType());
+    }
+    return vectorUtil;
+  }
+
+  // Utility class to update current vector to child vector in case of complex type handling
+  public static class VectorUtil {
+    private int pageSize;
+    private CarbonColumnVector vector;
+    private DataType vectorDataType;
+
+    private VectorUtil(int pageSize, CarbonColumnVector vector, DataType vectorDataType) {
+      this.pageSize = pageSize;
+      this.vector = vector;
+      this.vectorDataType = vectorDataType;
+    }
+
+    public int getPageSize() {

Review comment:
       we can use org.apache.commons.lang3.tuple class for returning 3 args from one method , pls check if you can use here




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kumarvishal09 commented on a change in pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto

GitBox
In reply to this post by GitBox

kumarvishal09 commented on a change in pull request #3887:
URL: https://github.com/apache/carbondata/pull/3887#discussion_r477308991



##########
File path: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/compress/DirectCompressCodec.java
##########
@@ -376,6 +456,56 @@ private void fillVector(byte[] pageData, CarbonColumnVector vector, DataType vec
           DecimalConverterFactory.DecimalConverter decimalConverter = vectorInfo.decimalConverter;
           decimalConverter.fillVector(pageData, pageSize, vectorInfo, nullBits, pageDataType);
         }
+      } else if (pageDataType == DataTypes.BYTE_ARRAY) {
+        if (vectorDataType == DataTypes.STRING || vectorDataType == DataTypes.BINARY
+            || vectorDataType == DataTypes.VARCHAR) {
+          // for complex primitive string, binary, varchar type
+          int offset = 0;
+          for (int i = 0; i < pageSize; i++) {
+            byte[] stringLen = new byte[DataTypes.INT.getSizeInBytes()];

Review comment:
       this byte array is not required, wrap page data inside byte buffer and updates  the position to get the length of the data
   By this we can avoid creating multiple bytebuffer and byte array




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kumarvishal09 commented on a change in pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto

GitBox
In reply to this post by GitBox

kumarvishal09 commented on a change in pull request #3887:
URL: https://github.com/apache/carbondata/pull/3887#discussion_r477308991



##########
File path: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/compress/DirectCompressCodec.java
##########
@@ -376,6 +456,56 @@ private void fillVector(byte[] pageData, CarbonColumnVector vector, DataType vec
           DecimalConverterFactory.DecimalConverter decimalConverter = vectorInfo.decimalConverter;
           decimalConverter.fillVector(pageData, pageSize, vectorInfo, nullBits, pageDataType);
         }
+      } else if (pageDataType == DataTypes.BYTE_ARRAY) {
+        if (vectorDataType == DataTypes.STRING || vectorDataType == DataTypes.BINARY
+            || vectorDataType == DataTypes.VARCHAR) {
+          // for complex primitive string, binary, varchar type
+          int offset = 0;
+          for (int i = 0; i < pageSize; i++) {
+            byte[] stringLen = new byte[DataTypes.INT.getSizeInBytes()];

Review comment:
       this byte array is not required, wrap page data inside byte buffer and updates  the position to get the length of the data
   By this we can avoid creating multiple bytebuffer and byte array . Pls handle all the places




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kumarvishal09 commented on a change in pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto

GitBox
In reply to this post by GitBox

kumarvishal09 commented on a change in pull request #3887:
URL: https://github.com/apache/carbondata/pull/3887#discussion_r477308991



##########
File path: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/compress/DirectCompressCodec.java
##########
@@ -376,6 +456,56 @@ private void fillVector(byte[] pageData, CarbonColumnVector vector, DataType vec
           DecimalConverterFactory.DecimalConverter decimalConverter = vectorInfo.decimalConverter;
           decimalConverter.fillVector(pageData, pageSize, vectorInfo, nullBits, pageDataType);
         }
+      } else if (pageDataType == DataTypes.BYTE_ARRAY) {
+        if (vectorDataType == DataTypes.STRING || vectorDataType == DataTypes.BINARY
+            || vectorDataType == DataTypes.VARCHAR) {
+          // for complex primitive string, binary, varchar type
+          int offset = 0;
+          for (int i = 0; i < pageSize; i++) {
+            byte[] stringLen = new byte[DataTypes.INT.getSizeInBytes()];

Review comment:
       this byte array is not required, wrap page data inside byte buffer and updates  the position to get the length of the data
   By this we can avoid creating multiple bytebuffer and byte array . Pls handle all the places
   
       ```
   ByteBuffer buffer = ByteBuffer.allocate(12);
       buffer.putInt(1);
       buffer.putInt(2);
       buffer.putInt(3);
       buffer.rewind();
       buffer.position(4);
       int anInt = buffer.getInt();
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kumarvishal09 commented on a change in pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto

GitBox
In reply to this post by GitBox

kumarvishal09 commented on a change in pull request #3887:
URL: https://github.com/apache/carbondata/pull/3887#discussion_r477313220



##########
File path: core/src/main/java/org/apache/carbondata/core/scan/executor/impl/AbstractQueryExecutor.java
##########
@@ -98,6 +98,9 @@
    */
   protected CarbonIterator queryIterator;
 
+  // Size of the ReusableDataBuffer based on the number of dimension projection columns
+  protected int reusableDimensionBufferSize = 0;

Review comment:
       by default Int value is 0, remove it




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kumarvishal09 commented on a change in pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto

GitBox
In reply to this post by GitBox

kumarvishal09 commented on a change in pull request #3887:
URL: https://github.com/apache/carbondata/pull/3887#discussion_r477313518



##########
File path: core/src/main/java/org/apache/carbondata/core/scan/result/BlockletScannedResult.java
##########
@@ -153,6 +154,9 @@
 
   private ReusableDataBuffer[] measureReusableBuffer;
 
+  // index used by dimensionReusableBuffer
+  int dimensionReusableBufferIndex = 0;

Review comment:
       private int dimensionReusableBufferIndex = 0;




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kumarvishal09 commented on a change in pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto

GitBox
In reply to this post by GitBox

kumarvishal09 commented on a change in pull request #3887:
URL: https://github.com/apache/carbondata/pull/3887#discussion_r477313518



##########
File path: core/src/main/java/org/apache/carbondata/core/scan/result/BlockletScannedResult.java
##########
@@ -153,6 +154,9 @@
 
   private ReusableDataBuffer[] measureReusableBuffer;
 
+  // index used by dimensionReusableBuffer
+  int dimensionReusableBufferIndex = 0;

Review comment:
       change private int dimensionReusableBufferIndex;




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kumarvishal09 commented on pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto

GitBox
In reply to this post by GitBox

kumarvishal09 commented on pull request #3887:
URL: https://github.com/apache/carbondata/pull/3887#issuecomment-680904506


   @ajantha-bhat By default Row filter push down is false, In case of complex type for eg: array of string number of records can be much more than 32k, so filling in one shot can be a problem because creating a big array will be an overhead.
   
   + In case of direct fill we have ResuableDataBuffer to create one big byte array/column for one blocklet processing  and reuse for all the page for a column inside, this wont be useful because number of element in case of complex type can vary, and byte array size will change every page and even creating a big byte array for one page when number of element in child is more can degrade query performance.
   
   So for complex type user must configure table page size to get the better query performance while creating the table.
   
   If number of row is page is higher we can  avoid ResuableDataBuffer it may be become overhead.
   
   Some comments:
   1. Pls refactor the code, avoid duplicate code if you can
   2. Pls add comments.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


1234567