[GitHub] carbondata pull request #2396: [WIP] Complex data type enhancements

classic Classic list List threaded Threaded
103 messages Options
123456
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2396: [CARBONDATA-2606] [Complex DataType Enhancements] Pr...

qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2396
 
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5311/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2396#discussion_r197470593
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java ---
    @@ -665,6 +665,34 @@ public CarbonDimension getDimensionByName(String tableName, String columnName) {
           if (dim.getColName().equalsIgnoreCase(columnName)) {
             carbonDimension = dim;
             break;
    +      } else if (dim.getListOfChildDimensions() != null) {
    --- End diff --
   
    Try to simplify as follow
    ```
    /**
       * to get particular dimension from a table
       *
       * @param tableName
       * @param columnName [col1.col12.col123]
       * @return
       */
      public CarbonDimension getDimensionByName(String tableName, String columnName) {
        CarbonDimension carbonDimension = null;
        List<CarbonDimension> dimList = tableDimensionsMap.get(tableName);
        String[] colsplits = columnName.split("\\.");
        String tempColName = colsplits[0];
        for (String colsplit : colsplits) {
          carbonDimension = getCarbonDimension(tempColName, dimList);
          if (carbonDimension.getListOfChildDimensions() != null) {
            tempColName = tempColName + "." + colsplit;
            dimList = carbonDimension.getListOfChildDimensions();
          }
        }
        List<CarbonDimension> implicitDimList = tableImplicitDimensionsMap.get(tableName);
        if (carbonDimension == null) {
          carbonDimension = getCarbonDimension(columnName, implicitDimList);
        }
        return carbonDimension;
      }
   
   
      private CarbonDimension getCarbonDimension(String columnName, List<CarbonDimension> dimensions) {
        CarbonDimension carbonDimension = null;
        for (CarbonDimension dim : dimensions) {
          if (dim.getColName().equalsIgnoreCase(columnName)) {
            carbonDimension = dim;
            break;
          }
        }
        return carbonDimension;
      }
    ```


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2396#discussion_r197472622
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/column/CarbonDimension.java ---
    @@ -51,6 +51,16 @@
        */
       private int complexTypeOrdinal;
     
    +  /**
    +   * to store the Ordinal of the Complex Parent Column.
    +   */
    +  private int parentOrdinal = -1;
    --- End diff --
   
    Please remove it as we can get the information from `complexParentDimension`


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2396: [CARBONDATA-2606] [Complex DataType Enhancements] Pr...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2396
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5395/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2396#discussion_r197473335
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedResultCollector.java ---
    @@ -19,6 +19,7 @@
     import java.nio.ByteBuffer;
     import java.util.ArrayList;
     import java.util.Arrays;
    +import java.util.HashMap;
    --- End diff --
   
    Please check and handle vectorCollector also if required.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2396: [CARBONDATA-2606] [Complex DataType Enhancements] Pr...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2396
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6480/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2396#discussion_r197482635
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedResultCollector.java ---
    @@ -27,6 +28,7 @@
     import org.apache.carbondata.core.keygenerator.directdictionary.DirectDictionaryKeyGeneratorFactory;
     import org.apache.carbondata.core.metadata.datatype.DataTypes;
     import org.apache.carbondata.core.metadata.encoder.Encoding;
    +import org.apache.carbondata.core.metadata.schema.table.column.CarbonDimension;
    --- End diff --
   
    To handle dictionary complex type, try to place this code in ColumnPageWrapper
    ```
    @Override public int fillSurrogateKey(int rowId, int chunkIndex, int[] outputSurrogateKey,
          KeyStructureInfo restructuringInfo) {
        byte[] rowBytes = columnPage.getBytes(rowId);
   
        outputSurrogateKey[chunkIndex] = CarbonUtil.getSurrogateInternal(rowBytes, 0, rowBytes.length);
        return chunkIndex + 1;
      }
    ```


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2396#discussion_r197483516
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedResultCollector.java ---
    @@ -67,17 +69,38 @@
       int noDictionaryColumnIndex;
       int complexTypeColumnIndex;
     
    +  int noDictionaryComplexColumnIndex = 0;
    --- End diff --
   
    All these class variables not needed, move to method level


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2396#discussion_r197486428
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedResultCollector.java ---
    @@ -119,9 +143,38 @@ public DictionaryBasedResultCollector(BlockExecutionInfo blockExecutionInfos) {
         return listBasedResult;
       }
     
    +  private void fillComplexColumnDataBuffer() {
    +
    +    //    dictionaryComplexColumnIndex = 0;
    +    noDictionaryComplexColumnIndex = 0;
    +    complexTypeComplexColumnIndex = 0;
    +    for (int i = 0; i < queryDimensions.length; i++) {
    +      if (queryDimensions[i].getDimension().getParentOrdinal() != -1) {
    --- End diff --
   
    Take an array and fill the ordinals in a constructor


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2396#discussion_r197488339
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedResultCollector.java ---
    @@ -67,17 +69,38 @@
       int noDictionaryColumnIndex;
       int complexTypeColumnIndex;
     
    +  int noDictionaryComplexColumnIndex = 0;
    +  int complexTypeComplexColumnIndex = 0;
    +
       boolean isDimensionExists;
     
    +  private int[] surrogateResult;
    +  private byte[][] noDictionaryKeys;
    +  private byte[][] complexTypeKeyArray;
    +
       protected Map<Integer, GenericQueryType> comlexDimensionInfoMap;
     
    +  /**
    +   * Field of this Map is the parent Column and associated child columns.
    +   * Final Projection shuld be a merged list consist of only parents.
    +   */
    +  public Map<Integer, List<Integer>> mergedComplexDimensionColumns;
    +
    +  /**
    +   * Fields of this Map of Parent Ordinal with the List is the Child Column Dimension and
    +   * the corresponding data buffer of that column.
    +   */
    +
    +  public Map<Integer, Map<CarbonDimension, ByteBuffer>> mergedComplexDimensionDataMap;
    --- End diff --
   
    Move this also to method and create only once and reuse the same. If it is possible please use multidimensional array


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2396#discussion_r197488819
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedResultCollector.java ---
    @@ -136,28 +189,81 @@ void fillDimensionData(BlockletScannedResult scannedResult, int[] surrogateResul
             }
           } else if (complexDataTypeArray[i]) {
             // Complex Type With No Dictionary Encoding.
    -        row[order[i]] = comlexDimensionInfoMap.get(queryDimensions[i].getDimension().getOrdinal())
    -            .getDataBasedOnDataType(ByteBuffer.wrap(complexTypeKeyArray[complexTypeColumnIndex++]));
    +        if (queryDimensions[i].getDimension().getParentOrdinal() != -1) {
    +          if (mergedComplexDimensionColumns
    +              .get(queryDimensions[i].getDimension().getParentOrdinal()).size() > 1) {
    +            fillRowForComplexColumn(complexDimensionInfoMap, row, i);
    +          } else {
    +            row[order[i]] =
    +                complexDimensionInfoMap.get(queryDimensions[i].getDimension().getParentOrdinal())
    +                    .getDataBasedOnColumn(
    +                        ByteBuffer.wrap(complexTypeKeyArray[complexTypeColumnIndex++]),
    +                        queryDimensions[i].getDimension().getComplexParentDimension(),
    +                        queryDimensions[i].getDimension());
    +          }
    +        } else {
    +          row[order[i]] =
    +              complexDimensionInfoMap.get(queryDimensions[i].getDimension().getOrdinal())
    +                  .getDataBasedOnDataType(
    +                      ByteBuffer.wrap(complexTypeKeyArray[complexTypeColumnIndex++]));
    +        }
           } else {
    -        row[order[i]] = DataTypeUtil.getDataBasedOnDataTypeForNoDictionaryColumn(
    -            noDictionaryKeys[noDictionaryColumnIndex++],
    -            queryDimensions[i].getDimension().getDataType());
    +        if (queryDimensions[i].getDimension().getParentOrdinal() != -1) {
    --- End diff --
   
    Move this to new method and add a comment


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2396#discussion_r197488859
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedResultCollector.java ---
    @@ -136,28 +189,81 @@ void fillDimensionData(BlockletScannedResult scannedResult, int[] surrogateResul
             }
           } else if (complexDataTypeArray[i]) {
             // Complex Type With No Dictionary Encoding.
    -        row[order[i]] = comlexDimensionInfoMap.get(queryDimensions[i].getDimension().getOrdinal())
    -            .getDataBasedOnDataType(ByteBuffer.wrap(complexTypeKeyArray[complexTypeColumnIndex++]));
    +        if (queryDimensions[i].getDimension().getParentOrdinal() != -1) {
    --- End diff --
   
    Move this to new method and add a comment


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2396#discussion_r197491159
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/complextypes/PrimitiveQueryType.java ---
    @@ -146,4 +148,58 @@ public PrimitiveQueryType(String name, String parentname, int blockIndex,
         return actualData;
       }
     
    +  @Override public Object getDataBasedOnColumn(ByteBuffer dataBuffer, CarbonDimension parent,
    +      CarbonDimension child) {
    +    Object actualData = null;
    +
    +    if (parent.getOrdinal() != child.getOrdinal() || null == dataBuffer) {
    +      return null;
    +    }
    +
    +    if (isDirectDictionary) {
    --- End diff --
   
    Remove the duplicate code, code is almost similar to `getDataBasedOnDataType`


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2396#discussion_r197492221
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/complextypes/StructQueryType.java ---
    @@ -109,4 +111,53 @@ public StructQueryType(String name, String parentname, int blockIndex) {
         }
         return DataTypeUtil.getDataTypeConverter().wrapWithGenericRow(fields);
       }
    +
    +  @Override public Object getDataBasedOnColumn(ByteBuffer dataBuffer, CarbonDimension parent,
    +      CarbonDimension child) {
    +    int childLength;
    +    if (parent.getOrdinal() < child.getOrdinal()) {
    +      childLength = parent.getNumberOfChild();
    +      Object[] fields = new Object[childLength];
    +      for (int i = 0; i < childLength; i++) {
    +        fields[i] = children.get(i)
    +            .getDataBasedOnColumn(dataBuffer, parent.getListOfChildDimensions().get(i), child);
    +      }
    +      return DataTypeUtil.getDataTypeConverter().wrapWithGenericRow(fields);
    +    } else if (parent.getOrdinal() > child.getOrdinal()) {
    +      return null;
    +    }
    +    else {
    --- End diff --
   
    correct the style


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2396#discussion_r197496409
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/executor/util/RestructureUtil.java ---
    @@ -156,8 +162,70 @@ private static boolean isColumnMatches(boolean isTransactionalTable,
         // If it is non transactional table just check the column names, no need to validate
         // column id as multiple sdk's output placed in a single folder doesn't have same
         // column ID but can have same column name
    -    return (tableColumn.getColumnId().equals(queryColumn.getColumnId()) ||
    -        (!isTransactionalTable && tableColumn.getColName().equals(queryColumn.getColName())));
    +    if (tableColumn.getDataType().isComplexType() && !(tableColumn.getDataType().getId()
    +        == DataTypes.ARRAY_TYPE_ID)) {
    +      if (tableColumn.getColumnId().equals(queryColumn.getColumnId())) {
    +        return true;
    +      } else {
    +        return isColumnMatchesStruct(tableColumn, queryColumn);
    +      }
    +    } else {
    +      return (tableColumn.getColumnId().equals(queryColumn.getColumnId()) || (!isTransactionalTable
    +          && tableColumn.getColName().equals(queryColumn.getColName())));
    +    }
    +  }
    +
    +  /**
    +   * In case of Multilevel Complex column - STRUCT/STRUCTofSTRUCT, traverse all the child dimension
    +   * to check column Id
    +   *
    +   * @param tableColumn
    +   * @param queryColumn
    +   * @return
    +   */
    +  private static boolean isColumnMatchesStruct(CarbonColumn tableColumn, CarbonColumn queryColumn) {
    +    if (tableColumn instanceof CarbonDimension) {
    +      CarbonDimension dimension = (CarbonDimension) tableColumn;
    +      if (dimension.getColName().equalsIgnoreCase(queryColumn.getColName().split("\\.")[0])) {
    +        if (dimension.getListOfChildDimensions() != null) {
    +          CarbonColumn columnMatchesStructlevel =
    +              isColumnMatchesStructlevel(tableColumn, dimension.getListOfChildDimensions(),
    +                  queryColumn);
    +          if (null != columnMatchesStructlevel) {
    +            return true;
    +          } else {
    +            return false;
    +          }
    +        }
    +      } else {
    +        return (tableColumn.getColumnId().equals(queryColumn.getColumnId()));
    +      }
    +    }
    +    return false;
    +  }
    +
    +  private static CarbonColumn isColumnMatchesStructlevel(CarbonColumn tableColumn,
    --- End diff --
   
    CHange as per CarbonTable suggested code


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2396#discussion_r197499107
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/model/QueryModelBuilder.java ---
    @@ -54,18 +58,134 @@ public QueryModelBuilder projectColumns(String[] projectionColumns) {
           } else {
             CarbonMeasure measure = table.getMeasureByName(factTableName, projectionColumnName);
             if (measure == null) {
    -          throw new RuntimeException(projectionColumnName +
    -              " column not found in the table " + factTableName);
    +          throw new RuntimeException(
    +              projectionColumnName + " column not found in the table " + factTableName);
             }
             projection.addMeasure(measure, i);
             i++;
           }
         }
    -
    +    optimizeProjectionForComplexColumns(projection);
         this.projection = projection;
         return this;
       }
     
    +  private void optimizeProjectionForComplexColumns(QueryProjection projection) {
    +    // Get the List of Complex Column Projection.
    +    // The optimization techniques which can be applied are
    +    // A. Merging in Driver Side
    +    // B. Merging in the result Collector side.
    +    // Merging is driver side cases are
    +    // Driver merging will eliminate one of the CarbonDimension.
    +    // Executor merging will merge the column output in Result Collector.
    +    // In this routine we are going to do driver merging and leave executor merging.
    +    Map<Integer, List<Integer>> complexColumnMap = new HashMap<>();
    +    List<ProjectionDimension> carbonDimensions = projection.getDimensions();
    +    for (ProjectionDimension cols : carbonDimensions) {
    +      // get all the Projections with Parent Ordinal Set.
    +      if (cols.getDimension().getParentOrdinal() != -1) {
    +        if (complexColumnMap.get(cols.getDimension().getParentOrdinal()) != null) {
    +          List<Integer> childColumns = complexColumnMap.get(cols.getDimension().getParentOrdinal());
    +          childColumns.add(cols.getDimension().getOrdinal());
    +          complexColumnMap.put(cols.getDimension().getParentOrdinal(), childColumns);
    +        } else {
    +          List<Integer> childColumns = new ArrayList<>();
    +          childColumns.add(cols.getDimension().getOrdinal());
    +          complexColumnMap.put(cols.getDimension().getParentOrdinal(), childColumns);
    +        }
    +      }
    +    }
    +
    +    // Traverse the Map to Find any columns are parent.
    +    for (Map.Entry<Integer, List<Integer>> entry : complexColumnMap.entrySet()) {
    +      List<Integer> childOrdinals = entry.getValue();
    +      if (childOrdinals.size() > 1) {
    +        // In case of more that one child, have to check if the child columns are in the same path
    +        // and have a common parent.
    +        Collections.sort(childOrdinals);
    +        List<Integer> mergedOrdinals = mergeChildColumns(childOrdinals, entry.getKey());
    +        if (mergedOrdinals.size() > 0) {
    +          projection = removeDimension(projection, mergedOrdinals);
    +        }
    +      }
    +    }
    +  }
    +
    +  private QueryProjection removeDimension(QueryProjection projection,
    +      List<Integer> mergedOrdinals) {
    +    List<ProjectionDimension> carbonDimensions = projection.getDimensions();
    +    QueryProjection outputProjection = new QueryProjection();
    +    int i = 0;
    +    for (ProjectionDimension cols : carbonDimensions) {
    +      if (!mergedOrdinals.contains(cols.getDimension().getOrdinal())) {
    +        outputProjection.addDimension(cols.getDimension(), i++);
    +      }
    +    }
    +    List<ProjectionMeasure> carbonMeasures = projection.getMeasures();
    +    for (ProjectionMeasure cols : carbonMeasures) {
    +      outputProjection.addMeasure(cols.getMeasure(), i++);
    +    }
    +    return outputProjection;
    +  }
    +
    +  private List<Integer> mergeChildColumns(List<Integer> childOrdinals, Integer key) {
    --- End diff --
   
    remove unnecessary key


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2396#discussion_r197501384
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/model/QueryModelBuilder.java ---
    @@ -54,18 +58,134 @@ public QueryModelBuilder projectColumns(String[] projectionColumns) {
           } else {
             CarbonMeasure measure = table.getMeasureByName(factTableName, projectionColumnName);
             if (measure == null) {
    -          throw new RuntimeException(projectionColumnName +
    -              " column not found in the table " + factTableName);
    +          throw new RuntimeException(
    +              projectionColumnName + " column not found in the table " + factTableName);
             }
             projection.addMeasure(measure, i);
             i++;
           }
         }
    -
    +    optimizeProjectionForComplexColumns(projection);
         this.projection = projection;
         return this;
       }
     
    +  private void optimizeProjectionForComplexColumns(QueryProjection projection) {
    +    // Get the List of Complex Column Projection.
    +    // The optimization techniques which can be applied are
    +    // A. Merging in Driver Side
    +    // B. Merging in the result Collector side.
    +    // Merging is driver side cases are
    +    // Driver merging will eliminate one of the CarbonDimension.
    +    // Executor merging will merge the column output in Result Collector.
    +    // In this routine we are going to do driver merging and leave executor merging.
    +    Map<Integer, List<Integer>> complexColumnMap = new HashMap<>();
    +    List<ProjectionDimension> carbonDimensions = projection.getDimensions();
    +    for (ProjectionDimension cols : carbonDimensions) {
    +      // get all the Projections with Parent Ordinal Set.
    +      if (cols.getDimension().getParentOrdinal() != -1) {
    +        if (complexColumnMap.get(cols.getDimension().getParentOrdinal()) != null) {
    +          List<Integer> childColumns = complexColumnMap.get(cols.getDimension().getParentOrdinal());
    +          childColumns.add(cols.getDimension().getOrdinal());
    +          complexColumnMap.put(cols.getDimension().getParentOrdinal(), childColumns);
    +        } else {
    +          List<Integer> childColumns = new ArrayList<>();
    +          childColumns.add(cols.getDimension().getOrdinal());
    +          complexColumnMap.put(cols.getDimension().getParentOrdinal(), childColumns);
    +        }
    +      }
    +    }
    +
    +    // Traverse the Map to Find any columns are parent.
    +    for (Map.Entry<Integer, List<Integer>> entry : complexColumnMap.entrySet()) {
    +      List<Integer> childOrdinals = entry.getValue();
    +      if (childOrdinals.size() > 1) {
    +        // In case of more that one child, have to check if the child columns are in the same path
    +        // and have a common parent.
    +        Collections.sort(childOrdinals);
    +        List<Integer> mergedOrdinals = mergeChildColumns(childOrdinals, entry.getKey());
    +        if (mergedOrdinals.size() > 0) {
    +          projection = removeDimension(projection, mergedOrdinals);
    +        }
    +      }
    +    }
    +  }
    +
    +  private QueryProjection removeDimension(QueryProjection projection,
    +      List<Integer> mergedOrdinals) {
    +    List<ProjectionDimension> carbonDimensions = projection.getDimensions();
    +    QueryProjection outputProjection = new QueryProjection();
    +    int i = 0;
    +    for (ProjectionDimension cols : carbonDimensions) {
    +      if (!mergedOrdinals.contains(cols.getDimension().getOrdinal())) {
    +        outputProjection.addDimension(cols.getDimension(), i++);
    +      }
    +    }
    +    List<ProjectionMeasure> carbonMeasures = projection.getMeasures();
    +    for (ProjectionMeasure cols : carbonMeasures) {
    +      outputProjection.addMeasure(cols.getMeasure(), i++);
    +    }
    +    return outputProjection;
    +  }
    +
    +  private List<Integer> mergeChildColumns(List<Integer> childOrdinals, Integer key) {
    +    // Check If children if they are in the path of not.
    +    List<Integer> mergedChild = new ArrayList<>();
    +
    +    for (int i = 0; i < childOrdinals.size(); i++) {
    +      for (int j = i; j < childOrdinals.size(); j++) {
    +        if (!mergedChild.contains(childOrdinals.get(j)) && checkChildsInSamePath(
    +            childOrdinals.get(i), childOrdinals.get(j))) {
    +          mergedChild.add(j);
    +        }
    +      }
    +    }
    +    return mergedChild;
    +  }
    +
    +  private boolean checkChildsInSamePath(Integer parentOrdinal, Integer childOrdinal) {
    +    List<CarbonDimension> dimList = table.getDimensions();
    +    CarbonDimension parentDimension = getDimensionBasedOnOrdinal(dimList, parentOrdinal);
    +    CarbonDimension childDimension = getDimensionBasedOnOrdinal(dimList, childOrdinal);
    +    if (checkForChildColumns(parentDimension, childDimension)) {
    +      return true;
    +    } else {
    +      return false;
    +    }
    +  }
    +
    +  private boolean checkForChildColumns(CarbonDimension parentDimension,
    --- End diff --
   
    Better try to use dimension oridinal range to decide the childs to merge.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2396#discussion_r197501751
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java ---
    @@ -417,6 +417,44 @@ public static Object getDataDataTypeForNoDictionaryColumn(String dimensionValue,
         }
       }
     
    +  /**
    +   * Returns true for fixed length DataTypes.
    +   * @param dataType
    +   * @return
    +   */
    +  public static boolean isFixedSizeDataType(DataType dataType) {
    +    if (dataType == DataTypes.STRING || DataTypes.isDecimal(dataType)) {
    +      return false;
    +    } else {
    +      return true;
    +    }
    +  }
    +  /**
    +   * get the size of fixed size DataType
    +   * @param actualDataType
    +   * @return
    +   */
    +  public static int getSizeOfDataType(DataType actualDataType) {
    --- End diff --
   
    Get the size in bytes directly from DataType


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2396#discussion_r197502251
 
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/CarbonDatasourceHadoopRelation.scala ---
    @@ -67,14 +68,68 @@ case class CarbonDatasourceHadoopRelation(
       override def schema: StructType = tableSchema.getOrElse(carbonRelation.schema)
     
       def buildScan(requiredColumns: Array[String],
    +      projects: Seq[NamedExpression],
           filters: Array[Filter],
           partitions: Seq[PartitionSpec]): RDD[InternalRow] = {
         val filterExpression: Option[Expression] = filters.flatMap { filter =>
           CarbonFilters.createCarbonFilter(schema, filter)
         }.reduceOption(new AndExpression(_, _))
     
    +    var parentColumns = new ListBuffer[String]
    +    // In case of Struct or StructofStruct Complex type, get the project column for given
    +    // parent/child field and pushdown the corresponding project column. In case of Array,
    +    // ArrayofStruct or StructofArray, pushdown parent column
    +    var reqColumns = projects.map {
    +      case a@Alias(s: GetStructField, name) =>
    +        val arrayTypeExists = s.childSchema.map(x => x.dataType)
    +          .filter(dataType => dataType.isInstanceOf[ArrayType])
    +        if (0 == arrayTypeExists.length) {
    +          val columnName = s.toString().replaceAll("#[0-9]*", "")
    +          parentColumns += columnName.split("\\.")(0)
    +          columnName
    +        }
    +        else {
    +          None
    +        }
    +      case a@Alias(s: GetArrayItem, name) =>
    +        None
    +      case other => other.name.replaceAll("#[0-9]*", "")
    +    }
    +
    +    var reqCols = reqColumns.filterNot(none => none.equals(None)).map(col => col.toString)
    +    parentColumns = parentColumns.distinct
    +    reqCols = reqCols.distinct
    +
    +    // if the parent column is there in the projection list then we can filter out all the children
    +    // in that projection list
    +    val parentColumnOnProjectionList = reqCols.filter(col => parentColumns.contains(col))
    --- End diff --
   
    Remove merge logic here


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2396#discussion_r197502370
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/complextypes/ArrayQueryType.java ---
    @@ -97,4 +99,56 @@ public void parseBlocksAndReturnComplexColumnByteArray(DimensionRawColumnChunk[]
         return DataTypeUtil.getDataTypeConverter().wrapWithGenericArrayData(data);
       }
     
    +  @Override public Object getDataBasedOnColumn(ByteBuffer dataBuffer, CarbonDimension parent,
    --- End diff --
   
    Don't implement it, throw unsupported exception


---
123456