[GitHub] incubator-carbondata pull request #635: [WIP]support SORT_COLUMNS

classic Classic list List threaded Threaded
56 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #635: [CARBONDATA-782]support SORT_COLUMNS

qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/635#discussion_r106803325
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java ---
    @@ -320,6 +320,80 @@ public static Object getDataBasedOnDataType(String data, DataType actualDataType
     
       }
     
    +  public static byte[] getBytesBasedOnDataTypeForNoDictionaryColumn(String dimensionValue,
    +      DataType actualDataType) throws Throwable {
    +    if (DataType.STRING.equals(actualDataType)) {
    --- End diff --
   
    Can you move this to switch case. Is there any reason it can't be moved to switch case?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #635: [CARBONDATA-782]support SORT_COLUMNS

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/635#discussion_r106803356
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/converter/impl/NonDictionaryFieldConverterImpl.java ---
    @@ -48,23 +46,24 @@ public NonDictionaryFieldConverterImpl(DataField dataField, String nullformat, i
         this.isEmptyBadRecord = isEmptyBadRecord;
       }
     
    -  @Override
    -  public void convert(CarbonRow row, BadRecordLogHolder logHolder) {
    +  @Override public void convert(CarbonRow row, BadRecordLogHolder logHolder) {
         String dimensionValue = row.getString(index);
         if (dimensionValue == null || dimensionValue.equals(nullformat)) {
    -      dimensionValue = CarbonCommonConstants.MEMBER_DEFAULT_VAL;
    -    }
    -    if (dataType != DataType.STRING) {
    -      if (null == DataTypeUtil.normalizeIntAndLongValues(dimensionValue, dataType)) {
    -        if ((dimensionValue.length() > 0) || (dimensionValue.length() == 0 && isEmptyBadRecord)) {
    +      row.update(CarbonCommonConstants.MEMBER_DEFAULT_VAL_ARRAY, index);
    +    } else {
    +      try {
    +        row.update(DataTypeUtil
    +            .getBytesBasedOnDataTypeForNoDictionaryColumn(dimensionValue, dataType), index);
    +      } catch (Throwable ex) {
    +        if (dimensionValue.length() != 0 || isEmptyBadRecord) {
    --- End diff --
   
    I think you are missing the check `dimensionValue.length() == 0` in case of `isEmptyBadRecord`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #635: [CARBONDATA-782]support SORT_COLUMNS

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/635#discussion_r106803680
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/sortandgroupby/sortdata/SortDataRows.java ---
    @@ -193,20 +193,42 @@ public void startSorting() throws CarbonSortKeyAndGroupByException {
           toSort = new Object[entryCount][];
           System.arraycopy(recordHolderList, 0, toSort, 0, entryCount);
     
    -      if (parameters.isUseKettle()) {
    -        if (parameters.getNoDictionaryCount() > 0) {
    -          Arrays.sort(toSort, new RowComparator(parameters.getNoDictionaryDimnesionColumn(),
    -              parameters.getNoDictionaryCount()));
    +      if (parameters.getNumberOfSortColumns() == 0) {
    --- End diff --
   
    If sort columns are none, then you are not supposed to sort it right? And also there are lot of checks it supposed to be simplified.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #635: [CARBONDATA-782]support SORT_COLUMNS

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/635#discussion_r106803684
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/sortandgroupby/sortdata/SortDataRows.java ---
    @@ -193,20 +193,42 @@ public void startSorting() throws CarbonSortKeyAndGroupByException {
           toSort = new Object[entryCount][];
           System.arraycopy(recordHolderList, 0, toSort, 0, entryCount);
     
    -      if (parameters.isUseKettle()) {
    -        if (parameters.getNoDictionaryCount() > 0) {
    -          Arrays.sort(toSort, new RowComparator(parameters.getNoDictionaryDimnesionColumn(),
    -              parameters.getNoDictionaryCount()));
    +      if (parameters.getNumberOfSortColumns() == 0) {
    --- End diff --
   
    Add some comments about the logic.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #635: [CARBONDATA-782]support SORT_COLUMNS

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/635#discussion_r106803693
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/sortandgroupby/sortdata/SortDataRows.java ---
    @@ -460,22 +482,46 @@ public DataSorterAndWriter(Object[][] recordHolderArray) {
         @Override public Void call() throws Exception {
           try {
             long startTime = System.currentTimeMillis();
    -        if (parameters.isUseKettle()) {
    -          if (parameters.getNoDictionaryCount() > 0) {
    -            Arrays.sort(recordHolderArray,
    -                new RowComparator(parameters.getNoDictionaryDimnesionColumn(),
    -                    parameters.getNoDictionaryCount()));
    +        if (parameters.getNumberOfSortColumns() == 0) {
    --- End diff --
   
    You can move this to one common method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #635: [CARBONDATA-782]support SORT_COLUMNS

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user QiangCai commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/635#discussion_r106844171
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java ---
    @@ -320,6 +320,80 @@ public static Object getDataBasedOnDataType(String data, DataType actualDataType
     
       }
     
    +  public static byte[] getBytesBasedOnDataTypeForNoDictionaryColumn(String dimensionValue,
    +      DataType actualDataType) throws Throwable {
    +    if (DataType.STRING.equals(actualDataType)) {
    --- End diff --
   
    ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #635: [CARBONDATA-782]support SORT_COLUMNS

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user QiangCai commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/635#discussion_r106844168
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java ---
    @@ -153,6 +164,21 @@ public void loadCarbonTable(TableInfo tableInfo) {
             tableInfo.getFactTable().getBucketingInfo());
       }
     
    +  private void parseSortColumns(TableSchema tableSchema) {
    +    Map<String, String> tableProperties = tableSchema.getTableProperties();
    +    if (tableProperties != null) {
    +      String sortColumnsString = tableProperties.get(CarbonCommonConstants.SORT_COLUMNS);
    +      if (sortColumnsString != null) {
    +        numberOfSortColumns = sortColumnsString.split(",").length;
    +        for (int i = 0; i < numberOfSortColumns; i++) {
    +          if (!tableSchema.getListOfColumns().get(i).hasEncoding(Encoding.DICTIONARY)) {
    +            numberOfNoDictSortColumns++;
    --- End diff --
   
    during table creation,  moved sort columns to the begin of table schema by the order of sort columns.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #635: [CARBONDATA-782]support SORT_COLUMNS

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user QiangCai commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/635#discussion_r106844177
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/converter/impl/NonDictionaryFieldConverterImpl.java ---
    @@ -48,23 +46,24 @@ public NonDictionaryFieldConverterImpl(DataField dataField, String nullformat, i
         this.isEmptyBadRecord = isEmptyBadRecord;
       }
     
    -  @Override
    -  public void convert(CarbonRow row, BadRecordLogHolder logHolder) {
    +  @Override public void convert(CarbonRow row, BadRecordLogHolder logHolder) {
         String dimensionValue = row.getString(index);
         if (dimensionValue == null || dimensionValue.equals(nullformat)) {
    -      dimensionValue = CarbonCommonConstants.MEMBER_DEFAULT_VAL;
    -    }
    -    if (dataType != DataType.STRING) {
    -      if (null == DataTypeUtil.normalizeIntAndLongValues(dimensionValue, dataType)) {
    -        if ((dimensionValue.length() > 0) || (dimensionValue.length() == 0 && isEmptyBadRecord)) {
    +      row.update(CarbonCommonConstants.MEMBER_DEFAULT_VAL_ARRAY, index);
    +    } else {
    +      try {
    +        row.update(DataTypeUtil
    +            .getBytesBasedOnDataTypeForNoDictionaryColumn(dimensionValue, dataType), index);
    +      } catch (Throwable ex) {
    +        if (dimensionValue.length() != 0 || isEmptyBadRecord) {
    --- End diff --
   
    the same with
    dimensionValue.length() != 0 || (dimensionValue.length() == 0 && isEmptyBadRecord)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #635: [CARBONDATA-782]support SORT_COLUMNS

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user QiangCai commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/635#discussion_r106844183
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/sortandgroupby/sortdata/SortDataRows.java ---
    @@ -193,20 +193,42 @@ public void startSorting() throws CarbonSortKeyAndGroupByException {
           toSort = new Object[entryCount][];
           System.arraycopy(recordHolderList, 0, toSort, 0, entryCount);
     
    -      if (parameters.isUseKettle()) {
    -        if (parameters.getNoDictionaryCount() > 0) {
    -          Arrays.sort(toSort, new RowComparator(parameters.getNoDictionaryDimnesionColumn(),
    -              parameters.getNoDictionaryCount()));
    +      if (parameters.getNumberOfSortColumns() == 0) {
    --- End diff --
   
    ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #635: [CARBONDATA-782]support SORT_COLUMNS

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user QiangCai commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/635#discussion_r106844180
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/sortandgroupby/sortdata/SortDataRows.java ---
    @@ -193,20 +193,42 @@ public void startSorting() throws CarbonSortKeyAndGroupByException {
           toSort = new Object[entryCount][];
           System.arraycopy(recordHolderList, 0, toSort, 0, entryCount);
     
    -      if (parameters.isUseKettle()) {
    -        if (parameters.getNoDictionaryCount() > 0) {
    -          Arrays.sort(toSort, new RowComparator(parameters.getNoDictionaryDimnesionColumn(),
    -              parameters.getNoDictionaryCount()));
    +      if (parameters.getNumberOfSortColumns() == 0) {
    --- End diff --
   
    If sort columns are none, will sort by all dimension, just keep old code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #635: [CARBONDATA-782]support SORT_COLUMNS

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user QiangCai commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/635#discussion_r106844188
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/sortandgroupby/sortdata/SortDataRows.java ---
    @@ -460,22 +482,46 @@ public DataSorterAndWriter(Object[][] recordHolderArray) {
         @Override public Void call() throws Exception {
           try {
             long startTime = System.currentTimeMillis();
    -        if (parameters.isUseKettle()) {
    -          if (parameters.getNoDictionaryCount() > 0) {
    -            Arrays.sort(recordHolderArray,
    -                new RowComparator(parameters.getNoDictionaryDimnesionColumn(),
    -                    parameters.getNoDictionaryCount()));
    +        if (parameters.getNumberOfSortColumns() == 0) {
    --- End diff --
   
    ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata issue #635: [CARBONDATA-782]support SORT_COLUMNS

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/635
 
    Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1232/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata issue #635: [CARBONDATA-782]support SORT_COLUMNS

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/635
 
    Build Failed  with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1235/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #635: [CARBONDATA-782]support SORT_COLUMNS

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/635#discussion_r106908141
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/sortandgroupby/sortdata/SortDataRows.java ---
    @@ -193,20 +193,42 @@ public void startSorting() throws CarbonSortKeyAndGroupByException {
           toSort = new Object[entryCount][];
           System.arraycopy(recordHolderList, 0, toSort, 0, entryCount);
     
    -      if (parameters.isUseKettle()) {
    -        if (parameters.getNoDictionaryCount() > 0) {
    -          Arrays.sort(toSort, new RowComparator(parameters.getNoDictionaryDimnesionColumn(),
    -              parameters.getNoDictionaryCount()));
    +      if (parameters.getNumberOfSortColumns() == 0) {
    --- End diff --
   
    I feel, we should not use the old flow.  If user don't specify sort_columns then we should add all dimensions to sort_columns. In this way in our core/processing always depends upon sort_columns, so that code would be clear.  We just always depends on the sort_columns in our core code, please refactor the code in that way.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata issue #635: [CARBONDATA-782]support SORT_COLUMNS

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/635
 
    Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1243/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata issue #635: [CARBONDATA-782]support SORT_COLUMNS

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/635
 
    I feel, we should not use the old flow. If user don't specify sort_columns then we should add all dimensions to sort_columns. In this way in our core/processing always depends upon sort_columns, so that code would be clear. We just always depends on the sort_columns in our core code, please refactor the code in that way.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata issue #635: [CARBONDATA-782]support SORT_COLUMNS

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/635
 
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata issue #635: [CARBONDATA-782]support SORT_COLUMNS

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/635
 
    Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1276/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata issue #635: [CARBONDATA-782]support SORT_COLUMNS

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/635
 
    Build Failed  with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1346/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata issue #635: [CARBONDATA-782]support SORT_COLUMNS

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/635
 
    Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1347/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
123