[GitHub] carbondata pull request #1632: [CARBONDATA-1839] [DataLoad]Fix bugs in compr...

classic Classic list List threaded Threaded
44 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1632: [CARBONDATA-1839] [DataLoad]Fix bugs in compressing ...

qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1632
 
    Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/686/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1632: [CARBONDATA-1839] [DataLoad]Fix bugs in compressing ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1632
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2248/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1632: [CARBONDATA-1839] [DataLoad]Fix bugs in compressing ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/1632
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1632: [CARBONDATA-1839] [DataLoad]Fix bugs in compressing ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1632
 
    Build Failed with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/694/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1632: [CARBONDATA-1839] [DataLoad]Fix bugs in compressing ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1632
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1923/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1632: [CARBONDATA-1839] [DataLoad]Fix bugs in compressing ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/1632
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1632: [CARBONDATA-1839] [DataLoad]Fix bugs in compressing ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1632
 
    Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/704/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1632: [CARBONDATA-1839] [DataLoad]Fix bugs in compressing ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1632
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1933/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1632: [CARBONDATA-1839] [DataLoad]Fix bugs in compressing ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/1632
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1632: [CARBONDATA-1839] [DataLoad]Fix bugs in compressing ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1632
 
    Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/721/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1632: [CARBONDATA-1839] [DataLoad]Fix bugs in compressing ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1632
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1949/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1632: [CARBONDATA-1839] [DataLoad]Fix bugs in compr...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1632#discussion_r156919460
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/util/NonDictionaryUtil.java ---
    @@ -108,60 +105,21 @@ public static Object getMeasure(int index, Object[] row) {
         return measures[index];
       }
     
    -  public static byte[] getByteArrayForNoDictionaryCols(Object[] row) {
    -
    -    return (byte[]) row[WriteStepRowUtil.NO_DICTIONARY_AND_COMPLEX];
    +  /**
    +   * Method to get the required non-dictionary & complex from 3-parted row
    +   * @param index
    +   * @param row
    +   * @return
    +   */
    +  public static byte[] getNonDictOrComplex(int index, Object[] row) {
    --- End diff --
   
    Rename the method to getNoDictOrComplex


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1632: [CARBONDATA-1839] [DataLoad]Fix bugs in compr...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1632#discussion_r156954293
 
    --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/load/DataLoadProcessBuilderOnSpark.scala ---
    @@ -121,17 +121,18 @@ object DataLoadProcessBuilderOnSpark {
             CarbonProperties.getInstance().getGlobalSortRddStorageLevel()))
         }
     
    +    val sortStepRowConverter: SortStepRowHandler = new SortStepRowHandler(sortParameters)
         import scala.reflect.classTag
    +
    +    // 3. sort
         val sortRDD = convertRDD
    -      .sortBy(_.getData, numPartitions = numPartitions)(RowOrdering, classTag[Array[AnyRef]])
    -      .mapPartitionsWithIndex { case (index, rows) =>
    -        DataLoadProcessorStepOnSpark.convertTo3Parts(rows, index, modelBroadcast,
    -          sortStepRowCounter)
    -      }
    +      .map(r => DataLoadProcessorStepOnSpark.convertTo3Parts(r, TaskContext.getPartitionId(),
    +        modelBroadcast, sortStepRowConverter, sortStepRowCounter))
    +      .sortBy(r => r.getData, numPartitions = numPartitions)(RowOrdering, classTag[Array[AnyRef]])
    --- End diff --
   
    @xuchuanyin ...
    This PR is for compressing sort temp files but this code modification is for data load using global sort flow which does not involve creation of sort temp files.  Can you please clarify?


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1632: [CARBONDATA-1839] [DataLoad]Fix bugs in compr...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1632#discussion_r157109850
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/util/NonDictionaryUtil.java ---
    @@ -108,60 +105,21 @@ public static Object getMeasure(int index, Object[] row) {
         return measures[index];
       }
     
    -  public static byte[] getByteArrayForNoDictionaryCols(Object[] row) {
    -
    -    return (byte[]) row[WriteStepRowUtil.NO_DICTIONARY_AND_COMPLEX];
    +  /**
    +   * Method to get the required non-dictionary & complex from 3-parted row
    +   * @param index
    +   * @param row
    +   * @return
    +   */
    +  public static byte[] getNonDictOrComplex(int index, Object[] row) {
    --- End diff --
   
    OK~


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1632: [CARBONDATA-1839] [DataLoad]Fix bugs in compr...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1632#discussion_r157112148
 
    --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/load/DataLoadProcessBuilderOnSpark.scala ---
    @@ -121,17 +121,18 @@ object DataLoadProcessBuilderOnSpark {
             CarbonProperties.getInstance().getGlobalSortRddStorageLevel()))
         }
     
    +    val sortStepRowConverter: SortStepRowHandler = new SortStepRowHandler(sortParameters)
         import scala.reflect.classTag
    +
    +    // 3. sort
         val sortRDD = convertRDD
    -      .sortBy(_.getData, numPartitions = numPartitions)(RowOrdering, classTag[Array[AnyRef]])
    -      .mapPartitionsWithIndex { case (index, rows) =>
    -        DataLoadProcessorStepOnSpark.convertTo3Parts(rows, index, modelBroadcast,
    -          sortStepRowCounter)
    -      }
    +      .map(r => DataLoadProcessorStepOnSpark.convertTo3Parts(r, TaskContext.getPartitionId(),
    +        modelBroadcast, sortStepRowConverter, sortStepRowCounter))
    +      .sortBy(r => r.getData, numPartitions = numPartitions)(RowOrdering, classTag[Array[AnyRef]])
    --- End diff --
   
    This change of code is not involved with sort temp file. I changed this because the interface and internal load procedure has been changed.
   
    After `convertRDD`, each row is still raw-row; In the sort phrase, rows will be converted to 3-parts; In the write phrase, rows will be encoded and written.
   
    In the previous implementation, Carbondata sort on these raw-rows and then convert each row to 3-parts in batch.
   
    In the current implementation, Carbondata firstly convert each row to 3-parts, and sort on these rows.
   
    While converting raw-row to 3-parts-row, the interface (DataLoadProcessorStepOnSpark.convertTo3Parts) has changed: previously deal with batch, currently deal with one row.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1632: [CARBONDATA-1839] [DataLoad]Fix bugs in compressing ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/1632
 
    @manishgupta88 review comments are resolved


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1632: [CARBONDATA-1839] [DataLoad]Fix bugs in compressing ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1632
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1984/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1632: [CARBONDATA-1839] [DataLoad]Fix bugs in compressing ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1632
 
    Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/754/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1632: [CARBONDATA-1839] [DataLoad]Fix bugs in compressing ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1632
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2304/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1632: [CARBONDATA-1839] [DataLoad]Fix bugs in compressing ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/1632
 
    @manishgupta88 @jackylk  Hi, how do you think about this PR? I raised a discussion about it and prefer to another method.
   
    Please refer to this: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discussion-Compression-for-sort-temp-files-in-Carbomdata-td31747.html
   
    OR refer to this: https://issues.apache.org/jira/browse/CARBONDATA-1839


---
123