[GitHub] carbondata pull request #1716: [CARBONDATA-1933] Support Spark 2.2.1 in carb...

classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1716: [CARBONDATA-1933] Support Spark 2.2.1 in carb...

qiuchenjian-2
GitHub user ravipesala opened a pull request:

    https://github.com/apache/carbondata/pull/1716

    [CARBONDATA-1933] Support Spark 2.2.1 in carbon partition tables

   
    Be sure to do all of the following checklist to help us incorporate
    your contribution quickly and easily:
   
     - [ ] Any interfaces changed?
     
     - [ ] Any backward compatibility impacted?
     
     - [ ] Document update required?
   
     - [ ] Testing done
            Please provide details on
            - Whether new unit test cases have been added or why no new tests are required?
            - How it is tested? Please attach test report.
            - Is it a performance related change? Please attach the performance test report.
            - Any additional information to help reviewers in testing this change.
           
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ravipesala/incubator-carbondata partition-load-issue

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/1716.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1716
   
----
commit af98a1d0fd9cb4db3aa633fce7a9e76f45779276
Author: ravipesala <ravi.pesala@...>
Date:   2017-12-22T09:17:53Z

    Support 2.2.1 spark

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1716: [CARBONDATA-1933] Support Spark 2.2.1 in carb...

qiuchenjian-2
Github user gvramana commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1716#discussion_r158468813
 
    --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CarbonScalaUtil.scala ---
    @@ -171,11 +173,23 @@ object CarbonScalaUtil {
               UTF8String.fromString(
                 DateTimeUtils.dateToString(
                   (dateFormat.parse(value).getTime / DateTimeUtils.MILLIS_PER_DAY).toInt))
    +        case ShortType => UTF8String.fromString(value.toShort.toString)
    --- End diff --
   
    Add ignore case here


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1716: [CARBONDATA-1933] Support Spark 2.2.1 in carb...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1716#discussion_r158479090
 
    --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CarbonScalaUtil.scala ---
    @@ -171,11 +173,23 @@ object CarbonScalaUtil {
               UTF8String.fromString(
                 DateTimeUtils.dateToString(
                   (dateFormat.parse(value).getTime / DateTimeUtils.MILLIS_PER_DAY).toInt))
    +        case ShortType => UTF8String.fromString(value.toShort.toString)
    --- End diff --
   
    ok


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1716: [CARBONDATA-1933] Support Spark 2.2.1 in carbon part...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1716
 
    Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1042/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1716: [CARBONDATA-1933] Support Spark 2.2.1 in carbon part...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user KanakaKumar commented on the issue:

    https://github.com/apache/carbondata/pull/1716
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1716: [CARBONDATA-1933] Support Spark 2.2.1 in carbon part...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1716
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1716: [CARBONDATA-1933] Support Spark 2.2.1 in carbon part...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1716
 
    retest sdv please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1716: [CARBONDATA-1933] Support Spark 2.2.1 in carbon part...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1716
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2277/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1716: [CARBONDATA-1933] Support Spark 2.2.1 in carbon part...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1716
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2279/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1716: [CARBONDATA-1933] Support Spark 2.2.1 in carbon part...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1716
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2525/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1716: [CARBONDATA-1933] Support Spark 2.2.1 in carbon part...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1716
 
    Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1056/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1716: [CARBONDATA-1933] Support Spark 2.2.1 in carb...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1716#discussion_r158574044
 
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonLoadDataCommand.scala ---
    @@ -564,55 +572,49 @@ case class CarbonLoadDataCommand(
                 val data = new Array[Any](len)
                 var i = 0
                 val input = value.get()
    -            while (i < input.length) {
    -              // TODO find a way to avoid double conversion of date and time.
    -              data(i) = CarbonScalaUtil.convertToUTF8String(
    -                input(i),
    -                rowDataTypes(i),
    -                timeStampFormat,
    -                dateFormat,
    -                serializationNullFormat)
    -              i = i + 1
    +            val inputLen = Math.min(input.length, len)
    +            try {
    +              while (i < inputLen) {
    +                // TODO find a way to avoid double conversion of date and time.
    +                data(i) = CarbonScalaUtil.convertToUTF8String(
    +                  input(i),
    +                  rowDataTypes(i),
    +                  timeStampFormat,
    +                  dateFormat,
    +                  serializationNullFormat,
    +                  failAction,
    +                  ignoreAction)
    +                i = i + 1
    +              }
    +              InternalRow.fromSeq(data)
    +            } catch {
    +              case e: BadRecordFoundException => throw e
    +              case e: Exception => InternalRow.empty // It is bad record ignore case
                 }
    -            InternalRow.fromSeq(data)
    -        }
    +
    +        }.filter(f => f.numFields != 0) // In bad record ignore case filter the empty values
    --- End diff --
   
    I think it is better filter it inside the map function in line 571, it can avoid creating a big InternalRow with all columns


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1716: [CARBONDATA-1933] Support Spark 2.2.1 in carb...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1716#discussion_r158574052
 
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonLoadDataCommand.scala ---
    @@ -564,55 +572,49 @@ case class CarbonLoadDataCommand(
                 val data = new Array[Any](len)
                 var i = 0
                 val input = value.get()
    -            while (i < input.length) {
    -              // TODO find a way to avoid double conversion of date and time.
    -              data(i) = CarbonScalaUtil.convertToUTF8String(
    -                input(i),
    -                rowDataTypes(i),
    -                timeStampFormat,
    -                dateFormat,
    -                serializationNullFormat)
    -              i = i + 1
    +            val inputLen = Math.min(input.length, len)
    +            try {
    +              while (i < inputLen) {
    +                // TODO find a way to avoid double conversion of date and time.
    +                data(i) = CarbonScalaUtil.convertToUTF8String(
    +                  input(i),
    +                  rowDataTypes(i),
    +                  timeStampFormat,
    +                  dateFormat,
    +                  serializationNullFormat,
    +                  failAction,
    +                  ignoreAction)
    +                i = i + 1
    +              }
    +              InternalRow.fromSeq(data)
    +            } catch {
    +              case e: BadRecordFoundException => throw e
    +              case e: Exception => InternalRow.empty // It is bad record ignore case
                 }
    -            InternalRow.fromSeq(data)
    -        }
    +
    +        }.filter(f => f.numFields != 0) // In bad record ignore case filter the empty values
    --- End diff --
   
    It is better to create a private func for that map function


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1716: [CARBONDATA-1933] Support Spark 2.2.1 in carb...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1716#discussion_r158574266
 
    --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CarbonScalaUtil.scala ---
    @@ -158,7 +159,9 @@ object CarbonScalaUtil {
           dataType: DataType,
           timeStampFormat: SimpleDateFormat,
           dateFormat: SimpleDateFormat,
    -      serializationNullFormat: String): UTF8String = {
    +      serializationNullFormat: String,
    +      failAction: Boolean,
    +      ignoreAction: Boolean): UTF8String = {
    --- End diff --
   
    It is not easy to understand why there is bad record related parameter in this conversion function, can you catch exception in caller and handler it there? There is only one place call this function, and it is better to restrict its scope by moving it there


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1716: [CARBONDATA-1933] Support Spark 2.2.1 in carb...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1716#discussion_r158575701
 
    --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CarbonScalaUtil.scala ---
    @@ -158,7 +159,9 @@ object CarbonScalaUtil {
           dataType: DataType,
           timeStampFormat: SimpleDateFormat,
           dateFormat: SimpleDateFormat,
    -      serializationNullFormat: String): UTF8String = {
    +      serializationNullFormat: String,
    +      failAction: Boolean,
    +      ignoreAction: Boolean): UTF8String = {
    --- End diff --
   
    We passed these parameters to throw an exception or ignore it in case of bad records.  It is not good to handle in caller as it is per each column, not for row so caller code will become messy if we try to handle there.
    I will give comments to the parameters for understanding,


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1716: [CARBONDATA-1933] Support Spark 2.2.1 in carb...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1716#discussion_r158575732
 
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonLoadDataCommand.scala ---
    @@ -564,55 +572,49 @@ case class CarbonLoadDataCommand(
                 val data = new Array[Any](len)
                 var i = 0
                 val input = value.get()
    -            while (i < input.length) {
    -              // TODO find a way to avoid double conversion of date and time.
    -              data(i) = CarbonScalaUtil.convertToUTF8String(
    -                input(i),
    -                rowDataTypes(i),
    -                timeStampFormat,
    -                dateFormat,
    -                serializationNullFormat)
    -              i = i + 1
    +            val inputLen = Math.min(input.length, len)
    +            try {
    +              while (i < inputLen) {
    +                // TODO find a way to avoid double conversion of date and time.
    +                data(i) = CarbonScalaUtil.convertToUTF8String(
    +                  input(i),
    +                  rowDataTypes(i),
    +                  timeStampFormat,
    +                  dateFormat,
    +                  serializationNullFormat,
    +                  failAction,
    +                  ignoreAction)
    +                i = i + 1
    +              }
    +              InternalRow.fromSeq(data)
    +            } catch {
    +              case e: BadRecordFoundException => throw e
    +              case e: Exception => InternalRow.empty // It is bad record ignore case
                 }
    -            InternalRow.fromSeq(data)
    -        }
    +
    +        }.filter(f => f.numFields != 0) // In bad record ignore case filter the empty values
    --- End diff --
   
    I am not getting how to filter inside map function. I don't think we can filter some rows inside map function unless we apply filter function.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1716: [CARBONDATA-1933] Support Spark 2.2.1 in carb...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1716#discussion_r158575738
 
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonLoadDataCommand.scala ---
    @@ -564,55 +572,49 @@ case class CarbonLoadDataCommand(
                 val data = new Array[Any](len)
                 var i = 0
                 val input = value.get()
    -            while (i < input.length) {
    -              // TODO find a way to avoid double conversion of date and time.
    -              data(i) = CarbonScalaUtil.convertToUTF8String(
    -                input(i),
    -                rowDataTypes(i),
    -                timeStampFormat,
    -                dateFormat,
    -                serializationNullFormat)
    -              i = i + 1
    +            val inputLen = Math.min(input.length, len)
    +            try {
    +              while (i < inputLen) {
    +                // TODO find a way to avoid double conversion of date and time.
    +                data(i) = CarbonScalaUtil.convertToUTF8String(
    +                  input(i),
    +                  rowDataTypes(i),
    +                  timeStampFormat,
    +                  dateFormat,
    +                  serializationNullFormat,
    +                  failAction,
    +                  ignoreAction)
    +                i = i + 1
    +              }
    +              InternalRow.fromSeq(data)
    +            } catch {
    +              case e: BadRecordFoundException => throw e
    +              case e: Exception => InternalRow.empty // It is bad record ignore case
                 }
    -            InternalRow.fromSeq(data)
    -        }
    +
    +        }.filter(f => f.numFields != 0) // In bad record ignore case filter the empty values
    --- End diff --
   
    I will move rdd creation to another private function.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1716: [CARBONDATA-1933] Support Spark 2.2.1 in carbon part...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1716
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2293/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1716: [CARBONDATA-1933] Support Spark 2.2.1 in carbon part...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1716
 
    Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1076/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1716: [CARBONDATA-1933] Support Spark 2.2.1 in carbon part...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1716
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2537/



---
12