Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #1716: [CARBONDATA-1933] Support Spark 2.2.1 in carb...

Classic

List

22 messages Options

Options

12

[GitHub] carbondata pull request #1716: [CARBONDATA-1933] Support Spark 2.2.1 in carb...

GitHub user ravipesala opened a pull request:

https://github.com/apache/carbondata/pull/1716

[CARBONDATA-1933] Support Spark 2.2.1 in carbon partition tables

Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:

- [ ] Any interfaces changed?

- [ ] Any backward compatibility impacted?

- [ ] Document update required?

- [ ] Testing done
Please provide details on
- Whether new unit test cases have been added or why no new tests are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance test report.
- Any additional information to help reviewers in testing this change.

- [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ravipesala/incubator-carbondata partition-load-issue

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/1716.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1716

----
commit af98a1d0fd9cb4db3aa633fce7a9e76f45779276
Author: ravipesala <ravi.pesala@...>
Date: 2017-12-22T09:17:53Z

Support 2.2.1 spark

----

---

[GitHub] carbondata pull request #1716: [CARBONDATA-1933] Support Spark 2.2.1 in carb...

Github user gvramana commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1716#discussion_r158468813

--- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CarbonScalaUtil.scala ---
@@ -171,11 +173,23 @@ object CarbonScalaUtil {
UTF8String.fromString(
DateTimeUtils.dateToString(
(dateFormat.parse(value).getTime / DateTimeUtils.MILLIS_PER_DAY).toInt))
+ case ShortType => UTF8String.fromString(value.toShort.toString)
--- End diff --

Add ignore case here

---

[GitHub] carbondata pull request #1716: [CARBONDATA-1933] Support Spark 2.2.1 in carb...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1716#discussion_r158479090

--- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CarbonScalaUtil.scala ---
@@ -171,11 +173,23 @@ object CarbonScalaUtil {
UTF8String.fromString(
DateTimeUtils.dateToString(
(dateFormat.parse(value).getTime / DateTimeUtils.MILLIS_PER_DAY).toInt))
+ case ShortType => UTF8String.fromString(value.toShort.toString)
--- End diff --

ok

---

[GitHub] carbondata issue #1716: [CARBONDATA-1933] Support Spark 2.2.1 in carbon part...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1716

Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1042/

---

[GitHub] carbondata issue #1716: [CARBONDATA-1933] Support Spark 2.2.1 in carbon part...

In reply to this post by qiuchenjian-2

Github user KanakaKumar commented on the issue:

https://github.com/apache/carbondata/pull/1716

retest this please

---

[GitHub] carbondata issue #1716: [CARBONDATA-1933] Support Spark 2.2.1 in carbon part...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1716

retest this please

---

[GitHub] carbondata issue #1716: [CARBONDATA-1933] Support Spark 2.2.1 in carbon part...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1716

retest sdv please

---

[GitHub] carbondata issue #1716: [CARBONDATA-1933] Support Spark 2.2.1 in carbon part...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1716

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2277/

---

[GitHub] carbondata issue #1716: [CARBONDATA-1933] Support Spark 2.2.1 in carbon part...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1716

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2279/

---

[GitHub] carbondata issue #1716: [CARBONDATA-1933] Support Spark 2.2.1 in carbon part...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1716

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2525/

---

[GitHub] carbondata issue #1716: [CARBONDATA-1933] Support Spark 2.2.1 in carbon part...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1716

Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1056/

---

[GitHub] carbondata pull request #1716: [CARBONDATA-1933] Support Spark 2.2.1 in carb...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1716#discussion_r158574044

--- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonLoadDataCommand.scala ---
@@ -564,55 +572,49 @@ case class CarbonLoadDataCommand(
val data = new Array[Any](len)
var i = 0
val input = value.get()
- while (i < input.length) {
- // TODO find a way to avoid double conversion of date and time.
- data(i) = CarbonScalaUtil.convertToUTF8String(
- input(i),
- rowDataTypes(i),
- timeStampFormat,
- dateFormat,
- serializationNullFormat)
- i = i + 1
+ val inputLen = Math.min(input.length, len)
+ try {
+ while (i < inputLen) {
+ // TODO find a way to avoid double conversion of date and time.
+ data(i) = CarbonScalaUtil.convertToUTF8String(
+ input(i),
+ rowDataTypes(i),
+ timeStampFormat,
+ dateFormat,
+ serializationNullFormat,
+ failAction,
+ ignoreAction)
+ i = i + 1
+ }
+ InternalRow.fromSeq(data)
+ } catch {
+ case e: BadRecordFoundException => throw e
+ case e: Exception => InternalRow.empty // It is bad record ignore case
}
- InternalRow.fromSeq(data)
- }
+
+ }.filter(f => f.numFields != 0) // In bad record ignore case filter the empty values
--- End diff --

I think it is better filter it inside the map function in line 571, it can avoid creating a big InternalRow with all columns

---

[GitHub] carbondata pull request #1716: [CARBONDATA-1933] Support Spark 2.2.1 in carb...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1716#discussion_r158574052

--- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonLoadDataCommand.scala ---
@@ -564,55 +572,49 @@ case class CarbonLoadDataCommand(
val data = new Array[Any](len)
var i = 0
val input = value.get()
- while (i < input.length) {
- // TODO find a way to avoid double conversion of date and time.
- data(i) = CarbonScalaUtil.convertToUTF8String(
- input(i),
- rowDataTypes(i),
- timeStampFormat,
- dateFormat,
- serializationNullFormat)
- i = i + 1
+ val inputLen = Math.min(input.length, len)
+ try {
+ while (i < inputLen) {
+ // TODO find a way to avoid double conversion of date and time.
+ data(i) = CarbonScalaUtil.convertToUTF8String(
+ input(i),
+ rowDataTypes(i),
+ timeStampFormat,
+ dateFormat,
+ serializationNullFormat,
+ failAction,
+ ignoreAction)
+ i = i + 1
+ }
+ InternalRow.fromSeq(data)
+ } catch {
+ case e: BadRecordFoundException => throw e
+ case e: Exception => InternalRow.empty // It is bad record ignore case
}
- InternalRow.fromSeq(data)
- }
+
+ }.filter(f => f.numFields != 0) // In bad record ignore case filter the empty values
--- End diff --

It is better to create a private func for that map function

---

[GitHub] carbondata pull request #1716: [CARBONDATA-1933] Support Spark 2.2.1 in carb...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1716#discussion_r158574266

--- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CarbonScalaUtil.scala ---
@@ -158,7 +159,9 @@ object CarbonScalaUtil {
dataType: DataType,
timeStampFormat: SimpleDateFormat,
dateFormat: SimpleDateFormat,
- serializationNullFormat: String): UTF8String = {
+ serializationNullFormat: String,
+ failAction: Boolean,
+ ignoreAction: Boolean): UTF8String = {
--- End diff --

It is not easy to understand why there is bad record related parameter in this conversion function, can you catch exception in caller and handler it there? There is only one place call this function, and it is better to restrict its scope by moving it there

---

[GitHub] carbondata pull request #1716: [CARBONDATA-1933] Support Spark 2.2.1 in carb...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1716#discussion_r158575701

--- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CarbonScalaUtil.scala ---
@@ -158,7 +159,9 @@ object CarbonScalaUtil {
dataType: DataType,
timeStampFormat: SimpleDateFormat,
dateFormat: SimpleDateFormat,
- serializationNullFormat: String): UTF8String = {
+ serializationNullFormat: String,
+ failAction: Boolean,
+ ignoreAction: Boolean): UTF8String = {
--- End diff --

We passed these parameters to throw an exception or ignore it in case of bad records. It is not good to handle in caller as it is per each column, not for row so caller code will become messy if we try to handle there.
I will give comments to the parameters for understanding,

---

[GitHub] carbondata pull request #1716: [CARBONDATA-1933] Support Spark 2.2.1 in carb...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1716#discussion_r158575732

--- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonLoadDataCommand.scala ---
@@ -564,55 +572,49 @@ case class CarbonLoadDataCommand(
val data = new Array[Any](len)
var i = 0
val input = value.get()
- while (i < input.length) {
- // TODO find a way to avoid double conversion of date and time.
- data(i) = CarbonScalaUtil.convertToUTF8String(
- input(i),
- rowDataTypes(i),
- timeStampFormat,
- dateFormat,
- serializationNullFormat)
- i = i + 1
+ val inputLen = Math.min(input.length, len)
+ try {
+ while (i < inputLen) {
+ // TODO find a way to avoid double conversion of date and time.
+ data(i) = CarbonScalaUtil.convertToUTF8String(
+ input(i),
+ rowDataTypes(i),
+ timeStampFormat,
+ dateFormat,
+ serializationNullFormat,
+ failAction,
+ ignoreAction)
+ i = i + 1
+ }
+ InternalRow.fromSeq(data)
+ } catch {
+ case e: BadRecordFoundException => throw e
+ case e: Exception => InternalRow.empty // It is bad record ignore case
}
- InternalRow.fromSeq(data)
- }
+
+ }.filter(f => f.numFields != 0) // In bad record ignore case filter the empty values
--- End diff --

I am not getting how to filter inside map function. I don't think we can filter some rows inside map function unless we apply filter function.

---

[GitHub] carbondata pull request #1716: [CARBONDATA-1933] Support Spark 2.2.1 in carb...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1716#discussion_r158575738

--- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonLoadDataCommand.scala ---
@@ -564,55 +572,49 @@ case class CarbonLoadDataCommand(
val data = new Array[Any](len)
var i = 0
val input = value.get()
- while (i < input.length) {
- // TODO find a way to avoid double conversion of date and time.
- data(i) = CarbonScalaUtil.convertToUTF8String(
- input(i),
- rowDataTypes(i),
- timeStampFormat,
- dateFormat,
- serializationNullFormat)
- i = i + 1
+ val inputLen = Math.min(input.length, len)
+ try {
+ while (i < inputLen) {
+ // TODO find a way to avoid double conversion of date and time.
+ data(i) = CarbonScalaUtil.convertToUTF8String(
+ input(i),
+ rowDataTypes(i),
+ timeStampFormat,
+ dateFormat,
+ serializationNullFormat,
+ failAction,
+ ignoreAction)
+ i = i + 1
+ }
+ InternalRow.fromSeq(data)
+ } catch {
+ case e: BadRecordFoundException => throw e
+ case e: Exception => InternalRow.empty // It is bad record ignore case
}
- InternalRow.fromSeq(data)
- }
+
+ }.filter(f => f.numFields != 0) // In bad record ignore case filter the empty values
--- End diff --

I will move rdd creation to another private function.

---

[GitHub] carbondata issue #1716: [CARBONDATA-1933] Support Spark 2.2.1 in carbon part...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1716

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2293/

---

[GitHub] carbondata issue #1716: [CARBONDATA-1933] Support Spark 2.2.1 in carbon part...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1716

Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1076/

---

[GitHub] carbondata issue #1716: [CARBONDATA-1933] Support Spark 2.2.1 in carbon part...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1716

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2537/

---

12