Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #1032: [WIP] Fixed range info overlapping values iss...

Classic

List

34 messages Options

Options

12

[GitHub] carbondata issue #1032: [CARBONDATA-1149] Fixed range info overlapping value...

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1032

Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/131/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1032: [CARBONDATA-1149] Fixed range info overlapping value...

In reply to this post by qiuchenjian-2

Github user asfgit commented on the issue:

https://github.com/apache/carbondata/pull/1032

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/carbondata-pr-spark-1.6/630/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1032: [CARBONDATA-1149] Fixed range info overlappin...

In reply to this post by qiuchenjian-2

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1032#discussion_r124024397

--- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala ---
@@ -288,6 +297,69 @@ object CommonUtil {
result
}

+ def validateForOverLappingRangeValues(desType: Option[String],
+ rangeInfoArray: Array[String]): Boolean = {
--- End diff --

better to use the some compare class with range partitioner.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1032: [CARBONDATA-1149] Fixed range info overlappin...

In reply to this post by qiuchenjian-2

Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1032#discussion_r124182621

--- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala ---
@@ -288,6 +297,69 @@ object CommonUtil {
result
}

+ def validateForOverLappingRangeValues(desType: Option[String],
+ rangeInfoArray: Array[String]): Boolean = {
--- End diff --

@QiangCai ....Please correct me if I am wrong. Scala has already a predefined method for comparing different array elements. I think for writing a Comparator class we will have to write our own logic which will be an extra overhead to maintain.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1032: [CARBONDATA-1149] Fixed range info overlapping value...

In reply to this post by qiuchenjian-2

Github user QiangCai commented on the issue:

https://github.com/apache/carbondata/pull/1032

LGTM

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1032: [CARBONDATA-1149] Fixed range info overlappin...

In reply to this post by qiuchenjian-2

Github user gvramana commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1032#discussion_r124548518

--- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala ---
@@ -288,6 +297,69 @@ object CommonUtil {
result
}

+ def validateForOverLappingRangeValues(desType: Option[String],
+ rangeInfoArray: Array[String]): Boolean = {
+ val rangeInfoValuesValid = desType match {
+ case Some("IntegerType") | Some("int") =>
+ val intRangeInfoArray = rangeInfoArray.map(_.toInt)
+ val sortedRangeInfoArray = intRangeInfoArray.sorted
+ intRangeInfoArray.sameElements(sortedRangeInfoArray)
+ case Some("StringType") | Some("string") =>
+ val sortedRangeInfoArray = rangeInfoArray.sorted
+ rangeInfoArray.sameElements(sortedRangeInfoArray)
+ case a if (desType.get.startsWith("varchar") || desType.get.startsWith("char")) =>
+ val sortedRangeInfoArray = rangeInfoArray.sorted
+ rangeInfoArray.sameElements(sortedRangeInfoArray)
+ case Some("LongType") | Some("long") | Some("bigint") =>
+ val longRangeInfoArray = rangeInfoArray.map(_.toLong)
+ val sortedRangeInfoArray = longRangeInfoArray.sorted
+ longRangeInfoArray.sameElements(sortedRangeInfoArray)
+ case Some("FloatType") | Some("float") =>
+ val floatRangeInfoArray = rangeInfoArray.map(_.toFloat)
+ val sortedRangeInfoArray = floatRangeInfoArray.sorted
+ floatRangeInfoArray.sameElements(sortedRangeInfoArray)
+ case Some("DoubleType") | Some("double") =>
+ val doubleRangeInfoArray = rangeInfoArray.map(_.toDouble)
+ val sortedRangeInfoArray = doubleRangeInfoArray.sorted
+ doubleRangeInfoArray.sameElements(sortedRangeInfoArray)
+ case Some("ByteType") | Some("tinyint") =>
+ val byteRangeInfoArray = rangeInfoArray.map(_.toByte)
+ val sortedRangeInfoArray = byteRangeInfoArray.sorted
+ byteRangeInfoArray.sameElements(sortedRangeInfoArray)
+ case Some("ShortType") | Some("smallint") =>
+ val shortRangeInfoArray = rangeInfoArray.map(_.toShort)
+ val sortedRangeInfoArray = shortRangeInfoArray.sorted
+ shortRangeInfoArray.sameElements(sortedRangeInfoArray)
+ case Some("BooleanType") | Some("boolean") =>
+ true
+ case a if (desType.get.startsWith("DecimalType") || desType.get.startsWith("decimal")) =>
+ val decimalRangeInfoArray = rangeInfoArray.map(value => BigDecimal(value))
+ val sortedRangeInfoArray = decimalRangeInfoArray.sorted
+ decimalRangeInfoArray.sameElements(sortedRangeInfoArray)
+ case Some("DateType") | Some("date") =>
+ val dateRangeInfoArray = rangeInfoArray.map { value =>
--- End diff --

Dictionary generation can bring duplicate values. duplicate value check required.
Same is the case with timesamp case also.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1032: [CARBONDATA-1149] Fixed range info overlappin...

In reply to this post by qiuchenjian-2

Github user gvramana commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1032#discussion_r124549140

--- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala ---
@@ -288,6 +297,69 @@ object CommonUtil {
result
}

+ def validateForOverLappingRangeValues(desType: Option[String],
+ rangeInfoArray: Array[String]): Boolean = {
+ val rangeInfoValuesValid = desType match {
+ case Some("IntegerType") | Some("int") =>
+ val intRangeInfoArray = rangeInfoArray.map(_.toInt)
+ val sortedRangeInfoArray = intRangeInfoArray.sorted
+ intRangeInfoArray.sameElements(sortedRangeInfoArray)
+ case Some("StringType") | Some("string") =>
+ val sortedRangeInfoArray = rangeInfoArray.sorted
+ rangeInfoArray.sameElements(sortedRangeInfoArray)
+ case a if (desType.get.startsWith("varchar") || desType.get.startsWith("char")) =>
+ val sortedRangeInfoArray = rangeInfoArray.sorted
+ rangeInfoArray.sameElements(sortedRangeInfoArray)
+ case Some("LongType") | Some("long") | Some("bigint") =>
+ val longRangeInfoArray = rangeInfoArray.map(_.toLong)
+ val sortedRangeInfoArray = longRangeInfoArray.sorted
+ longRangeInfoArray.sameElements(sortedRangeInfoArray)
+ case Some("FloatType") | Some("float") =>
+ val floatRangeInfoArray = rangeInfoArray.map(_.toFloat)
+ val sortedRangeInfoArray = floatRangeInfoArray.sorted
+ floatRangeInfoArray.sameElements(sortedRangeInfoArray)
+ case Some("DoubleType") | Some("double") =>
+ val doubleRangeInfoArray = rangeInfoArray.map(_.toDouble)
+ val sortedRangeInfoArray = doubleRangeInfoArray.sorted
+ doubleRangeInfoArray.sameElements(sortedRangeInfoArray)
+ case Some("ByteType") | Some("tinyint") =>
+ val byteRangeInfoArray = rangeInfoArray.map(_.toByte)
+ val sortedRangeInfoArray = byteRangeInfoArray.sorted
+ byteRangeInfoArray.sameElements(sortedRangeInfoArray)
+ case Some("ShortType") | Some("smallint") =>
+ val shortRangeInfoArray = rangeInfoArray.map(_.toShort)
+ val sortedRangeInfoArray = shortRangeInfoArray.sorted
+ shortRangeInfoArray.sameElements(sortedRangeInfoArray)
+ case Some("BooleanType") | Some("boolean") =>
+ true
+ case a if (desType.get.startsWith("DecimalType") || desType.get.startsWith("decimal")) =>
+ val decimalRangeInfoArray = rangeInfoArray.map(value => BigDecimal(value))
--- End diff --

Bigdecimal precision and scale needs to be considered , other wise two ranges can overlap after converting value during dataload.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1032: [CARBONDATA-1149] Fixed range info overlapping value...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1032

Can one of the admins verify this patch?

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1032: [CARBONDATA-1149] Fixed range info overlapping value...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1032

SDV Build Failed with Spark 2.1, Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/45/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1032: [CARBONDATA-1149] Fixed range info overlapping value...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1032

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/60/

---

[GitHub] carbondata issue #1032: [CARBONDATA-1149] Fixed range info overlapping value...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1032

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1234/

---

[GitHub] carbondata issue #1032: [CARBONDATA-1149] Fixed range info overlapping value...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1032

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/603/

---

[GitHub] carbondata issue #1032: [CARBONDATA-1149] Fixed range info overlapping value...

In reply to this post by qiuchenjian-2

Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1032

Not required as partition feature is re-implemented.

---

[GitHub] carbondata pull request #1032: [CARBONDATA-1149] Fixed range info overlappin...

In reply to this post by qiuchenjian-2

Github user manishgupta88 closed the pull request at:

https://github.com/apache/carbondata/pull/1032

---

12