Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #1672: [CARBONDATA-1858][PARTITION] Support querying...

Classic

List

Threaded

84 messages Options

12345

qiuchenjian-2

[GitHub] carbondata issue #1672: [CARBONDATA-1858][PARTITION] Support querying data f...

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1672

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2400/

---

qiuchenjian-2

[GitHub] carbondata issue #1672: [CARBONDATA-1858][PARTITION] Support querying data f...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1672

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2119/

---

qiuchenjian-2

[GitHub] carbondata issue #1672: [CARBONDATA-1858][PARTITION] Support querying data f...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1672

Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/894/

---

qiuchenjian-2

[GitHub] carbondata issue #1672: [CARBONDATA-1858][PARTITION] Support querying data f...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1672

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2404/

---

qiuchenjian-2

[GitHub] carbondata issue #1672: [CARBONDATA-1858][PARTITION] Support querying data f...

In reply to this post by qiuchenjian-2

Github user jackylk commented on the issue:

https://github.com/apache/carbondata/pull/1672

LGTM

---

qiuchenjian-2

[GitHub] carbondata pull request #1672: [CARBONDATA-1858][PARTITION] Support querying...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1672#discussion_r157702204

--- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala ---
@@ -146,27 +184,13 @@ private[sql] class CarbonLateDecodeStrategy extends SparkStrategy {
}
}

- protected def pruneFilterProject(
- relation: LogicalRelation,
- projects: Seq[NamedExpression],
- filterPredicates: Seq[Expression],
- scanBuilder: (Seq[Attribute], Array[Filter],
- ArrayBuffer[AttributeReference]) => RDD[InternalRow]) = {
- pruneFilterProjectRaw(
- relation,
- projects,
- filterPredicates,
- (requestedColumns, _, pushedFilters, a) => {
- scanBuilder(requestedColumns, pushedFilters.toArray, a)
- })
- }
-
protected def pruneFilterProjectRaw(
relation: LogicalRelation,
rawProjects: Seq[NamedExpression],
filterPredicates: Seq[Expression],
+ partitions: Seq[String],
scanBuilder: (Seq[Attribute], Seq[Expression], Seq[Filter],
- ArrayBuffer[AttributeReference]) => RDD[InternalRow]) = {
+ ArrayBuffer[AttributeReference], Seq[String]) => RDD[InternalRow]) = {
--- End diff --

identation is not correct

---

qiuchenjian-2

[GitHub] carbondata pull request #1672: [CARBONDATA-1858][PARTITION] Support querying...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1672#discussion_r157702403

--- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala ---
@@ -130,6 +130,44 @@ private[sql] class CarbonLateDecodeStrategy extends SparkStrategy {
table.carbonTable.getTableInfo.serialize())
}

+ /**
+ * Converts to physical RDD of carbon after pushing down applicable filters.
+ * @param relation
+ * @param projects
+ * @param filterPredicates
+ * @param scanBuilder
+ * @return
+ */
+ private def pruneFilterProject(
+ relation: LogicalRelation,
+ projects: Seq[NamedExpression],
+ filterPredicates: Seq[Expression],
+ scanBuilder: (Seq[Attribute], Array[Filter],
+ ArrayBuffer[AttributeReference], Seq[String]) => RDD[InternalRow]) = {
--- End diff --

incorrect identation

---

qiuchenjian-2

[GitHub] carbondata pull request #1672: [CARBONDATA-1858][PARTITION] Support querying...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1672#discussion_r157703591

--- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala ---
@@ -130,6 +130,44 @@ private[sql] class CarbonLateDecodeStrategy extends SparkStrategy {
table.carbonTable.getTableInfo.serialize())
}

+ /**
+ * Converts to physical RDD of carbon after pushing down applicable filters.
+ * @param relation
+ * @param projects
+ * @param filterPredicates
+ * @param scanBuilder
--- End diff --

can you add comment for these parameter

---

qiuchenjian-2

[GitHub] carbondata pull request #1672: [CARBONDATA-1858][PARTITION] Support querying...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1672#discussion_r157704454

--- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala ---
@@ -130,6 +130,44 @@ private[sql] class CarbonLateDecodeStrategy extends SparkStrategy {
table.carbonTable.getTableInfo.serialize())
}

+ /**
+ * Converts to physical RDD of carbon after pushing down applicable filters.
+ * @param relation
+ * @param projects
+ * @param filterPredicates
+ * @param scanBuilder
+ * @return
+ */
+ private def pruneFilterProject(
+ relation: LogicalRelation,
+ projects: Seq[NamedExpression],
+ filterPredicates: Seq[Expression],
+ scanBuilder: (Seq[Attribute], Array[Filter],
+ ArrayBuffer[AttributeReference], Seq[String]) => RDD[InternalRow]) = {
+ val names = relation.catalogTable.get.partitionColumnNames
+ // Get the current partitions from table.
+ var partitions: Seq[String] = null
+ if (names.nonEmpty) {
+ val partitionSet = AttributeSet(names
+ .map(p => relation.output.find(_.name.equalsIgnoreCase(p)).get))
--- End diff --

here the code style is not good

---

qiuchenjian-2

[GitHub] carbondata pull request #1672: [CARBONDATA-1858][PARTITION] Support querying...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1672#discussion_r157704631

--- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala ---
@@ -130,6 +130,44 @@ private[sql] class CarbonLateDecodeStrategy extends SparkStrategy {
table.carbonTable.getTableInfo.serialize())
}

+ /**
+ * Converts to physical RDD of carbon after pushing down applicable filters.
+ * @param relation
+ * @param projects
+ * @param filterPredicates
+ * @param scanBuilder
+ * @return
+ */
+ private def pruneFilterProject(
+ relation: LogicalRelation,
+ projects: Seq[NamedExpression],
+ filterPredicates: Seq[Expression],
+ scanBuilder: (Seq[Attribute], Array[Filter],
+ ArrayBuffer[AttributeReference], Seq[String]) => RDD[InternalRow]) = {
+ val names = relation.catalogTable.get.partitionColumnNames
+ // Get the current partitions from table.
+ var partitions: Seq[String] = null
+ if (names.nonEmpty) {
+ val partitionSet = AttributeSet(names
+ .map(p => relation.output.find(_.name.equalsIgnoreCase(p)).get))
+ val partitionKeyFilters =
+ ExpressionSet(ExpressionSet(filterPredicates).filter(_.references.subsetOf(partitionSet)))
+ partitions =
+ CarbonFilters.getPartitions(
--- End diff --

can you add comment for logic line 151 to 156, explain what it does

---

qiuchenjian-2

[GitHub] carbondata pull request #1672: [CARBONDATA-1858][PARTITION] Support querying...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1672#discussion_r157704888

--- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/optimizer/CarbonFilters.scala ---
@@ -395,4 +391,14 @@ object CarbonFilters {
case _ => expressions
}
}
+
+ def getPartitions(partitionFilters: Seq[Expression],
--- End diff --

Can you add comment for this function

---

qiuchenjian-2

[GitHub] carbondata pull request #1672: [CARBONDATA-1858][PARTITION] Support querying...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1672#discussion_r157704988

--- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/optimizer/CarbonFilters.scala ---
@@ -395,4 +391,14 @@ object CarbonFilters {
case _ => expressions
}
}
+
+ def getPartitions(partitionFilters: Seq[Expression],
--- End diff --

move `partitionFilters` to next line

---

qiuchenjian-2

[GitHub] carbondata issue #1672: [CARBONDATA-1858][PARTITION] Support querying data f...

In reply to this post by qiuchenjian-2

qiuchenjian-2

[GitHub] carbondata issue #1672: [CARBONDATA-1858][PARTITION] Support querying data f...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1672

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2123/

---

qiuchenjian-2

[GitHub] carbondata issue #1672: [CARBONDATA-1858][PARTITION] Support querying data f...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1672

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2407/

---

qiuchenjian-2

[GitHub] carbondata pull request #1672: [CARBONDATA-1858][PARTITION] Support querying...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1672#discussion_r157738771

--- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala ---
@@ -130,6 +130,44 @@ private[sql] class CarbonLateDecodeStrategy extends SparkStrategy {
table.carbonTable.getTableInfo.serialize())
}

+ /**
+ * Converts to physical RDD of carbon after pushing down applicable filters.
+ * @param relation
+ * @param projects
+ * @param filterPredicates
+ * @param scanBuilder
+ * @return
+ */
+ private def pruneFilterProject(
+ relation: LogicalRelation,
+ projects: Seq[NamedExpression],
+ filterPredicates: Seq[Expression],
+ scanBuilder: (Seq[Attribute], Array[Filter],
+ ArrayBuffer[AttributeReference], Seq[String]) => RDD[InternalRow]) = {
--- End diff --

Yes, you are right. It is difficult to track the parameters meaning. Even I also find it difficult pass a new parameter partitions but this is old code added as part of spark2.0 support and may we can refactor later.

---

qiuchenjian-2

[GitHub] carbondata pull request #1672: [CARBONDATA-1858][PARTITION] Support querying...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1672#discussion_r157739117

--- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala ---
@@ -130,6 +130,44 @@ private[sql] class CarbonLateDecodeStrategy extends SparkStrategy {
table.carbonTable.getTableInfo.serialize())
}

+ /**
+ * Converts to physical RDD of carbon after pushing down applicable filters.
+ * @param relation
+ * @param projects
+ * @param filterPredicates
+ * @param scanBuilder
+ * @return
+ */
+ private def pruneFilterProject(
+ relation: LogicalRelation,
+ projects: Seq[NamedExpression],
+ filterPredicates: Seq[Expression],
+ scanBuilder: (Seq[Attribute], Array[Filter],
+ ArrayBuffer[AttributeReference], Seq[String]) => RDD[InternalRow]) = {
+ val names = relation.catalogTable.get.partitionColumnNames
+ // Get the current partitions from table.
+ var partitions: Seq[String] = null
+ if (names.nonEmpty) {
+ val partitionSet = AttributeSet(names
+ .map(p => relation.output.find(_.name.equalsIgnoreCase(p)).get))
--- End diff --

OK

---

qiuchenjian-2

[GitHub] carbondata pull request #1672: [CARBONDATA-1858][PARTITION] Support querying...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1672#discussion_r157739562

--- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala ---
@@ -130,6 +130,44 @@ private[sql] class CarbonLateDecodeStrategy extends SparkStrategy {
table.carbonTable.getTableInfo.serialize())
}

+ /**
+ * Converts to physical RDD of carbon after pushing down applicable filters.
+ * @param relation
+ * @param projects
+ * @param filterPredicates
+ * @param scanBuilder
+ * @return
+ */
+ private def pruneFilterProject(
+ relation: LogicalRelation,
+ projects: Seq[NamedExpression],
+ filterPredicates: Seq[Expression],
+ scanBuilder: (Seq[Attribute], Array[Filter],
+ ArrayBuffer[AttributeReference], Seq[String]) => RDD[InternalRow]) = {
+ val names = relation.catalogTable.get.partitionColumnNames
+ // Get the current partitions from table.
+ var partitions: Seq[String] = null
+ if (names.nonEmpty) {
+ val partitionSet = AttributeSet(names
+ .map(p => relation.output.find(_.name.equalsIgnoreCase(p)).get))
+ val partitionKeyFilters =
+ ExpressionSet(ExpressionSet(filterPredicates).filter(_.references.subsetOf(partitionSet)))
+ partitions =
+ CarbonFilters.getPartitions(
--- End diff --

ok

---

qiuchenjian-2

[GitHub] carbondata pull request #1672: [CARBONDATA-1858][PARTITION] Support querying...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1672#discussion_r157740002

--- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala ---
@@ -130,6 +130,44 @@ private[sql] class CarbonLateDecodeStrategy extends SparkStrategy {
table.carbonTable.getTableInfo.serialize())
}

+ /**
+ * Converts to physical RDD of carbon after pushing down applicable filters.
+ * @param relation
+ * @param projects
+ * @param filterPredicates
+ * @param scanBuilder
--- End diff --

ok

---

qiuchenjian-2

[GitHub] carbondata issue #1672: [CARBONDATA-1858][PARTITION] Support querying data f...

In reply to this post by qiuchenjian-2

12345