[GitHub] [carbondata] ShreelekhyaG opened a new pull request #4107: [CARBONDATA-4149] Query with SI after add partition based on location on partition table gives incorrect results

classic Classic list List threaded Threaded
34 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4107: [CARBONDATA-4149] Query with SI after add partition based on location on partition table gives incorrect results

GitBox

CarbonDataQA2 commented on pull request #4107:
URL: https://github.com/apache/carbondata/pull/4107#issuecomment-802179085


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5593/
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4107: [CARBONDATA-4149] Query with SI after add partition based on location on partition table gives incorrect results

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4107:
URL: https://github.com/apache/carbondata/pull/4107#issuecomment-802180741


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3827/
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4107: [CARBONDATA-4149] Query with SI after add partition based on location on partition table gives incorrect results

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4107:
URL: https://github.com/apache/carbondata/pull/4107#issuecomment-802352264


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5594/
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4107: [CARBONDATA-4149] Query with SI after add partition based on location on partition table gives incorrect results

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4107:
URL: https://github.com/apache/carbondata/pull/4107#issuecomment-802356898


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3828/
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #4107: [CARBONDATA-4149] Query with SI after add partition based on location on partition table gives incorrect results

GitBox
In reply to this post by GitBox

ShreelekhyaG commented on a change in pull request #4107:
URL: https://github.com/apache/carbondata/pull/4107#discussion_r597396875



##########
File path: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonMergerRDD.scala
##########
@@ -98,6 +98,24 @@ class CarbonMergerRDD[K, V](
     broadCastSplits = sparkContext.broadcast(new CarbonInputSplitWrapper(splits))
   }
 
+  // checks for added partition specs with external path.
+  // after compaction, location path to be updated with table path.
+  def checkAndUpdatePartitionLocation(partitionSpec: PartitionSpec) : PartitionSpec = {
+    if (partitionSpec != null) {
+      carbonLoadModel.getLoadMetadataDetails.asScala.foreach(loadMetaDetail => {
+        if (loadMetaDetail.getPath != null &&
+            loadMetaDetail.getPath.split(",").contains(partitionSpec.getLocation.toString)) {
+          val updatedPartitionLocation = CarbonDataProcessorUtil

Review comment:
       Done

##########
File path: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonTableCompactor.scala
##########
@@ -263,6 +267,16 @@ class CarbonTableCompactor(
       if (partitionSpecs != null && partitionSpecs.nonEmpty) {
         compactionCallableModel.compactedPartitions = Some(partitionSpecs)
       }
+      partitionSpecs.foreach(partitionSpec => {
+        carbonLoadModel.getLoadMetadataDetails.asScala.foreach(loadMetaDetail => {
+          if (loadMetaDetail.getPath != null &&
+              loadMetaDetail.getPath.split(",").contains(partitionSpec.getLocation.toString)) {

Review comment:
       Done




--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #4107: [CARBONDATA-4149] Query with SI after add partition based on location on partition table gives incorrect results

GitBox
In reply to this post by GitBox

ShreelekhyaG commented on a change in pull request #4107:
URL: https://github.com/apache/carbondata/pull/4107#discussion_r597396979



##########
File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestSIWithPartition.scala
##########
@@ -380,6 +384,37 @@ class TestSIWithPartition extends QueryTest with BeforeAndAfterAll {
     sql("drop table if exists partition_table")
   }
 
+  test("test si with add partition based on location on partition table") {
+    sql("drop table if exists partition_table")
+    sql("create table partition_table (id int,name String) " +
+        "partitioned by(email string) stored as carbondata")
+    sql("insert into partition_table select 1,'blue','abc'")
+    sql("CREATE INDEX partitionTable_si  on table partition_table (name) as 'carbondata'")

Review comment:
       Ok done




--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #4107: [CARBONDATA-4149] Query with SI after add partition based on location on partition table gives incorrect results

GitBox
In reply to this post by GitBox

ShreelekhyaG commented on a change in pull request #4107:
URL: https://github.com/apache/carbondata/pull/4107#discussion_r597398064



##########
File path: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonTableCompactor.scala
##########
@@ -276,7 +290,25 @@ class CarbonTableCompactor(
           segmentMetaDataAccumulator)
       } else {
         if (mergeRDD != null) {
-          mergeRDD.collect
+          val result = mergeRDD.collect

Review comment:
       Made changes with old and new partitions list in order to replace partitionSpec in `compactedPartitions`.




--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #4107: [CARBONDATA-4149] Query with SI after add partition based on location on partition table gives incorrect results

GitBox
In reply to this post by GitBox

Indhumathi27 commented on a change in pull request #4107:
URL: https://github.com/apache/carbondata/pull/4107#discussion_r597409507



##########
File path: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonTableCompactor.scala
##########
@@ -276,7 +294,40 @@ class CarbonTableCompactor(
           segmentMetaDataAccumulator)
       } else {
         if (mergeRDD != null) {
-          mergeRDD.collect
+          val result = mergeRDD.collect
+          if (!updatePartitionSpecs.isEmpty) {
+            val tableIdentifier = new TableIdentifier(carbonTable.getTableName,
+              Some(carbonTable.getDatabaseName))
+            // To update partitionSpec in hive metastore, drop and add with latest path.
+            val oldPartitions: util.List[TablePartitionSpec] =
+              new util.ArrayList[TablePartitionSpec]()
+            val newPartitions: util.List[TablePartitionSpec] =
+              new util.ArrayList[TablePartitionSpec]()
+            updatePartitionSpecs.asScala.foreach {
+              partitionSpec =>
+                var spec = PartitioningUtils.parsePathFragment(
+                  String.join(CarbonCommonConstants.FILE_SEPARATOR, partitionSpec.getPartitions))
+                oldPartitions.add(spec)
+                val addPartition = mergeRDD.checkAndUpdatePartitionLocation(partitionSpec)
+                spec = PartitioningUtils.parsePathFragment(

Review comment:
        partitionSpec.getPartitions and addPartition.getPartitions will be same only. so, please remove oldPartiiton and new partition list and keep one




--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #4107: [CARBONDATA-4149] Query with SI after add partition based on location on partition table gives incorrect results

GitBox
In reply to this post by GitBox

ShreelekhyaG commented on a change in pull request #4107:
URL: https://github.com/apache/carbondata/pull/4107#discussion_r597429391



##########
File path: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonTableCompactor.scala
##########
@@ -276,7 +294,40 @@ class CarbonTableCompactor(
           segmentMetaDataAccumulator)
       } else {
         if (mergeRDD != null) {
-          mergeRDD.collect
+          val result = mergeRDD.collect
+          if (!updatePartitionSpecs.isEmpty) {
+            val tableIdentifier = new TableIdentifier(carbonTable.getTableName,
+              Some(carbonTable.getDatabaseName))
+            // To update partitionSpec in hive metastore, drop and add with latest path.
+            val oldPartitions: util.List[TablePartitionSpec] =
+              new util.ArrayList[TablePartitionSpec]()
+            val newPartitions: util.List[TablePartitionSpec] =
+              new util.ArrayList[TablePartitionSpec]()
+            updatePartitionSpecs.asScala.foreach {
+              partitionSpec =>
+                var spec = PartitioningUtils.parsePathFragment(
+                  String.join(CarbonCommonConstants.FILE_SEPARATOR, partitionSpec.getPartitions))
+                oldPartitions.add(spec)
+                val addPartition = mergeRDD.checkAndUpdatePartitionLocation(partitionSpec)
+                spec = PartitioningUtils.parsePathFragment(

Review comment:
       Done




--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on pull request #4107: [CARBONDATA-4149] Query with SI after add partition based on location on partition table gives incorrect results

GitBox
In reply to this post by GitBox

Indhumathi27 commented on pull request #4107:
URL: https://github.com/apache/carbondata/pull/4107#issuecomment-802599633


   @ShreelekhyaG please update PR description


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on pull request #4107: [CARBONDATA-4149] Fix query issues after alter add partition.

GitBox
In reply to this post by GitBox

Indhumathi27 commented on pull request #4107:
URL: https://github.com/apache/carbondata/pull/4107#issuecomment-802666444


   LGTM


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4107: [CARBONDATA-4149] Fix query issues after alter add partition.

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4107:
URL: https://github.com/apache/carbondata/pull/4107#issuecomment-802717935


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5602/
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4107: [CARBONDATA-4149] Fix query issues after alter add partition.

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4107:
URL: https://github.com/apache/carbondata/pull/4107#issuecomment-802720450


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3836/
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] asfgit closed pull request #4107: [CARBONDATA-4149] Fix query issues after alter add partition.

GitBox
In reply to this post by GitBox

asfgit closed pull request #4107:
URL: https://github.com/apache/carbondata/pull/4107


   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


12