[GitHub] [carbondata] Indhumathi27 opened a new pull request #3748: [WIP]Query on partition table with SI having multiple partitons gives empty results

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 opened a new pull request #3748: [WIP]Query on partition table with SI having multiple partitons gives empty results

GitBox

Indhumathi27 opened a new pull request #3748:
URL: https://github.com/apache/carbondata/pull/3748


   
   
    ### Why is this PR needed?
   
   
    ### What changes were proposed in this PR?
   
       
    ### Does this PR introduce any user interface change?
    - No
    - Yes. (please explain the change and update document)
   
    ### Is any new testcase added?
    - No
    - Yes
   
       
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3748: [WIP]Query on partition table with SI having multiple partiton columns gives empty results

GitBox

CarbonDataQA1 commented on pull request #3748:
URL: https://github.com/apache/carbondata/pull/3748#issuecomment-624628356


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1234/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3748: [WIP]Query on partition table with SI having multiple partiton columns gives empty results

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3748:
URL: https://github.com/apache/carbondata/pull/3748#issuecomment-624634789


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2952/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3748: [CARBONDATA-3801] Query on partition table with SI having multiple partition columns gives empty results

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3748:
URL: https://github.com/apache/carbondata/pull/3748#issuecomment-624720597


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1241/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3748: [CARBONDATA-3801] Query on partition table with SI having multiple partition columns gives empty results

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3748:
URL: https://github.com/apache/carbondata/pull/3748#issuecomment-624740141


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2959/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3748: [CARBONDATA-3801] Query on partition table with SI having multiple partition columns gives empty results

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3748:
URL: https://github.com/apache/carbondata/pull/3748#issuecomment-625300724


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2970/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3748: [CARBONDATA-3801] Query on partition table with SI having multiple partition columns gives empty results

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3748:
URL: https://github.com/apache/carbondata/pull/3748#issuecomment-625301427


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1252/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3748: [CARBONDATA-3801][CARBONDATA-3809] Query on partition table with SI having multiple partition columns gives empty results

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3748:
URL: https://github.com/apache/carbondata/pull/3748#issuecomment-625724593


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1259/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3748: [CARBONDATA-3801][CARBONDATA-3809] Query on partition table with SI having multiple partition columns gives empty results

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3748:
URL: https://github.com/apache/carbondata/pull/3748#issuecomment-625726171


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2977/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on pull request #3748: [CARBONDATA-3801][CARBONDATA-3805][CARBONDATA-3809] Query on partition table with SI having multiple partition columns gives empty results

GitBox
In reply to this post by GitBox

Indhumathi27 commented on pull request #3748:
URL: https://github.com/apache/carbondata/pull/3748#issuecomment-626147614


   @akashrn5 @kunal642 Please help to review and merge


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3748: [CARBONDATA-3801][CARBONDATA-3805][CARBONDATA-3809] Query on partition table with SI having multiple partition columns gives empty results

GitBox
In reply to this post by GitBox

akashrn5 commented on a change in pull request #3748:
URL: https://github.com/apache/carbondata/pull/3748#discussion_r422862534



##########
File path: core/src/main/java/org/apache/carbondata/core/datastore/block/SegmentPropertiesAndSchemaHolder.java
##########
@@ -297,9 +298,11 @@ public void addSegmentId(int segmentPropertiesIndex, String segmentId) {
     private CarbonRowSchema[] fileFooterEntrySchemaForBlock;
     private CarbonRowSchema[] fileFooterEntrySchemaForBlocklet;
 
-    public SegmentPropertiesWrapper(CarbonTable carbonTable, List<ColumnSchema> columnsInTable) {
+    public SegmentPropertiesWrapper(CarbonTable carbonTable, List<ColumnSchema> columnsInTable,
+        String segmentId) {

Review comment:
       I think this change is not required, because, for the partition table which is transactional table, segment id does not matter. This change will lead to unnecessary  changes. you are passing this just to pass the segment Id to getBlockId method. But if you see the method, for transactional partition table flow segment id not required to get the blockID, so please remove this change and can just pass empty string to getBlockId method.

##########
File path: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockIndex.java
##########
@@ -793,9 +794,14 @@ private boolean addBlockBasedOnMinMaxValue(FilterExecuter filterExecuter, byte[]
     BitSet bitSet = null;
     if (filterExecuter instanceof ImplicitColumnFilterExecutor) {
       String uniqueBlockPath;
-      if (segmentPropertiesWrapper.getCarbonTable().isHivePartitionTable()) {
-        uniqueBlockPath = filePath
-            .substring(segmentPropertiesWrapper.getCarbonTable().getTablePath().length() + 1);
+      String blockId = null;
+      CarbonTable carbonTable = segmentPropertiesWrapper.getCarbonTable();
+      if (carbonTable.isHivePartitionTable()) {
+        uniqueBlockPath = filePath.substring(carbonTable.getTablePath().length() + 1);
+        boolean isStandardTable = CarbonUtil.isStandardCarbonTable(carbonTable);
+        blockId = CarbonUtil.getBlockId(carbonTable.getAbsoluteTableIdentifier(), filePath,
+            segmentPropertiesWrapper.getSegmentId(), carbonTable.isTransactionalTable(),
+            isStandardTable, carbonTable.isHivePartitionTable());

Review comment:
       here better to get the blockid and assign to uniqueBlockPath, which will take care later in query flow to find shortblockid, no need to calculate here and pass ahead which will make us to change all the interfaces.

##########
File path: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/ImplicitColumnFilterExecutor.java
##########
@@ -35,7 +35,7 @@
    * @return
    */
   BitSet isFilterValuesPresentInBlockOrBlocklet(byte[][] maxValue, byte[][] minValue,
-      String uniqueBlockPath, boolean[] isMinMaxSet);
+      String uniqueBlockPath, boolean[] isMinMaxSet, String shortBlockId);

Review comment:
       So once after the above comment fix, all the interface changes can be removed.

##########
File path: core/src/test/java/org/apache/carbondata/core/indexstore/blockletindex/TestBlockletIndex.java
##########
@@ -54,7 +54,7 @@
 
     new MockUp<ImplicitIncludeFilterExecutorImpl>() {
       @Mock BitSet isFilterValuesPresentInBlockOrBlocklet(byte[][] maxValue, byte[][] minValue,
-          String uniqueBlockPath, boolean[] isMinMaxSet) {
+          String uniqueBlockPath, boolean[] isMinMaxSet, String shortBlockId) {

Review comment:
       revert these




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on pull request #3748: [CARBONDATA-3801][CARBONDATA-3805][CARBONDATA-3809] Query on partition table with SI having multiple partition columns gives empty results

GitBox
In reply to this post by GitBox

Indhumathi27 commented on pull request #3748:
URL: https://github.com/apache/carbondata/pull/3748#issuecomment-626579556


   @akashrn5 Handled al comments. Please check


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3748: [CARBONDATA-3801][CARBONDATA-3805][CARBONDATA-3809] Query on partition table with SI having multiple partition columns gives empty results

GitBox
In reply to this post by GitBox

akashrn5 commented on a change in pull request #3748:
URL: https://github.com/apache/carbondata/pull/3748#discussion_r422918566



##########
File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestSIWithPartition.scala
##########
@@ -356,8 +356,32 @@ class TestSIWithPartition extends QueryTest with BeforeAndAfterAll {
     }
   }
 
+  test("test secondary index with partition table having mutiple partition columns") {
+    sql("drop table if exists partition_table")
+    sql(s"""
+         | CREATE TABLE partition_table (stringField string, intField int, shortField short, stringField1 string)
+         | STORED AS carbondata
+         | PARTITIONED BY (hour_ string, date_ string, sec_ string)
+         | TBLPROPERTIES ('SORT_COLUMNS'='hour_,date_,stringField', 'SORT_SCOPE'='GLOBAL_SORT')
+      """.stripMargin
+    ).collect()
+    sql(s"drop index if exists si_1 on partition_table")
+    sql(s"create index si_1 on partition_table(stringField1) as 'carbondata'")

Review comment:
       please give some meaningful name to SI based on the test scenario.

##########
File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestSIWithPartition.scala
##########
@@ -356,8 +356,32 @@ class TestSIWithPartition extends QueryTest with BeforeAndAfterAll {
     }
   }
 
+  test("test secondary index with partition table having mutiple partition columns") {
+    sql("drop table if exists partition_table")
+    sql(s"""
+         | CREATE TABLE partition_table (stringField string, intField int, shortField short, stringField1 string)
+         | STORED AS carbondata
+         | PARTITIONED BY (hour_ string, date_ string, sec_ string)
+         | TBLPROPERTIES ('SORT_COLUMNS'='hour_,date_,stringField', 'SORT_SCOPE'='GLOBAL_SORT')
+      """.stripMargin
+    ).collect()

Review comment:
       remove the collect

##########
File path: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockIndex.java
##########
@@ -793,9 +794,12 @@ private boolean addBlockBasedOnMinMaxValue(FilterExecuter filterExecuter, byte[]
     BitSet bitSet = null;
     if (filterExecuter instanceof ImplicitColumnFilterExecutor) {
       String uniqueBlockPath;
-      if (segmentPropertiesWrapper.getCarbonTable().isHivePartitionTable()) {
-        uniqueBlockPath = filePath
-            .substring(segmentPropertiesWrapper.getCarbonTable().getTablePath().length() + 1);
+      CarbonTable carbonTable = segmentPropertiesWrapper.getCarbonTable();
+      if (carbonTable.isHivePartitionTable()) {
+        boolean isStandardTable = CarbonUtil.isStandardCarbonTable(carbonTable);

Review comment:
       please add a comment explaining the scenario here with example.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3748: [CARBONDATA-3801][CARBONDATA-3805][CARBONDATA-3809] Query on partition table with SI having multiple partition columns gives empty results

GitBox
In reply to this post by GitBox

Indhumathi27 commented on a change in pull request #3748:
URL: https://github.com/apache/carbondata/pull/3748#discussion_r422927143



##########
File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestSIWithPartition.scala
##########
@@ -356,8 +356,32 @@ class TestSIWithPartition extends QueryTest with BeforeAndAfterAll {
     }
   }
 
+  test("test secondary index with partition table having mutiple partition columns") {
+    sql("drop table if exists partition_table")
+    sql(s"""
+         | CREATE TABLE partition_table (stringField string, intField int, shortField short, stringField1 string)
+         | STORED AS carbondata
+         | PARTITIONED BY (hour_ string, date_ string, sec_ string)
+         | TBLPROPERTIES ('SORT_COLUMNS'='hour_,date_,stringField', 'SORT_SCOPE'='GLOBAL_SORT')
+      """.stripMargin
+    ).collect()
+    sql(s"drop index if exists si_1 on partition_table")
+    sql(s"create index si_1 on partition_table(stringField1) as 'carbondata'")

Review comment:
       hmm




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3748: [CARBONDATA-3801][CARBONDATA-3805][CARBONDATA-3809] Query on partition table with SI having multiple partition columns gives empty results

GitBox
In reply to this post by GitBox

Indhumathi27 commented on a change in pull request #3748:
URL: https://github.com/apache/carbondata/pull/3748#discussion_r422934984



##########
File path: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockIndex.java
##########
@@ -793,9 +794,12 @@ private boolean addBlockBasedOnMinMaxValue(FilterExecuter filterExecuter, byte[]
     BitSet bitSet = null;
     if (filterExecuter instanceof ImplicitColumnFilterExecutor) {
       String uniqueBlockPath;
-      if (segmentPropertiesWrapper.getCarbonTable().isHivePartitionTable()) {
-        uniqueBlockPath = filePath
-            .substring(segmentPropertiesWrapper.getCarbonTable().getTablePath().length() + 1);
+      CarbonTable carbonTable = segmentPropertiesWrapper.getCarbonTable();
+      if (carbonTable.isHivePartitionTable()) {
+        boolean isStandardTable = CarbonUtil.isStandardCarbonTable(carbonTable);

Review comment:
       added

##########
File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestSIWithPartition.scala
##########
@@ -356,8 +356,32 @@ class TestSIWithPartition extends QueryTest with BeforeAndAfterAll {
     }
   }
 
+  test("test secondary index with partition table having mutiple partition columns") {
+    sql("drop table if exists partition_table")
+    sql(s"""
+         | CREATE TABLE partition_table (stringField string, intField int, shortField short, stringField1 string)
+         | STORED AS carbondata
+         | PARTITIONED BY (hour_ string, date_ string, sec_ string)
+         | TBLPROPERTIES ('SORT_COLUMNS'='hour_,date_,stringField', 'SORT_SCOPE'='GLOBAL_SORT')
+      """.stripMargin
+    ).collect()

Review comment:
       removed




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3748: [CARBONDATA-3801][CARBONDATA-3805][CARBONDATA-3809] Query on partition table with SI having multiple partition columns gives empty results

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3748:
URL: https://github.com/apache/carbondata/pull/3748#issuecomment-626672341


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2998/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3748: [CARBONDATA-3801][CARBONDATA-3805][CARBONDATA-3809] Query on partition table with SI having multiple partition columns gives empty results

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3748:
URL: https://github.com/apache/carbondata/pull/3748#issuecomment-626674438


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1279/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]