[GitHub] [carbondata] Indhumathi27 opened a new pull request #3707: [WIP] Refactor code to optimize partition pruning

classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 opened a new pull request #3707: [WIP] Refactor code to optimize partition pruning

GitBox
Indhumathi27 opened a new pull request #3707: [WIP] Refactor code to optimize partition pruning
URL: https://github.com/apache/carbondata/pull/3707
 
 
    ### Why is this PR needed?
   
   
    ### What changes were proposed in this PR?
   
       
    ### Does this PR introduce any user interface change?
    - No
    - Yes. (please explain the change and update document)
   
    ### Is any new testcase added?
    - No
    - Yes
   
       
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3707: [WIP] Refactor code to optimize partition pruning

GitBox
CarbonDataQA1 commented on issue #3707: [WIP] Refactor code to optimize partition pruning
URL: https://github.com/apache/carbondata/pull/3707#issuecomment-613380026
 
 
   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1022/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3707: [WIP] Refactor code to optimize partition pruning

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3707: [WIP] Refactor code to optimize partition pruning
URL: https://github.com/apache/carbondata/pull/3707#issuecomment-613380450
 
 
   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2734/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3707: [WIP] Refactor code to optimize partition pruning

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3707: [WIP] Refactor code to optimize partition pruning
URL: https://github.com/apache/carbondata/pull/3707#issuecomment-613388758
 
 
   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1023/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3707: [WIP] Refactor code to optimize partition pruning

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3707: [WIP] Refactor code to optimize partition pruning
URL: https://github.com/apache/carbondata/pull/3707#issuecomment-613389272
 
 
   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2735/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3707: [WIP] Refactor code to optimize partition pruning

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3707: [WIP] Refactor code to optimize partition pruning
URL: https://github.com/apache/carbondata/pull/3707#issuecomment-613451301
 
 
   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2737/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3707: [WIP] Refactor code to optimize partition pruning

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3707: [WIP] Refactor code to optimize partition pruning
URL: https://github.com/apache/carbondata/pull/3707#issuecomment-613466843
 
 
   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1025/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] marchpure commented on a change in pull request #3707: [HOTFIX] Refactor code to optimize partition pruning

GitBox
In reply to this post by GitBox
marchpure commented on a change in pull request #3707: [HOTFIX] Refactor code to optimize partition pruning
URL: https://github.com/apache/carbondata/pull/3707#discussion_r409700166
 
 

 ##########
 File path: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletIndexFactory.java
 ##########
 @@ -193,21 +192,18 @@ public IndexBuilder createBuilder(Segment segment, String shardName,
    * get tableBlockUniqueIdentifierWrappers from segment info. If partitionsToPrune is defined,
    * then get tableBlockUniqueIdentifierWrappers for the matched partitions.
    */
-  private void getTableBlockUniqueIdentifierWrappers(List<PartitionSpec> partitionsToPrune,
+  private void getTableBlockUniqueIdentifierWrappers(Set<String> partitionsToPrune,
       List<TableBlockIndexUniqueIdentifierWrapper> tableBlockIndexUniqueIdentifierWrappers,
       Set<TableBlockIndexUniqueIdentifier> identifiers) {
     for (TableBlockIndexUniqueIdentifier tableBlockIndexUniqueIdentifier : identifiers) {
-      if (null != partitionsToPrune) {
+      if (!partitionsToPrune.isEmpty()) {
         // add only tableBlockUniqueIdentifier that matches the partition
         // get the indexFile Parent path and compare with the PartitionPath, if matches, then add
         // the corresponding tableBlockIndexUniqueIdentifier for pruning
-        for (PartitionSpec partitionSpec : partitionsToPrune) {
-          if (partitionSpec.getLocation().toString()
-              .equalsIgnoreCase(tableBlockIndexUniqueIdentifier.getIndexFilePath())) {
-            tableBlockIndexUniqueIdentifierWrappers.add(
-                new TableBlockIndexUniqueIdentifierWrapper(tableBlockIndexUniqueIdentifier,
-                    this.getCarbonTable()));
-          }
+        if (partitionsToPrune.contains(tableBlockIndexUniqueIdentifier.getIndexFilePath())) {
 
 Review comment:
   I tested failed.
   In this dli spark, the indexfilepath is "obs://bucktable//tablename//partitionname",
   while the partitionSpec's location is "obs://bucktable/tablename/partitionname"
   Leading to pruning empty blocklets when queried partition table

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] marchpure commented on a change in pull request #3707: [HOTFIX] Refactor code to optimize partition pruning

GitBox
In reply to this post by GitBox
marchpure commented on a change in pull request #3707: [HOTFIX] Refactor code to optimize partition pruning
URL: https://github.com/apache/carbondata/pull/3707#discussion_r409700166
 
 

 ##########
 File path: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletIndexFactory.java
 ##########
 @@ -193,21 +192,18 @@ public IndexBuilder createBuilder(Segment segment, String shardName,
    * get tableBlockUniqueIdentifierWrappers from segment info. If partitionsToPrune is defined,
    * then get tableBlockUniqueIdentifierWrappers for the matched partitions.
    */
-  private void getTableBlockUniqueIdentifierWrappers(List<PartitionSpec> partitionsToPrune,
+  private void getTableBlockUniqueIdentifierWrappers(Set<String> partitionsToPrune,
       List<TableBlockIndexUniqueIdentifierWrapper> tableBlockIndexUniqueIdentifierWrappers,
       Set<TableBlockIndexUniqueIdentifier> identifiers) {
     for (TableBlockIndexUniqueIdentifier tableBlockIndexUniqueIdentifier : identifiers) {
-      if (null != partitionsToPrune) {
+      if (!partitionsToPrune.isEmpty()) {
         // add only tableBlockUniqueIdentifier that matches the partition
         // get the indexFile Parent path and compare with the PartitionPath, if matches, then add
         // the corresponding tableBlockIndexUniqueIdentifier for pruning
-        for (PartitionSpec partitionSpec : partitionsToPrune) {
-          if (partitionSpec.getLocation().toString()
-              .equalsIgnoreCase(tableBlockIndexUniqueIdentifier.getIndexFilePath())) {
-            tableBlockIndexUniqueIdentifierWrappers.add(
-                new TableBlockIndexUniqueIdentifierWrapper(tableBlockIndexUniqueIdentifier,
-                    this.getCarbonTable()));
-          }
+        if (partitionsToPrune.contains(tableBlockIndexUniqueIdentifier.getIndexFilePath())) {
 
 Review comment:
   I tested failed.
   In this dli spark, the indexfilepath is "obs://bucktable//tablename//partitionname",
   while the partitionSpec's location is "obs://bucktable/tablename/partitionname"
   Result in empty blocklets after prunning.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3707: [HOTFIX] Refactor code to optimize partition pruning

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3707: [HOTFIX] Refactor code to optimize partition pruning
URL: https://github.com/apache/carbondata/pull/3707#issuecomment-615124684
 
 
   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2763/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on issue #3707: [HOTFIX] Refactor code to optimize partition pruning

GitBox
In reply to this post by GitBox
Indhumathi27 commented on issue #3707: [HOTFIX] Refactor code to optimize partition pruning
URL: https://github.com/apache/carbondata/pull/3707#issuecomment-615130101
 
 
   retest this please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3707: [HOTFIX] Refactor code to optimize partition pruning

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3707: [HOTFIX] Refactor code to optimize partition pruning
URL: https://github.com/apache/carbondata/pull/3707#issuecomment-615183039
 
 
   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2766/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3707: [HOTFIX] Refactor code to optimize partition pruning

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3707: [HOTFIX] Refactor code to optimize partition pruning
URL: https://github.com/apache/carbondata/pull/3707#issuecomment-615189338
 
 
   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1053/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3707: [HOTFIX] Refactor code to optimize partition pruning

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3707: [HOTFIX] Refactor code to optimize partition pruning
URL: https://github.com/apache/carbondata/pull/3707#issuecomment-615310491
 
 
   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1055/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3707: [HOTFIX] Refactor code to optimize partition pruning

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3707: [HOTFIX] Refactor code to optimize partition pruning
URL: https://github.com/apache/carbondata/pull/3707#issuecomment-615311079
 
 
   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2768/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3707: [HOTFIX] Refactor code to optimize partition pruning

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3707: [HOTFIX] Refactor code to optimize partition pruning
URL: https://github.com/apache/carbondata/pull/3707#issuecomment-615313202
 
 
   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1056/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3707: [HOTFIX] Refactor code to optimize partition pruning

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3707: [HOTFIX] Refactor code to optimize partition pruning
URL: https://github.com/apache/carbondata/pull/3707#issuecomment-615313796
 
 
   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2769/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3707: [HOTFIX] Refactor code to optimize partition pruning

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3707: [HOTFIX] Refactor code to optimize partition pruning
URL: https://github.com/apache/carbondata/pull/3707#issuecomment-615385176
 
 
   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2771/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3707: [HOTFIX] Refactor code to optimize partition pruning

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3707: [HOTFIX] Refactor code to optimize partition pruning
URL: https://github.com/apache/carbondata/pull/3707#issuecomment-615389497
 
 
   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1058/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services