[GitHub] [carbondata] ajantha-bhat opened a new pull request #3913: [WIP] Improve partition purining perfromance in presto carbon integration

classic Classic list List threaded Threaded
52 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat opened a new pull request #3913: [WIP] Improve partition purining perfromance in presto carbon integration

GitBox

ajantha-bhat opened a new pull request #3913:
URL: https://github.com/apache/carbondata/pull/3913


    ### Why is this PR needed?
   a) For 200K segments table in cloud, presto partition query was taking more than 5 hours. the reason is it was reading all segment files for partition pruning. Now it is less than a minute !
   
    ### What changes were proposed in this PR?
   a)  HiveTableHandle already have partition spec, matching for the filters (it has queried metastore to get all partitions and pruned it). So, create partitionSpec based on that. Also handled for both prestodb and prestosql  
   b)  #3885 , broke prestodb compilation, only prestosql is compiled.
   c)  #3887, also didn't handled prestodb
   
       
    ### Does this PR introduce any user interface change?
    - No
   
   
    ### Is any new testcase added?
    - No [Need to add spark support and create better UT for presto, TODO]
   verified manually
   
   
       
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3913: [WIP] Improve partition purning performance in presto carbon integration

GitBox

CarbonDataQA1 commented on pull request #3913:
URL: https://github.com/apache/carbondata/pull/3913#issuecomment-687331617


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3979/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3913: [WIP] Improve partition purning performance in presto carbon integration

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3913:
URL: https://github.com/apache/carbondata/pull/3913#issuecomment-687335279


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2239/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3913: [CARBONDATA-3974] Improve partition purning performance in presto carbon integration

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3913:
URL: https://github.com/apache/carbondata/pull/3913#issuecomment-688316413


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3984/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3913: [CARBONDATA-3974] Improve partition purning performance in presto carbon integration

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3913:
URL: https://github.com/apache/carbondata/pull/3913#issuecomment-688323439


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2244/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on pull request #3913: [CARBONDATA-3974] Improve partition purning performance in presto carbon integration

GitBox
In reply to this post by GitBox

ajantha-bhat commented on pull request #3913:
URL: https://github.com/apache/carbondata/pull/3913#issuecomment-688403371


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on pull request #3913: [CARBONDATA-3974] Improve partition purning performance in presto carbon integration

GitBox
In reply to this post by GitBox

ajantha-bhat commented on pull request #3913:
URL: https://github.com/apache/carbondata/pull/3913#issuecomment-688403784


   PR is ready. Please review @QiangCai , @kunal642


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3913: [CARBONDATA-3974] Improve partition purning performance in presto carbon integration

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3913:
URL: https://github.com/apache/carbondata/pull/3913#issuecomment-688455154


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3990/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3913: [CARBONDATA-3974] Improve partition purning performance in presto carbon integration

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3913:
URL: https://github.com/apache/carbondata/pull/3913#issuecomment-688455954


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2251/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] marchpure commented on pull request #3913: [CARBONDATA-3974] Improve partition purning performance in presto carbon integration

GitBox
In reply to this post by GitBox

marchpure commented on pull request #3913:
URL: https://github.com/apache/carbondata/pull/3913#issuecomment-688802921


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3913: [CARBONDATA-3974] Improve partition purning performance in presto carbon integration

GitBox
In reply to this post by GitBox

Indhumathi27 commented on a change in pull request #3913:
URL: https://github.com/apache/carbondata/pull/3913#discussion_r484845416



##########
File path: integration/presto/src/main/prestodb/org/apache/carbondata/presto/CarbondataSplitManager.java
##########
@@ -117,6 +122,16 @@ public ConnectorSplitSource getSplits(ConnectorTransactionHandle transactionHand
       // file metastore case tablePath can be null, so get from location
       location = table.getStorage().getLocation();
     }
+    List<PartitionSpec> filteredPartitions = new ArrayList<>();

Review comment:
       Can you add a testcase with partition filter?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3913: [CARBONDATA-3974] Improve partition purning performance in presto carbon integration

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #3913:
URL: https://github.com/apache/carbondata/pull/3913#discussion_r484845973



##########
File path: integration/presto/src/main/prestodb/org/apache/carbondata/presto/CarbondataSplitManager.java
##########
@@ -117,6 +122,16 @@ public ConnectorSplitSource getSplits(ConnectorTransactionHandle transactionHand
       // file metastore case tablePath can be null, so get from location
       location = table.getStorage().getLocation();
     }
+    List<PartitionSpec> filteredPartitions = new ArrayList<>();

Review comment:
       please read the description, I have mentioned why UT cannot be added now




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3913: [CARBONDATA-3974] Improve partition purning performance in presto carbon integration

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #3913:
URL: https://github.com/apache/carbondata/pull/3913#discussion_r484845973



##########
File path: integration/presto/src/main/prestodb/org/apache/carbondata/presto/CarbondataSplitManager.java
##########
@@ -117,6 +122,16 @@ public ConnectorSplitSource getSplits(ConnectorTransactionHandle transactionHand
       // file metastore case tablePath can be null, so get from location
       location = table.getStorage().getLocation();
     }
+    List<PartitionSpec> filteredPartitions = new ArrayList<>();

Review comment:
       please read the description, I have already mentioned why UT cannot be added now




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3913: [CARBONDATA-3974] Improve partition purning performance in presto carbon integration

GitBox
In reply to this post by GitBox

Indhumathi27 commented on a change in pull request #3913:
URL: https://github.com/apache/carbondata/pull/3913#discussion_r484849645



##########
File path: integration/presto/src/main/prestosql/org/apache/carbondata/presto/impl/CarbonTableReader.java
##########
@@ -245,16 +242,14 @@ private CarbonTableCacheModel getValidCacheBySchemaTableName(SchemaTableName sch
    *
    * @param tableCacheModel cached table
    * @param filters carbonData filters
-   * @param constraints presto filters
+   * @param filteredPartitions matched partitionSpec for the filter
    * @param config hadoop conf
    * @return list of multiblock split
    * @throws IOException
    */
-  public List<CarbonLocalMultiBlockSplit> getInputSplits(
-      CarbonTableCacheModel tableCacheModel,
-      Expression filters,
-      TupleDomain<HiveColumnHandle> constraints,
-      Configuration config) throws IOException {
+  public List<CarbonLocalMultiBlockSplit> getInputSplits(CarbonTableCacheModel tableCacheModel,
+      Expression filters, List<PartitionSpec> filteredPartitions, Configuration config)

Review comment:
       Can revert to old style




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3913: [CARBONDATA-3974] Improve partition purning performance in presto carbon integration

GitBox
In reply to this post by GitBox

Indhumathi27 commented on a change in pull request #3913:
URL: https://github.com/apache/carbondata/pull/3913#discussion_r484851607



##########
File path: integration/presto/src/test/prestodb/org/apache/carbondata/presto/server/PrestoTestUtil.scala
##########
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.presto.server
+
+import com.facebook.presto.jdbc.PrestoArray
+

Review comment:
       Remove extra lines




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3913: [CARBONDATA-3974] Improve partition purning performance in presto carbon integration

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3913:
URL: https://github.com/apache/carbondata/pull/3913#issuecomment-688858320


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4003/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3913: [CARBONDATA-3974] Improve partition purning performance in presto carbon integration

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3913:
URL: https://github.com/apache/carbondata/pull/3913#issuecomment-688859734


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2263/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] marchpure commented on pull request #3913: [CARBONDATA-3974] Improve partition purning performance in presto carbon integration

GitBox
In reply to this post by GitBox

marchpure commented on pull request #3913:
URL: https://github.com/apache/carbondata/pull/3913#issuecomment-689003409


   I just tested. With this PR. Query nonpartition table will has EMPTY RESULT. Query  parititon table works well


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] marchpure removed a comment on pull request #3913: [CARBONDATA-3974] Improve partition purning performance in presto carbon integration

GitBox
In reply to this post by GitBox

marchpure removed a comment on pull request #3913:
URL: https://github.com/apache/carbondata/pull/3913#issuecomment-689003409


   I just tested. With this PR. Query nonpartition table will has EMPTY RESULT. Query  parititon table works well


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kunal642 commented on a change in pull request #3913: [CARBONDATA-3974] Improve partition purning performance in presto carbon integration

GitBox
In reply to this post by GitBox

kunal642 commented on a change in pull request #3913:
URL: https://github.com/apache/carbondata/pull/3913#discussion_r485626615



##########
File path: integration/presto/src/main/prestodb/org/apache/carbondata/presto/CarbondataModule.java
##########
@@ -21,6 +21,8 @@
 
 import static java.util.Objects.requireNonNull;
 
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.util.CarbonProperties;

Review comment:
       Why this change is required?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


123