[GitHub] [carbondata] kunal642 opened a new pull request #3601: Fixed performance issue for drop table

classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kunal642 opened a new pull request #3601: Fixed performance issue for drop table

GitBox
kunal642 opened a new pull request #3601: Fixed performance issue for drop table
URL: https://github.com/apache/carbondata/pull/3601
 
 
    ### Why is this PR needed?
   
   
    ### What changes were proposed in this PR?
   
       
    ### Does this PR introduce any user interface change?
    - No
    - Yes. (please explain the change and update document)
   
    ### Is any new testcase added?
    - No
    - Yes
   
       
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3601: [WIP] Fixed performance issue for drop table

GitBox
CarbonDataQA1 commented on issue #3601: [WIP] Fixed performance issue for drop table
URL: https://github.com/apache/carbondata/pull/3601#issuecomment-581471527
 
 
   Build Failed  with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/136/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3601: [WIP] Fixed performance issue for drop table

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3601: [WIP] Fixed performance issue for drop table
URL: https://github.com/apache/carbondata/pull/3601#issuecomment-581472138
 
 
   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1840/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3601: [CARBONDATA-3677] Fixed performance issue for drop table

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3601: [CARBONDATA-3677] Fixed performance issue for drop table
URL: https://github.com/apache/carbondata/pull/3601#issuecomment-581757193
 
 
   Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/142/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3601: [CARBONDATA-3677] Fixed performance issue for drop table

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3601: [CARBONDATA-3677] Fixed performance issue for drop table
URL: https://github.com/apache/carbondata/pull/3601#issuecomment-581771136
 
 
   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1846/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3601: [CARBONDATA-3677] Fixed performance issue for drop table

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3601: [CARBONDATA-3677] Fixed performance issue for drop table
URL: https://github.com/apache/carbondata/pull/3601#discussion_r374774775
 
 

 ##########
 File path: core/src/main/java/org/apache/carbondata/core/datamap/DataMapChooser.java
 ##########
 @@ -121,6 +121,10 @@ public DataMapExprWrapper chooseCGDataMap(FilterResolverIntf resolverIntf) {
     return chooseDataMap(DataMapLevel.CG, resolverIntf);
   }
 
+  public boolean hasCGDataMap() {
 
 Review comment:
   It is strange we add this func in a class called `Chooser`, can you put in a more suitable class?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3601: [CARBONDATA-3677] Fixed performance issue for drop table

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3601: [CARBONDATA-3677] Fixed performance issue for drop table
URL: https://github.com/apache/carbondata/pull/3601#discussion_r374775357
 
 

 ##########
 File path: core/src/main/java/org/apache/carbondata/core/datamap/DataMapChooser.java
 ##########
 @@ -121,6 +121,10 @@ public DataMapExprWrapper chooseCGDataMap(FilterResolverIntf resolverIntf) {
     return chooseDataMap(DataMapLevel.CG, resolverIntf);
   }
 
+  public boolean hasCGDataMap() {
 
 Review comment:
   It is strange we add this func in a class called `Chooser`, can you put it in CarbonTable?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3601: [CARBONDATA-3677] Fixed performance issue for drop table

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3601: [CARBONDATA-3677] Fixed performance issue for drop table
URL: https://github.com/apache/carbondata/pull/3601#discussion_r374777441
 
 

 ##########
 File path: integration/spark2/src/main/scala/org/apache/carbondata/indexserver/DistributedPruneRDD.scala
 ##########
 @@ -142,7 +142,6 @@ private[indexserver] class DistributedPruneRDD(@transient private val ss: SparkS
         while (reader.nextKeyValue()) {
           blocklets.add(reader.getCurrentValue)
         }
-        reader.close()
 
 Review comment:
   not close reader?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3601: [CARBONDATA-3677] Fixed performance issue for drop table

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3601: [CARBONDATA-3677] Fixed performance issue for drop table
URL: https://github.com/apache/carbondata/pull/3601#discussion_r374781431
 
 

 ##########
 File path: core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java
 ##########
 @@ -513,7 +513,17 @@ public void refreshSegmentCacheIfRequired(CarbonTable carbonTable,
    * @param identifier Table identifier
    */
   public void clearDataMaps(AbsoluteTableIdentifier identifier) {
-    clearDataMaps(identifier, true);
+    CarbonTable carbonTable = getCarbonTable(identifier);
+    boolean launchJob = false;
+    try {
+      // launchJob will be true if either the table has a CGDatamap or index server is enabled for
+      // the specified table.
+      launchJob = new DataMapChooser(carbonTable).hasCGDataMap() || CarbonProperties.getInstance()
+          .isDistributedPruningEnabled(identifier.getDatabaseName(), identifier.getTableName());
+    } catch (IOException e) {
+      LOGGER.warn("Unable to launch job to clear datamaps.", e);
+    }
+    clearDataMaps(identifier, launchJob);
 
 Review comment:
   `launchJob` parameter name is not easy to understand, can you change it to
   `cleareDataMapCache(identifier, clearInAllWorkers)` to indicate that it will clear the datamap cache in all workers

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3601: [CARBONDATA-3677] Fixed performance issue for drop table

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3601: [CARBONDATA-3677] Fixed performance issue for drop table
URL: https://github.com/apache/carbondata/pull/3601#discussion_r374781431
 
 

 ##########
 File path: core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java
 ##########
 @@ -513,7 +513,17 @@ public void refreshSegmentCacheIfRequired(CarbonTable carbonTable,
    * @param identifier Table identifier
    */
   public void clearDataMaps(AbsoluteTableIdentifier identifier) {
-    clearDataMaps(identifier, true);
+    CarbonTable carbonTable = getCarbonTable(identifier);
+    boolean launchJob = false;
+    try {
+      // launchJob will be true if either the table has a CGDatamap or index server is enabled for
+      // the specified table.
+      launchJob = new DataMapChooser(carbonTable).hasCGDataMap() || CarbonProperties.getInstance()
+          .isDistributedPruningEnabled(identifier.getDatabaseName(), identifier.getTableName());
+    } catch (IOException e) {
+      LOGGER.warn("Unable to launch job to clear datamaps.", e);
+    }
+    clearDataMaps(identifier, launchJob);
 
 Review comment:
   `launchJob` parameter name is not easy to understand, can you change it to
   `clearDataMapCache(identifier, clearInAllWorkers)` to indicate that it will clear the datamap cache in all workers

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3601: [CARBONDATA-3677] Fixed performance issue for drop table

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3601: [CARBONDATA-3677] Fixed performance issue for drop table
URL: https://github.com/apache/carbondata/pull/3601#discussion_r374783957
 
 

 ##########
 File path: core/src/main/java/org/apache/carbondata/core/datamap/DistributableDataMapFormat.java
 ##########
 @@ -196,6 +196,11 @@ public float getProgress() {
 
       @Override
       public void close() {
+        // Clear the datamaps from executor
+        if (isFallbackJob) {
+          DataMapStoreManager.getInstance()
+              .clearDataMaps(table.getAbsoluteTableIdentifier(), false);
 
 Review comment:
   ```suggestion
                 .clearDataMapCache(table.getAbsoluteTableIdentifier(), false);
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kunal642 commented on a change in pull request #3601: [CARBONDATA-3677] Fixed performance issue for drop table

GitBox
In reply to this post by GitBox
kunal642 commented on a change in pull request #3601: [CARBONDATA-3677] Fixed performance issue for drop table
URL: https://github.com/apache/carbondata/pull/3601#discussion_r375648297
 
 

 ##########
 File path: core/src/main/java/org/apache/carbondata/core/datamap/DataMapChooser.java
 ##########
 @@ -121,6 +121,10 @@ public DataMapExprWrapper chooseCGDataMap(FilterResolverIntf resolverIntf) {
     return chooseDataMap(DataMapLevel.CG, resolverIntf);
   }
 
+  public boolean hasCGDataMap() {
 
 Review comment:
   Moved to DataMapStoreManager because this method is used only by DataMapStoreManager.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kunal642 commented on a change in pull request #3601: [CARBONDATA-3677] Fixed performance issue for drop table

GitBox
In reply to this post by GitBox
kunal642 commented on a change in pull request #3601: [CARBONDATA-3677] Fixed performance issue for drop table
URL: https://github.com/apache/carbondata/pull/3601#discussion_r375648396
 
 

 ##########
 File path: integration/spark2/src/main/scala/org/apache/carbondata/indexserver/DistributedPruneRDD.scala
 ##########
 @@ -142,7 +142,6 @@ private[indexserver] class DistributedPruneRDD(@transient private val ss: SparkS
         while (reader.nextKeyValue()) {
           blocklets.add(reader.getCurrentValue)
         }
-        reader.close()
 
 Review comment:
   nextKeyValue is already closing when next() returns false

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kunal642 commented on a change in pull request #3601: [CARBONDATA-3677] Fixed performance issue for drop table

GitBox
In reply to this post by GitBox
kunal642 commented on a change in pull request #3601: [CARBONDATA-3677] Fixed performance issue for drop table
URL: https://github.com/apache/carbondata/pull/3601#discussion_r375648653
 
 

 ##########
 File path: core/src/main/java/org/apache/carbondata/core/datamap/DistributableDataMapFormat.java
 ##########
 @@ -196,6 +196,11 @@ public float getProgress() {
 
       @Override
       public void close() {
+        // Clear the datamaps from executor
+        if (isFallbackJob) {
+          DataMapStoreManager.getInstance()
+              .clearDataMaps(table.getAbsoluteTableIdentifier(), false);
 
 Review comment:
   done

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kunal642 commented on a change in pull request #3601: [CARBONDATA-3677] Fixed performance issue for drop table

GitBox
In reply to this post by GitBox
kunal642 commented on a change in pull request #3601: [CARBONDATA-3677] Fixed performance issue for drop table
URL: https://github.com/apache/carbondata/pull/3601#discussion_r375648707
 
 

 ##########
 File path: core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java
 ##########
 @@ -513,7 +513,17 @@ public void refreshSegmentCacheIfRequired(CarbonTable carbonTable,
    * @param identifier Table identifier
    */
   public void clearDataMaps(AbsoluteTableIdentifier identifier) {
-    clearDataMaps(identifier, true);
+    CarbonTable carbonTable = getCarbonTable(identifier);
+    boolean launchJob = false;
+    try {
+      // launchJob will be true if either the table has a CGDatamap or index server is enabled for
+      // the specified table.
+      launchJob = new DataMapChooser(carbonTable).hasCGDataMap() || CarbonProperties.getInstance()
+          .isDistributedPruningEnabled(identifier.getDatabaseName(), identifier.getTableName());
+    } catch (IOException e) {
+      LOGGER.warn("Unable to launch job to clear datamaps.", e);
+    }
+    clearDataMaps(identifier, launchJob);
 
 Review comment:
   done

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3601: [CARBONDATA-3677] Fixed performance issue for drop table

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3601: [CARBONDATA-3677] Fixed performance issue for drop table
URL: https://github.com/apache/carbondata/pull/3601#issuecomment-582748130
 
 
   Build Failed  with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/164/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3601: [CARBONDATA-3677] Fixed performance issue for drop table

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3601: [CARBONDATA-3677] Fixed performance issue for drop table
URL: https://github.com/apache/carbondata/pull/3601#issuecomment-582748510
 
 
   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1867/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3601: [CARBONDATA-3677] Fixed performance issue for drop table

GitBox
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3601: [CARBONDATA-3677] Fixed performance issue for drop table
URL: https://github.com/apache/carbondata/pull/3601#discussion_r375702400
 
 

 ##########
 File path: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java
 ##########
 @@ -35,6 +35,7 @@
 import org.apache.carbondata.core.constants.CarbonCommonConstants;
 import org.apache.carbondata.core.constants.CarbonLoadOptionConstants;
 import org.apache.carbondata.core.constants.SortScopeOptions;
+import org.apache.carbondata.core.datamap.DataMapLevel;
 
 Review comment:
   please remove if not used

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3601: [CARBONDATA-3677] Fixed performance issue for drop table

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3601: [CARBONDATA-3677] Fixed performance issue for drop table
URL: https://github.com/apache/carbondata/pull/3601#issuecomment-583829617
 
 
   Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/192/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3601: [CARBONDATA-3677] Fixed performance issue for drop table

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3601: [CARBONDATA-3677] Fixed performance issue for drop table
URL: https://github.com/apache/carbondata/pull/3601#issuecomment-583834562
 
 
   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1894/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
12