[GitHub] [carbondata] Indhumathi27 opened a new pull request #4060: [CARBONDATA-4093] Added logs for MV and method to verify if mv is in Sync during query

classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 opened a new pull request #4060: [CARBONDATA-4093] Added logs for MV and method to verify if mv is in Sync during query

GitBox

Indhumathi27 opened a new pull request #4060:
URL: https://github.com/apache/carbondata/pull/4060


    ### Why is this PR needed?
    Added logs for MV and method to verify if mv is in Sync during query
   
    ### What changes were proposed in this PR?
   1. Move MV Enable Check to beginning to avoid transform logical plan
   2. Add Logger if exception is occurred during fetching mv schema
   3. Check if MV is in Sync and allow Query rewrite
   
       
    ### Does this PR introduce any user interface change?
    - No
   
    ### Is any new testcase added?
    - Yes
   
       
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4060: [CARBONDATA-4093] Added logs for MV and method to verify if mv is in Sync during query

GitBox

CarbonDataQA2 commented on pull request #4060:
URL: https://github.com/apache/carbondata/pull/4060#issuecomment-748467895






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4060: [CARBONDATA-4093] Added logs for MV and method to verify if mv is in Sync during query

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4060:
URL: https://github.com/apache/carbondata/pull/4060#issuecomment-748490991


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3455/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4060: [CARBONDATA-4093] Added logs for MV and method to verify if mv is in Sync during query

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4060:
URL: https://github.com/apache/carbondata/pull/4060#issuecomment-748491161


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5215/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on a change in pull request #4060: [CARBONDATA-4093] Added logs for MV and method to verify if mv is in Sync during query

GitBox
In reply to this post by GitBox

akashrn5 commented on a change in pull request #4060:
URL: https://github.com/apache/carbondata/pull/4060#discussion_r546516920



##########
File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java
##########
@@ -204,6 +205,17 @@ public ValidAndInvalidSegmentsInfo getValidAndInvalidSegments(Boolean isChildTab
             || SegmentStatus.COMPACTED == segment.getSegmentStatus()
             || SegmentStatus.MARKED_FOR_DELETE == segment.getSegmentStatus())) {
           listOfInvalidSegments.add(new Segment(segment.getLoadName(), segment.getSegmentFile()));
+          if (SegmentStatus.COMPACTED == segment.getSegmentStatus()) {

Review comment:
       please add a details comment with the scenario as example




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on a change in pull request #4060: [CARBONDATA-4093] Added logs for MV and method to verify if mv is in Sync during query

GitBox
In reply to this post by GitBox

akashrn5 commented on a change in pull request #4060:
URL: https://github.com/apache/carbondata/pull/4060#discussion_r546518847



##########
File path: core/src/main/java/org/apache/carbondata/core/view/MVProvider.java
##########
@@ -364,18 +366,37 @@ private static boolean isViewCanBeEnabled(MVSchema schema)
     }
     List<RelationIdentifier> relatedTables = schema.getRelatedTables();
     for (RelationIdentifier relatedTable : relatedTables) {
+      SegmentStatusManager.ValidAndInvalidSegmentsInfo validAndInvalidSegmentsInfo =
+          SegmentStatusManager.getValidAndInvalidSegmentsInfo(relatedTable);
       List<String> relatedTableSegmentList =
-          SegmentStatusManager.getValidSegmentList(relatedTable);
+          SegmentStatusManager.getValidSegmentList(validAndInvalidSegmentsInfo);
       if (!relatedTableSegmentList.isEmpty()) {
         if (viewSegmentMap.isEmpty()) {
           isViewCanBeEnabled = false;
         } else {
-          isViewCanBeEnabled = viewSegmentMap.get(
-              relatedTable.getDatabaseName() + CarbonCommonConstants.POINT +
-                  relatedTable.getTableName()).containsAll(relatedTableSegmentList);
+          String tableUniqueName =
+              relatedTable.getDatabaseName() + CarbonCommonConstants.POINT + relatedTable
+                  .getTableName();
+          isViewCanBeEnabled =
+              viewSegmentMap.get(tableUniqueName).containsAll(relatedTableSegmentList);
+          if (!isViewCanBeEnabled) {
+            // in case if main table is compacted and mv table mapping is not updated,
+            // check from merged Load Mapping
+            Map<String, List<String>> mergedLoadMapping =
+                validAndInvalidSegmentsInfo.getMergedLoadMapping();
+            List<String> mergedLoadsList = new ArrayList<>();

Review comment:
       dont create this list, just use stream and map function and do .tolist




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on a change in pull request #4060: [CARBONDATA-4093] Added logs for MV and method to verify if mv is in Sync during query

GitBox
In reply to this post by GitBox

akashrn5 commented on a change in pull request #4060:
URL: https://github.com/apache/carbondata/pull/4060#discussion_r546519558



##########
File path: core/src/main/java/org/apache/carbondata/core/view/MVManager.java
##########
@@ -114,7 +118,16 @@ public boolean hasSchemaOnTable(CarbonTable table) throws IOException {
   public List<MVSchema> getSchemas() throws IOException {
     List<MVSchema> schemas = new ArrayList<>();
     for (String database : this.getDatabases()) {
-      schemas.addAll(this.getSchemas(database));
+      try {
+        schemas.addAll(this.getSchemas(database));
+      } catch (IOException ex) {
+        throw ex;
+      } catch (Exception ex) {
+        LOGGER.error("Exception Occurred: Skipping MV schemas from database `" + database + "`.");

Review comment:
       ```suggestion
           LOGGER.error("Exception Occurred: Skipping MV schemas from database " + database + ".");
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on a change in pull request #4060: [CARBONDATA-4093] Added logs for MV and method to verify if mv is in Sync during query

GitBox
In reply to this post by GitBox

akashrn5 commented on a change in pull request #4060:
URL: https://github.com/apache/carbondata/pull/4060#discussion_r546519558



##########
File path: core/src/main/java/org/apache/carbondata/core/view/MVManager.java
##########
@@ -114,7 +118,16 @@ public boolean hasSchemaOnTable(CarbonTable table) throws IOException {
   public List<MVSchema> getSchemas() throws IOException {
     List<MVSchema> schemas = new ArrayList<>();
     for (String database : this.getDatabases()) {
-      schemas.addAll(this.getSchemas(database));
+      try {
+        schemas.addAll(this.getSchemas(database));
+      } catch (IOException ex) {
+        throw ex;
+      } catch (Exception ex) {
+        LOGGER.error("Exception Occurred: Skipping MV schemas from database `" + database + "`.");

Review comment:
       ```suggestion
           LOGGER.error("Exception Occurred: Skipping MV schemas from database: " + database + ".");
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on a change in pull request #4060: [CARBONDATA-4093] Added logs for MV and method to verify if mv is in Sync during query

GitBox
In reply to this post by GitBox

akashrn5 commented on a change in pull request #4060:
URL: https://github.com/apache/carbondata/pull/4060#discussion_r546520134



##########
File path: core/src/main/java/org/apache/carbondata/core/view/MVManager.java
##########
@@ -234,7 +247,16 @@ public static String getUpdatedSegmentMap(String mergedLoadName,
   public List<MVStatusDetail> getEnabledStatusDetails() throws IOException {
     List<MVStatusDetail> statusDetails = new ArrayList<>();
     for (String database : this.getDatabases()) {
-      statusDetails.addAll(this.getEnabledStatusDetails(database));
+      try {
+        statusDetails.addAll(this.getEnabledStatusDetails(database));

Review comment:
       since already it failed to access in getScehamas, please see if this is required or not, if not please revert




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on a change in pull request #4060: [CARBONDATA-4093] Added logs for MV and method to verify if mv is in Sync during query

GitBox
In reply to this post by GitBox

akashrn5 commented on a change in pull request #4060:
URL: https://github.com/apache/carbondata/pull/4060#discussion_r546520134



##########
File path: core/src/main/java/org/apache/carbondata/core/view/MVManager.java
##########
@@ -234,7 +247,16 @@ public static String getUpdatedSegmentMap(String mergedLoadName,
   public List<MVStatusDetail> getEnabledStatusDetails() throws IOException {
     List<MVStatusDetail> statusDetails = new ArrayList<>();
     for (String database : this.getDatabases()) {
-      statusDetails.addAll(this.getEnabledStatusDetails(database));
+      try {
+        statusDetails.addAll(this.getEnabledStatusDetails(database));

Review comment:
       since already it failed to access in `getSchemas(database)`, please see if this is required or not, if not please revert




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on a change in pull request #4060: [CARBONDATA-4093] Added logs for MV and method to verify if mv is in Sync during query

GitBox
In reply to this post by GitBox

akashrn5 commented on a change in pull request #4060:
URL: https://github.com/apache/carbondata/pull/4060#discussion_r546520998



##########
File path: integration/spark/src/main/scala/org/apache/carbondata/view/MVManagerInSpark.scala
##########
@@ -31,10 +31,10 @@ class MVManagerInSpark(session: SparkSession) extends MVManager {
   override def getDatabases: util.List[String] = {
     CarbonThreadUtil.threadSet(CarbonCommonConstants.CARBON_ENABLE_MV, "true")
     try {
-      val databaseList = session.catalog.listDatabases()
+      val databaseList = session.sessionState.catalog.listDatabases()

Review comment:
       remove the beow for loop and just do `map ` here

##########
File path: integration/spark/src/main/scala/org/apache/carbondata/view/MVManagerInSpark.scala
##########
@@ -31,10 +31,10 @@ class MVManagerInSpark(session: SparkSession) extends MVManager {
   override def getDatabases: util.List[String] = {
     CarbonThreadUtil.threadSet(CarbonCommonConstants.CARBON_ENABLE_MV, "true")
     try {
-      val databaseList = session.catalog.listDatabases()
+      val databaseList = session.sessionState.catalog.listDatabases()

Review comment:
       remove the below for loop and just do `map ` here




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on a change in pull request #4060: [CARBONDATA-4093] Added logs for MV and method to verify if mv is in Sync during query

GitBox
In reply to this post by GitBox

akashrn5 commented on a change in pull request #4060:
URL: https://github.com/apache/carbondata/pull/4060#discussion_r546522529



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/optimizer/MVRewriteRule.scala
##########
@@ -183,7 +184,7 @@ class MVRewriteRule(session: SparkSession) extends Rule[LogicalPlan] {
       catalogs.nonEmpty &&
       !isRewritten(validSchemas, catalogs) &&
       !isRelatedTableSegmentsSetAsInput(catalogs) &&
-      isRelated(validSchemas, catalogs)
+      isRelatedAndSyncWithParentTables(catalog, validSchemas, catalogs)

Review comment:
       ```suggestion
         isRelatedAndSyncWithParentTables(mvCatalog, validSchemas, catalogTables)
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #4060: [CARBONDATA-4093] Added logs for MV and method to verify if mv is in Sync during query

GitBox
In reply to this post by GitBox

Indhumathi27 commented on a change in pull request #4060:
URL: https://github.com/apache/carbondata/pull/4060#discussion_r546525745



##########
File path: core/src/main/java/org/apache/carbondata/core/view/MVManager.java
##########
@@ -234,7 +247,16 @@ public static String getUpdatedSegmentMap(String mergedLoadName,
   public List<MVStatusDetail> getEnabledStatusDetails() throws IOException {
     List<MVStatusDetail> statusDetails = new ArrayList<>();
     for (String database : this.getDatabases()) {
-      statusDetails.addAll(this.getEnabledStatusDetails(database));
+      try {
+        statusDetails.addAll(this.getEnabledStatusDetails(database));

Review comment:
       it is needed. as it try to get MV_STATUS file for all databases.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #4060: [CARBONDATA-4093] Added logs for MV and method to verify if mv is in Sync during query

GitBox
In reply to this post by GitBox

Indhumathi27 commented on a change in pull request #4060:
URL: https://github.com/apache/carbondata/pull/4060#discussion_r546528743



##########
File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java
##########
@@ -204,6 +205,17 @@ public ValidAndInvalidSegmentsInfo getValidAndInvalidSegments(Boolean isChildTab
             || SegmentStatus.COMPACTED == segment.getSegmentStatus()
             || SegmentStatus.MARKED_FOR_DELETE == segment.getSegmentStatus())) {
           listOfInvalidSegments.add(new Segment(segment.getLoadName(), segment.getSegmentFile()));
+          if (SegmentStatus.COMPACTED == segment.getSegmentStatus()) {

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4060: [CARBONDATA-4093] Added logs for MV and method to verify if mv is in Sync during query

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4060:
URL: https://github.com/apache/carbondata/pull/4060#issuecomment-748824686


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5218/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4060: [CARBONDATA-4093] Added logs for MV and method to verify if mv is in Sync during query

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4060:
URL: https://github.com/apache/carbondata/pull/4060#issuecomment-748824968


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3457/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4060: [CARBONDATA-4093] Added logs for MV and method to verify if mv is in Sync during query

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4060:
URL: https://github.com/apache/carbondata/pull/4060#issuecomment-748909001


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5223/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4060: [CARBONDATA-4093] Added logs for MV and method to verify if mv is in Sync during query

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4060:
URL: https://github.com/apache/carbondata/pull/4060#issuecomment-748914569


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3462/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akashrn5 commented on pull request #4060: [CARBONDATA-4093] Added logs for MV and method to verify if mv is in Sync during query

GitBox
In reply to this post by GitBox

akashrn5 commented on pull request #4060:
URL: https://github.com/apache/carbondata/pull/4060#issuecomment-748919308


   LGTM


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] asfgit closed pull request #4060: [CARBONDATA-4093] Added logs for MV and method to verify if mv is in Sync during query

GitBox
In reply to this post by GitBox

asfgit closed pull request #4060:
URL: https://github.com/apache/carbondata/pull/4060


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]