[GitHub] [carbondata] kunal642 opened a new pull request #3581: WIP: Reduced an HDFS call and listing of tables in refresh command

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kunal642 opened a new pull request #3581: WIP: Reduced an HDFS call and listing of tables in refresh command

GitBox
kunal642 opened a new pull request #3581: WIP: Reduced an HDFS call and listing of tables in refresh command
URL: https://github.com/apache/carbondata/pull/3581
 
 
    ### Why is this PR needed?
   
   
    ### What changes were proposed in this PR?
   
       
    ### Does this PR introduce any user interface change?
    - No
    - Yes. (please explain the change and update document)
   
    ### Is any new testcase added?
    - No
    - Yes
   
       
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3581: WIP: Reduced an HDFS call and listing of tables in refresh command

GitBox
CarbonDataQA1 commented on issue #3581: WIP: Reduced an HDFS call and listing of tables in refresh command
URL: https://github.com/apache/carbondata/pull/3581#issuecomment-575053854
 
 
   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1660/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kumarvishal09 commented on issue #3581: WIP: Reduced an HDFS call and listing of tables in refresh command

GitBox
In reply to this post by GitBox
kumarvishal09 commented on issue #3581: WIP: Reduced an HDFS call and listing of tables in refresh command
URL: https://github.com/apache/carbondata/pull/3581#issuecomment-575460318
 
 
   @kunal642 pls fix the ci failure

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kumarvishal09 commented on issue #3581: WIP: Reduced an HDFS call and listing of tables in refresh command

GitBox
In reply to this post by GitBox
kumarvishal09 commented on issue #3581: WIP: Reduced an HDFS call and listing of tables in refresh command
URL: https://github.com/apache/carbondata/pull/3581#issuecomment-575462633
 
 
   @kunal642 Please add PR detail description

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kunal642 commented on issue #3581: WIP: Reduced an HDFS call and listing of tables in refresh command

GitBox
In reply to this post by GitBox
kunal642 commented on issue #3581: WIP: Reduced an HDFS call and listing of tables in refresh command
URL: https://github.com/apache/carbondata/pull/3581#issuecomment-575467554
 
 
   retest this please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3581: WIP: Reduced an HDFS call and listing of tables in refresh command

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3581: WIP: Reduced an HDFS call and listing of tables in refresh command
URL: https://github.com/apache/carbondata/pull/3581#issuecomment-575472992
 
 
   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1674/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3581: [CARBONDATA-3666] Reduced an HDFS call and listing of tables in refresh command

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3581: [CARBONDATA-3666] Reduced an HDFS call and listing of tables in refresh command
URL: https://github.com/apache/carbondata/pull/3581#issuecomment-575510796
 
 
   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1675/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kunal642 commented on issue #3581: [CARBONDATA-3666] Avoided listing of table dir in refresh command

GitBox
In reply to this post by GitBox
kunal642 commented on issue #3581: [CARBONDATA-3666] Avoided listing of table dir in refresh command
URL: https://github.com/apache/carbondata/pull/3581#issuecomment-575530349
 
 
   @kumarvishal09 CI Passed

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3581: [CARBONDATA-3666] Avoided listing of table dir in refresh command

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3581: [CARBONDATA-3666] Avoided listing of table dir in refresh command
URL: https://github.com/apache/carbondata/pull/3581#discussion_r368263868
 
 

 ##########
 File path: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/RefreshCarbonTableCommand.scala
 ##########
 @@ -63,12 +63,20 @@ case class RefreshCarbonTableCommand(
     // then do the below steps
     // 2.2.1 validate that all the aggregate tables are copied at the store location.
     // 2.2.2 Register the aggregate tables
-    val tablePath = CarbonEnv.getTablePath(databaseNameOp, tableName.toLowerCase)(sparkSession)
-    val identifier = AbsoluteTableIdentifier.from(tablePath, databaseName, tableName.toLowerCase)
     // 2.1 check if the table already register with hive then ignore and continue with the next
     // schema
-    if (!sparkSession.sessionState.catalog.listTables(databaseName)
-      .exists(_.table.equalsIgnoreCase(tableName))) {
+    val provider = try {
+      sparkSession.sessionState.catalog
+        .getTableMetadata(TableIdentifier(tableName, databaseNameOp)).provider
+    } catch {
+      case _: NoSuchTableException =>
+        None
+    }
+    if (provider.isEmpty ||
+        provider.get.equalsIgnoreCase("org.apache.spark.sql.CarbonSource") ||
 
 Review comment:
   There are many places we are doing this check, it is getting repeated in many places, not clean. Can you make a util function and use it in all places

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kunal642 commented on a change in pull request #3581: [CARBONDATA-3666] Avoided listing of table dir in refresh command

GitBox
In reply to this post by GitBox
kunal642 commented on a change in pull request #3581: [CARBONDATA-3666] Avoided listing of table dir in refresh command
URL: https://github.com/apache/carbondata/pull/3581#discussion_r368274021
 
 

 ##########
 File path: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/RefreshCarbonTableCommand.scala
 ##########
 @@ -63,12 +63,20 @@ case class RefreshCarbonTableCommand(
     // then do the below steps
     // 2.2.1 validate that all the aggregate tables are copied at the store location.
     // 2.2.2 Register the aggregate tables
-    val tablePath = CarbonEnv.getTablePath(databaseNameOp, tableName.toLowerCase)(sparkSession)
-    val identifier = AbsoluteTableIdentifier.from(tablePath, databaseName, tableName.toLowerCase)
     // 2.1 check if the table already register with hive then ignore and continue with the next
     // schema
-    if (!sparkSession.sessionState.catalog.listTables(databaseName)
-      .exists(_.table.equalsIgnoreCase(tableName))) {
+    val provider = try {
+      sparkSession.sessionState.catalog
+        .getTableMetadata(TableIdentifier(tableName, databaseNameOp)).provider
+    } catch {
+      case _: NoSuchTableException =>
+        None
+    }
+    if (provider.isEmpty ||
+        provider.get.equalsIgnoreCase("org.apache.spark.sql.CarbonSource") ||
 
 Review comment:
   ok

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] xuchuanyin commented on a change in pull request #3581: [CARBONDATA-3666] Avoided listing of table dir in refresh command

GitBox
In reply to this post by GitBox
xuchuanyin commented on a change in pull request #3581: [CARBONDATA-3666] Avoided listing of table dir in refresh command
URL: https://github.com/apache/carbondata/pull/3581#discussion_r368288557
 
 

 ##########
 File path: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/RefreshCarbonTableCommand.scala
 ##########
 @@ -63,12 +63,20 @@ case class RefreshCarbonTableCommand(
     // then do the below steps
     // 2.2.1 validate that all the aggregate tables are copied at the store location.
     // 2.2.2 Register the aggregate tables
-    val tablePath = CarbonEnv.getTablePath(databaseNameOp, tableName.toLowerCase)(sparkSession)
 
 Review comment:
   the above comments are outdated and should be updated to keep up with your modification

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kunal642 commented on a change in pull request #3581: [CARBONDATA-3666] Avoided listing of table dir in refresh command

GitBox
In reply to this post by GitBox
kunal642 commented on a change in pull request #3581: [CARBONDATA-3666] Avoided listing of table dir in refresh command
URL: https://github.com/apache/carbondata/pull/3581#discussion_r368375832
 
 

 ##########
 File path: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/RefreshCarbonTableCommand.scala
 ##########
 @@ -63,12 +63,20 @@ case class RefreshCarbonTableCommand(
     // then do the below steps
     // 2.2.1 validate that all the aggregate tables are copied at the store location.
     // 2.2.2 Register the aggregate tables
-    val tablePath = CarbonEnv.getTablePath(databaseNameOp, tableName.toLowerCase)(sparkSession)
 
 Review comment:
   done

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kunal642 commented on a change in pull request #3581: [CARBONDATA-3666] Avoided listing of table dir in refresh command

GitBox
In reply to this post by GitBox
kunal642 commented on a change in pull request #3581: [CARBONDATA-3666] Avoided listing of table dir in refresh command
URL: https://github.com/apache/carbondata/pull/3581#discussion_r368375838
 
 

 ##########
 File path: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/RefreshCarbonTableCommand.scala
 ##########
 @@ -63,12 +63,20 @@ case class RefreshCarbonTableCommand(
     // then do the below steps
     // 2.2.1 validate that all the aggregate tables are copied at the store location.
     // 2.2.2 Register the aggregate tables
-    val tablePath = CarbonEnv.getTablePath(databaseNameOp, tableName.toLowerCase)(sparkSession)
-    val identifier = AbsoluteTableIdentifier.from(tablePath, databaseName, tableName.toLowerCase)
     // 2.1 check if the table already register with hive then ignore and continue with the next
     // schema
-    if (!sparkSession.sessionState.catalog.listTables(databaseName)
-      .exists(_.table.equalsIgnoreCase(tableName))) {
+    val provider = try {
+      sparkSession.sessionState.catalog
+        .getTableMetadata(TableIdentifier(tableName, databaseNameOp)).provider
+    } catch {
+      case _: NoSuchTableException =>
+        None
+    }
+    if (provider.isEmpty ||
+        provider.get.equalsIgnoreCase("org.apache.spark.sql.CarbonSource") ||
 
 Review comment:
   done

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3581: [CARBONDATA-3666] Avoided listing of table dir in refresh command

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3581: [CARBONDATA-3666] Avoided listing of table dir in refresh command
URL: https://github.com/apache/carbondata/pull/3581#issuecomment-576114924
 
 
   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1695/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3581: [CARBONDATA-3666] Avoided listing of table dir in refresh command

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3581: [CARBONDATA-3666] Avoided listing of table dir in refresh command
URL: https://github.com/apache/carbondata/pull/3581#issuecomment-576118520
 
 
   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1696/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3581: [CARBONDATA-3666] Avoided listing of table dir in refresh command

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3581: [CARBONDATA-3666] Avoided listing of table dir in refresh command
URL: https://github.com/apache/carbondata/pull/3581#issuecomment-576233640
 
 
   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1703/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on issue #3581: [CARBONDATA-3666] Avoided listing of table dir in refresh command

GitBox
In reply to this post by GitBox
jackylk commented on issue #3581: [CARBONDATA-3666] Avoided listing of table dir in refresh command
URL: https://github.com/apache/carbondata/pull/3581#issuecomment-576991766
 
 
   LGTM

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] asfgit closed pull request #3581: [CARBONDATA-3666] Avoided listing of table dir in refresh command

GitBox
In reply to this post by GitBox
asfgit closed pull request #3581: [CARBONDATA-3666] Avoided listing of table dir in refresh command
URL: https://github.com/apache/carbondata/pull/3581
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services