[GitHub] [carbondata] MarvinLitt opened a new pull request #3855: [CARBONDATA-3863], after using index service clean the temp data

classic Classic list List threaded Threaded
36 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] MarvinLitt opened a new pull request #3855: [CARBONDATA-3863], after using index service clean the temp data

GitBox

MarvinLitt opened a new pull request #3855:
URL: https://github.com/apache/carbondata/pull/3855


    ### Why is this PR needed?
   
   
    ### What changes were proposed in this PR?
   
       
    ### Does this PR introduce any user interface change?
    - No
    - Yes. (please explain the change and update document)
   
    ### Is any new testcase added?
    - No
    - Yes
   
       
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3855: [CARBONDATA-3863], after using index service clean the temp data

GitBox

CarbonDataQA1 commented on pull request #3855:
URL: https://github.com/apache/carbondata/pull/3855#issuecomment-661143216


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1699/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3855: [CARBONDATA-3863], after using index service clean the temp data

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3855:
URL: https://github.com/apache/carbondata/pull/3855#issuecomment-661143881


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3441/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] vikramahuja1001 commented on pull request #3855: [CARBONDATA-3863], after using index service clean the temp data

GitBox
In reply to this post by GitBox

vikramahuja1001 commented on pull request #3855:
URL: https://github.com/apache/carbondata/pull/3855#issuecomment-662251240


   @MarvinLitt , please add proper description as to why this PR is needed and what changes are proposed


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] MarvinLitt commented on pull request #3855: [CARBONDATA-3863], after using index service clean the temp data

GitBox
In reply to this post by GitBox

MarvinLitt commented on pull request #3855:
URL: https://github.com/apache/carbondata/pull/3855#issuecomment-662774014


   > @MarvinLitt , please add proper description as to why this PR is needed and what changes are proposed
   
   done


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kunal642 commented on a change in pull request #3855: [CARBONDATA-3863], after using index service clean the temp data

GitBox
In reply to this post by GitBox

kunal642 commented on a change in pull request #3855:
URL: https://github.com/apache/carbondata/pull/3855#discussion_r459579617



##########
File path: integration/spark/src/main/scala/org/apache/carbondata/indexserver/IndexServer.scala
##########
@@ -316,4 +324,17 @@ object IndexServer extends ServerInterface {
       Array(new Service("security.indexserver.protocol.acl", classOf[ServerInterface]))
     }
   }
+
+  def startAgingFolders(): Unit = {
+    val runnable = new Runnable() {
+      def run() {
+        val age = System.currentTimeMillis() - agePeriod.toLong
+        CarbonUtil.agingTempFolderForIndexServer(age)
+        LOGGER.info(s"Complete age temp folder ${CarbonUtil.getIndexServerTempPath}")
+      }
+    }
+    val ags: ScheduledExecutorService = Executors.newSingleThreadScheduledExecutor
+    ags.scheduleAtFixedRate(runnable, 1000, 3600000, TimeUnit.MICROSECONDS)
+    LOGGER.info("index server temp folders aging thread start")

Review comment:
       Please add this log inside the thread so that we can know each time the deletion thread was executed




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kunal642 commented on a change in pull request #3855: [CARBONDATA-3863], after using index service clean the temp data

GitBox
In reply to this post by GitBox

kunal642 commented on a change in pull request #3855:
URL: https://github.com/apache/carbondata/pull/3855#discussion_r459584046



##########
File path: integration/spark/src/main/scala/org/apache/carbondata/indexserver/IndexServer.scala
##########
@@ -316,4 +324,17 @@ object IndexServer extends ServerInterface {
       Array(new Service("security.indexserver.protocol.acl", classOf[ServerInterface]))
     }
   }
+
+  def startAgingFolders(): Unit = {
+    val runnable = new Runnable() {
+      def run() {
+        val age = System.currentTimeMillis() - agePeriod.toLong
+        CarbonUtil.agingTempFolderForIndexServer(age)
+        LOGGER.info(s"Complete age temp folder ${CarbonUtil.getIndexServerTempPath}")
+      }
+    }
+    val ags: ScheduledExecutorService = Executors.newSingleThreadScheduledExecutor
+    ags.scheduleAtFixedRate(runnable, 1000, 3600000, TimeUnit.MICROSECONDS)

Review comment:
       Why are you scheduling this after 3s.. I think this scheduling rate is too fast.
   
   Better to execute this thread at a fixed delay of 1 hour.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kunal642 commented on a change in pull request #3855: [CARBONDATA-3863], after using index service clean the temp data

GitBox
In reply to this post by GitBox

kunal642 commented on a change in pull request #3855:
URL: https://github.com/apache/carbondata/pull/3855#discussion_r459584942



##########
File path: integration/spark/src/main/scala/org/apache/carbondata/indexserver/IndexServer.scala
##########
@@ -316,4 +324,17 @@ object IndexServer extends ServerInterface {
       Array(new Service("security.indexserver.protocol.acl", classOf[ServerInterface]))
     }
   }
+
+  def startAgingFolders(): Unit = {

Review comment:
       Please rename this method to something like "scheduleCleanUpThread"




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kunal642 commented on a change in pull request #3855: [CARBONDATA-3863], after using index service clean the temp data

GitBox
In reply to this post by GitBox

kunal642 commented on a change in pull request #3855:
URL: https://github.com/apache/carbondata/pull/3855#discussion_r459588452



##########
File path: integration/spark/src/main/scala/org/apache/carbondata/indexserver/IndexJobs.scala
##########
@@ -69,6 +69,12 @@ class DistributedIndexJob extends AbstractIndexJob {
         if (null != splitFolderPath && !splitFolderPath.deleteFile()) {
           LOGGER.error("Problem while deleting the temp directory:"
             + splitFolderPath.getAbsolutePath)
+        } else {
+          // if the path build with getQueryId already exists,
+          // the splitFolderPath should be null, need delete
+          if (null == splitFolderPath) {
+            CarbonUtil.deleteTempFolderForIndexServer(indexFormat.getQueryId)

Review comment:
       We are already deleting the query id folders in this PR, why is deleting the "/tmp/indexservertmp" required then??




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kunal642 commented on a change in pull request #3855: [CARBONDATA-3863], after using index service clean the temp data

GitBox
In reply to this post by GitBox

kunal642 commented on a change in pull request #3855:
URL: https://github.com/apache/carbondata/pull/3855#discussion_r459589936



##########
File path: core/src/main/java/org/apache/carbondata/core/datastore/filesystem/LocalCarbonFile.java
##########
@@ -485,4 +486,30 @@ public boolean equals(Object o) {
   public int hashCode() {
     return Objects.hash(file.getAbsolutePath());
   }
+
+  @Override
+  public List<CarbonFile> listDirs() throws IOException {
+    if (!file.isDirectory()) {
+      return new ArrayList<CarbonFile>();
+    }
+    Collection<File> fileCollection = FileUtils.listFilesAndDirs(file,
+            DirectoryFileFilter.DIRECTORY, null);
+    if (fileCollection.isEmpty()) {
+      return new ArrayList<CarbonFile>();
+    }
+    List<CarbonFile> carbonFiles = new ArrayList<CarbonFile>();
+    for (File file : fileCollection) {
+      if (file.isDirectory()) {
+        File[] files = file.listFiles();

Review comment:
       Why list files is needed again? Are you trying to list recursive?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] MarvinLitt commented on a change in pull request #3855: [CARBONDATA-3863], after using index service clean the temp data

GitBox
In reply to this post by GitBox

MarvinLitt commented on a change in pull request #3855:
URL: https://github.com/apache/carbondata/pull/3855#discussion_r460346175



##########
File path: integration/spark/src/main/scala/org/apache/carbondata/indexserver/IndexJobs.scala
##########
@@ -69,6 +69,12 @@ class DistributedIndexJob extends AbstractIndexJob {
         if (null != splitFolderPath && !splitFolderPath.deleteFile()) {
           LOGGER.error("Problem while deleting the temp directory:"
             + splitFolderPath.getAbsolutePath)
+        } else {
+          // if the path build with getQueryId already exists,
+          // the splitFolderPath should be null, need delete
+          if (null == splitFolderPath) {
+            CarbonUtil.deleteTempFolderForIndexServer(indexFormat.getQueryId)

Review comment:
       if the /tmp/indexservertmp/gropid  has exists then the splitFolderPath will return null.
   before our code do not deal with that scene。




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] MarvinLitt commented on a change in pull request #3855: [CARBONDATA-3863], after using index service clean the temp data

GitBox
In reply to this post by GitBox

MarvinLitt commented on a change in pull request #3855:
URL: https://github.com/apache/carbondata/pull/3855#discussion_r460346175



##########
File path: integration/spark/src/main/scala/org/apache/carbondata/indexserver/IndexJobs.scala
##########
@@ -69,6 +69,12 @@ class DistributedIndexJob extends AbstractIndexJob {
         if (null != splitFolderPath && !splitFolderPath.deleteFile()) {
           LOGGER.error("Problem while deleting the temp directory:"
             + splitFolderPath.getAbsolutePath)
+        } else {
+          // if the path build with getQueryId already exists,
+          // the splitFolderPath should be null, need delete
+          if (null == splitFolderPath) {
+            CarbonUtil.deleteTempFolderForIndexServer(indexFormat.getQueryId)

Review comment:
       if the /tmp/indexservertmp/gropid  has exists then the splitFolderPath will return null.
   before our code do not deal with that scene。
   found this issue by test case.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] MarvinLitt commented on a change in pull request #3855: [CARBONDATA-3863], after using index service clean the temp data

GitBox
In reply to this post by GitBox

MarvinLitt commented on a change in pull request #3855:
URL: https://github.com/apache/carbondata/pull/3855#discussion_r460346573



##########
File path: integration/spark/src/main/scala/org/apache/carbondata/indexserver/IndexServer.scala
##########
@@ -316,4 +324,17 @@ object IndexServer extends ServerInterface {
       Array(new Service("security.indexserver.protocol.acl", classOf[ServerInterface]))
     }
   }
+
+  def startAgingFolders(): Unit = {
+    val runnable = new Runnable() {
+      def run() {
+        val age = System.currentTimeMillis() - agePeriod.toLong
+        CarbonUtil.agingTempFolderForIndexServer(age)
+        LOGGER.info(s"Complete age temp folder ${CarbonUtil.getIndexServerTempPath}")
+      }
+    }
+    val ags: ScheduledExecutorService = Executors.newSingleThreadScheduledExecutor
+    ags.scheduleAtFixedRate(runnable, 1000, 3600000, TimeUnit.MICROSECONDS)

Review comment:
       delay 1 hour may be to late, if something wrong happens, 1hour is enough to cause a relatively large impcact。
   may be we can delay for 5 min, what do you think?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] MarvinLitt commented on a change in pull request #3855: [CARBONDATA-3863], after using index service clean the temp data

GitBox
In reply to this post by GitBox

MarvinLitt commented on a change in pull request #3855:
URL: https://github.com/apache/carbondata/pull/3855#discussion_r460346871



##########
File path: core/src/main/java/org/apache/carbondata/core/datastore/filesystem/LocalCarbonFile.java
##########
@@ -485,4 +486,30 @@ public boolean equals(Object o) {
   public int hashCode() {
     return Objects.hash(file.getAbsolutePath());
   }
+
+  @Override
+  public List<CarbonFile> listDirs() throws IOException {
+    if (!file.isDirectory()) {
+      return new ArrayList<CarbonFile>();
+    }
+    Collection<File> fileCollection = FileUtils.listFilesAndDirs(file,
+            DirectoryFileFilter.DIRECTORY, null);
+    if (fileCollection.isEmpty()) {
+      return new ArrayList<CarbonFile>();
+    }
+    List<CarbonFile> carbonFiles = new ArrayList<CarbonFile>();
+    for (File file : fileCollection) {
+      if (file.isDirectory()) {
+        File[] files = file.listFiles();

Review comment:
       try to find all folders from  /tmp/indexservertmp.
   no  list recursive, just find the direct directory under /tmp/indexservertmp.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] MarvinLitt commented on a change in pull request #3855: [CARBONDATA-3863], after using index service clean the temp data

GitBox
In reply to this post by GitBox

MarvinLitt commented on a change in pull request #3855:
URL: https://github.com/apache/carbondata/pull/3855#discussion_r460666952



##########
File path: integration/spark/src/main/scala/org/apache/carbondata/indexserver/IndexServer.scala
##########
@@ -316,4 +324,17 @@ object IndexServer extends ServerInterface {
       Array(new Service("security.indexserver.protocol.acl", classOf[ServerInterface]))
     }
   }
+
+  def startAgingFolders(): Unit = {
+    val runnable = new Runnable() {
+      def run() {
+        val age = System.currentTimeMillis() - agePeriod.toLong
+        CarbonUtil.agingTempFolderForIndexServer(age)
+        LOGGER.info(s"Complete age temp folder ${CarbonUtil.getIndexServerTempPath}")
+      }
+    }
+    val ags: ScheduledExecutorService = Executors.newSingleThreadScheduledExecutor
+    ags.scheduleAtFixedRate(runnable, 1000, 3600000, TimeUnit.MICROSECONDS)
+    LOGGER.info("index server temp folders aging thread start")

Review comment:
       under run func there already has logs.
         def run() {
           val age = System.currentTimeMillis() - agePeriod.toLong
           CarbonUtil.agingTempFolderForIndexServer(age)
           LOGGER.info(s"Complete age temp folder ${CarbonUtil.getIndexServerTempPath}")
         }




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] MarvinLitt commented on a change in pull request #3855: [CARBONDATA-3863], after using index service clean the temp data

GitBox
In reply to this post by GitBox

MarvinLitt commented on a change in pull request #3855:
URL: https://github.com/apache/carbondata/pull/3855#discussion_r460667653



##########
File path: integration/spark/src/main/scala/org/apache/carbondata/indexserver/IndexServer.scala
##########
@@ -316,4 +324,17 @@ object IndexServer extends ServerInterface {
       Array(new Service("security.indexserver.protocol.acl", classOf[ServerInterface]))
     }
   }
+
+  def startAgingFolders(): Unit = {

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] MarvinLitt commented on a change in pull request #3855: [CARBONDATA-3863], after using index service clean the temp data

GitBox
In reply to this post by GitBox

MarvinLitt commented on a change in pull request #3855:
URL: https://github.com/apache/carbondata/pull/3855#discussion_r460675486



##########
File path: integration/spark/src/main/scala/org/apache/carbondata/indexserver/IndexServer.scala
##########
@@ -316,4 +324,17 @@ object IndexServer extends ServerInterface {
       Array(new Service("security.indexserver.protocol.acl", classOf[ServerInterface]))
     }
   }
+
+  def startAgingFolders(): Unit = {
+    val runnable = new Runnable() {
+      def run() {
+        val age = System.currentTimeMillis() - agePeriod.toLong
+        CarbonUtil.agingTempFolderForIndexServer(age)
+        LOGGER.info(s"Complete age temp folder ${CarbonUtil.getIndexServerTempPath}")
+      }
+    }
+    val ags: ScheduledExecutorService = Executors.newSingleThreadScheduledExecutor
+    ags.scheduleAtFixedRate(runnable, 1000, 3600000, TimeUnit.MICROSECONDS)

Review comment:
       the rate is 3 hours, about the delay time, i thinks it is ok, delay 1s or delay 5min or delay 1hour the effect is almost the same.
   The test cases are covered here. If there is too much delay, the execution of test cases will be affected.
   so kunal is there no need to modify the delay here?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3855: [CARBONDATA-3863], after using index service clean the temp data

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3855:
URL: https://github.com/apache/carbondata/pull/3855#issuecomment-664237649


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1757/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3855: [CARBONDATA-3863], after using index service clean the temp data

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3855:
URL: https://github.com/apache/carbondata/pull/3855#issuecomment-664239278


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3499/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] MarvinLitt commented on pull request #3855: [CARBONDATA-3863], after using index service clean the temp data

GitBox
In reply to this post by GitBox

MarvinLitt commented on pull request #3855:
URL: https://github.com/apache/carbondata/pull/3855#issuecomment-666257057


   @kunal642 @vikramahuja1001  please check this pr


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


12