[GitHub] [carbondata] vikramahuja1001 opened a new pull request #3678: [WIP]: index server concurrency fix

classic Classic list List threaded Threaded
32 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3678: [WIP]: index server concurrency fix

GitBox
CarbonDataQA1 commented on issue #3678: [WIP]: index server concurrency fix
URL: https://github.com/apache/carbondata/pull/3678#issuecomment-604444324
 
 
   Build Failed  with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/860/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3678: [WIP]: index server concurrency fix

GitBox
In reply to this post by GitBox
vikramahuja1001 commented on a change in pull request #3678: [WIP]: index server concurrency fix
URL: https://github.com/apache/carbondata/pull/3678#discussion_r407479607
 
 

 ##########
 File path: integration/spark/src/main/scala/org/apache/carbondata/indexserver/DistributedCountRDD.scala
 ##########
 @@ -69,15 +70,17 @@ class DistributedCountRDD(@transient ss: SparkSession, dataMapFormat: Distributa
       DataMapStoreManager.getInstance().clearInvalidSegments(dataMapFormat.getCarbonTable,
         dataMapFormat.getInvalidSegments)
     }
+    val globalQueue = SegmentProcessor.getInstance()
     val futures = if (inputSplits.length <= numOfThreads) {
       inputSplits.map {
-        split => generateFuture(Seq(split))
+        split => generateFuture(Seq(split), globalQueue)
       }
     } else {
       DistributedRDDUtils.groupSplits(inputSplits, numOfThreads).map {
-        splits => generateFuture(splits)
+        splits => generateFuture(splits, globalQueue)
       }
     }
+    globalQueue.emptyQueue()
 
 Review comment:
   done

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3678: [WIP]: index server concurrency fix

GitBox
In reply to this post by GitBox
vikramahuja1001 commented on a change in pull request #3678: [WIP]: index server concurrency fix
URL: https://github.com/apache/carbondata/pull/3678#discussion_r407479647
 
 

 ##########
 File path: integration/spark/src/main/scala/org/apache/carbondata/indexserver/DistributedCountRDD.scala
 ##########
 @@ -96,20 +99,78 @@ class DistributedCountRDD(@transient ss: SparkSession, dataMapFormat: Distributa
     new DistributedPruneRDD(ss, dataMapFormat).partitions
   }
 
-  private def generateFuture(split: Seq[InputSplit])
+  private def generateFuture(split: Seq[InputSplit], globalQueue: SegmentProcessor)
     (implicit executionContext: ExecutionContext) = {
     Future {
-      val segments = split.map { inputSplit =>
+
+      var segmentsWorkStatus = split.map { inputSplit =>
         val distributable = inputSplit.asInstanceOf[DataMapDistributableWrapper]
         distributable.getDistributable.getSegment
           .setReadCommittedScope(dataMapFormat.getReadCommittedScope)
-        distributable.getDistributable.getSegment
+
+        val processedSegments = globalQueue.ifProcessSegment(distributable.getDistributable
+          .getSegment.getSegmentNo, dataMapFormat.getCarbonTable.getTableId)
+
+        val segmentWorkStatusList = new SegmentWorkStatus(distributable.getDistributable
+          .getSegment, !processedSegments)
+
+        // if ifprocesssegment = true, iswaiting = false
+        val processedSegmentsList = globalQueue.processSegment(segmentWorkStatusList,
+          dataMapFormat.getCarbonTable.getTableId)
+        segmentWorkStatusList
+      }
+
+      val queueSize = globalQueue.queueSize()
+      val getGlobalworkQueue = globalQueue.getGlobalWorkQueue
+
+      var segmentsPositive: mutable.HashSet[SegmentWorkStatus] = mutable.HashSet.empty
+      var segmentsNegative: mutable.HashSet[SegmentWorkStatus] = mutable.HashSet.empty
+
+      segmentsWorkStatus.map { iter =>
+        if (iter.getWaiting == false) {
+          segmentsPositive.add(iter)
+        } else {
+          segmentsNegative.add(iter)
+        }
       }
+
       val defaultDataMap = DataMapStoreManager.getInstance
         .getDataMap(dataMapFormat.getCarbonTable, split.head
           .asInstanceOf[DataMapDistributableWrapper].getDistributable.getDataMapSchema)
-      defaultDataMap.getBlockRowCount(segments.toList.asJava, dataMapFormat
-        .getPartitions, defaultDataMap).asScala
+      var result = defaultDataMap.getBlockRowCount(segmentsPositive.map { iter =>
+        iter.getSegment }.toList.asJava, dataMapFormat.getPartitions,
+        defaultDataMap).asScala
+
+      //  delete from local
+      var segment = segmentsPositive.map { iter =>
+        globalQueue.updateWaitingStatus(iter, defaultDataMap.getTable.getTableId)
+      }
+
+      while (segmentsNegative != null && segmentsNegative.size != 0) {
+        segmentsWorkStatus = segmentsNegative.map { iter =>
+          val processedSegments = globalQueue.ifProcessSegment(iter.getSegment
+            .getSegmentNo, defaultDataMap.getTable.getTableId)
+          val processedSegmentsList = globalQueue.processSegment(iter,
+            dataMapFormat.getCarbonTable.getTableId)
+          iter
+        }.toSeq
+        segmentsPositive = mutable.HashSet.empty
+        segmentsNegative = mutable.HashSet.empty
+        segmentsWorkStatus.map { iter =>
+          if (iter.getWaiting == false) {
+            segmentsPositive.add(iter)
+          } else {
+            segmentsNegative.add(iter)
+          }
+        }
+        result = result.++(defaultDataMap.getBlockRowCount(segmentsPositive
 
 Review comment:
   done

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3678: [WIP]: index server concurrency fix

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3678: [WIP]: index server concurrency fix
URL: https://github.com/apache/carbondata/pull/3678#issuecomment-612900566
 
 
   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1015/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3678: [WIP]: index server concurrency fix

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3678: [WIP]: index server concurrency fix
URL: https://github.com/apache/carbondata/pull/3678#issuecomment-612901063
 
 
   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2727/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3678: [WIP]: index server concurrency fix

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3678: [WIP]: index server concurrency fix
URL: https://github.com/apache/carbondata/pull/3678#issuecomment-612902649
 
 
   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1016/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3678: [WIP]: index server concurrency fix

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3678: [WIP]: index server concurrency fix
URL: https://github.com/apache/carbondata/pull/3678#issuecomment-612903081
 
 
   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2728/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3678: [WIP]: index server concurrency fix

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3678: [WIP]: index server concurrency fix
URL: https://github.com/apache/carbondata/pull/3678#issuecomment-612932705
 
 
   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1017/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3678: [WIP]: index server concurrency fix

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3678: [WIP]: index server concurrency fix
URL: https://github.com/apache/carbondata/pull/3678#issuecomment-612932883
 
 
   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2729/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3678: [WIP]: index server concurrency fix

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3678: [WIP]: index server concurrency fix
URL: https://github.com/apache/carbondata/pull/3678#issuecomment-613006776
 
 
   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2730/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3678: [WIP]: index server concurrency fix

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3678: [WIP]: index server concurrency fix
URL: https://github.com/apache/carbondata/pull/3678#issuecomment-613012464
 
 
   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1018/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] vikramahuja1001 commented on issue #3678: [WIP]: index server concurrency fix

GitBox
In reply to this post by GitBox
vikramahuja1001 commented on issue #3678: [WIP]: index server concurrency fix
URL: https://github.com/apache/carbondata/pull/3678#issuecomment-613253747
 
 
   @kunal642 , please check

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
12