[GitHub] [carbondata] Indhumathi27 opened a new pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

classic Classic list List threaded Threaded
209 messages Options
1 ... 567891011
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3584: [WIP][CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

GitBox
CarbonDataQA1 commented on issue #3584: [WIP][CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#issuecomment-597153843
 
 
   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2413/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3584: [WIP][CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3584: [WIP][CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#issuecomment-597235500
 
 
   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2414/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3584: [WIP][CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3584: [WIP][CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#issuecomment-597243014
 
 
   Build Failed  with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/707/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3584: [WIP][CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3584: [WIP][CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#issuecomment-597481876
 
 
   Build Failed  with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/710/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3584: [WIP][CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3584: [WIP][CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#issuecomment-597481963
 
 
   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2417/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3584: [WIP][CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

GitBox
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3584: [WIP][CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r390808808
 
 

 ##########
 File path: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
 ##########
 @@ -390,9 +399,9 @@ object CarbonDataRDDFactory {
                 carbonLoadModel,
                 hadoopConf)
             } else if (dataFrame.isDefined) {
-              loadDataFrame(sqlContext, dataFrame, None, carbonLoadModel)
+              loadDataFrame(sqlContext, dataFrame, None, carbonLoadModel, segmentMinMaxAccumulator)
 
 Review comment:
   For compaction, we are creating a new load, where we recalculate min and max for writing new index file. I have handled getting segmentmetadata for merged load in compaction flow

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3584: [WIP][CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

GitBox
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3584: [WIP][CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r390809032
 
 

 ##########
 File path: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
 ##########
 @@ -390,9 +399,9 @@ object CarbonDataRDDFactory {
                 carbonLoadModel,
                 hadoopConf)
             } else if (dataFrame.isDefined) {
-              loadDataFrame(sqlContext, dataFrame, None, carbonLoadModel)
+              loadDataFrame(sqlContext, dataFrame, None, carbonLoadModel, segmentMinMaxAccumulator)
 
 Review comment:
   For compaction, we are creating a new load, where we recalculate min and max for writing new index file. I have handled getting segmentmetadata for merged load in compaction flow

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3584: [WIP][CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

GitBox
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3584: [WIP][CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r390809121
 
 

 ##########
 File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
 ##########
 @@ -2406,4 +2406,14 @@ private CarbonCommonConstants() {
   public static final String BUCKET_COLUMNS = "bucket_columns";
   public static final String BUCKET_NUMBER = "bucket_number";
 
+  /**
+   * Load all indexes to carbon LRU cache
+   */
+  public static final String CARBON_LOAD_ALL_INDEX_TO_CACHE = "carbon.load.all.indexes.to.cache";
 
 Review comment:
   done

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3584: [WIP][CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

GitBox
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3584: [WIP][CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r390809178
 
 

 ##########
 File path: core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java
 ##########
 @@ -135,7 +135,7 @@ public CarbonTable getTable() {
     int datamapsCount = 0;
     // In case if filter has matched partitions, then update the segments with datamap's
     // segment list, as getDataMaps will return segments that matches the partition.
-    if (null != partitions && !partitions.isEmpty()) {
+    if (null != partitions && !partitions.isEmpty() || (null != filter && !filter.isEmpty())) {
 
 Review comment:
   changed

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3584: [WIP][CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

GitBox
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3584: [WIP][CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r390809253
 
 

 ##########
 File path: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
 ##########
 @@ -479,7 +488,17 @@ object CarbonDataRDDFactory {
             segmentDetails.add(new Segment(resultOfBlock._2._1.getLoadName))
           }
         }
-        val segmentFiles = updateSegmentFiles(carbonTable, segmentDetails, updateModel.get)
+        var segmentMinMaxMap: Map[String, List[SegmentMinMax]] = Map()
+        if (!segmentMinMaxAccumulator.isZero) {
+          segmentMinMaxAccumulator.value.asScala.foreach(map => if (map.nonEmpty) {
+            segmentMinMaxMap = segmentMinMaxMap ++ map
 
 Review comment:
   done

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3584: [WIP][CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

GitBox
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3584: [WIP][CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r390809324
 
 

 ##########
 File path: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
 ##########
 @@ -375,7 +380,11 @@ object CarbonDataRDDFactory {
                 carbonLoadModel,
                 hadoopConf)
             } else {
-              loadDataFrame(sqlContext, None, Some(convertedRdd), carbonLoadModel)
+              loadDataFrame(sqlContext,
 
 Review comment:
   handled

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3584: [WIP][CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

GitBox
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3584: [WIP][CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r390809442
 
 

 ##########
 File path: core/src/main/java/org/apache/carbondata/core/util/SegmentBlockMinMaxInfo.java
 ##########
 @@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.util;
+
+import java.io.Serializable;
+
+/**
+ * Represent min, max and alter sort column properties for each column in a block
+ */
+public class SegmentBlockMinMaxInfo implements Serializable {
 
 Review comment:
   changed

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3584: [WIP][CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

GitBox
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3584: [WIP][CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r390809550
 
 

 ##########
 File path: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
 ##########
 @@ -479,7 +488,17 @@ object CarbonDataRDDFactory {
             segmentDetails.add(new Segment(resultOfBlock._2._1.getLoadName))
           }
         }
-        val segmentFiles = updateSegmentFiles(carbonTable, segmentDetails, updateModel.get)
+        var segmentMinMaxMap: Map[String, List[SegmentMinMax]] = Map()
+        if (!segmentMinMaxAccumulator.isZero) {
+          segmentMinMaxAccumulator.value.asScala.foreach(map => if (map.nonEmpty) {
+            segmentMinMaxMap = segmentMinMaxMap ++ map
+          })
+        }
+        val segmentFiles = updateSegmentFiles(carbonTable,
 
 Review comment:
   Handled. Please check

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3584: [WIP][CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3584: [WIP][CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#issuecomment-597552112
 
 
   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2423/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3584: [WIP][CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3584: [WIP][CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#issuecomment-597552925
 
 
   Build Failed  with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/716/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3584: [WIP][CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3584: [WIP][CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#issuecomment-597642010
 
 
   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2428/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3584: [WIP][CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3584: [WIP][CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#issuecomment-597644958
 
 
   Build Failed  with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/721/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on issue #3584: [WIP][CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

GitBox
In reply to this post by GitBox
Indhumathi27 commented on issue #3584: [WIP][CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#issuecomment-598566251
 
 
   retest this please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3584: [WIP][CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3584: [WIP][CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#issuecomment-598591445
 
 
   Build Failed  with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/743/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3584: [WIP][CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3584: [WIP][CARBONDATA-3718] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#issuecomment-598592445
 
 
   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2451/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
1 ... 567891011