[GitHub] [carbondata] VenuReddy2103 opened a new pull request #4110: [WIP]Secondary Index as a coarse grain datamap

classic Classic list List threaded Threaded
97 messages Options
12345
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

GitBox

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-812213423


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5114/
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-812216965


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3363/
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] VenuReddy2103 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

GitBox
In reply to this post by GitBox

VenuReddy2103 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-812327923


   retest this please


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-812363048


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3364/
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-812364945


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5115/
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-813362446


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3373/
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-813362851


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5124/
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-813821344


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3376/
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-813822392


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5127/
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r607445465



##########
File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##########
@@ -2316,6 +2316,12 @@ private CarbonCommonConstants() {
   @CarbonProperty(dynamicConfigurable = true)
   public static final String CARBON_ENABLE_INDEX_SERVER = "carbon.enable.index.server";
 
+  @CarbonProperty(dynamicConfigurable = true)
+  public static final String CARBON_REWRITE_PLAN_WITH_SECONDARY_INDEX =

Review comment:
       please update this in the document also (.md file) and add a comment to explain the property

##########
File path: core/src/main/java/org/apache/carbondata/core/index/secondaryindex/CarbonCostBasedOptimizer.java
##########
@@ -15,15 +15,49 @@
  * limitations under the License.
  */
 
-package org.apache.spark.sql.secondaryindex.optimizer;
+package org.apache.carbondata.core.index.secondaryindex;
 
+import java.io.IOException;
 import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashMap;
 import java.util.HashSet;
 import java.util.List;
 import java.util.Map;
 import java.util.Set;
 
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.metadata.index.IndexType;
+import org.apache.carbondata.core.metadata.schema.indextable.IndexMetadata;
+import org.apache.carbondata.core.metadata.schema.table.TableInfo;
+
 public class CarbonCostBasedOptimizer {
+  public static Map<String, List<String>> getSecondaryIndexes(TableInfo tableInfo) {
+    Map<String, List<String>> indexes = new HashMap();
+    String indexMeta =
+        tableInfo.getFactTable().getTableProperties().get(tableInfo.getFactTable().getTableId());
+    IndexMetadata indexMetadata = null;
+    if (null != indexMeta) {
+      try {
+        indexMetadata = IndexMetadata.deserialize(indexMeta);
+      } catch (IOException e) {
+        e.printStackTrace();

Review comment:
       add a error log instead of printing the trace.

##########
File path: core/src/main/java/org/apache/carbondata/core/util/SessionParams.java
##########
@@ -217,7 +218,9 @@ private boolean validateKeyValue(String key, String value) throws InvalidConfigu
           }
         } else if (key.startsWith(CarbonCommonConstants.CARBON_INDEX_VISIBLE)) {
           isValid = true;
-        } else if (key.startsWith(CarbonCommonConstants.CARBON_LOAD_INDEXES_PARALLEL)) {
+        } else if (key.startsWith(CarbonCommonConstants.CARBON_LOAD_INDEXES_PARALLEL) || (
+            key.startsWith(CARBON_REWRITE_PLAN_WITH_SECONDARY_INDEX)
+                && key.split("\\.").length == 8)) {

Review comment:
       why this is compared with hardcoded value ? please explain the logic in comments

##########
File path: integration/presto/src/main/prestosql/org/apache/carbondata/presto/impl/CarbonTableReader.java
##########
@@ -265,6 +265,7 @@ private CarbonTableCacheModel getValidCacheBySchemaTableName(SchemaTableName sch
     config.set("query.id", queryId);
     CarbonInputFormat.setTransactionalTable(config, carbonTable.isTransactionalTable());
     CarbonInputFormat.setTableInfo(config, carbonTable.getTableInfo());
+    CarbonInputFormat.checkAndSetSecondaryIndexPruning(carbonTable.getTableInfo(), filters, config);

Review comment:
       same as above for prestosql also.

##########
File path: integration/spark/src/main/scala/org/apache/carbondata/index/secondary/SecondaryIndex.java
##########
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.index.secondary;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.index.IndexUtil;
+import org.apache.carbondata.core.index.dev.IndexModel;
+import org.apache.carbondata.core.index.dev.cgindex.CoarseGrainIndex;
+import org.apache.carbondata.core.indexstore.Blocklet;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecutor;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import org.apache.carbondata.index.secondary.SecondaryIndexModel.PositionReferenceInfo;
+
+import org.apache.log4j.Logger;
+
+import static org.apache.carbondata.core.util.path.CarbonTablePath.BATCH_PREFIX;
+
+/**
+ * Secondary Index to prune at blocklet level.
+ */
+public class SecondaryIndex extends CoarseGrainIndex {
+  private static final Logger LOGGER =
+      LogServiceFactory.getLogService(SecondaryIndex.class.getName());
+  String indexName;
+  String currentSegmentId;
+  List<String> validSegmentIds;
+  PositionReferenceInfo positionReferenceInfo;
+
+  @Override
+  public void init(IndexModel indexModel) {
+    assert (indexModel instanceof SecondaryIndexModel);
+    SecondaryIndexModel model = (SecondaryIndexModel) indexModel;
+    indexName = model.indexName;
+    currentSegmentId = model.currentSegmentId;
+    validSegmentIds = model.validSegmentIds;
+    positionReferenceInfo = model.positionReferenceInfo;
+  }
+
+  private Set<String> getPositionReferences(String databaseName, String indexName,
+      Expression expression) {
+    synchronized (positionReferenceInfo) {

Review comment:
       please make synchronize object as final.

##########
File path: integration/spark/src/main/scala/org/apache/carbondata/index/secondary/SecondaryIndexFactory.java
##########
@@ -0,0 +1,172 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.index.secondary;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.common.exceptions.sql.MalformedIndexCommandException;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.features.TableOperation;
+import org.apache.carbondata.core.index.IndexFilter;
+import org.apache.carbondata.core.index.IndexInputSplit;
+import org.apache.carbondata.core.index.IndexMeta;
+import org.apache.carbondata.core.index.Segment;
+import org.apache.carbondata.core.index.dev.IndexBuilder;
+import org.apache.carbondata.core.index.dev.IndexWriter;
+import org.apache.carbondata.core.index.dev.cgindex.CoarseGrainIndex;
+import org.apache.carbondata.core.index.dev.cgindex.CoarseGrainIndexFactory;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import org.apache.carbondata.core.metadata.schema.table.IndexSchema;
+import org.apache.carbondata.core.scan.filter.intf.ExpressionType;
+import org.apache.carbondata.events.Event;
+import org.apache.carbondata.index.secondary.SecondaryIndexModel.PositionReferenceInfo;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.Logger;
+
+/**
+ * Index Factory for Secondary Index.
+ */
+@InterfaceAudience.Internal
+public class SecondaryIndexFactory extends CoarseGrainIndexFactory {
+  private static final Logger LOGGER =
+      LogServiceFactory.getLogService(SecondaryIndexFactory.class.getName());
+  private IndexMeta indexMeta;
+
+  public SecondaryIndexFactory(CarbonTable carbonTable, IndexSchema indexSchema)
+      throws MalformedIndexCommandException {
+    super(carbonTable, indexSchema);
+    List<ExpressionType> operations = new ArrayList(Arrays.asList(ExpressionType.values()));
+    indexMeta = new IndexMeta(indexSchema.getIndexName(),
+        carbonTable.getIndexedColumns(indexSchema.getIndexColumns()), operations);
+    LOGGER.info("Created Secondary Index Factory instance for " + indexSchema.getIndexName());
+  }
+
+  @Override
+  public IndexWriter createWriter(Segment segment, String shardName,
+      SegmentProperties segmentProperties) throws IOException {
+    throw new UnsupportedOperationException("Not supported for Secondary Index");
+  }
+
+  @Override
+  public IndexBuilder createBuilder(Segment segment, String shardName,
+      SegmentProperties segmentProperties) throws IOException {
+    throw new UnsupportedOperationException("Not supported for Secondary Index");
+  }
+
+  @Override
+  public IndexMeta getMeta() {
+    return indexMeta;
+  }
+
+  private Map<Segment, List<CoarseGrainIndex>> getIndexes(List<Segment> segments,
+      PositionReferenceInfo positionReferenceInfo) throws IOException {
+    Map<Segment, List<CoarseGrainIndex>> indexes = new HashMap<>();
+    List<String> allSegmentIds =
+        segments.stream().map(Segment::getSegmentNo).collect(Collectors.toList());
+    for (Segment segment : segments) {
+      indexes.put(segment, this.getIndexes(segment, allSegmentIds, positionReferenceInfo));
+    }
+    return indexes;
+  }
+
+  private List<CoarseGrainIndex> getIndexes(Segment segment, List<String> allSegmentIds,
+      PositionReferenceInfo positionReferenceInfo) throws IOException {
+    List<CoarseGrainIndex> indexes = new ArrayList<>();
+    SecondaryIndex secondaryIndex = new SecondaryIndex();
+    secondaryIndex.init(
+        new SecondaryIndexModel(getIndexSchema().getIndexName(), segment.getSegmentNo(),
+            allSegmentIds, positionReferenceInfo, segment.getConfiguration()));
+    indexes.add(secondaryIndex);
+    return indexes;
+  }
+
+  @Override
+  public Map<Segment, List<CoarseGrainIndex>> getIndexes(List<Segment> segments, IndexFilter filter)
+      throws IOException {
+    return getIndexes(segments, new PositionReferenceInfo());
+  }
+
+  @Override
+  public Map<Segment, List<CoarseGrainIndex>> getIndexes(List<Segment> segments,
+      Set<Path> partitionLocations, IndexFilter filter) throws IOException {
+    return getIndexes(segments, new PositionReferenceInfo());
+  }
+
+  @Override
+  public List<CoarseGrainIndex> getIndexes(Segment segment) throws IOException {
+    List<String> allSegmentIds = new ArrayList();

Review comment:
       In all the newly added code please check and modify ArrayList() -> ArrayList<>()
   and HashMap() -> HashMap<>() to avoid IDE warnings

##########
File path: integration/spark/src/main/scala/org/apache/carbondata/index/secondary/SecondaryIndex.java
##########
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.index.secondary;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.index.IndexUtil;
+import org.apache.carbondata.core.index.dev.IndexModel;
+import org.apache.carbondata.core.index.dev.cgindex.CoarseGrainIndex;
+import org.apache.carbondata.core.indexstore.Blocklet;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecutor;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import org.apache.carbondata.index.secondary.SecondaryIndexModel.PositionReferenceInfo;
+
+import org.apache.log4j.Logger;
+
+import static org.apache.carbondata.core.util.path.CarbonTablePath.BATCH_PREFIX;
+
+/**
+ * Secondary Index to prune at blocklet level.
+ */
+public class SecondaryIndex extends CoarseGrainIndex {
+  private static final Logger LOGGER =
+      LogServiceFactory.getLogService(SecondaryIndex.class.getName());
+  String indexName;
+  String currentSegmentId;
+  List<String> validSegmentIds;
+  PositionReferenceInfo positionReferenceInfo;
+
+  @Override
+  public void init(IndexModel indexModel) {
+    assert (indexModel instanceof SecondaryIndexModel);
+    SecondaryIndexModel model = (SecondaryIndexModel) indexModel;
+    indexName = model.indexName;
+    currentSegmentId = model.currentSegmentId;
+    validSegmentIds = model.validSegmentIds;
+    positionReferenceInfo = model.positionReferenceInfo;
+  }
+
+  private Set<String> getPositionReferences(String databaseName, String indexName,
+      Expression expression) {
+    synchronized (positionReferenceInfo) {
+      /* If the position references are not obtained yet(i.e., prune happening for the first valid
+      segment), then get them from the given index table with the given filter from all the valid
+      segments at once and store them as map of segmentId to set of position references in that
+      particular segment. Upon the subsequent prune for other segments, return the position
+      references for the respective segment from the map directly */
+      if (!positionReferenceInfo.fetched) {

Review comment:
       If two  SI query is in progress concurrently with a different filter expression, I think it will give wrong result  ? as segment id is same, we need to check same expression or not also ??

##########
File path: integration/spark/src/main/scala/org/apache/carbondata/index/secondary/SecondaryIndexModel.java
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.index.secondary;
+
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.carbondata.core.index.dev.IndexModel;
+
+import org.apache.hadoop.conf.Configuration;
+
+/**
+ * Secondary Index Model
+ */
+public class SecondaryIndexModel extends IndexModel {
+  String indexName; // Secondary Index name

Review comment:
       please keep the variables private and reformat the comments

##########
File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##########
@@ -2316,6 +2316,12 @@ private CarbonCommonConstants() {
   @CarbonProperty(dynamicConfigurable = true)
   public static final String CARBON_ENABLE_INDEX_SERVER = "carbon.enable.index.server";
 
+  @CarbonProperty(dynamicConfigurable = true)
+  public static final String CARBON_REWRITE_PLAN_WITH_SECONDARY_INDEX =
+      "carbon.rewrite.plan.with.secondary.index";
+
+  public static final String CARBON_REWRITE_PLAN_WITH_SECONDARY_INDEX_DEFAULT = "true";

Review comment:
       a) cluster test with index server and presto with SI is completed ?
   b) with spark and index server testing is done with this `property = false` ?

##########
File path: core/src/main/java/org/apache/carbondata/core/scan/expression/LiteralExpression.java
##########
@@ -58,7 +64,28 @@ public String getString() {
 
   @Override
   public String getStatement() {
-    return value == null ? null : value.toString();
+    boolean quoteString = false;
+    Object val = value;
+    if (val != null) {
+      if (dataType == DataTypes.STRING || val instanceof String) {
+        quoteString = true;
+      } else if (dataType == DataTypes.TIMESTAMP) {
+        SimpleDateFormat parser =
+            new SimpleDateFormat(CarbonCommonConstants.CARBON_TIMESTAMP_DEFAULT_FORMAT);
+        val = parser.format(new Date((Long) val / 1000L));
+        quoteString = true;

Review comment:
       I can see duplicate code in ExpressionResult#getStatement(), can we extract a common code and use in both ?

##########
File path: integration/spark/src/main/scala/org/apache/carbondata/index/secondary/SecondaryIndex.java
##########
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.index.secondary;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.index.IndexUtil;
+import org.apache.carbondata.core.index.dev.IndexModel;
+import org.apache.carbondata.core.index.dev.cgindex.CoarseGrainIndex;
+import org.apache.carbondata.core.indexstore.Blocklet;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecutor;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import org.apache.carbondata.index.secondary.SecondaryIndexModel.PositionReferenceInfo;
+
+import org.apache.log4j.Logger;
+
+import static org.apache.carbondata.core.util.path.CarbonTablePath.BATCH_PREFIX;
+
+/**
+ * Secondary Index to prune at blocklet level.
+ */
+public class SecondaryIndex extends CoarseGrainIndex {
+  private static final Logger LOGGER =
+      LogServiceFactory.getLogService(SecondaryIndex.class.getName());
+  String indexName;

Review comment:
       These class members can be private.

##########
File path: core/src/main/java/org/apache/carbondata/core/metadata/schema/indextable/IndexTableInfo.java
##########
@@ -157,15 +157,19 @@ public static String updateIndexColumns(String oldGsonData, String columnToBeUpd
     return toGson(indexTableInfos);
   }
 
-  public static String enableIndex(String oldIndexIno, String indexName) {
-    IndexTableInfo[] indexTableInfos = fromGson(oldIndexIno);
+  public static void setIndexStatus(IndexTableInfo[] indexTableInfos, String indexName,
+      IndexStatus status) {
     for (IndexTableInfo indexTableInfo : indexTableInfos) {
       if (indexTableInfo.tableName.equalsIgnoreCase(indexName)) {
-        Map<String, String> oldIndexProperties = indexTableInfo.indexProperties;
-        oldIndexProperties.put(CarbonCommonConstants.INDEX_STATUS, IndexStatus.ENABLED.name());
-        indexTableInfo.setIndexProperties(oldIndexProperties);
+        Map<String, String> indexProperties = indexTableInfo.indexProperties;
+        indexProperties.put(CarbonCommonConstants.INDEX_STATUS, status.name());

Review comment:
       may be no need to declare variable.
   
   indexTableInfo.indexProperties.put(CarbonCommonConstants.INDEX_STATUS, status.name());

##########
File path: integration/presto/src/main/prestodb/org/apache/carbondata/presto/impl/CarbonTableReader.java
##########
@@ -270,6 +270,7 @@ private CarbonTableCacheModel getValidCacheBySchemaTableName(SchemaTableName sch
     config.set("query.id", queryId);
     CarbonInputFormat.setTransactionalTable(config, carbonTable.isTransactionalTable());
     CarbonInputFormat.setTableInfo(config, carbonTable.getTableInfo());
+    CarbonInputFormat.checkAndSetSecondaryIndexPruning(carbonTable.getTableInfo(), filters, config);

Review comment:
       Sometimes SI may not give an advantage based on the data distribution.
   From presto if they don't want to use SI, is NI UDF is supported?
   Else use that carbon property here also to support disable from presto.

##########
File path: core/src/main/java/org/apache/carbondata/core/index/IndexStoreManager.java
##########
@@ -95,13 +94,10 @@ private IndexStoreManager() {
         .getIndexesMap().entrySet()) {
       for (Map.Entry<String, Map<String, String>> indexEntry : providerEntry.getValue()
           .entrySet()) {
-        if (!indexEntry.getValue().get(CarbonCommonConstants.INDEX_PROVIDER)
-            .equalsIgnoreCase(IndexType.SI.getIndexProviderName())) {
-          IndexSchema indexSchema = new IndexSchema(indexEntry.getKey(),
-              indexEntry.getValue().get(CarbonCommonConstants.INDEX_PROVIDER));
-          indexSchema.setProperties(indexEntry.getValue());
-          indexes.add(getIndex(carbonTable, indexSchema));
-        }
+        IndexSchema indexSchema = new IndexSchema(indexEntry.getKey(),

Review comment:
       Here we need to send SI as CG index only when SI PlanReWrite is false right ? where is that check ?

##########
File path: core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java
##########
@@ -2187,6 +2187,24 @@ public static boolean isFilterReorderingEnabled() {
     );
   }
 
+  /**
+   * Check whether plan rewrite with secondary index is enabled or not
+   */
+  public boolean isRewritePlanWithSecondaryIndex(String dbName, String tableName) {
+    String configuredValue = getSessionPropertyValue(
+        CarbonCommonConstants.CARBON_REWRITE_PLAN_WITH_SECONDARY_INDEX + "." + dbName + "."
+            + tableName);
+    if (configuredValue == null) {
+      configuredValue = getProperty(CarbonCommonConstants.CARBON_REWRITE_PLAN_WITH_SECONDARY_INDEX,
+          CarbonCommonConstants.CARBON_REWRITE_PLAN_WITH_SECONDARY_INDEX_DEFAULT);

Review comment:
       need to use dbname and table name here also while looking up ?

##########
File path: integration/spark/src/main/scala/org/apache/carbondata/index/secondary/SecondaryIndex.java
##########
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.index.secondary;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.index.IndexUtil;
+import org.apache.carbondata.core.index.dev.IndexModel;
+import org.apache.carbondata.core.index.dev.cgindex.CoarseGrainIndex;
+import org.apache.carbondata.core.indexstore.Blocklet;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecutor;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import org.apache.carbondata.index.secondary.SecondaryIndexModel.PositionReferenceInfo;
+
+import org.apache.log4j.Logger;
+
+import static org.apache.carbondata.core.util.path.CarbonTablePath.BATCH_PREFIX;
+
+/**
+ * Secondary Index to prune at blocklet level.
+ */
+public class SecondaryIndex extends CoarseGrainIndex {
+  private static final Logger LOGGER =
+      LogServiceFactory.getLogService(SecondaryIndex.class.getName());
+  String indexName;
+  String currentSegmentId;
+  List<String> validSegmentIds;
+  PositionReferenceInfo positionReferenceInfo;
+
+  @Override
+  public void init(IndexModel indexModel) {
+    assert (indexModel instanceof SecondaryIndexModel);
+    SecondaryIndexModel model = (SecondaryIndexModel) indexModel;
+    indexName = model.indexName;
+    currentSegmentId = model.currentSegmentId;
+    validSegmentIds = model.validSegmentIds;
+    positionReferenceInfo = model.positionReferenceInfo;
+  }
+
+  private Set<String> getPositionReferences(String databaseName, String indexName,
+      Expression expression) {
+    synchronized (positionReferenceInfo) {
+      /* If the position references are not obtained yet(i.e., prune happening for the first valid
+      segment), then get them from the given index table with the given filter from all the valid
+      segments at once and store them as map of segmentId to set of position references in that
+      particular segment. Upon the subsequent prune for other segments, return the position
+      references for the respective segment from the map directly */
+      if (!positionReferenceInfo.fetched) {
+        Object[] rows = IndexUtil.getPositionReferences(String
+            .format("select distinct positionReference from %s.%s where insegment('%s') and %s",
+                databaseName, indexName, String.join(",", validSegmentIds),
+                expression.getStatement()));
+        for (Object row : rows) {
+          String positionReference = (String) row;
+          int blockPathIndex = positionReference.indexOf("/");
+          String segmentId = positionReference.substring(0, blockPathIndex);
+          String blockPath = positionReference.substring(blockPathIndex + 1);
+          Set<String> blockPaths = positionReferenceInfo.segmentToPosReferences
+              .computeIfAbsent(segmentId, k -> new HashSet<String>());
+          blockPaths.add(blockPath);
+        }
+        positionReferenceInfo.fetched = true;
+      }
+    }
+    Set<String> blockPaths = positionReferenceInfo.segmentToPosReferences.get(currentSegmentId);
+    return blockPaths != null ? blockPaths : new HashSet<String>();
+  }
+
+  @Override
+  public List<Blocklet> prune(FilterResolverIntf filterExp, SegmentProperties segmentProperties,
+      FilterExecutor filterExecutor, CarbonTable carbonTable) {
+    Set<String> positionReferences = getPositionReferences(carbonTable.getDatabaseName(), indexName,
+        filterExp.getFilterExpression());
+    List<Blocklet> blocklets = new ArrayList();
+    for (String blockPath : positionReferences) {

Review comment:
       can we extract common code and reuse here ?

##########
File path: integration/spark/src/main/scala/org/apache/carbondata/index/secondary/SecondaryIndexModel.java
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.index.secondary;
+
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.carbondata.core.index.dev.IndexModel;
+
+import org.apache.hadoop.conf.Configuration;
+
+/**
+ * Secondary Index Model
+ */
+public class SecondaryIndexModel extends IndexModel {
+  String indexName; // Secondary Index name
+  String currentSegmentId; // Segment Id to prune
+  List<String> validSegmentIds; // All the valid segment Ids for Secondary Index pruning
+  PositionReferenceInfo positionReferenceInfo; // Position reference information
+
+  public SecondaryIndexModel(String indexName, String currentSegmentId,
+      List<String> validSegmentIds, PositionReferenceInfo positionReferenceInfo,
+      Configuration configuration) {
+    super(null, configuration);
+    this.indexName = indexName;
+    this.currentSegmentId = currentSegmentId;
+    this.validSegmentIds = validSegmentIds;
+    this.positionReferenceInfo = positionReferenceInfo;
+  }
+
+  /**
+   * Position Reference information
+   */
+  public static class PositionReferenceInfo {

Review comment:
       As mentioned above, need to keep per expression , per segment may be instead of just per segment.




--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kunal642 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

GitBox
In reply to this post by GitBox

kunal642 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r607040648



##########
File path: core/src/main/java/org/apache/carbondata/core/index/secondaryindex/CarbonCostBasedOptimizer.java
##########
@@ -15,15 +15,49 @@
  * limitations under the License.
  */
 
-package org.apache.spark.sql.secondaryindex.optimizer;
+package org.apache.carbondata.core.index.secondaryindex;
 
+import java.io.IOException;
 import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashMap;
 import java.util.HashSet;
 import java.util.List;
 import java.util.Map;
 import java.util.Set;
 
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.metadata.index.IndexType;
+import org.apache.carbondata.core.metadata.schema.indextable.IndexMetadata;
+import org.apache.carbondata.core.metadata.schema.table.TableInfo;
+
 public class CarbonCostBasedOptimizer {
+  public static Map<String, List<String>> getSecondaryIndexes(TableInfo tableInfo) {

Review comment:
       this method should be called identifyRequiredTables(Set<String> filterAttributes, TableInfo tableInfo)... internally you can extract the table info string and pass to the other method to get the valid SI list

##########
File path: core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java
##########
@@ -2187,6 +2187,24 @@ public static boolean isFilterReorderingEnabled() {
     );
   }
 
+  /**
+   * Check whether plan rewrite with secondary index is enabled or not
+   */
+  public boolean isRewritePlanWithSecondaryIndex(String dbName, String tableName) {
+    String configuredValue = getSessionPropertyValue(

Review comment:
       I feel this should be based on whether the query is from a non spark engine

##########
File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##########
@@ -2316,6 +2316,12 @@ private CarbonCommonConstants() {
   @CarbonProperty(dynamicConfigurable = true)
   public static final String CARBON_ENABLE_INDEX_SERVER = "carbon.enable.index.server";
 
+  @CarbonProperty(dynamicConfigurable = true)
+  public static final String CARBON_REWRITE_PLAN_WITH_SECONDARY_INDEX =
+      "carbon.rewrite.plan.with.secondary.index";

Review comment:
       add this new property in docs

##########
File path: integration/spark/src/main/scala/org/apache/carbondata/index/secondary/SecondaryIndex.java
##########
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.index.secondary;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.index.IndexUtil;
+import org.apache.carbondata.core.index.dev.IndexModel;
+import org.apache.carbondata.core.index.dev.cgindex.CoarseGrainIndex;
+import org.apache.carbondata.core.indexstore.Blocklet;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecutor;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import org.apache.carbondata.index.secondary.SecondaryIndexModel.PositionReferenceInfo;
+
+import org.apache.log4j.Logger;
+
+import static org.apache.carbondata.core.util.path.CarbonTablePath.BATCH_PREFIX;
+
+/**
+ * Secondary Index to prune at blocklet level.
+ */
+public class SecondaryIndex extends CoarseGrainIndex {
+  private static final Logger LOGGER =
+      LogServiceFactory.getLogService(SecondaryIndex.class.getName());
+  String indexName;
+  String currentSegmentId;
+  List<String> validSegmentIds;
+  PositionReferenceInfo positionReferenceInfo;
+
+  @Override
+  public void init(IndexModel indexModel) {
+    assert (indexModel instanceof SecondaryIndexModel);
+    SecondaryIndexModel model = (SecondaryIndexModel) indexModel;
+    indexName = model.indexName;
+    currentSegmentId = model.currentSegmentId;
+    validSegmentIds = model.validSegmentIds;
+    positionReferenceInfo = model.positionReferenceInfo;
+  }
+
+  private Set<String> getPositionReferences(String databaseName, String indexName,
+      Expression expression) {
+    synchronized (positionReferenceInfo) {

Review comment:
       there is no need for this as this will be a sequential call

##########
File path: integration/spark/src/main/scala/org/apache/carbondata/indexserver/DistributedRDDUtils.scala
##########
@@ -394,4 +397,35 @@ object DistributedRDDUtils {
       }
     }
   }
+
+  def pruneOnDriver(request: IndexInputFormat): ExtendedBlockletWrapper = {
+    val job = CarbonSparkUtil.createHadoopJob()
+    val conf = job.getConfiguration
+    val tableInfo = request.getCarbonTable.getTableInfo
+    val identifier = request.getCarbonTable.getAbsoluteTableIdentifier
+    val indexFilter = new IndexFilter(request.getCarbonTable,
+      request.getFilterResolverIntf.getFilterExpression)
+    CarbonInputFormat.setTableInfo(conf, tableInfo)
+    CarbonInputFormat.setFilterPredicates(conf, indexFilter)
+    CarbonInputFormat.setDatabaseName(conf, tableInfo.getDatabaseName)
+    CarbonInputFormat.setTableName(conf, tableInfo.getFactTable.getTableName)
+    CarbonInputFormat.setPartitionsToPrune(conf, request.getPartitions)
+    CarbonInputFormat.setTransactionalTable(conf, tableInfo.isTransactionalTable)
+    CarbonInputFormat.setTablePath(conf,
+      identifier.appendWithLocalPrefix(identifier.getTablePath))
+    CarbonInputFormat.setQuerySegment(conf, identifier)
+    CarbonInputFormat.setColumnProjection(conf, Array("positionId"))
+    CarbonInputFormat.setReadCommittedScope(conf, request.getReadCommittedScope)
+    CarbonInputFormat.setSegmentsToAccess(conf, request.getValidSegments)
+    CarbonInputFormat.setValidateSegmentsToAccess(conf, false)
+    CarbonInputFormat.setSecondaryIndexPruning(conf, request.isSIPruningEnabled)
+    val blocklets = new CarbonTableInputFormat[Object].getPrunedBlocklets(job,

Review comment:
       set useIndexServer as false in job and getPrunedBlocklets should check for "useIndexServer" to decide whether to go for indexServer or not.
   
   Otherwise if user sets indexServerEnable to true in carbon.properties for indexserver application then this flow can go in a infinite loop.

##########
File path: integration/spark/src/main/scala/org/apache/carbondata/index/secondary/SecondaryIndex.java
##########
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.index.secondary;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.index.IndexUtil;
+import org.apache.carbondata.core.index.dev.IndexModel;
+import org.apache.carbondata.core.index.dev.cgindex.CoarseGrainIndex;
+import org.apache.carbondata.core.indexstore.Blocklet;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecutor;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import org.apache.carbondata.index.secondary.SecondaryIndexModel.PositionReferenceInfo;
+
+import org.apache.log4j.Logger;
+
+import static org.apache.carbondata.core.util.path.CarbonTablePath.BATCH_PREFIX;
+
+/**
+ * Secondary Index to prune at blocklet level.
+ */
+public class SecondaryIndex extends CoarseGrainIndex {
+  private static final Logger LOGGER =
+      LogServiceFactory.getLogService(SecondaryIndex.class.getName());
+  String indexName;
+  String currentSegmentId;
+  List<String> validSegmentIds;
+  PositionReferenceInfo positionReferenceInfo;

Review comment:
       Can you add details for why positionReferenceInfo is same across the SecondaryIndex instances for each segment.

##########
File path: integration/spark/src/main/scala/org/apache/carbondata/indexserver/IndexServer.scala
##########
@@ -265,8 +272,6 @@ object IndexServer extends ServerInterface {
           server.stop()
         }
       })
-      CarbonProperties.getInstance().addProperty(CarbonCommonConstants

Review comment:
       please revert this

##########
File path: integration/presto/src/main/prestodb/org/apache/carbondata/presto/impl/CarbonTableReader.java
##########
@@ -270,6 +270,7 @@ private CarbonTableCacheModel getValidCacheBySchemaTableName(SchemaTableName sch
     config.set("query.id", queryId);
     CarbonInputFormat.setTransactionalTable(config, carbonTable.isTransactionalTable());
     CarbonInputFormat.setTableInfo(config, carbonTable.getTableInfo());
+    CarbonInputFormat.checkAndSetSecondaryIndexPruning(carbonTable.getTableInfo(), filters, config);

Review comment:
       Later we can optimize the CarbonInputSplit to return only the required objects incase query performance is bad... Please add a TODO for same

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/index/CarbonIndexUtil.scala
##########
@@ -396,6 +398,88 @@ object CarbonIndexUtil {
     }
   }
 
+  def updateIndexStatusInBatch(carbonTable: CarbonTable,
+      indexTables: List[CarbonTable],
+      indexType: IndexType,
+      status: IndexStatus,
+      sparkSession: SparkSession): Unit = {
+    val dbName = carbonTable.getDatabaseName
+    val tableName = carbonTable.getTableName
+    val metadataLock = CarbonLockFactory.getCarbonLockObj(carbonTable.getAbsoluteTableIdentifier,
+      LockUsage.METADATA_LOCK)
+    try {
+      if (metadataLock.lockWithRetries()) {
+        CarbonMetadata.getInstance.removeTable(dbName, tableName)
+        val table = CarbonEnv.getCarbonTable(Some(dbName), tableName)(sparkSession)
+        val indexTableInfos = IndexTableInfo.fromGson(table.getIndexInfo)
+        val indexMetadata = table.getIndexMetadata
+        indexTables.foreach { indexTable =>
+          IndexTableInfo.setIndexStatus(indexTableInfos, indexTable.getTableName, status)
+          indexMetadata.updateIndexStatus(indexType.getIndexProviderName,
+            indexTable.getTableName,
+            status.name())
+        }
+        table.getTableInfo
+          .getFactTable
+          .getTableProperties
+          .put(table.getCarbonTableIdentifier.getTableId, indexMetadata.serialize)
+        sparkSession.sql(s"""ALTER TABLE $dbName.$tableName SET SERDEPROPERTIES ('indexInfo' = '${
+          IndexTableInfo.toGson(indexTableInfos)
+        }')""".stripMargin).collect()
+        CarbonHiveIndexMetadataUtil.refreshTable(dbName, tableName, sparkSession)
+        CarbonMetadata.getInstance.removeTable(dbName, tableName)
+        CarbonMetadata.getInstance.loadTableMetadata(table.getTableInfo)
+      }
+    } finally {
+      metadataLock.unlock()
+    }
+  }
+
+  def updateIndexStatus(carbonTable: CarbonTable,

Review comment:
       If SI is already created on a table and its in sync, then IndexStatus would not be there as this is a new DataMap Implementation for us.
   
   We need to find some way to update this property for old SI tables




--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r607816175



##########
File path: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonScanRDD.scala
##########
@@ -128,6 +129,16 @@ class CarbonScanRDD[T: ClassTag](
       } else {
         prepareInputFormatForDriver(job.getConfiguration)
       }
+      if (segmentIds.isDefined) {
+        CarbonInputFormat.setQuerySegment(job.getConfiguration, segmentIds.get)

Review comment:
       we already have `segmentsToAccess`, we need one more variable?

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/index/CarbonCreateIndexCommand.scala
##########
@@ -225,7 +225,9 @@ case class CarbonCreateIndexCommand(
               new IndexTableInfo(parentTable.getDatabaseName, indexModel.indexName,
                 indexSchema.getProperties),
               false)
-            val enabledIndexInfo = IndexTableInfo.enableIndex(indexInfo, indexModel.indexName)
+            val enabledIndexInfo = IndexTableInfo.setIndexStatus(indexInfo,

Review comment:
       For the existing SI table, what will be the behavior? can be queried from presto?
   better to update the document on this.

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/index/CarbonIndexUtil.scala
##########
@@ -396,6 +398,88 @@ object CarbonIndexUtil {
     }
   }
 
+  def updateIndexStatusInBatch(carbonTable: CarbonTable,
+      indexTables: List[CarbonTable],
+      indexType: IndexType,
+      status: IndexStatus,
+      sparkSession: SparkSession): Unit = {
+    val dbName = carbonTable.getDatabaseName
+    val tableName = carbonTable.getTableName
+    val metadataLock = CarbonLockFactory.getCarbonLockObj(carbonTable.getAbsoluteTableIdentifier,
+      LockUsage.METADATA_LOCK)
+    try {
+      if (metadataLock.lockWithRetries()) {
+        CarbonMetadata.getInstance.removeTable(dbName, tableName)
+        val table = CarbonEnv.getCarbonTable(Some(dbName), tableName)(sparkSession)
+        val indexTableInfos = IndexTableInfo.fromGson(table.getIndexInfo)
+        val indexMetadata = table.getIndexMetadata
+        indexTables.foreach { indexTable =>
+          IndexTableInfo.setIndexStatus(indexTableInfos, indexTable.getTableName, status)
+          indexMetadata.updateIndexStatus(indexType.getIndexProviderName,
+            indexTable.getTableName,
+            status.name())
+        }
+        table.getTableInfo
+          .getFactTable
+          .getTableProperties
+          .put(table.getCarbonTableIdentifier.getTableId, indexMetadata.serialize)
+        sparkSession.sql(s"""ALTER TABLE $dbName.$tableName SET SERDEPROPERTIES ('indexInfo' = '${
+          IndexTableInfo.toGson(indexTableInfos)
+        }')""".stripMargin).collect()
+        CarbonHiveIndexMetadataUtil.refreshTable(dbName, tableName, sparkSession)
+        CarbonMetadata.getInstance.removeTable(dbName, tableName)
+        CarbonMetadata.getInstance.loadTableMetadata(table.getTableInfo)
+      }
+    } finally {
+      metadataLock.unlock()
+    }
+  }
+
+  def updateIndexStatus(carbonTable: CarbonTable,
+      indexName: String,
+      indexType: IndexType,
+      status: IndexStatus,
+      needLock: Boolean = true,
+      sparkSession: SparkSession): Unit = {
+    val dbName = carbonTable.getDatabaseName

Review comment:
       we cannot pass the single table to the above batch API and update only that, why do we need similar functionality again ?
   If both needed please extract common code and reuse

##########
File path: processing/src/main/java/org/apache/carbondata/processing/index/IndexWriterListener.java
##########
@@ -75,7 +76,8 @@ public void registerAllWriter(CarbonTable carbonTable, String segmentId,
     for (TableIndex tableIndex : tableIndices) {
       // register it only if it is not lazy index, for lazy index, user
       // will rebuild the index manually

Review comment:
       please update the comments

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/index/CarbonIndexUtil.scala
##########
@@ -396,6 +398,88 @@ object CarbonIndexUtil {
     }
   }
 
+  def updateIndexStatusInBatch(carbonTable: CarbonTable,
+      indexTables: List[CarbonTable],
+      indexType: IndexType,
+      status: IndexStatus,
+      sparkSession: SparkSession): Unit = {
+    val dbName = carbonTable.getDatabaseName
+    val tableName = carbonTable.getTableName
+    val metadataLock = CarbonLockFactory.getCarbonLockObj(carbonTable.getAbsoluteTableIdentifier,
+      LockUsage.METADATA_LOCK)
+    try {
+      if (metadataLock.lockWithRetries()) {
+        CarbonMetadata.getInstance.removeTable(dbName, tableName)
+        val table = CarbonEnv.getCarbonTable(Some(dbName), tableName)(sparkSession)
+        val indexTableInfos = IndexTableInfo.fromGson(table.getIndexInfo)
+        val indexMetadata = table.getIndexMetadata
+        indexTables.foreach { indexTable =>
+          IndexTableInfo.setIndexStatus(indexTableInfos, indexTable.getTableName, status)
+          indexMetadata.updateIndexStatus(indexType.getIndexProviderName,
+            indexTable.getTableName,
+            status.name())
+        }
+        table.getTableInfo
+          .getFactTable
+          .getTableProperties
+          .put(table.getCarbonTableIdentifier.getTableId, indexMetadata.serialize)
+        sparkSession.sql(s"""ALTER TABLE $dbName.$tableName SET SERDEPROPERTIES ('indexInfo' = '${
+          IndexTableInfo.toGson(indexTableInfos)
+        }')""".stripMargin).collect()
+        CarbonHiveIndexMetadataUtil.refreshTable(dbName, tableName, sparkSession)
+        CarbonMetadata.getInstance.removeTable(dbName, tableName)
+        CarbonMetadata.getInstance.loadTableMetadata(table.getTableInfo)
+      }
+    } finally {
+      metadataLock.unlock()
+    }
+  }
+
+  def updateIndexStatus(carbonTable: CarbonTable,

Review comment:
       agree. Maybe we should support this in the refresh SI command with the option to set this for existing tables.




--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

GitBox
In reply to this post by GitBox

ajantha-bhat commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-814288498


   @VenuReddy2103 : Do we have a performance report of querying a SI table with spark vs with presto?  First without SI we can run from spark and presto and then same query can be benchmarked with SI.


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

GitBox
In reply to this post by GitBox

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r608636650



##########
File path: core/src/main/java/org/apache/carbondata/core/index/IndexStoreManager.java
##########
@@ -95,13 +94,10 @@ private IndexStoreManager() {
         .getIndexesMap().entrySet()) {
       for (Map.Entry<String, Map<String, String>> indexEntry : providerEntry.getValue()
           .entrySet()) {
-        if (!indexEntry.getValue().get(CarbonCommonConstants.INDEX_PROVIDER)
-            .equalsIgnoreCase(IndexType.SI.getIndexProviderName())) {
-          IndexSchema indexSchema = new IndexSchema(indexEntry.getKey(),
-              indexEntry.getValue().get(CarbonCommonConstants.INDEX_PROVIDER));
-          indexSchema.setProperties(indexEntry.getValue());
-          indexes.add(getIndex(carbonTable, indexSchema));
-        }
+        IndexSchema indexSchema = new IndexSchema(indexEntry.getKey(),

Review comment:
       Not here. Only in the query process. we pass it to IndexChooser constructor.




--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-815293905


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3390/
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-815295082


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5141/
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

GitBox
In reply to this post by GitBox

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r609292286



##########
File path: core/src/main/java/org/apache/carbondata/core/util/SessionParams.java
##########
@@ -217,7 +218,9 @@ private boolean validateKeyValue(String key, String value) throws InvalidConfigu
           }
         } else if (key.startsWith(CarbonCommonConstants.CARBON_INDEX_VISIBLE)) {
           isValid = true;
-        } else if (key.startsWith(CarbonCommonConstants.CARBON_LOAD_INDEXES_PARALLEL)) {
+        } else if (key.startsWith(CarbonCommonConstants.CARBON_LOAD_INDEXES_PARALLEL) || (
+            key.startsWith(CARBON_REWRITE_PLAN_WITH_SECONDARY_INDEX)
+                && key.split("\\.").length == 8)) {

Review comment:
       done

##########
File path: core/src/main/java/org/apache/carbondata/core/util/SessionParams.java
##########
@@ -217,7 +218,9 @@ private boolean validateKeyValue(String key, String value) throws InvalidConfigu
           }
         } else if (key.startsWith(CarbonCommonConstants.CARBON_INDEX_VISIBLE)) {
           isValid = true;
-        } else if (key.startsWith(CarbonCommonConstants.CARBON_LOAD_INDEXES_PARALLEL)) {
+        } else if (key.startsWith(CarbonCommonConstants.CARBON_LOAD_INDEXES_PARALLEL) || (
+            key.startsWith(CARBON_REWRITE_PLAN_WITH_SECONDARY_INDEX)
+                && key.split("\\.").length == 8)) {

Review comment:
       done




--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

GitBox
In reply to this post by GitBox

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r609292432



##########
File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##########
@@ -2316,6 +2316,12 @@ private CarbonCommonConstants() {
   @CarbonProperty(dynamicConfigurable = true)
   public static final String CARBON_ENABLE_INDEX_SERVER = "carbon.enable.index.server";
 
+  @CarbonProperty(dynamicConfigurable = true)
+  public static final String CARBON_REWRITE_PLAN_WITH_SECONDARY_INDEX =

Review comment:
       added




--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

GitBox
In reply to this post by GitBox

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r609292526



##########
File path: core/src/main/java/org/apache/carbondata/core/index/secondaryindex/CarbonCostBasedOptimizer.java
##########
@@ -15,15 +15,49 @@
  * limitations under the License.
  */
 
-package org.apache.spark.sql.secondaryindex.optimizer;
+package org.apache.carbondata.core.index.secondaryindex;
 
+import java.io.IOException;
 import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashMap;
 import java.util.HashSet;
 import java.util.List;
 import java.util.Map;
 import java.util.Set;
 
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.metadata.index.IndexType;
+import org.apache.carbondata.core.metadata.schema.indextable.IndexMetadata;
+import org.apache.carbondata.core.metadata.schema.table.TableInfo;
+
 public class CarbonCostBasedOptimizer {
+  public static Map<String, List<String>> getSecondaryIndexes(TableInfo tableInfo) {
+    Map<String, List<String>> indexes = new HashMap();
+    String indexMeta =
+        tableInfo.getFactTable().getTableProperties().get(tableInfo.getFactTable().getTableId());
+    IndexMetadata indexMetadata = null;
+    if (null != indexMeta) {
+      try {
+        indexMetadata = IndexMetadata.deserialize(indexMeta);
+      } catch (IOException e) {
+        e.printStackTrace();

Review comment:
       added




--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

GitBox
In reply to this post by GitBox

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r609292856



##########
File path: core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java
##########
@@ -2187,6 +2187,24 @@ public static boolean isFilterReorderingEnabled() {
     );
   }
 
+  /**
+   * Check whether plan rewrite with secondary index is enabled or not
+   */
+  public boolean isRewritePlanWithSecondaryIndex(String dbName, String tableName) {
+    String configuredValue = getSessionPropertyValue(
+        CarbonCommonConstants.CARBON_REWRITE_PLAN_WITH_SECONDARY_INDEX + "." + dbName + "."
+            + tableName);
+    if (configuredValue == null) {
+      configuredValue = getProperty(CarbonCommonConstants.CARBON_REWRITE_PLAN_WITH_SECONDARY_INDEX,
+          CarbonCommonConstants.CARBON_REWRITE_PLAN_WITH_SECONDARY_INDEX_DEFAULT);

Review comment:
       Done




--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


12345