Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] [carbondata] VenuReddy2103 opened a new pull request #4110: [WIP]Secondary Index as a coarse grain datamap

Classic

List

97 messages Options

Options

12345

GitBox

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r609294215

##########
File path: core/src/main/java/org/apache/carbondata/core/metadata/schema/indextable/IndexTableInfo.java
##########
@@ -157,15 +157,19 @@ public static String updateIndexColumns(String oldGsonData, String columnToBeUpd
return toGson(indexTableInfos);
}

- public static String enableIndex(String oldIndexIno, String indexName) {
- IndexTableInfo[] indexTableInfos = fromGson(oldIndexIno);
+ public static void setIndexStatus(IndexTableInfo[] indexTableInfos, String indexName,
+ IndexStatus status) {
for (IndexTableInfo indexTableInfo : indexTableInfos) {
if (indexTableInfo.tableName.equalsIgnoreCase(indexName)) {
- Map<String, String> oldIndexProperties = indexTableInfo.indexProperties;
- oldIndexProperties.put(CarbonCommonConstants.INDEX_STATUS, IndexStatus.ENABLED.name());
- indexTableInfo.setIndexProperties(oldIndexProperties);
+ Map<String, String> indexProperties = indexTableInfo.indexProperties;
+ indexProperties.put(CarbonCommonConstants.INDEX_STATUS, status.name());

Review comment:
done

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

In reply to this post by GitBox

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r609294436

##########
File path: integration/spark/src/main/scala/org/apache/carbondata/index/secondary/SecondaryIndex.java
##########
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.index.secondary;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.index.IndexUtil;
+import org.apache.carbondata.core.index.dev.IndexModel;
+import org.apache.carbondata.core.index.dev.cgindex.CoarseGrainIndex;
+import org.apache.carbondata.core.indexstore.Blocklet;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecutor;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import org.apache.carbondata.index.secondary.SecondaryIndexModel.PositionReferenceInfo;
+
+import org.apache.log4j.Logger;
+
+import static org.apache.carbondata.core.util.path.CarbonTablePath.BATCH_PREFIX;
+
+/**
+ * Secondary Index to prune at blocklet level.
+ */
+public class SecondaryIndex extends CoarseGrainIndex {
+ private static final Logger LOGGER =
+ LogServiceFactory.getLogService(SecondaryIndex.class.getName());
+ String indexName;

Review comment:
done

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

In reply to this post by GitBox

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r609295437

##########
File path: integration/spark/src/main/scala/org/apache/carbondata/index/secondary/SecondaryIndex.java
##########
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.index.secondary;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.index.IndexUtil;
+import org.apache.carbondata.core.index.dev.IndexModel;
+import org.apache.carbondata.core.index.dev.cgindex.CoarseGrainIndex;
+import org.apache.carbondata.core.indexstore.Blocklet;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecutor;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import org.apache.carbondata.index.secondary.SecondaryIndexModel.PositionReferenceInfo;
+
+import org.apache.log4j.Logger;
+
+import static org.apache.carbondata.core.util.path.CarbonTablePath.BATCH_PREFIX;
+
+/**
+ * Secondary Index to prune at blocklet level.
+ */
+public class SecondaryIndex extends CoarseGrainIndex {
+ private static final Logger LOGGER =
+ LogServiceFactory.getLogService(SecondaryIndex.class.getName());
+ String indexName;
+ String currentSegmentId;
+ List<String> validSegmentIds;
+ PositionReferenceInfo positionReferenceInfo;
+
+ @Override
+ public void init(IndexModel indexModel) {
+ assert (indexModel instanceof SecondaryIndexModel);
+ SecondaryIndexModel model = (SecondaryIndexModel) indexModel;
+ indexName = model.indexName;
+ currentSegmentId = model.currentSegmentId;
+ validSegmentIds = model.validSegmentIds;
+ positionReferenceInfo = model.positionReferenceInfo;
+ }
+
+ private Set<String> getPositionReferences(String databaseName, String indexName,
+ Expression expression) {
+ synchronized (positionReferenceInfo) {

Review comment:
have removed the synchronization block now with another review comment.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

In reply to this post by GitBox

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r609298135

##########
File path: integration/spark/src/main/scala/org/apache/carbondata/index/secondary/SecondaryIndex.java
##########
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.index.secondary;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.index.IndexUtil;
+import org.apache.carbondata.core.index.dev.IndexModel;
+import org.apache.carbondata.core.index.dev.cgindex.CoarseGrainIndex;
+import org.apache.carbondata.core.indexstore.Blocklet;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecutor;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import org.apache.carbondata.index.secondary.SecondaryIndexModel.PositionReferenceInfo;
+
+import org.apache.log4j.Logger;
+
+import static org.apache.carbondata.core.util.path.CarbonTablePath.BATCH_PREFIX;
+
+/**
+ * Secondary Index to prune at blocklet level.
+ */
+public class SecondaryIndex extends CoarseGrainIndex {
+ private static final Logger LOGGER =
+ LogServiceFactory.getLogService(SecondaryIndex.class.getName());
+ String indexName;
+ String currentSegmentId;
+ List<String> validSegmentIds;
+ PositionReferenceInfo positionReferenceInfo;
+
+ @Override
+ public void init(IndexModel indexModel) {
+ assert (indexModel instanceof SecondaryIndexModel);
+ SecondaryIndexModel model = (SecondaryIndexModel) indexModel;
+ indexName = model.indexName;
+ currentSegmentId = model.currentSegmentId;
+ validSegmentIds = model.validSegmentIds;
+ positionReferenceInfo = model.positionReferenceInfo;
+ }
+
+ private Set<String> getPositionReferences(String databaseName, String indexName,
+ Expression expression) {
+ synchronized (positionReferenceInfo) {
+ /* If the position references are not obtained yet(i.e., prune happening for the first valid
+ segment), then get them from the given index table with the given filter from all the valid
+ segments at once and store them as map of segmentId to set of position references in that
+ particular segment. Upon the subsequent prune for other segments, return the position
+ references for the respective segment from the map directly */
+ if (!positionReferenceInfo.fetched) {

Review comment:
They do not share the same position reference across queries. Have added more comments to `PositionReferenceInfo` class now.
One instance of `PositionReferenceInfo` is shared across all the `SecondaryIndex` instances for the particular query pruning with the given index filter. This ensures to run a single sql query to get position references from the valid segments of the given secondary index table with given index filter and populate them in map. First secondary index segment prune in the query will run the sql query for position references and store them in map. And subsequent segments prune in the same query can avoid the individual sql for position references within the respective segment and return position references from the map directly.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

In reply to this post by GitBox

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r609298600

##########
File path: integration/spark/src/main/scala/org/apache/carbondata/index/secondary/SecondaryIndexFactory.java
##########
@@ -0,0 +1,172 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.index.secondary;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.common.exceptions.sql.MalformedIndexCommandException;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.features.TableOperation;
+import org.apache.carbondata.core.index.IndexFilter;
+import org.apache.carbondata.core.index.IndexInputSplit;
+import org.apache.carbondata.core.index.IndexMeta;
+import org.apache.carbondata.core.index.Segment;
+import org.apache.carbondata.core.index.dev.IndexBuilder;
+import org.apache.carbondata.core.index.dev.IndexWriter;
+import org.apache.carbondata.core.index.dev.cgindex.CoarseGrainIndex;
+import org.apache.carbondata.core.index.dev.cgindex.CoarseGrainIndexFactory;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import org.apache.carbondata.core.metadata.schema.table.IndexSchema;
+import org.apache.carbondata.core.scan.filter.intf.ExpressionType;
+import org.apache.carbondata.events.Event;
+import org.apache.carbondata.index.secondary.SecondaryIndexModel.PositionReferenceInfo;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.Logger;
+
+/**
+ * Index Factory for Secondary Index.
+ */
+@InterfaceAudience.Internal
+public class SecondaryIndexFactory extends CoarseGrainIndexFactory {
+ private static final Logger LOGGER =
+ LogServiceFactory.getLogService(SecondaryIndexFactory.class.getName());
+ private IndexMeta indexMeta;
+
+ public SecondaryIndexFactory(CarbonTable carbonTable, IndexSchema indexSchema)
+ throws MalformedIndexCommandException {
+ super(carbonTable, indexSchema);
+ List<ExpressionType> operations = new ArrayList(Arrays.asList(ExpressionType.values()));
+ indexMeta = new IndexMeta(indexSchema.getIndexName(),
+ carbonTable.getIndexedColumns(indexSchema.getIndexColumns()), operations);
+ LOGGER.info("Created Secondary Index Factory instance for " + indexSchema.getIndexName());
+ }
+
+ @Override
+ public IndexWriter createWriter(Segment segment, String shardName,
+ SegmentProperties segmentProperties) throws IOException {
+ throw new UnsupportedOperationException("Not supported for Secondary Index");
+ }
+
+ @Override
+ public IndexBuilder createBuilder(Segment segment, String shardName,
+ SegmentProperties segmentProperties) throws IOException {
+ throw new UnsupportedOperationException("Not supported for Secondary Index");
+ }
+
+ @Override
+ public IndexMeta getMeta() {
+ return indexMeta;
+ }
+
+ private Map<Segment, List<CoarseGrainIndex>> getIndexes(List<Segment> segments,
+ PositionReferenceInfo positionReferenceInfo) throws IOException {
+ Map<Segment, List<CoarseGrainIndex>> indexes = new HashMap<>();
+ List<String> allSegmentIds =
+ segments.stream().map(Segment::getSegmentNo).collect(Collectors.toList());
+ for (Segment segment : segments) {
+ indexes.put(segment, this.getIndexes(segment, allSegmentIds, positionReferenceInfo));
+ }
+ return indexes;
+ }
+
+ private List<CoarseGrainIndex> getIndexes(Segment segment, List<String> allSegmentIds,
+ PositionReferenceInfo positionReferenceInfo) throws IOException {
+ List<CoarseGrainIndex> indexes = new ArrayList<>();
+ SecondaryIndex secondaryIndex = new SecondaryIndex();
+ secondaryIndex.init(
+ new SecondaryIndexModel(getIndexSchema().getIndexName(), segment.getSegmentNo(),
+ allSegmentIds, positionReferenceInfo, segment.getConfiguration()));
+ indexes.add(secondaryIndex);
+ return indexes;
+ }
+
+ @Override
+ public Map<Segment, List<CoarseGrainIndex>> getIndexes(List<Segment> segments, IndexFilter filter)
+ throws IOException {
+ return getIndexes(segments, new PositionReferenceInfo());
+ }
+
+ @Override
+ public Map<Segment, List<CoarseGrainIndex>> getIndexes(List<Segment> segments,
+ Set<Path> partitionLocations, IndexFilter filter) throws IOException {
+ return getIndexes(segments, new PositionReferenceInfo());
+ }
+
+ @Override
+ public List<CoarseGrainIndex> getIndexes(Segment segment) throws IOException {
+ List<String> allSegmentIds = new ArrayList();

Review comment:
done

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

In reply to this post by GitBox

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r609298969

##########
File path: integration/spark/src/main/scala/org/apache/carbondata/index/secondary/SecondaryIndexModel.java
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.index.secondary;
+
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.carbondata.core.index.dev.IndexModel;
+
+import org.apache.hadoop.conf.Configuration;
+
+/**
+ * Secondary Index Model
+ */
+public class SecondaryIndexModel extends IndexModel {
+ String indexName; // Secondary Index name

Review comment:
done

##########
File path: integration/spark/src/main/scala/org/apache/carbondata/index/secondary/SecondaryIndexModel.java
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.index.secondary;
+
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.carbondata.core.index.dev.IndexModel;
+
+import org.apache.hadoop.conf.Configuration;
+
+/**
+ * Secondary Index Model
+ */
+public class SecondaryIndexModel extends IndexModel {
+ String indexName; // Secondary Index name

Review comment:
done

##########
File path: integration/spark/src/main/scala/org/apache/carbondata/index/secondary/SecondaryIndexModel.java
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.index.secondary;
+
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.carbondata.core.index.dev.IndexModel;
+
+import org.apache.hadoop.conf.Configuration;
+
+/**
+ * Secondary Index Model
+ */
+public class SecondaryIndexModel extends IndexModel {
+ String indexName; // Secondary Index name
+ String currentSegmentId; // Segment Id to prune
+ List<String> validSegmentIds; // All the valid segment Ids for Secondary Index pruning
+ PositionReferenceInfo positionReferenceInfo; // Position reference information
+
+ public SecondaryIndexModel(String indexName, String currentSegmentId,
+ List<String> validSegmentIds, PositionReferenceInfo positionReferenceInfo,
+ Configuration configuration) {
+ super(null, configuration);
+ this.indexName = indexName;
+ this.currentSegmentId = currentSegmentId;
+ this.validSegmentIds = validSegmentIds;
+ this.positionReferenceInfo = positionReferenceInfo;
+ }
+
+ /**
+ * Position Reference information
+ */
+ public static class PositionReferenceInfo {

Review comment:
Have replied above.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

In reply to this post by GitBox

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r609299248

##########
File path: core/src/main/java/org/apache/carbondata/core/index/secondaryindex/CarbonCostBasedOptimizer.java
##########
@@ -15,15 +15,49 @@
* limitations under the License.
*/

-package org.apache.spark.sql.secondaryindex.optimizer;
+package org.apache.carbondata.core.index.secondaryindex;

+import java.io.IOException;
import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Set;

+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.metadata.index.IndexType;
+import org.apache.carbondata.core.metadata.schema.indextable.IndexMetadata;
+import org.apache.carbondata.core.metadata.schema.table.TableInfo;
+
public class CarbonCostBasedOptimizer {
+ public static Map<String, List<String>> getSecondaryIndexes(TableInfo tableInfo) {

Review comment:
done

##########
File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##########
@@ -2316,6 +2316,12 @@ private CarbonCommonConstants() {
@CarbonProperty(dynamicConfigurable = true)
public static final String CARBON_ENABLE_INDEX_SERVER = "carbon.enable.index.server";

+ @CarbonProperty(dynamicConfigurable = true)
+ public static final String CARBON_REWRITE_PLAN_WITH_SECONDARY_INDEX =
+ "carbon.rewrite.plan.with.secondary.index";

Review comment:
done

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

In reply to this post by GitBox

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r609299417

##########
File path: integration/spark/src/main/scala/org/apache/carbondata/indexserver/DistributedRDDUtils.scala
##########
@@ -394,4 +397,35 @@ object DistributedRDDUtils {
}
}
}
+
+ def pruneOnDriver(request: IndexInputFormat): ExtendedBlockletWrapper = {
+ val job = CarbonSparkUtil.createHadoopJob()
+ val conf = job.getConfiguration
+ val tableInfo = request.getCarbonTable.getTableInfo
+ val identifier = request.getCarbonTable.getAbsoluteTableIdentifier
+ val indexFilter = new IndexFilter(request.getCarbonTable,
+ request.getFilterResolverIntf.getFilterExpression)
+ CarbonInputFormat.setTableInfo(conf, tableInfo)
+ CarbonInputFormat.setFilterPredicates(conf, indexFilter)
+ CarbonInputFormat.setDatabaseName(conf, tableInfo.getDatabaseName)
+ CarbonInputFormat.setTableName(conf, tableInfo.getFactTable.getTableName)
+ CarbonInputFormat.setPartitionsToPrune(conf, request.getPartitions)
+ CarbonInputFormat.setTransactionalTable(conf, tableInfo.isTransactionalTable)
+ CarbonInputFormat.setTablePath(conf,
+ identifier.appendWithLocalPrefix(identifier.getTablePath))
+ CarbonInputFormat.setQuerySegment(conf, identifier)
+ CarbonInputFormat.setColumnProjection(conf, Array("positionId"))
+ CarbonInputFormat.setReadCommittedScope(conf, request.getReadCommittedScope)
+ CarbonInputFormat.setSegmentsToAccess(conf, request.getValidSegments)
+ CarbonInputFormat.setValidateSegmentsToAccess(conf, false)
+ CarbonInputFormat.setSecondaryIndexPruning(conf, request.isSIPruningEnabled)
+ val blocklets = new CarbonTableInputFormat[Object].getPrunedBlocklets(job,

Review comment:
done

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

In reply to this post by GitBox

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r609300785

##########
File path: integration/spark/src/main/scala/org/apache/carbondata/index/secondary/SecondaryIndex.java
##########
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.index.secondary;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.index.IndexUtil;
+import org.apache.carbondata.core.index.dev.IndexModel;
+import org.apache.carbondata.core.index.dev.cgindex.CoarseGrainIndex;
+import org.apache.carbondata.core.indexstore.Blocklet;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecutor;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import org.apache.carbondata.index.secondary.SecondaryIndexModel.PositionReferenceInfo;
+
+import org.apache.log4j.Logger;
+
+import static org.apache.carbondata.core.util.path.CarbonTablePath.BATCH_PREFIX;
+
+/**
+ * Secondary Index to prune at blocklet level.
+ */
+public class SecondaryIndex extends CoarseGrainIndex {
+ private static final Logger LOGGER =
+ LogServiceFactory.getLogService(SecondaryIndex.class.getName());
+ String indexName;
+ String currentSegmentId;
+ List<String> validSegmentIds;
+ PositionReferenceInfo positionReferenceInfo;
+
+ @Override
+ public void init(IndexModel indexModel) {
+ assert (indexModel instanceof SecondaryIndexModel);
+ SecondaryIndexModel model = (SecondaryIndexModel) indexModel;
+ indexName = model.indexName;
+ currentSegmentId = model.currentSegmentId;
+ validSegmentIds = model.validSegmentIds;
+ positionReferenceInfo = model.positionReferenceInfo;
+ }
+
+ private Set<String> getPositionReferences(String databaseName, String indexName,
+ Expression expression) {
+ synchronized (positionReferenceInfo) {

Review comment:
Done

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

In reply to this post by GitBox

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r609300934

##########
File path: integration/spark/src/main/scala/org/apache/carbondata/index/secondary/SecondaryIndex.java
##########
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.index.secondary;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.index.IndexUtil;
+import org.apache.carbondata.core.index.dev.IndexModel;
+import org.apache.carbondata.core.index.dev.cgindex.CoarseGrainIndex;
+import org.apache.carbondata.core.indexstore.Blocklet;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecutor;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import org.apache.carbondata.index.secondary.SecondaryIndexModel.PositionReferenceInfo;
+
+import org.apache.log4j.Logger;
+
+import static org.apache.carbondata.core.util.path.CarbonTablePath.BATCH_PREFIX;
+
+/**
+ * Secondary Index to prune at blocklet level.
+ */
+public class SecondaryIndex extends CoarseGrainIndex {
+ private static final Logger LOGGER =
+ LogServiceFactory.getLogService(SecondaryIndex.class.getName());
+ String indexName;
+ String currentSegmentId;
+ List<String> validSegmentIds;
+ PositionReferenceInfo positionReferenceInfo;

Review comment:
Added detailed comments

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

In reply to this post by GitBox

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r609301165

##########
File path: processing/src/main/java/org/apache/carbondata/processing/index/IndexWriterListener.java
##########
@@ -75,7 +76,8 @@ public void registerAllWriter(CarbonTable carbonTable, String segmentId,
for (TableIndex tableIndex : tableIndices) {
// register it only if it is not lazy index, for lazy index, user
// will rebuild the index manually

Review comment:
done

##########
File path: processing/src/main/java/org/apache/carbondata/processing/index/IndexWriterListener.java
##########
@@ -75,7 +76,8 @@ public void registerAllWriter(CarbonTable carbonTable, String segmentId,
for (TableIndex tableIndex : tableIndices) {
// register it only if it is not lazy index, for lazy index, user
// will rebuild the index manually

Review comment:
done

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

In reply to this post by GitBox

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r609303441

##########
File path: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonScanRDD.scala
##########
@@ -128,6 +129,16 @@ class CarbonScanRDD[T: ClassTag](
} else {
prepareInputFormatForDriver(job.getConfiguration)
}
+ if (segmentIds.isDefined) {
+ CarbonInputFormat.setQuerySegment(job.getConfiguration, segmentIds.get)

Review comment:
We have segment Ids from insegment() udf. So had passed them.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

In reply to this post by GitBox

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r609304145

##########
File path: integration/spark/src/main/scala/org/apache/carbondata/indexserver/IndexServer.scala
##########
@@ -265,8 +272,6 @@ object IndexServer extends ServerInterface {
server.stop()
}
})
- CarbonProperties.getInstance().addProperty(CarbonCommonConstants

Review comment:
done

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

In reply to this post by GitBox

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r609354461

##########
File path: core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java
##########
@@ -2187,6 +2187,24 @@ public static boolean isFilterReorderingEnabled() {
);
}

+ /**
+ * Check whether plan rewrite with secondary index is enabled or not
+ */
+ public boolean isRewritePlanWithSecondaryIndex(String dbName, String tableName) {
+ String configuredValue = getSessionPropertyValue(

Review comment:
Modified. Actually, was setting the property only for spark session. So the property name(rewrite plan) was meaningful only for spark.
Now changed the property to `carbon.coarse.grain.secondary.index` so that it can be used for both presto and spark. Default value in Presto and Spark session are `true` and `false` respectively.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-815528351

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5143/

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-815533035

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3392/

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] VenuReddy2103 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

In reply to this post by GitBox

VenuReddy2103 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-815539289

retest this please

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

In reply to this post by GitBox

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r609299017

##########
File path: integration/spark/src/main/scala/org/apache/carbondata/index/secondary/SecondaryIndexModel.java
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.index.secondary;
+
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.carbondata.core.index.dev.IndexModel;
+
+import org.apache.hadoop.conf.Configuration;
+
+/**
+ * Secondary Index Model
+ */
+public class SecondaryIndexModel extends IndexModel {
+ String indexName; // Secondary Index name

Review comment:
done

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-815604435

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5144/

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-815613829

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3393/

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

12345