Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] [carbondata] VenuReddy2103 opened a new pull request #4110: [WIP]Secondary Index as a coarse grain datamap

Classic

List

97 messages Options

Options

12345

GitBox

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-821145251

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5201/

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-821273953

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5202/

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-821283487

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3450/

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] brijoobopanna commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

In reply to this post by GitBox

brijoobopanna commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-823794397

retest this please

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-823839154

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5220/

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-823839382

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3472/

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

In reply to this post by GitBox

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r617292539

##########
File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##########
@@ -2316,6 +2316,12 @@ private CarbonCommonConstants() {
@CarbonProperty(dynamicConfigurable = true)
public static final String CARBON_ENABLE_INDEX_SERVER = "carbon.enable.index.server";

+ @CarbonProperty(dynamicConfigurable = true)
+ public static final String CARBON_REWRITE_PLAN_WITH_SECONDARY_INDEX =
+ "carbon.rewrite.plan.with.secondary.index";
+
+ public static final String CARBON_REWRITE_PLAN_WITH_SECONDARY_INDEX_DEFAULT = "true";

Review comment:
Yes cluster testing is done with both presto, spark with index server combination. Also, there is no performance impact observed with spark + index server when property is false

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

In reply to this post by GitBox

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r617293398

##########
File path: integration/presto/src/main/prestodb/org/apache/carbondata/presto/impl/CarbonTableReader.java
##########
@@ -270,6 +270,7 @@ private CarbonTableCacheModel getValidCacheBySchemaTableName(SchemaTableName sch
config.set("query.id", queryId);
CarbonInputFormat.setTransactionalTable(config, carbonTable.isTransactionalTable());
CarbonInputFormat.setTableInfo(config, carbonTable.getTableInfo());
+ CarbonInputFormat.checkAndSetSecondaryIndexPruning(carbonTable.getTableInfo(), filters, config);

Review comment:
done

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

In reply to this post by GitBox

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r617318018

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/index/CarbonIndexUtil.scala
##########
@@ -396,6 +398,88 @@ object CarbonIndexUtil {
}
}

+ def updateIndexStatusInBatch(carbonTable: CarbonTable,
+ indexTables: List[CarbonTable],
+ indexType: IndexType,
+ status: IndexStatus,
+ sparkSession: SparkSession): Unit = {
+ val dbName = carbonTable.getDatabaseName
+ val tableName = carbonTable.getTableName
+ val metadataLock = CarbonLockFactory.getCarbonLockObj(carbonTable.getAbsoluteTableIdentifier,
+ LockUsage.METADATA_LOCK)
+ try {
+ if (metadataLock.lockWithRetries()) {
+ CarbonMetadata.getInstance.removeTable(dbName, tableName)
+ val table = CarbonEnv.getCarbonTable(Some(dbName), tableName)(sparkSession)
+ val indexTableInfos = IndexTableInfo.fromGson(table.getIndexInfo)
+ val indexMetadata = table.getIndexMetadata
+ indexTables.foreach { indexTable =>
+ IndexTableInfo.setIndexStatus(indexTableInfos, indexTable.getTableName, status)
+ indexMetadata.updateIndexStatus(indexType.getIndexProviderName,
+ indexTable.getTableName,
+ status.name())
+ }
+ table.getTableInfo
+ .getFactTable
+ .getTableProperties
+ .put(table.getCarbonTableIdentifier.getTableId, indexMetadata.serialize)
+ sparkSession.sql(s"""ALTER TABLE $dbName.$tableName SET SERDEPROPERTIES ('indexInfo' = '${
+ IndexTableInfo.toGson(indexTableInfos)
+ }')""".stripMargin).collect()
+ CarbonHiveIndexMetadataUtil.refreshTable(dbName, tableName, sparkSession)
+ CarbonMetadata.getInstance.removeTable(dbName, tableName)
+ CarbonMetadata.getInstance.loadTableMetadata(table.getTableInfo)
+ }
+ } finally {
+ metadataLock.unlock()
+ }
+ }
+
+ def updateIndexStatus(carbonTable: CarbonTable,

Review comment:
Yes. As discussed offline, have created a subtask under same jira.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

In reply to this post by GitBox

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r617319168

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/index/CarbonIndexUtil.scala
##########
@@ -396,6 +398,88 @@ object CarbonIndexUtil {
}
}

+ def updateIndexStatusInBatch(carbonTable: CarbonTable,
+ indexTables: List[CarbonTable],
+ indexType: IndexType,
+ status: IndexStatus,
+ sparkSession: SparkSession): Unit = {
+ val dbName = carbonTable.getDatabaseName
+ val tableName = carbonTable.getTableName
+ val metadataLock = CarbonLockFactory.getCarbonLockObj(carbonTable.getAbsoluteTableIdentifier,
+ LockUsage.METADATA_LOCK)
+ try {
+ if (metadataLock.lockWithRetries()) {
+ CarbonMetadata.getInstance.removeTable(dbName, tableName)
+ val table = CarbonEnv.getCarbonTable(Some(dbName), tableName)(sparkSession)
+ val indexTableInfos = IndexTableInfo.fromGson(table.getIndexInfo)
+ val indexMetadata = table.getIndexMetadata
+ indexTables.foreach { indexTable =>
+ IndexTableInfo.setIndexStatus(indexTableInfos, indexTable.getTableName, status)
+ indexMetadata.updateIndexStatus(indexType.getIndexProviderName,
+ indexTable.getTableName,
+ status.name())
+ }
+ table.getTableInfo
+ .getFactTable
+ .getTableProperties
+ .put(table.getCarbonTableIdentifier.getTableId, indexMetadata.serialize)
+ sparkSession.sql(s"""ALTER TABLE $dbName.$tableName SET SERDEPROPERTIES ('indexInfo' = '${
+ IndexTableInfo.toGson(indexTableInfos)
+ }')""".stripMargin).collect()
+ CarbonHiveIndexMetadataUtil.refreshTable(dbName, tableName, sparkSession)
+ CarbonMetadata.getInstance.removeTable(dbName, tableName)
+ CarbonMetadata.getInstance.loadTableMetadata(table.getTableInfo)
+ }
+ } finally {
+ metadataLock.unlock()
+ }
+ }
+
+ def updateIndexStatus(carbonTable: CarbonTable,
+ indexName: String,
+ indexType: IndexType,
+ status: IndexStatus,
+ needLock: Boolean = true,
+ sparkSession: SparkSession): Unit = {
+ val dbName = carbonTable.getDatabaseName

Review comment:
replied above

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] maheshrajus commented on pull request #4110: [CARBONDATA-4158]Add Secondary Index as a coarse-grain index and use secondary indexes for Presto queries

In reply to this post by GitBox

maheshrajus commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-824087364

I executed secondary index flows in the presto cluster with this code. No issue found and flows are working fine as per design and discussion.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] ajantha-bhat commented on pull request #4110: [CARBONDATA-4158]Add Secondary Index as a coarse-grain index and use secondary indexes for Presto queries

In reply to this post by GitBox

ajantha-bhat commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-824096942

@VenuReddy2103 , @maheshrajus : when SI is not a datamap flow, we support matching to more than one SI tables also for filter queries. Is it supported in SI as datamap flow ?

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4110: [CARBONDATA-4158]Add Secondary Index as a coarse-grain index and use secondary indexes for Presto queries

In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r617590710

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/index/CarbonIndexUtil.scala
##########
@@ -396,6 +398,88 @@ object CarbonIndexUtil {
}
}

+ def updateIndexStatusInBatch(carbonTable: CarbonTable,
+ indexTables: List[CarbonTable],
+ indexType: IndexType,
+ status: IndexStatus,
+ sparkSession: SparkSession): Unit = {
+ val dbName = carbonTable.getDatabaseName
+ val tableName = carbonTable.getTableName
+ val metadataLock = CarbonLockFactory.getCarbonLockObj(carbonTable.getAbsoluteTableIdentifier,
+ LockUsage.METADATA_LOCK)
+ try {
+ if (metadataLock.lockWithRetries()) {
+ CarbonMetadata.getInstance.removeTable(dbName, tableName)
+ val table = CarbonEnv.getCarbonTable(Some(dbName), tableName)(sparkSession)
+ val indexTableInfos = IndexTableInfo.fromGson(table.getIndexInfo)
+ val indexMetadata = table.getIndexMetadata
+ indexTables.foreach { indexTable =>
+ IndexTableInfo.setIndexStatus(indexTableInfos, indexTable.getTableName, status)
+ indexMetadata.updateIndexStatus(indexType.getIndexProviderName,
+ indexTable.getTableName,
+ status.name())
+ }
+ table.getTableInfo
+ .getFactTable
+ .getTableProperties
+ .put(table.getCarbonTableIdentifier.getTableId, indexMetadata.serialize)
+ sparkSession.sql(s"""ALTER TABLE $dbName.$tableName SET SERDEPROPERTIES ('indexInfo' = '${
+ IndexTableInfo.toGson(indexTableInfos)
+ }')""".stripMargin).collect()
+ CarbonHiveIndexMetadataUtil.refreshTable(dbName, tableName, sparkSession)
+ CarbonMetadata.getInstance.removeTable(dbName, tableName)
+ CarbonMetadata.getInstance.loadTableMetadata(table.getTableInfo)
+ }
+ } finally {
+ metadataLock.unlock()
+ }
+ }
+
+ def updateIndexStatus(carbonTable: CarbonTable,

Review comment:
ok. we can track this
https://issues.apache.org/jira/browse/CARBONDATA-4169

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] ajantha-bhat commented on pull request #4110: [CARBONDATA-4158]Add Secondary Index as a coarse-grain index and use secondary indexes for Presto queries

In reply to this post by GitBox

ajantha-bhat commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-824115250

LGTM,
we should work on CARBONDATA-4169 to support querying already existing tables also.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] maheshrajus commented on pull request #4110: [CARBONDATA-4158]Add Secondary Index as a coarse-grain index and use secondary indexes for Presto queries

In reply to this post by GitBox

maheshrajus commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-824230350

> @VenuReddy2103 , @maheshrajus : when SI is not a datamap flow, we support matching to more than one SI tables also for filter queries. Is it supported in SI as datamap flow ?

@ajantha-bhat the behaviour is the same in both cases[SI as datamap and SI as non-datamap(plan-rewrite)]
I checked the below cases.
1) SI created in one column and query with a combination of Si index and non-si index., Then we are getting output based on filters[AND, OR etc] correctly.
2) SI created in two different columns and query with the combination of both SI index columns then we are getting output based on filters[AND, OR etc] correctly.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] kunal642 commented on pull request #4110: [CARBONDATA-4158]Add Secondary Index as a coarse-grain index and use secondary indexes for Presto queries

In reply to this post by GitBox

kunal642 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-824560735

LGTM,
@maheshrajus can you please start working on CARBONDATA-4169

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] asfgit closed pull request #4110: [CARBONDATA-4158]Add Secondary Index as a coarse-grain index and use secondary indexes for Presto queries

In reply to this post by GitBox

asfgit closed pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

12345