[GitHub] [carbondata] VenuReddy2103 opened a new pull request #4110: [WIP]Secondary Index as a coarse grain datamap

classic Classic list List threaded Threaded
97 messages Options
12345
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

GitBox

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-821145251


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5201/
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-821273953


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5202/
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-821283487


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3450/
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] brijoobopanna commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

GitBox
In reply to this post by GitBox

brijoobopanna commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-823794397


   retest this please


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-823839154


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5220/
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-823839382


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3472/
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

GitBox
In reply to this post by GitBox

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r617292539



##########
File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##########
@@ -2316,6 +2316,12 @@ private CarbonCommonConstants() {
   @CarbonProperty(dynamicConfigurable = true)
   public static final String CARBON_ENABLE_INDEX_SERVER = "carbon.enable.index.server";
 
+  @CarbonProperty(dynamicConfigurable = true)
+  public static final String CARBON_REWRITE_PLAN_WITH_SECONDARY_INDEX =
+      "carbon.rewrite.plan.with.secondary.index";
+
+  public static final String CARBON_REWRITE_PLAN_WITH_SECONDARY_INDEX_DEFAULT = "true";

Review comment:
       Yes cluster testing is done with both presto, spark with index server combination. Also, there is no performance impact observed with spark + index server when property is false




--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

GitBox
In reply to this post by GitBox

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r617293398



##########
File path: integration/presto/src/main/prestodb/org/apache/carbondata/presto/impl/CarbonTableReader.java
##########
@@ -270,6 +270,7 @@ private CarbonTableCacheModel getValidCacheBySchemaTableName(SchemaTableName sch
     config.set("query.id", queryId);
     CarbonInputFormat.setTransactionalTable(config, carbonTable.isTransactionalTable());
     CarbonInputFormat.setTableInfo(config, carbonTable.getTableInfo());
+    CarbonInputFormat.checkAndSetSecondaryIndexPruning(carbonTable.getTableInfo(), filters, config);

Review comment:
       done




--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

GitBox
In reply to this post by GitBox

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r617318018



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/index/CarbonIndexUtil.scala
##########
@@ -396,6 +398,88 @@ object CarbonIndexUtil {
     }
   }
 
+  def updateIndexStatusInBatch(carbonTable: CarbonTable,
+      indexTables: List[CarbonTable],
+      indexType: IndexType,
+      status: IndexStatus,
+      sparkSession: SparkSession): Unit = {
+    val dbName = carbonTable.getDatabaseName
+    val tableName = carbonTable.getTableName
+    val metadataLock = CarbonLockFactory.getCarbonLockObj(carbonTable.getAbsoluteTableIdentifier,
+      LockUsage.METADATA_LOCK)
+    try {
+      if (metadataLock.lockWithRetries()) {
+        CarbonMetadata.getInstance.removeTable(dbName, tableName)
+        val table = CarbonEnv.getCarbonTable(Some(dbName), tableName)(sparkSession)
+        val indexTableInfos = IndexTableInfo.fromGson(table.getIndexInfo)
+        val indexMetadata = table.getIndexMetadata
+        indexTables.foreach { indexTable =>
+          IndexTableInfo.setIndexStatus(indexTableInfos, indexTable.getTableName, status)
+          indexMetadata.updateIndexStatus(indexType.getIndexProviderName,
+            indexTable.getTableName,
+            status.name())
+        }
+        table.getTableInfo
+          .getFactTable
+          .getTableProperties
+          .put(table.getCarbonTableIdentifier.getTableId, indexMetadata.serialize)
+        sparkSession.sql(s"""ALTER TABLE $dbName.$tableName SET SERDEPROPERTIES ('indexInfo' = '${
+          IndexTableInfo.toGson(indexTableInfos)
+        }')""".stripMargin).collect()
+        CarbonHiveIndexMetadataUtil.refreshTable(dbName, tableName, sparkSession)
+        CarbonMetadata.getInstance.removeTable(dbName, tableName)
+        CarbonMetadata.getInstance.loadTableMetadata(table.getTableInfo)
+      }
+    } finally {
+      metadataLock.unlock()
+    }
+  }
+
+  def updateIndexStatus(carbonTable: CarbonTable,

Review comment:
       Yes. As discussed offline, have created a subtask under same jira.




--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

GitBox
In reply to this post by GitBox

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r617319168



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/index/CarbonIndexUtil.scala
##########
@@ -396,6 +398,88 @@ object CarbonIndexUtil {
     }
   }
 
+  def updateIndexStatusInBatch(carbonTable: CarbonTable,
+      indexTables: List[CarbonTable],
+      indexType: IndexType,
+      status: IndexStatus,
+      sparkSession: SparkSession): Unit = {
+    val dbName = carbonTable.getDatabaseName
+    val tableName = carbonTable.getTableName
+    val metadataLock = CarbonLockFactory.getCarbonLockObj(carbonTable.getAbsoluteTableIdentifier,
+      LockUsage.METADATA_LOCK)
+    try {
+      if (metadataLock.lockWithRetries()) {
+        CarbonMetadata.getInstance.removeTable(dbName, tableName)
+        val table = CarbonEnv.getCarbonTable(Some(dbName), tableName)(sparkSession)
+        val indexTableInfos = IndexTableInfo.fromGson(table.getIndexInfo)
+        val indexMetadata = table.getIndexMetadata
+        indexTables.foreach { indexTable =>
+          IndexTableInfo.setIndexStatus(indexTableInfos, indexTable.getTableName, status)
+          indexMetadata.updateIndexStatus(indexType.getIndexProviderName,
+            indexTable.getTableName,
+            status.name())
+        }
+        table.getTableInfo
+          .getFactTable
+          .getTableProperties
+          .put(table.getCarbonTableIdentifier.getTableId, indexMetadata.serialize)
+        sparkSession.sql(s"""ALTER TABLE $dbName.$tableName SET SERDEPROPERTIES ('indexInfo' = '${
+          IndexTableInfo.toGson(indexTableInfos)
+        }')""".stripMargin).collect()
+        CarbonHiveIndexMetadataUtil.refreshTable(dbName, tableName, sparkSession)
+        CarbonMetadata.getInstance.removeTable(dbName, tableName)
+        CarbonMetadata.getInstance.loadTableMetadata(table.getTableInfo)
+      }
+    } finally {
+      metadataLock.unlock()
+    }
+  }
+
+  def updateIndexStatus(carbonTable: CarbonTable,
+      indexName: String,
+      indexType: IndexType,
+      status: IndexStatus,
+      needLock: Boolean = true,
+      sparkSession: SparkSession): Unit = {
+    val dbName = carbonTable.getDatabaseName

Review comment:
       replied above




--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] maheshrajus commented on pull request #4110: [CARBONDATA-4158]Add Secondary Index as a coarse-grain index and use secondary indexes for Presto queries

GitBox
In reply to this post by GitBox

maheshrajus commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-824087364


   I executed secondary index flows in the presto cluster with this code. No issue found and flows are working fine as per design and discussion.


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on pull request #4110: [CARBONDATA-4158]Add Secondary Index as a coarse-grain index and use secondary indexes for Presto queries

GitBox
In reply to this post by GitBox

ajantha-bhat commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-824096942


   @VenuReddy2103 , @maheshrajus : when SI is not a datamap flow, we support matching to more than one  SI tables also for filter queries. Is it supported in SI as datamap flow ?


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4110: [CARBONDATA-4158]Add Secondary Index as a coarse-grain index and use secondary indexes for Presto queries

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r617590710



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/index/CarbonIndexUtil.scala
##########
@@ -396,6 +398,88 @@ object CarbonIndexUtil {
     }
   }
 
+  def updateIndexStatusInBatch(carbonTable: CarbonTable,
+      indexTables: List[CarbonTable],
+      indexType: IndexType,
+      status: IndexStatus,
+      sparkSession: SparkSession): Unit = {
+    val dbName = carbonTable.getDatabaseName
+    val tableName = carbonTable.getTableName
+    val metadataLock = CarbonLockFactory.getCarbonLockObj(carbonTable.getAbsoluteTableIdentifier,
+      LockUsage.METADATA_LOCK)
+    try {
+      if (metadataLock.lockWithRetries()) {
+        CarbonMetadata.getInstance.removeTable(dbName, tableName)
+        val table = CarbonEnv.getCarbonTable(Some(dbName), tableName)(sparkSession)
+        val indexTableInfos = IndexTableInfo.fromGson(table.getIndexInfo)
+        val indexMetadata = table.getIndexMetadata
+        indexTables.foreach { indexTable =>
+          IndexTableInfo.setIndexStatus(indexTableInfos, indexTable.getTableName, status)
+          indexMetadata.updateIndexStatus(indexType.getIndexProviderName,
+            indexTable.getTableName,
+            status.name())
+        }
+        table.getTableInfo
+          .getFactTable
+          .getTableProperties
+          .put(table.getCarbonTableIdentifier.getTableId, indexMetadata.serialize)
+        sparkSession.sql(s"""ALTER TABLE $dbName.$tableName SET SERDEPROPERTIES ('indexInfo' = '${
+          IndexTableInfo.toGson(indexTableInfos)
+        }')""".stripMargin).collect()
+        CarbonHiveIndexMetadataUtil.refreshTable(dbName, tableName, sparkSession)
+        CarbonMetadata.getInstance.removeTable(dbName, tableName)
+        CarbonMetadata.getInstance.loadTableMetadata(table.getTableInfo)
+      }
+    } finally {
+      metadataLock.unlock()
+    }
+  }
+
+  def updateIndexStatus(carbonTable: CarbonTable,

Review comment:
       ok. we can track this
   https://issues.apache.org/jira/browse/CARBONDATA-4169




--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on pull request #4110: [CARBONDATA-4158]Add Secondary Index as a coarse-grain index and use secondary indexes for Presto queries

GitBox
In reply to this post by GitBox

ajantha-bhat commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-824115250


   LGTM,
   we should work on CARBONDATA-4169 to support querying already existing tables also.
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] maheshrajus commented on pull request #4110: [CARBONDATA-4158]Add Secondary Index as a coarse-grain index and use secondary indexes for Presto queries

GitBox
In reply to this post by GitBox

maheshrajus commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-824230350


   > @VenuReddy2103 , @maheshrajus : when SI is not a datamap flow, we support matching to more than one SI tables also for filter queries. Is it supported in SI as datamap flow ?
   
   @ajantha-bhat the behaviour is the same in both cases[SI as datamap and SI as non-datamap(plan-rewrite)]
   I checked the below cases.
   1) SI created in one column and query with a combination of Si index and non-si index., Then we are getting output based on filters[AND, OR etc] correctly.
   2) SI created in two different columns and query with the combination of both SI index columns then we are getting output based on filters[AND, OR etc] correctly.
   
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kunal642 commented on pull request #4110: [CARBONDATA-4158]Add Secondary Index as a coarse-grain index and use secondary indexes for Presto queries

GitBox
In reply to this post by GitBox

kunal642 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-824560735


   LGTM,
   @maheshrajus can you please start working on CARBONDATA-4169


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] asfgit closed pull request #4110: [CARBONDATA-4158]Add Secondary Index as a coarse-grain index and use secondary indexes for Presto queries

GitBox
In reply to this post by GitBox

asfgit closed pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110


   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


12345