Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] [carbondata] nihal0107 opened a new pull request #4071: [CARBONDATA-4102] Added UT and FT to improve coverage of SI module.

Classic

List

103 messages Options

Options

123456

GitBox

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4071: [CARBONDATA-4102] Added UT and FT to improve coverage of SI module.

CarbonDataQA2 commented on pull request #4071:
URL: https://github.com/apache/carbondata/pull/4071#issuecomment-766720456

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5351/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4071: [CARBONDATA-4102] Added UT and FT to improve coverage of SI module.

In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4071:
URL: https://github.com/apache/carbondata/pull/4071#issuecomment-766721723

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3591/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4071: [CARBONDATA-4102] Added UT and FT to improve coverage of SI module.

In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4071:
URL: https://github.com/apache/carbondata/pull/4071#issuecomment-766553712

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] nihal0107 commented on a change in pull request #4071: [CARBONDATA-4102] Added UT and FT to improve coverage of SI module.

In reply to this post by GitBox

nihal0107 commented on a change in pull request #4071:
URL: https://github.com/apache/carbondata/pull/4071#discussion_r563482082

##########
File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/mergedata/CarbonDataFileMergeTestCaseOnSI.scala
##########
@@ -288,9 +268,7 @@ class CarbonDataFileMergeTestCaseOnSI
CarbonCommonConstants.CARBON_SI_SEGMENT_MERGE_DEFAULT)
}

- test("test verify data file merge when exception occurred in rebuild segment") {
- CarbonProperties.getInstance()
- .addProperty(CarbonCommonConstants.CARBON_SI_SEGMENT_MERGE, "false")
+ def createTableAndLoadData(): Unit = {

Review comment:
done

##########
File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/DropTableTest.scala
##########
@@ -88,4 +90,17 @@ class DropTableTest extends QueryTest with BeforeAndAfterAll {
assert(sql("show indexes on testDrop").collect().isEmpty)
sql("drop table if exists testDrop")
}
+
+ test("test index drop when SI table is not deleted while main table is deleted") {
+ sql("drop table if exists testDrop")
+ sql("create table testDrop (a string, b string, c string) STORED AS carbondata")
+ sql("create index index11 on table testDrop (c) AS 'carbondata'")
+ sql("insert into testDrop values('ab', 'cd', 'ef')")
+ val mock = TestSecondaryIndexUtils.mockSIDrop()
+ sql("drop table if exists testDrop")
+ mock.tearDown()
+ assert(Files.exists(Paths.get(warehouse + "/" + "index11")))
+ sql("drop table if exists testDrop")
+ sql("drop database if exists test cascade")

Review comment:
removed

##########
File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestCarbonInternalMetastore.scala
##########
@@ -0,0 +1,137 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.spark.testsuite.secondaryindex
+
+import org.apache.spark.sql.CarbonEnv
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.index.CarbonIndexUtil
+import org.apache.spark.sql.secondaryindex.hive.CarbonInternalMetastore
+import org.apache.spark.sql.test.util.QueryTest
+import org.scalatest.{BeforeAndAfterAll, BeforeAndAfterEach}
+
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable
+
+class TestCarbonInternalMetastore extends QueryTest with BeforeAndAfterAll with BeforeAndAfterEach {
+
+ val dbName: String = "test"
+ val tableName: String = "table1"
+ val indexName: String = "index1"
+ var tableIdentifier: TableIdentifier = _
+ var parentCarbonTable: CarbonTable = _
+ var indexTable: CarbonTable = _
+
+ override def beforeAll(): Unit = {
+ sql("drop database if exists test cascade");
+ }
+
+ override def beforeEach(): Unit = {
+ sql("drop database if exists test cascade");
+ sql("create database test")
+ sql("use test")
+ sql("drop table if exists table1")
+ sql("create table table1(a string, b string, c string) stored as carbondata")
+ sql("create index index1 on table1(b) as 'carbondata'")
+ sql("insert into table1 values('ab','bc','cd')")
+ }
+
+ def setVariables(indexName: String): Unit = {
+ tableIdentifier = new TableIdentifier(indexName, Some("test"))
+ parentCarbonTable = CarbonEnv.getCarbonTable(Some("test"), "table1")(sqlContext.sparkSession)
+ indexTable = CarbonEnv.getCarbonTable(Some("test"), "index1")(sqlContext.sparkSession)
+ }
+
+ test("test delete index silent") {
+ setVariables("index1")
+ CarbonInternalMetastore.deleteIndexSilent(tableIdentifier, "",
+ parentCarbonTable)(sqlContext.sparkSession)
+ checkExistence(sql("show indexes on table1"), false, "index1")
+ }
+
+ test("test delete index table silently when exception occur") {
+ setVariables("unknown")
+ CarbonInternalMetastore.deleteIndexSilent(tableIdentifier, "",
+ parentCarbonTable)(sqlContext.sparkSession)
+ checkExistence(sql("show indexes on table1"), true, "index1")
+ setVariables("index1")
+ CarbonInternalMetastore.deleteIndexSilent(tableIdentifier, "",
+ indexTable)(sqlContext.sparkSession)

Review comment:
this is just to check when we are passing index table as parent table then delete will fail. I have added comments and also included true condition.

##########
File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestCarbonInternalMetastore.scala
##########
@@ -0,0 +1,137 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.spark.testsuite.secondaryindex
+
+import org.apache.spark.sql.CarbonEnv
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.index.CarbonIndexUtil
+import org.apache.spark.sql.secondaryindex.hive.CarbonInternalMetastore
+import org.apache.spark.sql.test.util.QueryTest
+import org.scalatest.{BeforeAndAfterAll, BeforeAndAfterEach}
+
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable
+
+class TestCarbonInternalMetastore extends QueryTest with BeforeAndAfterAll with BeforeAndAfterEach {
+
+ val dbName: String = "test"
+ val tableName: String = "table1"
+ val indexName: String = "index1"
+ var tableIdentifier: TableIdentifier = _
+ var parentCarbonTable: CarbonTable = _
+ var indexTable: CarbonTable = _
+
+ override def beforeAll(): Unit = {
+ sql("drop database if exists test cascade");
+ }
+
+ override def beforeEach(): Unit = {
+ sql("drop database if exists test cascade");
+ sql("create database test")
+ sql("use test")
+ sql("drop table if exists table1")
+ sql("create table table1(a string, b string, c string) stored as carbondata")
+ sql("create index index1 on table1(b) as 'carbondata'")
+ sql("insert into table1 values('ab','bc','cd')")
+ }
+
+ def setVariables(indexName: String): Unit = {
+ tableIdentifier = new TableIdentifier(indexName, Some("test"))
+ parentCarbonTable = CarbonEnv.getCarbonTable(Some("test"), "table1")(sqlContext.sparkSession)
+ indexTable = CarbonEnv.getCarbonTable(Some("test"), "index1")(sqlContext.sparkSession)
+ }
+
+ test("test delete index silent") {
+ setVariables("index1")
+ CarbonInternalMetastore.deleteIndexSilent(tableIdentifier, "",
+ parentCarbonTable)(sqlContext.sparkSession)
+ checkExistence(sql("show indexes on table1"), false, "index1")
+ }
+
+ test("test delete index table silently when exception occur") {
+ setVariables("unknown")
+ CarbonInternalMetastore.deleteIndexSilent(tableIdentifier, "",
+ parentCarbonTable)(sqlContext.sparkSession)
+ checkExistence(sql("show indexes on table1"), true, "index1")
+ setVariables("index1")
+ CarbonInternalMetastore.deleteIndexSilent(tableIdentifier, "",
+ indexTable)(sqlContext.sparkSession)
+ checkExistence(sql("show indexes on table1"), true, "index1")
+ }
+
+ test("test show index when SI were created before the change CARBONDATA-3765") {
+ val mock = TestSecondaryIndexUtils.mockIndexInfo()
+ checkExistence(sql("show indexes on table1"), true, "index1")
+ mock.tearDown()
+ }
+
+ test("test refresh index with different value of isIndexTableExists") {
+ setVariables("index1")
+ sql("create index index2 on table1(b) as 'bloomfilter'")
+ parentCarbonTable = CarbonEnv.getCarbonTable(Some("test"), "table1")(sqlContext.sparkSession)
+ assert(CarbonIndexUtil.isIndexExists(parentCarbonTable).equalsIgnoreCase("true"))
+ CarbonIndexUtil.addOrModifyTableProperty(parentCarbonTable, Map("indexExists" -> "false"))(

Review comment:
yeah, made the changes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] Indhumathi27 commented on pull request #4071: [CARBONDATA-4102] Added UT and FT to improve coverage of SI module.

In reply to this post by GitBox

Indhumathi27 commented on pull request #4071:
URL: https://github.com/apache/carbondata/pull/4071#issuecomment-768178940

retest this please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4071: [CARBONDATA-4102] Added UT and FT to improve coverage of SI module.

In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4071:
URL: https://github.com/apache/carbondata/pull/4071#issuecomment-768231136

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3602/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4071: [CARBONDATA-4102] Added UT and FT to improve coverage of SI module.

In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4071:
URL: https://github.com/apache/carbondata/pull/4071#issuecomment-768231235

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5362/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] Indhumathi27 commented on pull request #4071: [CARBONDATA-4102] Added UT and FT to improve coverage of SI module.

In reply to this post by GitBox

Indhumathi27 commented on pull request #4071:
URL: https://github.com/apache/carbondata/pull/4071#issuecomment-768810377

LGTM

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4071: [CARBONDATA-4102] Added UT and FT to improve coverage of SI module.

In reply to this post by GitBox

VenuReddy2103 commented on a change in pull request #4071:
URL: https://github.com/apache/carbondata/pull/4071#discussion_r566061138

##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/allqueries/TestQueryWithColumnMetCacheAndCacheLevelProperty.scala
##########
@@ -363,5 +365,20 @@ class TestQueryWithColumnMetCacheAndCacheLevelProperty
sql("CREATE INDEX parallel_index on parallel_index_load(b) AS 'carbondata'")
checkAnswer(sql("select b from parallel_index"), Seq(Row("bb"), Row("dd"), Row("ff")))
sql("drop index if exists parallel_index on parallel_index_load")
+ val mock = mockIsSchemaModified()
+ sql("CREATE INDEX parallel_index on parallel_index_load(b) AS 'carbondata'")
+ checkAnswer(sql("select b from parallel_index"), Seq(Row("bb"), Row("dd"), Row("ff")))
+ sql("drop index if exists parallel_index on parallel_index_load")
+ mock.tearDown()

Review comment:
Don't have call tearDown() explicitly on mockup object in this case. Probably can check all calls to mockup object's tearDown() added in this PR and remove the ones which are not required.
jmockit teardown() description -
> Discards the mock methods originally set up by instantiating this mock-up object, restoring mocked methods to their original behaviors.
JMockit will automatically restore classes mocked by a test at the end of its execution, as well as classes mocked for the whole test class before the first test in the next test class is executed. Therefore, this method should rarely be used, if ever.

http://javadox.com/org.jmockit/jmockit/1.9/mockit/MockUp.html#tearDown()

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4071: [CARBONDATA-4102] Added UT and FT to improve coverage of SI module.

In reply to this post by GitBox

VenuReddy2103 commented on a change in pull request #4071:
URL: https://github.com/apache/carbondata/pull/4071#discussion_r566094958

##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/allqueries/TestQueryWithColumnMetCacheAndCacheLevelProperty.scala
##########
@@ -363,5 +365,20 @@ class TestQueryWithColumnMetCacheAndCacheLevelProperty
sql("CREATE INDEX parallel_index on parallel_index_load(b) AS 'carbondata'")
checkAnswer(sql("select b from parallel_index"), Seq(Row("bb"), Row("dd"), Row("ff")))
sql("drop index if exists parallel_index on parallel_index_load")
+ val mock = mockIsSchemaModified()
+ sql("CREATE INDEX parallel_index on parallel_index_load(b) AS 'carbondata'")
+ checkAnswer(sql("select b from parallel_index"), Seq(Row("bb"), Row("dd"), Row("ff")))
+ sql("drop index if exists parallel_index on parallel_index_load")
+ mock.tearDown()
+ }
+
+ def mockIsSchemaModified(): MockUp[TableInfo] = {

Review comment:
This tableInfo mockup instance for mocking isSchemaModified method to return true is specific to a particular testcase. No need to define a separate method. How about directly setup in testcase(line 368 can have like below) ?
new MockUp[TableInfo] {
@Mock
def isSchemaModified(): Boolean = {
true
}
}

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4071: [CARBONDATA-4102] Added UT and FT to improve coverage of SI module.

In reply to this post by GitBox

VenuReddy2103 commented on a change in pull request #4071:
URL: https://github.com/apache/carbondata/pull/4071#discussion_r566094958

##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/allqueries/TestQueryWithColumnMetCacheAndCacheLevelProperty.scala
##########
@@ -363,5 +365,20 @@ class TestQueryWithColumnMetCacheAndCacheLevelProperty
sql("CREATE INDEX parallel_index on parallel_index_load(b) AS 'carbondata'")
checkAnswer(sql("select b from parallel_index"), Seq(Row("bb"), Row("dd"), Row("ff")))
sql("drop index if exists parallel_index on parallel_index_load")
+ val mock = mockIsSchemaModified()
+ sql("CREATE INDEX parallel_index on parallel_index_load(b) AS 'carbondata'")
+ checkAnswer(sql("select b from parallel_index"), Seq(Row("bb"), Row("dd"), Row("ff")))
+ sql("drop index if exists parallel_index on parallel_index_load")
+ mock.tearDown()
+ }
+
+ def mockIsSchemaModified(): MockUp[TableInfo] = {

Review comment:
This tableInfo mockup instance for mocking isSchemaModified method to return true is specific to a particular testcase. No need to define a separate method. How about directly setup in testcase(line 368 can have like below) ? I think, we can do same at other places also.
new MockUp[TableInfo] {
@Mock
def isSchemaModified(): Boolean = {
true
}
}

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4071: [CARBONDATA-4102] Added UT and FT to improve coverage of SI module.

In reply to this post by GitBox

VenuReddy2103 commented on a change in pull request #4071:
URL: https://github.com/apache/carbondata/pull/4071#discussion_r566590213

##########
File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/mergedata/CarbonDataFileMergeTestCaseOnSI.scala
##########
@@ -288,20 +216,24 @@ class CarbonDataFileMergeTestCaseOnSI
CarbonCommonConstants.CARBON_SI_SEGMENT_MERGE_DEFAULT)
}

- test("test verify data file merge when exception occurred in rebuild segment") {
- CarbonProperties.getInstance()
- .addProperty(CarbonCommonConstants.CARBON_SI_SEGMENT_MERGE, "false")
+ def createTableAndLoadData(globalSortPartition: String, loadTimes: Int): Unit = {
sql("DROP TABLE IF EXISTS nonindexmerge")
sql(
"""
| CREATE TABLE nonindexmerge(id INT, name STRING, city STRING, age INT)
| STORED AS carbondata
| TBLPROPERTIES('SORT_COLUMNS'='city,name', 'SORT_SCOPE'='GLOBAL_SORT')
""".stripMargin)
- sql(s"LOAD DATA LOCAL INPATH '$file2' INTO TABLE nonindexmerge OPTIONS('header'='false', " +
- s"'GLOBAL_SORT_PARTITIONS'='100')")
- sql(s"LOAD DATA LOCAL INPATH '$file2' INTO TABLE nonindexmerge OPTIONS('header'='false', " +
- s"'GLOBAL_SORT_PARTITIONS'='100')")
+ for (_ <- 0 to loadTimes) {

Review comment:
loop runs for 0 to loadTimes(included) i.e., loadTimes + 1 times. You would want to use `until loadTimes` instead of `to loadTimes`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4071: [CARBONDATA-4102] Added UT and FT to improve coverage of SI module.

In reply to this post by GitBox

VenuReddy2103 commented on a change in pull request #4071:
URL: https://github.com/apache/carbondata/pull/4071#discussion_r566602925

##########
File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/mergedata/CarbonDataFileMergeTestCaseOnSI.scala
##########
@@ -327,6 +259,35 @@ class CarbonDataFileMergeTestCaseOnSI
.queryExecution.sparkPlan
assert(getDataFileCount("nonindexmerge_index1", "0") == 100)
assert(getDataFileCount("nonindexmerge_index1", "1") == 100)
+
+ // exception is thrown by compaction executor
+ val mock3 = TestSecondaryIndexUtils.mockCompactionExecutor()
+ val exception2 = intercept[Exception] {
+ sql("REFRESH INDEX nonindexmerge_index1 ON TABLE nonindexmerge").collect()
+ }
+ mock3.tearDown()
+ assert(exception2.getMessage.contains("Merge data files Failure in Merger Rdd."))
+ df1 = sql("""Select * from nonindexmerge where name='n16000'""")
+ .queryExecution.sparkPlan
+ assert(getDataFileCount("nonindexmerge_index1", "0") == 100)
+ assert(getDataFileCount("nonindexmerge_index1", "1") == 100)
+ CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_SI_SEGMENT_MERGE,
+ CarbonCommonConstants.CARBON_SI_SEGMENT_MERGE_DEFAULT)
+ }
+
+ test("test refresh index command when block need to be sorted") {
+ CarbonProperties.getInstance()
+ .addProperty(CarbonCommonConstants.CARBON_SI_SEGMENT_MERGE, "false")
+ createTableAndLoadData("100", 2)
+ sql("CREATE INDEX nonindexmerge_index1 on table nonindexmerge (name) AS 'carbondata'")
+ val mock = TestSecondaryIndexUtils.mockIsSortRequired();
+ sql("REFRESH INDEX nonindexmerge_index1 ON TABLE nonindexmerge").collect()
+ mock.tearDown()
+ val df1 = sql("""Select * from nonindexmerge where name='n16000'""")
+ .queryExecution.sparkPlan
+ assert(isFilterPushedDownToSI(df1))
+ assert(getDataFileCount("nonindexmerge_index1", "0") < 15)
+ assert(getDataFileCount("nonindexmerge_index1", "1") < 15)

Review comment:
CarbonCommonConstants.CARBON_SI_SEGMENT_MERGE property is set to "false" at beginning and end of testcase. Note that CarbonCommonConstants.CARBON_SI_SEGMENT_MERGE_DEFAULT value is also "false". Why set to false second time ? You might want to set to true at the end of testcase.

Same issue observed in following testcases. Please recheck for all of these cases.
test("Verify data file merge in SI segments with sort scope as gloabl sort and" +
"CARBON_SI_SEGMENT_MERGE property is enabled")
test("Verify REFRESH INDEX command with sort scope as global sort")
test("test verify data file merge when exception occurred in rebuild segment")
test("test refresh index command when block need to be sorted")

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4071: [CARBONDATA-4102] Added UT and FT to improve coverage of SI module.

In reply to this post by GitBox

VenuReddy2103 commented on a change in pull request #4071:
URL: https://github.com/apache/carbondata/pull/4071#discussion_r566602925

##########
File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/mergedata/CarbonDataFileMergeTestCaseOnSI.scala
##########
@@ -327,6 +259,35 @@ class CarbonDataFileMergeTestCaseOnSI
.queryExecution.sparkPlan
assert(getDataFileCount("nonindexmerge_index1", "0") == 100)
assert(getDataFileCount("nonindexmerge_index1", "1") == 100)
+
+ // exception is thrown by compaction executor
+ val mock3 = TestSecondaryIndexUtils.mockCompactionExecutor()
+ val exception2 = intercept[Exception] {
+ sql("REFRESH INDEX nonindexmerge_index1 ON TABLE nonindexmerge").collect()
+ }
+ mock3.tearDown()
+ assert(exception2.getMessage.contains("Merge data files Failure in Merger Rdd."))
+ df1 = sql("""Select * from nonindexmerge where name='n16000'""")
+ .queryExecution.sparkPlan
+ assert(getDataFileCount("nonindexmerge_index1", "0") == 100)
+ assert(getDataFileCount("nonindexmerge_index1", "1") == 100)
+ CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_SI_SEGMENT_MERGE,
+ CarbonCommonConstants.CARBON_SI_SEGMENT_MERGE_DEFAULT)
+ }
+
+ test("test refresh index command when block need to be sorted") {
+ CarbonProperties.getInstance()
+ .addProperty(CarbonCommonConstants.CARBON_SI_SEGMENT_MERGE, "false")
+ createTableAndLoadData("100", 2)
+ sql("CREATE INDEX nonindexmerge_index1 on table nonindexmerge (name) AS 'carbondata'")
+ val mock = TestSecondaryIndexUtils.mockIsSortRequired();
+ sql("REFRESH INDEX nonindexmerge_index1 ON TABLE nonindexmerge").collect()
+ mock.tearDown()
+ val df1 = sql("""Select * from nonindexmerge where name='n16000'""")
+ .queryExecution.sparkPlan
+ assert(isFilterPushedDownToSI(df1))
+ assert(getDataFileCount("nonindexmerge_index1", "0") < 15)
+ assert(getDataFileCount("nonindexmerge_index1", "1") < 15)

Review comment:
CarbonCommonConstants.CARBON_SI_SEGMENT_MERGE property is set to "false" at beginning and end of testcase. Note that CarbonCommonConstants.CARBON_SI_SEGMENT_MERGE_DEFAULT value is also "false". Why set to false second time ? You might want to set to true at the end of testcase.

Same issue observed in following testcases. Please recheck for all of these cases.
`test("Verify data file merge in SI segments with sort scope as gloabl sort and" +
"CARBON_SI_SEGMENT_MERGE property is enabled")`
`test("Verify REFRESH INDEX command with sort scope as global sort")`
`test("test verify data file merge when exception occurred in rebuild segment")`
`test("test refresh index command when block need to be sorted")`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4071: [CARBONDATA-4102] Added UT and FT to improve coverage of SI module.

In reply to this post by GitBox

VenuReddy2103 commented on a change in pull request #4071:
URL: https://github.com/apache/carbondata/pull/4071#discussion_r566602925

##########
File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/mergedata/CarbonDataFileMergeTestCaseOnSI.scala
##########
@@ -327,6 +259,35 @@ class CarbonDataFileMergeTestCaseOnSI
.queryExecution.sparkPlan
assert(getDataFileCount("nonindexmerge_index1", "0") == 100)
assert(getDataFileCount("nonindexmerge_index1", "1") == 100)
+
+ // exception is thrown by compaction executor
+ val mock3 = TestSecondaryIndexUtils.mockCompactionExecutor()
+ val exception2 = intercept[Exception] {
+ sql("REFRESH INDEX nonindexmerge_index1 ON TABLE nonindexmerge").collect()
+ }
+ mock3.tearDown()
+ assert(exception2.getMessage.contains("Merge data files Failure in Merger Rdd."))
+ df1 = sql("""Select * from nonindexmerge where name='n16000'""")
+ .queryExecution.sparkPlan
+ assert(getDataFileCount("nonindexmerge_index1", "0") == 100)
+ assert(getDataFileCount("nonindexmerge_index1", "1") == 100)
+ CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_SI_SEGMENT_MERGE,
+ CarbonCommonConstants.CARBON_SI_SEGMENT_MERGE_DEFAULT)
+ }
+
+ test("test refresh index command when block need to be sorted") {
+ CarbonProperties.getInstance()
+ .addProperty(CarbonCommonConstants.CARBON_SI_SEGMENT_MERGE, "false")
+ createTableAndLoadData("100", 2)
+ sql("CREATE INDEX nonindexmerge_index1 on table nonindexmerge (name) AS 'carbondata'")
+ val mock = TestSecondaryIndexUtils.mockIsSortRequired();
+ sql("REFRESH INDEX nonindexmerge_index1 ON TABLE nonindexmerge").collect()
+ mock.tearDown()
+ val df1 = sql("""Select * from nonindexmerge where name='n16000'""")
+ .queryExecution.sparkPlan
+ assert(isFilterPushedDownToSI(df1))
+ assert(getDataFileCount("nonindexmerge_index1", "0") < 15)
+ assert(getDataFileCount("nonindexmerge_index1", "1") < 15)

Review comment:
`CarbonCommonConstants.CARBON_SI_SEGMENT_MERGE` property is set to `false` at beginning and end of testcase. Note that `CarbonCommonConstants.CARBON_SI_SEGMENT_MERGE_DEFAULT` value is also `false`. Why set to false second time ? You might want to set to true at the end of testcase.

Same issue observed in following testcases. Please recheck for all of these cases.
`test("Verify data file merge in SI segments with sort scope as gloabl sort and" +
"CARBON_SI_SEGMENT_MERGE property is enabled")`
`test("Verify REFRESH INDEX command with sort scope as global sort")`
`test("test verify data file merge when exception occurred in rebuild segment")`
`test("test refresh index command when block need to be sorted")`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4071: [CARBONDATA-4102] Added UT and FT to improve coverage of SI module.

In reply to this post by GitBox

VenuReddy2103 commented on a change in pull request #4071:
URL: https://github.com/apache/carbondata/pull/4071#discussion_r566602925

##########
File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/mergedata/CarbonDataFileMergeTestCaseOnSI.scala
##########
@@ -327,6 +259,35 @@ class CarbonDataFileMergeTestCaseOnSI
.queryExecution.sparkPlan
assert(getDataFileCount("nonindexmerge_index1", "0") == 100)
assert(getDataFileCount("nonindexmerge_index1", "1") == 100)
+
+ // exception is thrown by compaction executor
+ val mock3 = TestSecondaryIndexUtils.mockCompactionExecutor()
+ val exception2 = intercept[Exception] {
+ sql("REFRESH INDEX nonindexmerge_index1 ON TABLE nonindexmerge").collect()
+ }
+ mock3.tearDown()
+ assert(exception2.getMessage.contains("Merge data files Failure in Merger Rdd."))
+ df1 = sql("""Select * from nonindexmerge where name='n16000'""")
+ .queryExecution.sparkPlan
+ assert(getDataFileCount("nonindexmerge_index1", "0") == 100)
+ assert(getDataFileCount("nonindexmerge_index1", "1") == 100)
+ CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_SI_SEGMENT_MERGE,
+ CarbonCommonConstants.CARBON_SI_SEGMENT_MERGE_DEFAULT)
+ }
+
+ test("test refresh index command when block need to be sorted") {
+ CarbonProperties.getInstance()
+ .addProperty(CarbonCommonConstants.CARBON_SI_SEGMENT_MERGE, "false")
+ createTableAndLoadData("100", 2)
+ sql("CREATE INDEX nonindexmerge_index1 on table nonindexmerge (name) AS 'carbondata'")
+ val mock = TestSecondaryIndexUtils.mockIsSortRequired();
+ sql("REFRESH INDEX nonindexmerge_index1 ON TABLE nonindexmerge").collect()
+ mock.tearDown()
+ val df1 = sql("""Select * from nonindexmerge where name='n16000'""")
+ .queryExecution.sparkPlan
+ assert(isFilterPushedDownToSI(df1))
+ assert(getDataFileCount("nonindexmerge_index1", "0") < 15)
+ assert(getDataFileCount("nonindexmerge_index1", "1") < 15)

Review comment:
`CarbonCommonConstants.CARBON_SI_SEGMENT_MERGE` property is set to `false` at beginning and end of testcase. Note that `CarbonCommonConstants.CARBON_SI_SEGMENT_MERGE_DEFAULT` value is also `false`. Why set to `false` second time ? You might want to set to `true` at the end of testcase.

Same issue observed in following testcases. Please recheck for all of these cases.
`test("Verify data file merge in SI segments with sort scope as gloabl sort and" +
"CARBON_SI_SEGMENT_MERGE property is enabled")`
`test("Verify REFRESH INDEX command with sort scope as global sort")`
`test("test verify data file merge when exception occurred in rebuild segment")`
`test("test refresh index command when block need to be sorted")`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4071: [CARBONDATA-4102] Added UT and FT to improve coverage of SI module.

In reply to this post by GitBox

VenuReddy2103 commented on a change in pull request #4071:
URL: https://github.com/apache/carbondata/pull/4071#discussion_r566616826

##########
File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestCarbonInternalMetastore.scala
##########
@@ -0,0 +1,148 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.spark.testsuite.secondaryindex
+
+import org.apache.spark.sql.CarbonEnv
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.index.CarbonIndexUtil
+import org.apache.spark.sql.secondaryindex.hive.CarbonInternalMetastore
+import org.apache.spark.sql.test.util.QueryTest
+import org.scalatest.{BeforeAndAfterAll, BeforeAndAfterEach}
+
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable
+
+class TestCarbonInternalMetastore extends QueryTest with BeforeAndAfterAll with BeforeAndAfterEach {
+
+ val dbName: String = "test"
+ val tableName: String = "table1"
+ val indexName: String = "index1"
+ var tableIdentifier: TableIdentifier = _
+ var parentCarbonTable: CarbonTable = _
+ var indexTable: CarbonTable = _
+
+ override def beforeAll(): Unit = {
+ sql("drop database if exists test cascade");
+ }
+
+ override def beforeEach(): Unit = {
+ sql("drop database if exists test cascade");

Review comment:
IMO, drop, create and use database should go in `beforeAll()`. And drop, create table, index and insert into can go in `beforeEach()`. No need to drop and create database for each testcase.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4071: [CARBONDATA-4102] Added UT and FT to improve coverage of SI module.

In reply to this post by GitBox

VenuReddy2103 commented on a change in pull request #4071:
URL: https://github.com/apache/carbondata/pull/4071#discussion_r566641195

##########
File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestCreateIndexWithLoadAndCompaction.scala
##########
@@ -283,19 +285,32 @@ class TestCreateIndexWithLoadAndCompaction extends QueryTest with BeforeAndAfter
}
sql("ALTER TABLE table1 COMPACT 'CUSTOM' WHERE SEGMENT.ID IN (1,2,3)")

- val segments = sql("SHOW SEGMENTS FOR TABLE idx1")
- val segInfos = segments.collect().map { each =>
- ((each.toSeq) (0).toString, (each.toSeq) (1).toString)
- }
- assert(segInfos.length == 6)
+ val segInfos = checkSegmentList(6)
assert(segInfos.contains(("0", "Success")))
assert(segInfos.contains(("1", "Compacted")))
assert(segInfos.contains(("2", "Compacted")))
assert(segInfos.contains(("3", "Compacted")))
assert(segInfos.contains(("1.1", "Success")))
assert(segInfos.contains(("4", "Success")))
checkAnswer(sql("select * from table1 where c3='b2'"), Seq(Row(3, "a2", "b2")))
- sql("drop table if exists table1")
+
+ // after clean files
+ val mock = TestSecondaryIndexUtils.mockreadSegmentList()

Review comment:
why do we need to mock here? It should have cleaned compacted segments without mocking. Please check and remove mockreadSegmentList mocking.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4071: [CARBONDATA-4102] Added UT and FT to improve coverage of SI module.

In reply to this post by GitBox

VenuReddy2103 commented on a change in pull request #4071:
URL: https://github.com/apache/carbondata/pull/4071#discussion_r566680396

##########
File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestSIWithSecondaryIndex.scala
##########
@@ -577,6 +589,59 @@ class TestSIWithSecondaryIndex extends QueryTest with BeforeAndAfterAll {
assert(ex.getMessage.contains("Problem loading data while creating secondary index:"))
}

+ test("test SI with carbon.use.local.dir as true") {
+ CarbonProperties.getInstance()

Review comment:
Note default for `CARBON_LOADING_USE_YARN_LOCAL_DIR` is `true`. Why do we set to `true` again both at the begin and end of testcase?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #4071: [CARBONDATA-4102] Added UT and FT to improve coverage of SI module.

In reply to this post by GitBox

Indhumathi27 commented on a change in pull request #4071:
URL: https://github.com/apache/carbondata/pull/4071#discussion_r566685541

##########
File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestSIWithSecondaryIndex.scala
##########
@@ -577,6 +589,59 @@ class TestSIWithSecondaryIndex extends QueryTest with BeforeAndAfterAll {
assert(ex.getMessage.contains("Problem loading data while creating secondary index:"))
}

+ test("test SI with carbon.use.local.dir as true") {
+ CarbonProperties.getInstance()

Review comment:
When some testcase has set this value and not reverted back to default, we need to add this property again.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

123456