GitHub user ravipesala opened a pull request:
https://github.com/apache/carbondata/pull/2478 [CARBONDATA-2540][CARBONDATA-2560][CARBONDATA-2568][MV] Add validations for unsupported MV queries This PR depends on https://github.com/apache/carbondata/pull/2453 Problem: Validations are missing on the unsupported MV queries while creating MV datamap. Solution: Added validation for the unsupported MV queries while creating datamap. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ravipesala/incubator-carbondata pr-2540 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2478.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2478 ---- commit 844cff764e94f18ff8d115ccd636a06a0931bffb Author: ravipesala <ravi.pesala@...> Date: 2018-06-14T06:10:07Z Fixed order by in mv and aggregation functions inside projection expressions are fixed commit 642d05b9dab3a001c2338a5dca920d952f0ccd5c Author: ravipesala <ravi.pesala@...> Date: 2018-07-09T10:19:57Z Added validation for unsupported MV queries ---- --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2478 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6987/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2478 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5765/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2478 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5748/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2478 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7192/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2478 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5967/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2478 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5854/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2478 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5858/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2478 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7245/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2478 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6020/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2478 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7289/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2478 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6056/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2478 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5917/ --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2478#discussion_r204432239 --- Diff: datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVHelper.scala --- @@ -118,6 +122,43 @@ object MVHelper { DataMapStoreManager.getInstance().saveDataMapSchema(dataMapSchema) } + private def validateMVQuery(sparkSession: SparkSession, + logicalPlan: LogicalPlan) { + val dataMapProvider = DataMapManager.get().getDataMapProvider(null, + new DataMapSchema("", DataMapClassProvider.MV.getShortName), sparkSession) + dataMapProvider + var catalog = DataMapStoreManager.getInstance().getDataMapCatalog(dataMapProvider, + DataMapClassProvider.MV.getShortName).asInstanceOf[SummaryDatasetCatalog] + if (catalog == null) { + catalog = new SummaryDatasetCatalog(sparkSession) + } + val modularPlan = + catalog.mvSession.sessionState.modularizer.modularize( + catalog.mvSession.sessionState.optimizer.execute(logicalPlan)).next().semiHarmonized + + val isValid = modularPlan match { + case g: GroupBy => + // Make sure all predicates are present in projections. + g.predicateList.forall{p => + g.outputList.exists{ + case a: Alias => + a.semanticEquals(p) || a.child.semanticEquals(p) + case other => other.semanticEquals(p) + } + } + case _ => true + } + if (!isValid) { + throw new UnsupportedOperationException("Group by columns must be present in project columns") + } + if(catalog.isMVWithSameQueryPresent(logicalPlan)) { --- End diff -- add space after `if` --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2478#discussion_r204434793 --- Diff: datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVHelper.scala --- @@ -118,6 +122,43 @@ object MVHelper { DataMapStoreManager.getInstance().saveDataMapSchema(dataMapSchema) } + private def validateMVQuery(sparkSession: SparkSession, + logicalPlan: LogicalPlan) { --- End diff -- add `: Unit` --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2478#discussion_r204435022 --- Diff: datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVHelper.scala --- @@ -118,6 +122,43 @@ object MVHelper { DataMapStoreManager.getInstance().saveDataMapSchema(dataMapSchema) } + private def validateMVQuery(sparkSession: SparkSession, + logicalPlan: LogicalPlan) { + val dataMapProvider = DataMapManager.get().getDataMapProvider(null, + new DataMapSchema("", DataMapClassProvider.MV.getShortName), sparkSession) + dataMapProvider --- End diff -- useless statement --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2478#discussion_r204435669 --- Diff: datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVHelper.scala --- @@ -118,6 +122,43 @@ object MVHelper { DataMapStoreManager.getInstance().saveDataMapSchema(dataMapSchema) } + private def validateMVQuery(sparkSession: SparkSession, + logicalPlan: LogicalPlan) { + val dataMapProvider = DataMapManager.get().getDataMapProvider(null, + new DataMapSchema("", DataMapClassProvider.MV.getShortName), sparkSession) + dataMapProvider + var catalog = DataMapStoreManager.getInstance().getDataMapCatalog(dataMapProvider, + DataMapClassProvider.MV.getShortName).asInstanceOf[SummaryDatasetCatalog] --- End diff -- Does casting `null` to `SummaryDatasetCatalog` ok? --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2478#discussion_r204435991 --- Diff: datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVHelper.scala --- @@ -118,6 +122,43 @@ object MVHelper { DataMapStoreManager.getInstance().saveDataMapSchema(dataMapSchema) } + private def validateMVQuery(sparkSession: SparkSession, + logicalPlan: LogicalPlan) { + val dataMapProvider = DataMapManager.get().getDataMapProvider(null, + new DataMapSchema("", DataMapClassProvider.MV.getShortName), sparkSession) + dataMapProvider + var catalog = DataMapStoreManager.getInstance().getDataMapCatalog(dataMapProvider, + DataMapClassProvider.MV.getShortName).asInstanceOf[SummaryDatasetCatalog] + if (catalog == null) { + catalog = new SummaryDatasetCatalog(sparkSession) + } + val modularPlan = + catalog.mvSession.sessionState.modularizer.modularize( + catalog.mvSession.sessionState.optimizer.execute(logicalPlan)).next().semiHarmonized + + val isValid = modularPlan match { + case g: GroupBy => + // Make sure all predicates are present in projections. + g.predicateList.forall{p => + g.outputList.exists{ + case a: Alias => + a.semanticEquals(p) || a.child.semanticEquals(p) + case other => other.semanticEquals(p) + } + } + case _ => true + } + if (!isValid) { + throw new UnsupportedOperationException("Group by columns must be present in project columns") + } + if(catalog.isMVWithSameQueryPresent(logicalPlan)) { + throw new UnsupportedOperationException("MV with same query present") + } + if (!modularPlan.isSPJGH) { --- End diff -- I think it is better to move this before line 139 --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2478#discussion_r204436654 --- Diff: datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVHelper.scala --- @@ -118,6 +122,43 @@ object MVHelper { DataMapStoreManager.getInstance().saveDataMapSchema(dataMapSchema) } + private def validateMVQuery(sparkSession: SparkSession, + logicalPlan: LogicalPlan) { + val dataMapProvider = DataMapManager.get().getDataMapProvider(null, + new DataMapSchema("", DataMapClassProvider.MV.getShortName), sparkSession) + dataMapProvider + var catalog = DataMapStoreManager.getInstance().getDataMapCatalog(dataMapProvider, + DataMapClassProvider.MV.getShortName).asInstanceOf[SummaryDatasetCatalog] + if (catalog == null) { + catalog = new SummaryDatasetCatalog(sparkSession) + } + val modularPlan = + catalog.mvSession.sessionState.modularizer.modularize( + catalog.mvSession.sessionState.optimizer.execute(logicalPlan)).next().semiHarmonized + + val isValid = modularPlan match { + case g: GroupBy => + // Make sure all predicates are present in projections. + g.predicateList.forall{p => + g.outputList.exists{ + case a: Alias => + a.semanticEquals(p) || a.child.semanticEquals(p) + case other => other.semanticEquals(p) + } + } + case _ => true + } + if (!isValid) { + throw new UnsupportedOperationException("Group by columns must be present in project columns") + } + if(catalog.isMVWithSameQueryPresent(logicalPlan)) { + throw new UnsupportedOperationException("MV with same query present") + } + if (!modularPlan.isSPJGH) { + throw new UnsupportedOperationException("MV is not supported for this query") --- End diff -- Can you explain for specificlly like `Only Select-Predicate-Join-Groupby-Having query is supported for MV' --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2478#discussion_r204479220 --- Diff: datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVHelper.scala --- @@ -118,6 +122,43 @@ object MVHelper { DataMapStoreManager.getInstance().saveDataMapSchema(dataMapSchema) } + private def validateMVQuery(sparkSession: SparkSession, + logicalPlan: LogicalPlan) { + val dataMapProvider = DataMapManager.get().getDataMapProvider(null, + new DataMapSchema("", DataMapClassProvider.MV.getShortName), sparkSession) + dataMapProvider + var catalog = DataMapStoreManager.getInstance().getDataMapCatalog(dataMapProvider, + DataMapClassProvider.MV.getShortName).asInstanceOf[SummaryDatasetCatalog] + if (catalog == null) { + catalog = new SummaryDatasetCatalog(sparkSession) + } + val modularPlan = + catalog.mvSession.sessionState.modularizer.modularize( + catalog.mvSession.sessionState.optimizer.execute(logicalPlan)).next().semiHarmonized + + val isValid = modularPlan match { + case g: GroupBy => + // Make sure all predicates are present in projections. + g.predicateList.forall{p => + g.outputList.exists{ + case a: Alias => + a.semanticEquals(p) || a.child.semanticEquals(p) + case other => other.semanticEquals(p) + } + } + case _ => true + } + if (!isValid) { + throw new UnsupportedOperationException("Group by columns must be present in project columns") + } + if(catalog.isMVWithSameQueryPresent(logicalPlan)) { + throw new UnsupportedOperationException("MV with same query present") + } + if (!modularPlan.isSPJGH) { + throw new UnsupportedOperationException("MV is not supported for this query") --- End diff -- ok --- |
Free forum by Nabble | Edit this page |