GitHub user kunal642 opened a pull request:
https://github.com/apache/carbondata/pull/1613 [CARBONDATA-1737] [CARBONDATA-1760] Fixed partial load issue if user has set segments to access on parent table Analysis: Partial load was happening on pre-aggregate table when the user has set segments to access for the parent table. Solution: Set segments to access to * before firing load for child table. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kunal642/carbondata preagg_loading_fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1613.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1613 ---- commit ac5d4fa6bf843810422f04c9cda90dc3ac3224c2 Author: kunal642 <[hidden email]> Date: 2017-12-05T11:31:46Z fixed partial load issue if user has set segments to access on parent table ---- --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1613 Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/456/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1613 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1724/ --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1613#discussion_r154977110 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/preaaggregate/CreatePreAggregateTableCommand.scala --- @@ -124,19 +122,15 @@ case class CreatePreAggregateTableCommand( val loadAvailable = SegmentStatusManager.readLoadMetadata(parentCarbonTable.getMetaDataFilepath) .nonEmpty if (loadAvailable) { - val headers = parentCarbonTable.getTableInfo.getFactTable.getListOfColumns. - asScala.map(_.getColumnName).mkString(",") - val childDataFrame = sparkSession.sql( - new CarbonSpark2SqlParser().addPreAggLoadFunction(queryString)) - CarbonLoadDataCommand(tableIdentifier.database, - tableIdentifier.table, - null, - Nil, - Map("fileheader" -> headers), - isOverwriteTable = false, - dataFrame = Some(childDataFrame), - internalOptions = Map(CarbonCommonConstants.IS_INTERNAL_LOAD_CALL -> "true")). - run(sparkSession) + // Passing segmentToLoad as * because we want to load all the segments into the + // pre-aggregate table even if the user has set some segments on the parent table. + PreAggregateUtil + .startDataLoadForDataMap(parentCarbonTable, --- End diff -- follow coding style: ``` a.foo( paramA, paramB) ``` --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1613#discussion_r154977309 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/preaaggregate/PreAggregateUtil.scala --- @@ -493,4 +495,49 @@ object PreAggregateUtil { updatedPlan } + /** + * This method will start load process on the data map + */ + def startDataLoadForDataMap(parentCarbonTable: CarbonTable, --- End diff -- move first parameter to next line --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1613#discussion_r154977627 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/preaaggregate/PreAggregateUtil.scala --- @@ -493,4 +495,49 @@ object PreAggregateUtil { updatedPlan } + /** + * This method will start load process on the data map + */ + def startDataLoadForDataMap(parentCarbonTable: CarbonTable, + dataMapIdentifier: TableIdentifier, + queryString: String, + segmentToLoad: String, + validateSegments: Boolean, + sparkSession: SparkSession): Unit = { + CarbonSession.threadSet( --- End diff -- Do not use thread local to pass parameters, pass them explicitly to the command --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1613 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2106/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1613 Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/487/ --- |
In reply to this post by qiuchenjian-2
Github user kunal642 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1613#discussion_r155155362 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/preaaggregate/PreAggregateUtil.scala --- @@ -493,4 +495,49 @@ object PreAggregateUtil { updatedPlan } + /** + * This method will start load process on the data map + */ + def startDataLoadForDataMap(parentCarbonTable: CarbonTable, + dataMapIdentifier: TableIdentifier, + queryString: String, + segmentToLoad: String, + validateSegments: Boolean, + sparkSession: SparkSession): Unit = { + CarbonSession.threadSet( --- End diff -- This parameter is used to specify the segments to scan in CarbonScanRDD. Therefore i cannot pass it explicitly. As discussed with @gvramana and @ravipesala this was only option that was agreed upon. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1613 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1751/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1613 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2134/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1613 Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/497/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1613 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1760/ --- |
In reply to this post by qiuchenjian-2
Github user kunal642 commented on the issue:
https://github.com/apache/carbondata/pull/1613 retest this please --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1613 Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/509/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1613#discussion_r155249189 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonScanRDD.scala --- @@ -359,14 +359,24 @@ class CarbonScanRDD( } val carbonSessionInfo = ThreadLocalSessionInfo.getCarbonSessionInfo if (carbonSessionInfo != null) { - CarbonTableInputFormat.setAggeragateTableSegments(conf, carbonSessionInfo.getSessionParams - .getProperty(CarbonCommonConstants.CARBON_INPUT_SEGMENTS + - identifier.getCarbonTableIdentifier.getDatabaseName + "." + - identifier.getCarbonTableIdentifier.getTableName, "")) - CarbonTableInputFormat.setValidateSegmentsToAccess(conf, carbonSessionInfo.getSessionParams + if (carbonSessionInfo.getSessionParams + .getProperty(CarbonCommonConstants.CARBON_INPUT_SEGMENTS + + identifier.getCarbonTableIdentifier.getDatabaseName + "." + + identifier.getCarbonTableIdentifier.getTableName) != null) { + CarbonTableInputFormat.setAggeragateTableSegments(conf, carbonSessionInfo.getSessionParams + .getProperty(CarbonCommonConstants.CARBON_INPUT_SEGMENTS + + identifier.getCarbonTableIdentifier.getDatabaseName + "." + + identifier.getCarbonTableIdentifier.getTableName)) + } + if (carbonSessionInfo.getSessionParams + .getProperty(CarbonCommonConstants.VALIDATE_CARBON_INPUT_SEGMENTS + + identifier.getCarbonTableIdentifier.getDatabaseName + "." + + identifier.getCarbonTableIdentifier.getTableName) != null) { --- End diff -- Please take this to variable and reuse it --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1613 Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/512/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1613 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2143/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1613 LGTM --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1613 Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/516/ --- |
Free forum by Nabble | Edit this page |