GitHub user jackylk opened a pull request:
https://github.com/apache/incubator-carbondata/pull/441 [CARBONDATA-539] Fiix column projection bug Currently when using CarbonInputFormat in map reduce app, if projection columns are not set, then Carbon will return empty row. This PR fix this bug. `CarbonInputFormat` are refactored to make cleaner API to application. Only 4 settings are now public to application. 1. TablePath 2. Column Projection 3. Filter Expression 4. ReadSupport class Function descriptions are added in `CarbonInputFormat` You can merge this pull request into a Git repository by running: $ git pull https://github.com/jackylk/incubator-carbondata format2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/441.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #441 ---- commit 5d93fb03ad9df61fbdb11b4753bebb326adfe281 Author: jackylk <[hidden email]> Date: 2016-12-18T16:33:42Z fix column projection bug ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/441#discussion_r92945104 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/util/CarbonInputFormatUtil.java --- @@ -70,6 +69,18 @@ public static CarbonQueryPlan createQueryPlan(CarbonTable carbonTable, String co i++; } } + } else { + // query all columns + List<CarbonMeasure> tableMsrs = carbonTable.getMeasureByTableName(factTableName); --- End diff -- This is the bug, when input projection is null, earlier it does not add all columns to read --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/441 Build Success with Spark 1.5.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/218/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/441#discussion_r93031221 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/util/CarbonInputFormatUtil.java --- @@ -70,6 +69,18 @@ public static CarbonQueryPlan createQueryPlan(CarbonTable carbonTable, String co i++; } } + } else { + // query all columns + List<CarbonMeasure> tableMsrs = carbonTable.getMeasureByTableName(factTableName); --- End diff -- It was an performance issue if all columns are added when projection is null. That was the reason it was removed. Any reason it is added back? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/441#discussion_r93032803 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/util/CarbonInputFormatUtil.java --- @@ -70,6 +69,18 @@ public static CarbonQueryPlan createQueryPlan(CarbonTable carbonTable, String co i++; } } + } else { + // query all columns + List<CarbonMeasure> tableMsrs = carbonTable.getMeasureByTableName(factTableName); --- End diff -- Because when I run HadoopFileExample, it is returning empty row. Then I found out that it is because projection columns are not set in `CarbonInputFormat`. So I have added it back. If it is removed, then it is mandatory that user set projection columns in `CarbonInputFormat` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/441#discussion_r93034892 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/util/CarbonInputFormatUtil.java --- @@ -70,6 +69,18 @@ public static CarbonQueryPlan createQueryPlan(CarbonTable carbonTable, String co i++; } } + } else { + // query all columns + List<CarbonMeasure> tableMsrs = carbonTable.getMeasureByTableName(factTableName); --- End diff -- yes, But we cannot make it mandatory because in count(*) case it sets null projection so that just returns empty rows but can calculate the count using that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/441#discussion_r93038276 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/util/CarbonInputFormatUtil.java --- @@ -70,6 +69,18 @@ public static CarbonQueryPlan createQueryPlan(CarbonTable carbonTable, String co i++; } } + } else { + // query all columns + List<CarbonMeasure> tableMsrs = carbonTable.getMeasureByTableName(factTableName); --- End diff -- ok, I got the issue here. Can we do like as following 1. if it is come from count(*), we set the projection column to a special string (like empty string "") in configuration, then we can process it differently in `createQueryPlan`. 2. If user want to project all column, projection column should be set to null or list all columns. 3. If user want to project specific columns, then set it to specific columns --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/441#discussion_r93051766 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/util/CarbonInputFormatUtil.java --- @@ -70,6 +69,18 @@ public static CarbonQueryPlan createQueryPlan(CarbonTable carbonTable, String co i++; } } + } else { + // query all columns + List<CarbonMeasure> tableMsrs = carbonTable.getMeasureByTableName(factTableName); --- End diff -- we cannot know whether it is count(*) unless we override aggregation strategies in carbon but I guess that is not recommendable. So from spark if it asks empty projection then better we should give empty rows. It can be achieved by special column name set to the projection. And from hadoop if projection is null we can add all columns. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/441#discussion_r94212785 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/util/CarbonInputFormatUtil.java --- @@ -70,6 +69,18 @@ public static CarbonQueryPlan createQueryPlan(CarbonTable carbonTable, String co i++; } } + } else { + // query all columns + List<CarbonMeasure> tableMsrs = carbonTable.getMeasureByTableName(factTableName); --- End diff -- ok, I will add a special column name set to projection --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on the issue:
https://github.com/apache/incubator-carbondata/pull/441 I think it is better let user set the column projection explicitly. So closing this PR --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk closed the pull request at:
https://github.com/apache/incubator-carbondata/pull/441 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
Free forum by Nabble | Edit this page |