[GitHub] incubator-carbondata pull request #802: [CARBONDATA-926] Changed max columns...

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #802: [CARBONDATA-926] Changed max columns...

qiuchenjian-2
GitHub user kunal642 opened a pull request:

    https://github.com/apache/incubator-carbondata/pull/802

    [CARBONDATA-926] Changed max columns from static value to configurable value in Load DDL

    Changes done:-
   
    1. Earlier the value for max columns that can be read by the csv parser was hard coded.
    Now the value would be read from the query options and if no option is provided then the default value of 0 would be taken.
   
    2. Added query id in compaction flow for driver statistics.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kunal642/incubator-carbondata CARBONDATA-926

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-carbondata/pull/802.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #802
   
----
commit cf854888f2914bec2239b594deb9edefd72c123e
Author: kunal642 <[hidden email]>
Date:   2017-04-15T08:18:17Z

    changed max columns from static value to configurable

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata issue #802: [CARBONDATA-926] Changed max columns from s...

qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/802
 
    Build Failed  with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1635/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata issue #802: [CARBONDATA-926] Changed max columns from s...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kunal642 commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/802
 
    retest this please



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata issue #802: [CARBONDATA-926] Changed max columns from s...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/802
 
    Build Failed  with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1639/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata issue #802: [CARBONDATA-926] Changed max columns from s...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/802
 
    Build Failed  with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1640/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata issue #802: [CARBONDATA-926] Changed max columns from s...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/802
 
    Build Failed  with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1641/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata issue #802: [CARBONDATA-926] Changed max columns from s...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/802
 
    Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1643/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata issue #802: [CARBONDATA-926] Changed max columns from s...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/802
 
    Build Failed  with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1647/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata issue #802: [CARBONDATA-926] Changed max columns from s...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/802
 
    Build Failed  with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1648/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #802: [CARBONDATA-926] Changed max columns...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user gvramana commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/802#discussion_r111665089
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/csvload/CSVInputFormat.java ---
    @@ -220,8 +241,16 @@ private CsvParserSettings extractCsvParserSettings(Configuration job) {
           parserSettings.setIgnoreTrailingWhitespaces(false);
           parserSettings.setSkipEmptyLines(false);
           parserSettings.setMaxCharsPerColumn(100000);
    -      // TODO get from csv file.
    -      parserSettings.setMaxColumns(1000);
    +      String maxColumns = job.get(MAX_COLUMNS);
    +      int maxColumnsInt = 0;
    --- End diff --
   
    default value should be taken and same should be logged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #802: [CARBONDATA-926] Changed max columns...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user gvramana commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/802#discussion_r111665083
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/csvload/CSVInputFormat.java ---
    @@ -231,6 +260,31 @@ private CsvParserSettings extractCsvParserSettings(Configuration job) {
           return parserSettings;
         }
     
    +    /**
    +     * This method will decide the number of columns to be parsed for a row by Csv parser
    +     *
    +     * @param columnCountInSchema total number of columns in schema
    +     * @return
    +     */
    +    private int getMaxColumnsForParsing(int columnCountInSchema, int maxColumns) {
    +      int maxNumberOfColumnsForParsing = DEFAULT_MAX_NUMBER_OF_COLUMNS_FOR_PARSING;
    +      if (maxColumns > 0) {
    +        if (columnCountInSchema >= maxColumns) {
    +          maxNumberOfColumnsForParsing = columnCountInSchema + 1;
    --- End diff --
   
    taking more columns than the configured max value, should not happen.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #802: [CARBONDATA-926] Changed max columns...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user gvramana commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/802#discussion_r111667378
 
    --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala ---
    @@ -342,7 +345,23 @@ object CommonUtil {
               + "the same. Input file : " + csvFile)
           }
         }
    -
         csvColumns
       }
    +
    +  def validateMaxColumns(csvHeaders: Array[String], maxColumns: String): Unit = {
    +    if (maxColumns != null) {
    +      try {
    +        val maxColumnsInt = maxColumns.toInt
    +        if( csvHeaders.length > maxColumnsInt) {
    +          sys.error("csv headers should be less than the max columns")
    --- End diff --
   
    also need to print max Value which is considered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata issue #802: [CARBONDATA-926] Changed max columns from s...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user gvramana commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/802
 
    Handle all the validation in driver and remove from executor.
    parameters: csvheadercolumns, maxcolumns(default 2000), Threashold(20000).
    1) User configures both  csvheadercolumns, maxcolumns,
    - if csvheadercolumns >= maxcolumns, give error " csvheadercolumns should be less than max columns <value>"
    - if maxcolumns > threashold, give error "max columns cannot be more than threashold(20,000)"
    2) User configures csvheadercolumns
    - if csvheadercolumns >= maxcolumns(default)
        maxcolumns = csvheadercolumns+1
    - if csvheadercolumns >= threashold, give error  " csvheadercolumns should be less than max threashold(20,000)"
    3) User configures nothing
    - if csvheadercolumns >= maxcolumns(default)
        maxcolumns = csvheadercolumns+1
    - if csvheadercolumns >= threashold, give error  " csvheadercolumns should be less than max threashold(20,000)"



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata issue #802: [CARBONDATA-926] Changed max columns from s...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/802
 
    Build Failed  with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1657/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata issue #802: [CARBONDATA-926] Changed max columns from s...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/802
 
    Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1663/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata issue #802: [CARBONDATA-926] Changed max columns from s...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user gvramana commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/802
 
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #802: [CARBONDATA-926] Changed max columns...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-carbondata/pull/802


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---