[GitHub] carbondata pull request #2923: [WIP] added partition columns to the last whe...

classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2923: [WIP] added partition columns to the last whe...

qiuchenjian-2
GitHub user kunal642 opened a pull request:

    https://github.com/apache/carbondata/pull/2923

    [WIP] added partition columns to the last when collecting columns

    Be sure to do all of the following checklist to help us incorporate
    your contribution quickly and easily:
   
     - [ ] Any interfaces changed?
     
     - [ ] Any backward compatibility impacted?
     
     - [ ] Document update required?
   
     - [ ] Testing done
            Please provide details on
            - Whether new unit test cases have been added or why no new tests are required?
            - How it is tested? Please attach test report.
            - Is it a performance related change? Please attach the performance test report.
            - Any additional information to help reviewers in testing this change.
           
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kunal642/carbondata partition_alter_bugfix

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2923.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2923
   
----
commit c399910c682f2573827214b1f76aea0225d7ecf8
Author: kunal642 <kunalkapoor642@...>
Date:   2018-11-15T04:40:00Z

    added partition columns to the last when collecting columns

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2923: [WIP] Fixed dataload failure when a column is droppe...

qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2923
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1413/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2923: [CARBONDATA-3101] Fixed dataload failure when a colu...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2923
 
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9671/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2923: [CARBONDATA-3101] Fixed dataload failure when a colu...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kunal642 commented on the issue:

    https://github.com/apache/carbondata/pull/2923
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2923: [CARBONDATA-3101] Fixed dataload failure when a colu...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2923
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1419/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2923: [CARBONDATA-3101] Fixed dataload failure when a colu...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2923
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1628/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2923: [CARBONDATA-3101] Fixed dataload failure when a colu...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2923
 
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9676/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2923: [CARBONDATA-3101] Fixed dataload failure when a colu...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2923
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1427/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2923: [CARBONDATA-3101] Fixed dataload failure when a colu...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2923
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1636/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2923: [CARBONDATA-3101] Fixed dataload failure when a colu...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2923
 
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9684/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2923: [CARBONDATA-3101] Fixed dataload failure when a colu...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2923
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1428/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2923: [CARBONDATA-3101] Fixed dataload failure when a colu...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2923
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1637/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2923: [CARBONDATA-3101] Fixed dataload failure when a colu...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2923
 
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9685/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2923: [CARBONDATA-3101] Fixed dataload failure when a colu...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kunal642 commented on the issue:

    https://github.com/apache/carbondata/pull/2923
 
    @ravipesala @manishgupta88 Please review


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2923: [CARBONDATA-3101] Fixed dataload failure when...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2923#discussion_r234499385
 
    --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableQueryTestCase.scala ---
    @@ -437,6 +437,20 @@ test("Creation of partition table should fail if the colname in table schema and
         sql("drop datamap if exists preaggTable on table partitionTable")
       }
     
    +  test("validate data in partition table after dropping and adding a column") {
    +    sql("drop table if exists par")
    +    sql("create table par(name string) partitioned by (age double) stored by " +
    +              "'carbondata'")
    +    sql(s"load data local inpath '$resourcesPath/uniqwithoutheader.csv' into table par options" +
    +        s"('header'='false')")
    +    sql("alter table par drop columns(name)")
    +    sql("alter table par add columns(name string)")
    +    sql(s"load data local inpath '$resourcesPath/uniqwithoutheader.csv' into table par options" +
    +        s"('header'='false')")
    --- End diff --
   
    keeping partition column at the end is carbondata behavior which may or may not be known to user. For a normal table whenever a column is dropped and added, the added column data should either be added as the last column in csv file or it should be mapped through fileheader which is the correct behavior.
    As you are using the same csv file in your test case without changing the order of data and providing header the above explained behavior might not hold true. Please revisit the changes and take opinion from other PMC's/Committers on this behavioral change


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2923: [CARBONDATA-3101] Fixed dataload failure when...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kunal642 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2923#discussion_r234966352
 
    --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableQueryTestCase.scala ---
    @@ -437,6 +437,20 @@ test("Creation of partition table should fail if the colname in table schema and
         sql("drop datamap if exists preaggTable on table partitionTable")
       }
     
    +  test("validate data in partition table after dropping and adding a column") {
    +    sql("drop table if exists par")
    +    sql("create table par(name string) partitioned by (age double) stored by " +
    +              "'carbondata'")
    +    sql(s"load data local inpath '$resourcesPath/uniqwithoutheader.csv' into table par options" +
    +        s"('header'='false')")
    +    sql("alter table par drop columns(name)")
    +    sql("alter table par add columns(name string)")
    +    sql(s"load data local inpath '$resourcesPath/uniqwithoutheader.csv' into table par options" +
    +        s"('header'='false')")
    --- End diff --
   
    Actually spark expects the partition column to be the last. So the solution that you are proposing will not work incase of datasource because spark will parse the schema and would always add the partition column to the last


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2923: [CARBONDATA-3101] Fixed dataload failure when...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2923#discussion_r235270774
 
    --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableQueryTestCase.scala ---
    @@ -437,6 +437,20 @@ test("Creation of partition table should fail if the colname in table schema and
         sql("drop datamap if exists preaggTable on table partitionTable")
       }
     
    +  test("validate data in partition table after dropping and adding a column") {
    +    sql("drop table if exists par")
    +    sql("create table par(name string) partitioned by (age double) stored by " +
    +              "'carbondata'")
    +    sql(s"load data local inpath '$resourcesPath/uniqwithoutheader.csv' into table par options" +
    +        s"('header'='false')")
    +    sql("alter table par drop columns(name)")
    +    sql("alter table par add columns(name string)")
    +    sql(s"load data local inpath '$resourcesPath/uniqwithoutheader.csv' into table par options" +
    +        s"('header'='false')")
    --- End diff --
   
    @kunal642 Please check the behaviour of all version spark. Are all versions of spark behave the same way?


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2923: [CARBONDATA-3101] Fixed dataload failure when...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kunal642 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2923#discussion_r235681603
 
    --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableQueryTestCase.scala ---
    @@ -437,6 +437,20 @@ test("Creation of partition table should fail if the colname in table schema and
         sql("drop datamap if exists preaggTable on table partitionTable")
       }
     
    +  test("validate data in partition table after dropping and adding a column") {
    +    sql("drop table if exists par")
    +    sql("create table par(name string) partitioned by (age double) stored by " +
    +              "'carbondata'")
    +    sql(s"load data local inpath '$resourcesPath/uniqwithoutheader.csv' into table par options" +
    +        s"('header'='false')")
    +    sql("alter table par drop columns(name)")
    +    sql("alter table par add columns(name string)")
    +    sql(s"load data local inpath '$resourcesPath/uniqwithoutheader.csv' into table par options" +
    +        s"('header'='false')")
    --- End diff --
   
    @ravipesala Spark-2.1 and 2.2 both put partition column at the last even if a new column is added.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2923: [CARBONDATA-3101] Fixed dataload failure when a colu...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2923
 
    LGTM


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2923: [CARBONDATA-3101] Fixed dataload failure when a colu...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on the issue:

    https://github.com/apache/carbondata/pull/2923
 
    LGTM


---
12