GitHub user kunal642 opened a pull request:
https://github.com/apache/carbondata/pull/2923 [WIP] added partition columns to the last when collecting columns Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kunal642/carbondata partition_alter_bugfix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2923.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2923 ---- commit c399910c682f2573827214b1f76aea0225d7ecf8 Author: kunal642 <kunalkapoor642@...> Date: 2018-11-15T04:40:00Z added partition columns to the last when collecting columns ---- --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2923 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1413/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2923 Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9671/ --- |
In reply to this post by qiuchenjian-2
Github user kunal642 commented on the issue:
https://github.com/apache/carbondata/pull/2923 retest this please --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2923 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1419/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2923 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1628/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2923 Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9676/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2923 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1427/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2923 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1636/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2923 Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9684/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2923 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1428/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2923 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1637/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2923 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9685/ --- |
In reply to this post by qiuchenjian-2
Github user kunal642 commented on the issue:
https://github.com/apache/carbondata/pull/2923 @ravipesala @manishgupta88 Please review --- |
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2923#discussion_r234499385 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableQueryTestCase.scala --- @@ -437,6 +437,20 @@ test("Creation of partition table should fail if the colname in table schema and sql("drop datamap if exists preaggTable on table partitionTable") } + test("validate data in partition table after dropping and adding a column") { + sql("drop table if exists par") + sql("create table par(name string) partitioned by (age double) stored by " + + "'carbondata'") + sql(s"load data local inpath '$resourcesPath/uniqwithoutheader.csv' into table par options" + + s"('header'='false')") + sql("alter table par drop columns(name)") + sql("alter table par add columns(name string)") + sql(s"load data local inpath '$resourcesPath/uniqwithoutheader.csv' into table par options" + + s"('header'='false')") --- End diff -- keeping partition column at the end is carbondata behavior which may or may not be known to user. For a normal table whenever a column is dropped and added, the added column data should either be added as the last column in csv file or it should be mapped through fileheader which is the correct behavior. As you are using the same csv file in your test case without changing the order of data and providing header the above explained behavior might not hold true. Please revisit the changes and take opinion from other PMC's/Committers on this behavioral change --- |
In reply to this post by qiuchenjian-2
Github user kunal642 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2923#discussion_r234966352 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableQueryTestCase.scala --- @@ -437,6 +437,20 @@ test("Creation of partition table should fail if the colname in table schema and sql("drop datamap if exists preaggTable on table partitionTable") } + test("validate data in partition table after dropping and adding a column") { + sql("drop table if exists par") + sql("create table par(name string) partitioned by (age double) stored by " + + "'carbondata'") + sql(s"load data local inpath '$resourcesPath/uniqwithoutheader.csv' into table par options" + + s"('header'='false')") + sql("alter table par drop columns(name)") + sql("alter table par add columns(name string)") + sql(s"load data local inpath '$resourcesPath/uniqwithoutheader.csv' into table par options" + + s"('header'='false')") --- End diff -- Actually spark expects the partition column to be the last. So the solution that you are proposing will not work incase of datasource because spark will parse the schema and would always add the partition column to the last --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2923#discussion_r235270774 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableQueryTestCase.scala --- @@ -437,6 +437,20 @@ test("Creation of partition table should fail if the colname in table schema and sql("drop datamap if exists preaggTable on table partitionTable") } + test("validate data in partition table after dropping and adding a column") { + sql("drop table if exists par") + sql("create table par(name string) partitioned by (age double) stored by " + + "'carbondata'") + sql(s"load data local inpath '$resourcesPath/uniqwithoutheader.csv' into table par options" + + s"('header'='false')") + sql("alter table par drop columns(name)") + sql("alter table par add columns(name string)") + sql(s"load data local inpath '$resourcesPath/uniqwithoutheader.csv' into table par options" + + s"('header'='false')") --- End diff -- @kunal642 Please check the behaviour of all version spark. Are all versions of spark behave the same way? --- |
In reply to this post by qiuchenjian-2
Github user kunal642 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2923#discussion_r235681603 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableQueryTestCase.scala --- @@ -437,6 +437,20 @@ test("Creation of partition table should fail if the colname in table schema and sql("drop datamap if exists preaggTable on table partitionTable") } + test("validate data in partition table after dropping and adding a column") { + sql("drop table if exists par") + sql("create table par(name string) partitioned by (age double) stored by " + + "'carbondata'") + sql(s"load data local inpath '$resourcesPath/uniqwithoutheader.csv' into table par options" + + s"('header'='false')") + sql("alter table par drop columns(name)") + sql("alter table par add columns(name string)") + sql(s"load data local inpath '$resourcesPath/uniqwithoutheader.csv' into table par options" + + s"('header'='false')") --- End diff -- @ravipesala Spark-2.1 and 2.2 both put partition column at the last even if a new column is added. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2923 LGTM --- |
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on the issue:
https://github.com/apache/carbondata/pull/2923 LGTM --- |
Free forum by Nabble | Edit this page |