Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #2923: [WIP] added partition columns to the last whe...

Classic

List

21 messages Options

Options

12

[GitHub] carbondata pull request #2923: [WIP] added partition columns to the last whe...

GitHub user kunal642 opened a pull request:

https://github.com/apache/carbondata/pull/2923

[WIP] added partition columns to the last when collecting columns

Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:

- [ ] Any interfaces changed?

- [ ] Any backward compatibility impacted?

- [ ] Document update required?

- [ ] Testing done
Please provide details on
- Whether new unit test cases have been added or why no new tests are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance test report.
- Any additional information to help reviewers in testing this change.

- [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kunal642/carbondata partition_alter_bugfix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2923.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2923

----
commit c399910c682f2573827214b1f76aea0225d7ecf8
Author: kunal642 <kunalkapoor642@...>
Date: 2018-11-15T04:40:00Z

added partition columns to the last when collecting columns

----

---

[GitHub] carbondata issue #2923: [WIP] Fixed dataload failure when a column is droppe...

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2923

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1413/

---

[GitHub] carbondata issue #2923: [CARBONDATA-3101] Fixed dataload failure when a colu...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2923

Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9671/

---

[GitHub] carbondata issue #2923: [CARBONDATA-3101] Fixed dataload failure when a colu...

In reply to this post by qiuchenjian-2

Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2923

retest this please

---

[GitHub] carbondata issue #2923: [CARBONDATA-3101] Fixed dataload failure when a colu...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2923

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1419/

---

[GitHub] carbondata issue #2923: [CARBONDATA-3101] Fixed dataload failure when a colu...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2923

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1628/

---

[GitHub] carbondata issue #2923: [CARBONDATA-3101] Fixed dataload failure when a colu...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2923

Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9676/

---

[GitHub] carbondata issue #2923: [CARBONDATA-3101] Fixed dataload failure when a colu...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2923

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1427/

---

[GitHub] carbondata issue #2923: [CARBONDATA-3101] Fixed dataload failure when a colu...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2923

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1636/

---

[GitHub] carbondata issue #2923: [CARBONDATA-3101] Fixed dataload failure when a colu...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2923

Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9684/

---

[GitHub] carbondata issue #2923: [CARBONDATA-3101] Fixed dataload failure when a colu...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2923

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1428/

---

[GitHub] carbondata issue #2923: [CARBONDATA-3101] Fixed dataload failure when a colu...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2923

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1637/

---

[GitHub] carbondata issue #2923: [CARBONDATA-3101] Fixed dataload failure when a colu...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2923

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9685/

---

[GitHub] carbondata issue #2923: [CARBONDATA-3101] Fixed dataload failure when a colu...

In reply to this post by qiuchenjian-2

Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2923

@ravipesala @manishgupta88 Please review

---

[GitHub] carbondata pull request #2923: [CARBONDATA-3101] Fixed dataload failure when...

In reply to this post by qiuchenjian-2

Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2923#discussion_r234499385

--- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableQueryTestCase.scala ---
@@ -437,6 +437,20 @@ test("Creation of partition table should fail if the colname in table schema and
sql("drop datamap if exists preaggTable on table partitionTable")
}

+ test("validate data in partition table after dropping and adding a column") {
+ sql("drop table if exists par")
+ sql("create table par(name string) partitioned by (age double) stored by " +
+ "'carbondata'")
+ sql(s"load data local inpath '$resourcesPath/uniqwithoutheader.csv' into table par options" +
+ s"('header'='false')")
+ sql("alter table par drop columns(name)")
+ sql("alter table par add columns(name string)")
+ sql(s"load data local inpath '$resourcesPath/uniqwithoutheader.csv' into table par options" +
+ s"('header'='false')")
--- End diff --

keeping partition column at the end is carbondata behavior which may or may not be known to user. For a normal table whenever a column is dropped and added, the added column data should either be added as the last column in csv file or it should be mapped through fileheader which is the correct behavior.
As you are using the same csv file in your test case without changing the order of data and providing header the above explained behavior might not hold true. Please revisit the changes and take opinion from other PMC's/Committers on this behavioral change

---

[GitHub] carbondata pull request #2923: [CARBONDATA-3101] Fixed dataload failure when...

In reply to this post by qiuchenjian-2

Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2923#discussion_r234966352

--- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableQueryTestCase.scala ---
@@ -437,6 +437,20 @@ test("Creation of partition table should fail if the colname in table schema and
sql("drop datamap if exists preaggTable on table partitionTable")
}

+ test("validate data in partition table after dropping and adding a column") {
+ sql("drop table if exists par")
+ sql("create table par(name string) partitioned by (age double) stored by " +
+ "'carbondata'")
+ sql(s"load data local inpath '$resourcesPath/uniqwithoutheader.csv' into table par options" +
+ s"('header'='false')")
+ sql("alter table par drop columns(name)")
+ sql("alter table par add columns(name string)")
+ sql(s"load data local inpath '$resourcesPath/uniqwithoutheader.csv' into table par options" +
+ s"('header'='false')")
--- End diff --

Actually spark expects the partition column to be the last. So the solution that you are proposing will not work incase of datasource because spark will parse the schema and would always add the partition column to the last

---

[GitHub] carbondata pull request #2923: [CARBONDATA-3101] Fixed dataload failure when...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2923#discussion_r235270774

--- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableQueryTestCase.scala ---
@@ -437,6 +437,20 @@ test("Creation of partition table should fail if the colname in table schema and
sql("drop datamap if exists preaggTable on table partitionTable")
}

+ test("validate data in partition table after dropping and adding a column") {
+ sql("drop table if exists par")
+ sql("create table par(name string) partitioned by (age double) stored by " +
+ "'carbondata'")
+ sql(s"load data local inpath '$resourcesPath/uniqwithoutheader.csv' into table par options" +
+ s"('header'='false')")
+ sql("alter table par drop columns(name)")
+ sql("alter table par add columns(name string)")
+ sql(s"load data local inpath '$resourcesPath/uniqwithoutheader.csv' into table par options" +
+ s"('header'='false')")
--- End diff --

@kunal642 Please check the behaviour of all version spark. Are all versions of spark behave the same way?

---

[GitHub] carbondata pull request #2923: [CARBONDATA-3101] Fixed dataload failure when...

In reply to this post by qiuchenjian-2

Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2923#discussion_r235681603

--- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableQueryTestCase.scala ---
@@ -437,6 +437,20 @@ test("Creation of partition table should fail if the colname in table schema and
sql("drop datamap if exists preaggTable on table partitionTable")
}

+ test("validate data in partition table after dropping and adding a column") {
+ sql("drop table if exists par")
+ sql("create table par(name string) partitioned by (age double) stored by " +
+ "'carbondata'")
+ sql(s"load data local inpath '$resourcesPath/uniqwithoutheader.csv' into table par options" +
+ s"('header'='false')")
+ sql("alter table par drop columns(name)")
+ sql("alter table par add columns(name string)")
+ sql(s"load data local inpath '$resourcesPath/uniqwithoutheader.csv' into table par options" +
+ s"('header'='false')")
--- End diff --

@ravipesala Spark-2.1 and 2.2 both put partition column at the last even if a new column is added.

---

[GitHub] carbondata issue #2923: [CARBONDATA-3101] Fixed dataload failure when a colu...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2923

LGTM

---

[GitHub] carbondata issue #2923: [CARBONDATA-3101] Fixed dataload failure when a colu...

In reply to this post by qiuchenjian-2

Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2923

LGTM

---

12