[jira] [Assigned] (CARBONDATA-1387) Incorrect partition creation while inserting data from another table

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Assigned] (CARBONDATA-1387) Incorrect partition creation while inserting data from another table

Akash R Nilugal (Jira)

     [ https://issues.apache.org/jira/browse/CARBONDATA-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Prabhat Kashyap reassigned CARBONDATA-1387:
-------------------------------------------

    Assignee: Prabhat Kashyap

> Incorrect partition creation while inserting data from another table
> --------------------------------------------------------------------
>
>                 Key: CARBONDATA-1387
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-1387
>             Project: CarbonData
>          Issue Type: Bug
>          Components: data-query
>    Affects Versions: 1.2.0
>         Environment: spark 2.1
>            Reporter: Vandana Yadav
>            Assignee: Prabhat Kashyap
>
> Incorrect partition creation while inserting data from another table.
> Description: While inserting data from another table no of rows in each partition remain same although no of rows in partitioned table get increased.(no of rows in each partition should also increase as we are inserting new data into the partitioned table.
> Steps to reproduce:
> 1) Create Partitioned table:
> CREATE TABLE uniqdata_part (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string,DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 int) PARTITIONED BY (DOB Timestamp) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES ('PARTITION_TYPE'='RANGE','RANGE_INFO'='1971-01-01 01:00:03, 1972-01-01 01:00:03, 1974-01-01 01:00:03',"TABLE_BLOCKSIZE"= "256 MB")
> 2) Load data into the partitioned table:
> LOAD DATA INPATH 'hdfs://localhost:54310/uniqdata/2000_UniqData.csv' into table uniqdata_part OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1')
> 3) Create another table:
> CREATE TABLE uniqdata_1 (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES ("TABLE_BLOCKSIZE"= "256 MB")
> 4) Load data into this table:
> LOAD DATA INPATH 'hdfs://localhost:54310/uniqdata/2000_UniqData.csv' into table uniqdata_1 OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1')
> 5)Execute Queries:
> a) show partitions query:
> show partitions uniqdata_part
> Output:
> 0, dob = DEFAULT                                    
>  1, dob < 1971-01-01 01:00:03                        
>  2, 1971-01-01 01:00:03 <= dob < 1972-01-01 01:00:03  
>  3, 1972-01-01 01:00:03 <= dob < 1974-01-01 01:00:03
> b) Query for row count in the partitioned table:
> select count(*) from uniqdata_part
> Output:
>  count(1)  |
> +-----------+--+
> | 2013
> c)query for row count in partition 0:
> select count(*) from uniqdata_part where dob >= '1974-01-01 01:00:03'
> Output:
>  count(1)  |
> +-----------+--+
> | 539
> d) query for row count in partition 1 :
> select count(*) from uniqdata_part where dob < '1971-01-01 01:00:03'
> Output:
>  count(1)  |
> +-----------+--+
> | 366  
> e) query for row count in partition 3:
> select count(*) from uniqdata_part where dob >= '1971-01-01 01:00:03' and dob < '1972-01-01 01:00:03'
> Output:
>  count(1)  |
> +-----------+--+
> | 365
> f) query for row count in partition 4:
> select count(*) from uniqdata_part where dob >= '1972-01-01 01:00:03' and dob < '1974-01-01 01:00:03'
> Output:
> count(1)  |
> +-----------+--+
> | 731
> g) Insert data in partitioned table through the normal table:
> insert into uniqdata_part select * from uniqdata_1;
> h) Query for row count in the partitioned table after insertion operation:
> select count(*) from uniqdata_part
> Output:
>  count(1)  |
> +-----------+--+
> | 4026  
> i) Repeat queries from (c) to (f) for row count in the each partition.
> 6) Actual Result: it shows the same row count for each partition but the partitioned table has more rows in it.
> 7)Expected Result: No of rows in each partition should increase as no of rows increases in partitioned table



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)