Vyom Rastogi created CARBONDATA-662:
--------------------------------------- Summary: Minor Compaction results of count of individual column is incorrect Key: CARBONDATA-662 URL: https://issues.apache.org/jira/browse/CARBONDATA-662 Project: CarbonData Issue Type: Bug Components: data-load Affects Versions: 1.0.0-incubating Reporter: Vyom Rastogi Priority: Minor Carbon_Properties: carbon.allowed.compaction.days = 2 carbon.enable.auto.load.merge = true carbon.compaction.level.threshold = 4,3 carbon.numberof.preserve.segments = 2 carbon.sort.file.buffer.size = 20 carbon.leaf.node.size = 120000 max.query.execution.time = 60 carbon.number.of.cores.while.loading = 6 carbon.number.of.cores = 4 carbon.sort.size = 500000 Steps for compaction : 1)Create table uniqdata: CREATE TABLE uniqdata (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 int) STORED BY 'org.apache.carbondata.format'; 2)load the Uniqdata: LOAD DATA INPATH 'hdfs://hacluster/vyom/2000_UniqData.csv' into table uniqdata OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1'); 3)Select the count of cust_name by filtering Cust_id 0: jdbc:hive2://172.168.100.212:23040> select count( (CUST_NAME)) from uniqdata where CUST_ID=9000 ; +------+--+ | _c0 | +------+--+ | 1 | +------+--+ 4)load the Uniqdata: LOAD DATA INPATH 'hdfs://hacluster/vyom/2000_UniqData.csv' into table uniqdata OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1'); 5)Select 0: jdbc:hive2://172.168.100.212:23040> select count( (CUST_NAME)) from uniqdata where CUST_ID=9000 ; +------+--+ | _c0 | +------+--+ | 2 | +------+--+ 6)Load into uniqdata 0: jdbc:hive2://172.168.100.212:23040> LOAD DATA INPATH 'hdfs://hacluster/vyom/2000_UniqData.csv' into table uniqdata OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1'); +---------+--+ | Result | +---------+--+ +---------+--+ No rows selected (0.327 seconds) 7) 0: jdbc:hive2://172.168.100.212:23040> select count( (CUST_NAME)) from uniqdata where CUST_ID=9000 ; +------+--+ | _c0 | +------+--+ | 3 | +------+--+ 8) 0: jdbc:hive2://172.168.100.212:23040> LOAD DATA INPATH 'hdfs://hacluster/vyom/2000_UniqData.csv' into table uniqdata OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1'); +---------+--+ | Result | +---------+--+ +---------+--+ 9)Select the Count 0: jdbc:hive2://172.168.100.212:23040> select count( (CUST_NAME)) from uniqdata where CUST_ID=9000 ; +------+--+ | _c0 | +------+--+ | 4 | +------+--+ 1 row selected (5.194 seconds) 10)Load the data 0: jdbc:hive2://172.168.100.212:23040> LOAD DATA INPATH 'hdfs://hacluster/vyom/2000_UniqData.csv' into table uniqdata OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1'); +---------+--+ | Result | +---------+--+ +---------+--+ No rows selected (0.342 seconds) 11)Select into cust_name 0: jdbc:hive2://172.168.100.212:23040> select count( (CUST_NAME)) from uniqdata where CUST_ID=9000 ; +------+--+ | _c0 | +------+--+ | 5 | +------+--+ 1 row selected (5.19 seconds) 12)Show Segments for table uniqdata; +--------------------+------------------+--------------------------+--------------------------+--+ | SegmentSequenceId | Status | Load Start Time | Load End Time | +--------------------+------------------+--------------------------+--------------------------+--+ | 5 | Partial Success | 2017-01-19 16:33:14.006 | 2017-01-19 16:33:14.186 | | 4 | Partial Success | 2017-01-19 16:32:49.484 | 2017-01-19 16:32:49.691 | | 3 | Compacted | 2017-01-19 16:32:41.044 | 2017-01-19 16:32:41.252 | | 2 | Compacted | 2017-01-19 16:32:36.952 | 2017-01-19 16:32:37.159 | | 1 | Compacted | 2017-01-19 16:32:33.251 | 2017-01-19 16:32:33.447 | | 0.1 | Success | 2017-01-19 16:33:14.205 | 2017-01-19 16:33:14.531 | | 0 | Compacted | 2017-01-19 16:32:08.457 | 2017-01-19 16:32:08.657 | +--------------------+------------------+--------------------------+--------------------------+--+ 13)Select the cust_name by Cust_id ideally Count should be 6 but it was coming as 14 0: jdbc:hive2://172.168.100.212:23040> select count( (CUST_NAME)) from uniqdata where CUST_ID=9000 ; +------+--+ | _c0 | +------+--+ | 14 | +------+--+ -- This message was sent by Atlassian JIRA (v6.3.4#6332) |
Free forum by Nabble | Edit this page |