[ https://issues.apache.org/jira/browse/CARBONDATA-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Rohilla updated CARBONDATA-1029: -------------------------------------- Description: Load data without single pass takes less time as compare to Single-pass load. Note :CSV Size is 4.00 GB. Result: A) Data Load without Single Pass: 0: jdbc:hive2://localhost:10000> LOAD DATA INPATH 'hdfs://hadoop-master:54310/data/uniqdata_bench14.csv' into table uniqdata OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1'); +---------+--+ | Result | +---------+--+ +---------+--+ No rows selected (114.641 seconds) B) Load Data with Single Pass: 0: jdbc:hive2://localhost:10000> LOAD DATA INPATH 'hdfs://hadoop-master:54310/data/uniqdata_bench14.csv' into table uniqdata OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1','SINGLE_Pass'='true'); +---------+--+ | Result | +---------+--+ +---------+--+ No rows selected (123.858 seconds) Expected Result: If user load data with Single-pass then it should take less time as compare to without single pass load. was: Load data without single pass takes less time as compare to Single-pass load. Note :CSV Size is 10.21 GB. Result: A) Data Load without Single Pass: 0: jdbc:hive2://localhost:10000> LOAD DATA INPATH 'hdfs://hadoop-master:54310/data/uniqdata_bench14.csv' into table uniqdata OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1'); +---------+--+ | Result | +---------+--+ +---------+--+ No rows selected (114.641 seconds) B) Load Data with Single Pass: 0: jdbc:hive2://localhost:10000> LOAD DATA INPATH 'hdfs://hadoop-master:54310/data/uniqdata_bench14.csv' into table uniqdata OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1','SINGLE_Pass'='true'); +---------+--+ | Result | +---------+--+ +---------+--+ No rows selected (123.858 seconds) Expected Result: If user load data with Single-pass then it should take less time as compare to without single pass load. > Load data time difference with Single-pass load. > ------------------------------------------------- > > Key: CARBONDATA-1029 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1029 > Project: CarbonData > Issue Type: Bug > Components: data-load > Affects Versions: 1.1.0 > Environment: Spark 2.1, AWS Cluster > Reporter: Vinod Rohilla > Priority: Minor > > Load data without single pass takes less time as compare to Single-pass load. > Note :CSV Size is 4.00 GB. > Result: > A) Data Load without Single Pass: > 0: jdbc:hive2://localhost:10000> LOAD DATA INPATH 'hdfs://hadoop-master:54310/data/uniqdata_bench14.csv' into table uniqdata OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1'); > +---------+--+ > | Result | > +---------+--+ > +---------+--+ > No rows selected (114.641 seconds) > B) Load Data with Single Pass: > 0: jdbc:hive2://localhost:10000> LOAD DATA INPATH 'hdfs://hadoop-master:54310/data/uniqdata_bench14.csv' into table uniqdata OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1','SINGLE_Pass'='true'); > +---------+--+ > | Result | > +---------+--+ > +---------+--+ > No rows selected (123.858 seconds) > Expected Result: If user load data with Single-pass then it should take less time as compare to without single pass load. -- This message was sent by Atlassian JIRA (v6.3.15#6346) |
Free forum by Nabble | Edit this page |