[ https://issues.apache.org/jira/browse/CARBONDATA-4187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17343932#comment-17343932 ] Sushant Sammanwar commented on CARBONDATA-4187: ----------------------------------------------- Below is my carbon.properties file : carbon.lock.retries=15 spark.hadoop.javax.jdo.option.ConnectionDriverName=org.postgresql.Driver upload.threads=256 spark.deploy.zookeeper.url=zookeeper:2181 carbon.lock.retry.timeout.sec=1 spark.hadoop.javax.jdo.option.ConnectionURL=jdbc:postgresql://postgres:5432/postgres query.max.parallel=32 data.location=/opt/basecamp/timeseries/diamond/warehouse spark.files.maxPartitionBytes=16777216 http.port=30014 import.max.parallel=8 carbon.unsafe.working.memory.in.mb=4958 http.max-request-size=1000000 carbon.enable.auto.load.merge=false schedule.threads=256 spark.master.cores.ratio=1 telnet.port=31008 sort.inmemory.size.inmb=2125 cluster.master.host=timeseries-0.timeseries max.rest.call=256 database.read.url=diamonddb://diamond-db-read:30110 carbon.lock.path=LogPath spark.hadoop.javax.jdo.option.ConnectionPassword=postgres carbon.lock.type=ZOOKEEPERLOCK mv=false spark.sql.autoBroadcastJoinThreshold=1024288000 carbon.compaction.level.threshold=10,6 spark.deploy.zookeeper.url=zookeeper:2181 cluster.master.port=30014 carbon.segment.lock.files.preserve.hours=1 database.url=diamonddb-direct://localhost spark.hadoop.javax.jdo.option.ConnectionUserName=postgres carbon.push.rowfilters.for.vector=true Below are spark default config : spark.hadoop.javax.jdo.option.ConnectionDriverName=org.postgresql.Driver spark.deploy.zookeeper.url=zookeeper:2181 spark.hadoop.javax.jdo.option.ConnectionURL=jdbc:postgresql://postgres:5432/postgres spark.files.maxPartitionBytes=16777216 spark.master.cores.ratio=1 spark.hadoop.javax.jdo.option.ConnectionPassword=postgres spark.sql.autoBroadcastJoinThreshold=1024288000 spark.hadoop.javax.jdo.option.ConnectionUserName=postgres > Performance Issue with Materialized views - increased loading time due to full refresh > -------------------------------------------------------------------------------------- > > Key: CARBONDATA-4187 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4187 > Project: CarbonData > Issue Type: Bug > Components: core > Affects Versions: 2.1.0 > Reporter: Sushant Sammanwar > Priority: Major > Labels: materializedviews, performance > > Hi Team , > We have been doing a POC by using Carbon 2.1.0 and created a wrapper code around carbon and deployed it as docker container. > Concurrent data loading is happening in many tables. > Our objective if get optimal performance for aggregated queries and using materialized views . > Our observation is after creating MVs data loading is slow and not able to keep-up the pace of incoming data . > Process is also consuming a lot of memory when MVs are created . > Data is received in continuous manner and MVs are refreshed which is resulting in increased load time. > Ideally MVs should only perform incremental refresh as it doesnot require to calculate old data again. > But it seems the full refresh is causing high memory usages and increased loading time. > Testing involved loading data without MVs for 6 hrs , then creating MVs and load data again for 4 hours. > Loading time with MVs increased there creating backlog of data ( loaded only 1/5 th no. of rows than expected). > Below are major bottlenecks observed : > 1. High Memory consumption after creating MVs > 2. MVs doing a full refresh > Please find attached details of testing with list of tables. > Below is definition of table : > create table if not exists fact_365_1_eutrancell_1 (ts timestamp, metric STRING, tags_id STRING, value DOUBLE, epoch bigint) partitioned by (ts2 timestamp) STORED AS carbondata TBLPROPERTIES ('SORT_COLUMNS'='metric') > Below is definition of MV : > create materialized view if not exists fact_365_1_eutrancell_1_hour as select tags_id ,metric,timeseries(ts,'hour') as ts,sum(value),avg(value),min(value),max(value) from fact_365_1_eutrancell_1 group by metric, tags_id, timeseries(ts,'hour') > Can you suggest why MV creation is slowing down the ingestion so much and what can be done to improve ? > Is there any way to have incremental refresh of MV - refresh only that hour for which we are loading the data ? -- This message was sent by Atlassian Jira (v8.3.4#803005) |
Free forum by Nabble | Edit this page |