Load data into carbondata executors distributed unevenly
Posted by
a on
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Load-data-into-carbondata-executors-distributed-unevenly-tp9831.html
Hello!
Test result:
When I load csv data into carbondata table 3 times,the executors distributed unevenly。My
purpose is one node one task,but the result is some node has 2 task and some node has no task。
See the load data 1.png,data 2.png,data 3.png。
The carbondata data.PNG is the data structure in hadoop.
I load 4 0000 0000 records into carbondata table takes 2629s seconds,its too long。
Question:
How can i make the executors distributed evenly ?
The environment:
spark2.1+carbondata1.1,there are 7 datanodes.
./bin/spark-shell \
--master yarn \
--deploy-mode client \
--num-executors n \ (the first time is 7(result in load data 1.png),the second time is 6(result in load data 2.png),the three time is 8(result in load data3.png))
--executor-cores 10 \
--executor-memory 40G \
--driver-memory 8G \
carbon.properties
######## DataLoading Configuration ########
carbon.sort.file.buffer.size=20
carbon.graph.rowset.size=10000
carbon.number.of.cores.while.loading=10
carbon.sort.size=50000
carbon.number.of.cores.while.compacting=10
carbon.number.of.cores=10
Best regards!