Apache CarbonData Dev Mailing List archive

Load data into carbondata executors distributed unevenly

Posted by a on
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Load-data-into-carbondata-executors-distributed-unevenly-tp9831.html

Hello!

Test result：

When I load csv data into carbondata table 3 times，the executors distributed unevenly。My purpose is one node one task，but the result is some node has 2 task and some node has no task。

See the load data 1.png,data 2.png,data 3.png。

The carbondata data.PNG is the data structure in hadoop.

I load 4 0000 0000 records into carbondata table takes 2629s seconds，its too long。

Question：

How can i make the executors distributed evenly ?

The environment：

spark2.1+carbondata1.1，there are 7 datanodes.

./bin/spark-shell \
--master yarn \
--deploy-mode client \
--num-executors n \ （the first time is 7(result in load data 1.png)，the second time is 6(result in load data 2.png),the three time is 8(result in load data3.png)）
--executor-cores 10 \
--executor-memory 40G \
--driver-memory 8G \

carbon.properties

######## DataLoading Configuration ########

carbon.sort.file.buffer.size=20

carbon.graph.rowset.size=10000

carbon.number.of.cores.while.loading=10

carbon.sort.size=50000

carbon.number.of.cores.while.compacting=10

carbon.number.of.cores=10

Best regards!