Re: Query About Carbon Write Process : why always 10 Task get created when we write dataframe or rdd in carbon format in a write job or save job
Posted by
Jacky Li on
May 26, 2019; 4:17am
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Query-About-Carbon-Write-Process-why-always-10-Task-get-created-when-we-write-dataframe-or-rdd-in-cab-tp79200p79444.html
Hi Anshul Jain,
If you have specified the SORT_COLUMNS table property when creating table,
by default carbon will sort the input data during data loading (to build
index). The sorting is controlled by a table property called SORT_SCOPE, by
default it is LOCAL_SORT, it means it will sort the data locally within the
spark executor, without shuffling across executors. And there are other
options too, see
http://carbondata.apache.org/ddl-of-carbondata.htmlIn your case, I guess it is using LOCAL_SORT. This sorting is using
multi-thread inside the executor, controlled by a CarbonProperty call
"NUM_THREAD_WHILE_LOADING".
If you want the spark default behavior like loading parquet, you can set the
SORT_SCOPE to NO_SORT.
Regards,
Jacky
--
Sent from:
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/