Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] [carbondata] MarvinLitt opened a new pull request #3518: [DOC] add performance-tuning with codegen parameters support

Classic

List

Threaded

37 messages Options

GitBox

[GitHub] [carbondata] MarvinLitt opened a new pull request #3518: [DOC] add performance-tuning with codegen parameters support

MarvinLitt opened a new pull request #3518: [DOC] add performance-tuning with codegen parameters support
URL: https://github.com/apache/carbondata/pull/3518

Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:

- [ ] Any interfaces changed?

- [ ] Any backward compatibility impacted?

- [ ] Document update required?

- [ ] Testing done
Please provide details on
- Whether new unit test cases have been added or why no new tests are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance test report.
- Any additional information to help reviewers in testing this change.

- [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

With regards,
Apache Git Services

GitBox

[GitHub] [carbondata] jackylk commented on a change in pull request #3518: [DOC] add performance-tuning with codegen parameters support

jackylk commented on a change in pull request #3518: [DOC] add performance-tuning with codegen parameters support
URL: https://github.com/apache/carbondata/pull/3518#discussion_r361096657

##########
File path: docs/performance-tuning.md
##########
@@ -173,6 +173,8 @@
| carbon.sort.temp.compressor | spark/carbonlib/carbon.properties | Data loading | Specify the name of compressor to compress the intermediate sort temporary files during sort procedure in data loading. | The optional values are 'SNAPPY','GZIP','BZIP2','LZ4','ZSTD', and empty. Specially, empty means that Carbondata will not compress the sort temp files. This parameter will be useful if you encounter disk bottleneck. |
| carbon.load.skewedDataOptimization.enabled | spark/carbonlib/carbon.properties | Data loading | Whether to enable size based block allocation strategy for data loading. | When loading, carbondata will use file size based block allocation strategy for task distribution. It will make sure that all the executors process the same size of data -- It's useful if the size of your input data files varies widely, say 1MB to 1GB. |
| carbon.load.min.size.enabled | spark/carbonlib/carbon.properties | Data loading | Whether to enable node minumun input data size allocation strategy for data loading.| When loading, carbondata will use node minumun input data size allocation strategy for task distribution. It will make sure the nodes load the minimum amount of data -- It's useful if the size of your input data files very small, say 1MB to 256MB,Avoid generating a large number of small files. |
+| spark.sql.codegen.wholeStage | spark/conf/spark-defaults.conf | Querying | improves the execution performance of a query by collapsing a query tree into a single optimized function that eliminates virtual function calls and leverages CPU registers for intermediate data. | The whole stage CodeGen mechanism introduced by spark SQL in version 2. X causes. This configuration is recommended to be off at spark 2.1 and on at spark 2.3. Because under spark2.1 user can only use spark.sql.codegen.wholeStage to control whether to use codegen, but can not config the size of the method. In fact, this parameter should be configured to be the same as the local JDK. Under spark2.3 support spark.sql.codegen.hugeMethodLimit use can use that to config the method size. |

Review comment:
This is spark configuration, suggest not to add in carbon's document. Or maybe you can add a link in the bottom of this section to point to the performance tuning page of spark community

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

With regards,
Apache Git Services

GitBox