GitHub user sgururajshetty opened a pull request:
https://github.com/apache/carbondata/pull/1903 [CARBONDATA-1880] Documentation for merging small files You can merge this pull request into a Git repository by running: $ git pull https://github.com/sgururajshetty/carbondata 1880_2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1903.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1903 ---- commit 1a0820350d57c50723070e172ec5d89edb90591b Author: sgururajshetty <sgururajshetty@...> Date: 2018-01-31T13:55:16Z Documentation for small files commit 253b5176338e58902e057300eb812e704fa51113 Author: sgururajshetty <sgururajshetty@...> Date: 2018-02-01T11:58:06Z Fixed the review comment from QiangCai ---- --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1903 Can one of the admins verify this patch? --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1903 Can one of the admins verify this patch? --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1903 Can one of the admins verify this patch? --- |
In reply to this post by qiuchenjian-2
Github user QiangCai commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1903#discussion_r165542386 --- Diff: docs/configuration-parameters.md --- @@ -61,6 +61,7 @@ This section provides the details of all the configurations required for CarbonD | carbon.options.bad.record.path | | Specifies the HDFS path where bad records are stored. By default the value is Null. This path must to be configured by the user if bad record logger is enabled or bad record action redirect. | | | carbon.enable.vector.reader | true | This parameter increases the performance of select queries as it fetch columnar batch of size 4*1024 rows instead of fetching data row by row. | | | carbon.blockletgroup.size.in.mb | 64 MB | The data are read as a group of blocklets which are called blocklet groups. This parameter specifies the size of the blocklet group. Higher value results in better sequential IO access.The minimum value is 16MB, any value lesser than 16MB will reset to the default value (64MB). | | +| carbon.task.distribution | block | **block**: Setting this value will launch one task per block. This setting is suggested in case of concurrent queries and queries having big shuffling scenarios. **custom**: Setting this value will group the blocks and distribute it uniformly to the available resources in the cluster. This enhances the query performance but not suggested in case of concurrent queries and queries having big shuffling scenarios. **blocklet**: Setting this value will launch one task per blocklet. This setting is suggested in case of concurrent queries and queries having big shuffling scenarios. **merge_small_files**: Setting this value will merge all the small partitions to a size of (128 MB) during querying. The small partitions are combined to a map task to reduce the number of read task. This enhances the performance. | | --- End diff -- 1. carbon.custom.block.distribution in this file is unused, please remove it. 2. 128 MB is the default value of spark.sql.files.maxPartitionBytes. The user can config this spark configuration. --- |
In reply to this post by qiuchenjian-2
|
Free forum by Nabble | Edit this page |