Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #1903: [CARBONDATA-1880] Documentation for merging s...

Classic

List

Threaded

6 messages Options

qiuchenjian-2

[GitHub] carbondata pull request #1903: [CARBONDATA-1880] Documentation for merging s...

GitHub user sgururajshetty opened a pull request:

https://github.com/apache/carbondata/pull/1903

[CARBONDATA-1880] Documentation for merging small files

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sgururajshetty/carbondata 1880_2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/1903.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1903

----
commit 1a0820350d57c50723070e172ec5d89edb90591b
Author: sgururajshetty <sgururajshetty@...>
Date: 2018-01-31T13:55:16Z

Documentation for small files

commit 253b5176338e58902e057300eb812e704fa51113
Author: sgururajshetty <sgururajshetty@...>
Date: 2018-02-01T11:58:06Z

Fixed the review comment from QiangCai

----

---

qiuchenjian-2

[GitHub] carbondata issue #1903: [CARBONDATA-1880] Documentation for merging small fi...

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1903

Can one of the admins verify this patch?

---

qiuchenjian-2

[GitHub] carbondata issue #1903: [CARBONDATA-1880] Documentation for merging small fi...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1903

Can one of the admins verify this patch?

---

qiuchenjian-2

[GitHub] carbondata issue #1903: [CARBONDATA-1880] Documentation for merging small fi...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1903

Can one of the admins verify this patch?

---

qiuchenjian-2

[GitHub] carbondata pull request #1903: [CARBONDATA-1880] Documentation for merging s...

In reply to this post by qiuchenjian-2

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1903#discussion_r165542386

--- Diff: docs/configuration-parameters.md ---
@@ -61,6 +61,7 @@ This section provides the details of all the configurations required for CarbonD
| carbon.options.bad.record.path | | Specifies the HDFS path where bad records are stored. By default the value is Null. This path must to be configured by the user if bad record logger is enabled or bad record action redirect. | |
| carbon.enable.vector.reader | true | This parameter increases the performance of select queries as it fetch columnar batch of size 4*1024 rows instead of fetching data row by row. | |
| carbon.blockletgroup.size.in.mb | 64 MB | The data are read as a group of blocklets which are called blocklet groups. This parameter specifies the size of the blocklet group. Higher value results in better sequential IO access.The minimum value is 16MB, any value lesser than 16MB will reset to the default value (64MB). | |
+| carbon.task.distribution | block | **block**: Setting this value will launch one task per block. This setting is suggested in case of concurrent queries and queries having big shuffling scenarios. **custom**: Setting this value will group the blocks and distribute it uniformly to the available resources in the cluster. This enhances the query performance but not suggested in case of concurrent queries and queries having big shuffling scenarios. **blocklet**: Setting this value will launch one task per blocklet. This setting is suggested in case of concurrent queries and queries having big shuffling scenarios. **merge_small_files**: Setting this value will merge all the small partitions to a size of (128 MB) during querying. The small partitions are combined to a map task to reduce the number of read task. This enhances the performance. | |
--- End diff --

1. carbon.custom.block.distribution in this file is unused, please remove it.
2. 128 MB is the default value of spark.sql.files.maxPartitionBytes. The user can config this spark configuration.

---

qiuchenjian-2

[GitHub] carbondata pull request #1903: [CARBONDATA-1880] Documentation for merging s...

In reply to this post by qiuchenjian-2

Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/1903

---