Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #1205: [CARBONDATA-1086] updated configuration-param...

Classic

List

31 messages Options

Options

12

[GitHub] carbondata pull request #1205: [CARBONDATA-1086] updated configuration-param...

GitHub user vandana7 opened a pull request:

https://github.com/apache/carbondata/pull/1205

[CARBONDATA-1086] updated configuration-parameters.md and dml-operation-on-carbondata for SORT_SCOPE

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vandana7/incubator-carbondata sort_scope_update

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/1205.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1205

----
commit 31f237e9c46247b60f84917da3c0368a5114531b
Author: vandana <[hidden email]>
Date: 2017-07-28T07:07:07Z

updated configuration-parameters.md and dml-operation-on-carbondata.md for sort_scope feature

----

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1205: [CARBONDATA-1086] updated configuration-parameters.m...

Github user asfgit commented on the issue:

https://github.com/apache/carbondata/pull/1205

Can one of the admins verify this patch?

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1205: [CARBONDATA-1086] updated configuration-parameters.m...

In reply to this post by qiuchenjian-2

Github user asfgit commented on the issue:

https://github.com/apache/carbondata/pull/1205

Can one of the admins verify this patch?

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1205: [CARBONDATA-1086] updated configuration-parameters.m...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1205

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3236/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1205: [CARBONDATA-1086] updated configuration-parameters.m...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1205

Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/641/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1205: [CARBONDATA-1086] updated configuration-parameters.m...

In reply to this post by qiuchenjian-2

Github user PallaviSingh1992 commented on the issue:

https://github.com/apache/carbondata/pull/1205

sort.inmemory.size.inmb the default value is 1024 as per the code

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1205: [CARBONDATA-1086] updated configuration-parameters.m...

In reply to this post by qiuchenjian-2

Github user vandana7 commented on the issue:

https://github.com/apache/carbondata/pull/1205

@sgururajshetty i have updated the required changes please review

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1205: [CARBONDATA-1086] updated configuration-parameters.m...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1205

Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/695/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1205: [CARBONDATA-1086] updated configuration-parameters.m...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1205

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3290/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1205: [CARBONDATA-1086] updated configuration-parameters.m...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1205

Build Failed with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/706/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1205: [CARBONDATA-1086] updated configuration-parameters.m...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1205

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3302/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1205: [CARBONDATA-1086] updated configuration-parameters.m...

In reply to this post by qiuchenjian-2

Github user sgururajshetty commented on the issue:

https://github.com/apache/carbondata/pull/1205

LGTM
@chenliang613 kindly review and merge.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1205: [CARBONDATA-1086] updated configuration-param...

In reply to this post by qiuchenjian-2

Github user zzcclp commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1205#discussion_r130534145

--- Diff: docs/dml-operation-on-carbondata.md ---
@@ -149,6 +149,50 @@ You can use the following options to load data:

* If this option is set to TRUE, then high.cardinality.identify.enable property will be disabled during data load.

+- **SORT_SCOPE:** This property can have four possible values :
+
+ * BATCH_SORT : The sorting scope is smaller and more index tree will be created,thus loading is faster but query maybe slower.
+
+ * LOCAL_SORT : The sorting scope is bigger and one index tree per data node will be created, thus loading is slower but query is faster.
+
+ * GLOBAL_SORT : The sorting scope is bigger and one index tree per task will be created, thus loading is slower but query is faster.
+
+ * NO_SORT : Feasible if we want to load our data in unsorted manner.
+
+ For BATCH_SORT:
+
+ ```
+ OPTIONS ('SORT_SCOPE'='BATCH_SORT')
+ ```
+
+ You can also specify the sort size option for sort scope.
+
+ ```
+ OPTIONS('SORT_SCOPE'='BATCH_SORT', 'batch_sort_size_inmb'='7')
+ ```
+
+ Note :
+
+ * batch_sort_size_inmb : Size of data in MB to be processed in batch. By default it is the 45 percent size of sort.inmemory.size.inmb(Memory size in MB available for in-memory sort).
+
+ For GLOBAL_SORT :
--- End diff --

Suggestion: add below note:
`'SINGLE_PASS' must be false.`

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1205: [CARBONDATA-1086] updated configuration-parameters.m...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1205

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3307/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1205: [CARBONDATA-1086] updated configuration-parameters.m...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1205

Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/711/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1205: [CARBONDATA-1086] updated configuration-param...

In reply to this post by qiuchenjian-2

Github user vandana7 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1205#discussion_r130578370

--- Diff: docs/dml-operation-on-carbondata.md ---
@@ -149,6 +149,50 @@ You can use the following options to load data:

* If this option is set to TRUE, then high.cardinality.identify.enable property will be disabled during data load.

+- **SORT_SCOPE:** This property can have four possible values :
+
+ * BATCH_SORT : The sorting scope is smaller and more index tree will be created,thus loading is faster but query maybe slower.
+
+ * LOCAL_SORT : The sorting scope is bigger and one index tree per data node will be created, thus loading is slower but query is faster.
+
+ * GLOBAL_SORT : The sorting scope is bigger and one index tree per task will be created, thus loading is slower but query is faster.
+
+ * NO_SORT : Feasible if we want to load our data in unsorted manner.
+
+ For BATCH_SORT:
+
+ ```
+ OPTIONS ('SORT_SCOPE'='BATCH_SORT')
+ ```
+
+ You can also specify the sort size option for sort scope.
+
+ ```
+ OPTIONS('SORT_SCOPE'='BATCH_SORT', 'batch_sort_size_inmb'='7')
+ ```
+
+ Note :
+
+ * batch_sort_size_inmb : Size of data in MB to be processed in batch. By default it is the 45 percent size of sort.inmemory.size.inmb(Memory size in MB available for in-memory sort).
+
+ For GLOBAL_SORT :
--- End diff --

Hi,
I tried to execute the LOAD query with single_pass= 'true' and sort_scope='BATCH_SORT', it successfully executed and i was able to fetch the records in sorted way
syntax i used to execute load query - LOAD DATA INPATH 'hdfs://localhost:54310/uniqdata/2000_UniqData.csv' into table uniqdata OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1','SINGLE_PASS'='TRUE','SORT_SCOPE'='BATCH_SORT','batch_sort_size_inmb'='7');

Please let me know if i am doing anything wrong

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1205: [CARBONDATA-1086] updated configuration-param...

In reply to this post by qiuchenjian-2

Github user zzcclp commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1205#discussion_r130581766

--- Diff: docs/dml-operation-on-carbondata.md ---
@@ -149,6 +149,50 @@ You can use the following options to load data:

* If this option is set to TRUE, then high.cardinality.identify.enable property will be disabled during data load.

+- **SORT_SCOPE:** This property can have four possible values :
+
+ * BATCH_SORT : The sorting scope is smaller and more index tree will be created,thus loading is faster but query maybe slower.
+
+ * LOCAL_SORT : The sorting scope is bigger and one index tree per data node will be created, thus loading is slower but query is faster.
+
+ * GLOBAL_SORT : The sorting scope is bigger and one index tree per task will be created, thus loading is slower but query is faster.
+
+ * NO_SORT : Feasible if we want to load our data in unsorted manner.
+
+ For BATCH_SORT:
+
+ ```
+ OPTIONS ('SORT_SCOPE'='BATCH_SORT')
+ ```
+
+ You can also specify the sort size option for sort scope.
+
+ ```
+ OPTIONS('SORT_SCOPE'='BATCH_SORT', 'batch_sort_size_inmb'='7')
+ ```
+
+ Note :
+
+ * batch_sort_size_inmb : Size of data in MB to be processed in batch. By default it is the 45 percent size of sort.inmemory.size.inmb(Memory size in MB available for in-memory sort).
+
+ For GLOBAL_SORT :
--- End diff --

I mean that if SORT_SCOPE=GLOBAL_SORT,single_pass must be false

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1205: [CARBONDATA-1086] updated configuration-param...

In reply to this post by qiuchenjian-2

Github user zzcclp commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1205#discussion_r130792549

--- Diff: docs/dml-operation-on-carbondata.md ---
@@ -149,6 +149,50 @@ You can use the following options to load data:

* If this option is set to TRUE, then high.cardinality.identify.enable property will be disabled during data load.

+- **SORT_SCOPE:** This property can have four possible values :
+
+ * BATCH_SORT : The sorting scope is smaller and more index tree will be created,thus loading is faster but query maybe slower.
+
+ * LOCAL_SORT : The sorting scope is bigger and one index tree per data node will be created, thus loading is slower but query is faster.
+
+ * GLOBAL_SORT : The sorting scope is bigger and one index tree per task will be created, thus loading is slower but query is faster.
+
+ * NO_SORT : Feasible if we want to load our data in unsorted manner.
+
+ For BATCH_SORT:
+
+ ```
+ OPTIONS ('SORT_SCOPE'='BATCH_SORT')
+ ```
+
+ You can also specify the sort size option for sort scope.
+
+ ```
+ OPTIONS('SORT_SCOPE'='BATCH_SORT', 'batch_sort_size_inmb'='7')
+ ```
+
+ Note :
+
+ * batch_sort_size_inmb : Size of data in MB to be processed in batch. By default it is the 45 percent size of sort.inmemory.size.inmb(Memory size in MB available for in-memory sort).
+
+ For GLOBAL_SORT :
--- End diff --

Sorry for my mistake, now when sort_scope='GLOBAL_SORT', single_pass can be 'true', I have raised a pr to remove this restriction of code (PR-1224).

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1205: [CARBONDATA-1086] updated configuration-parameters.m...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1205

Can one of the admins verify this patch?

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1205: [CARBONDATA-1086] updated configuration-param...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1205#discussion_r130923366

--- Diff: docs/dml-operation-on-carbondata.md ---
@@ -149,6 +149,50 @@ You can use the following options to load data:

* If this option is set to TRUE, then high.cardinality.identify.enable property will be disabled during data load.

+- **SORT_SCOPE:** This property can have four possible values :
+
+ * BATCH_SORT : The sorting scope is smaller and more index tree will be created,thus loading is faster but query maybe slower.
+
+ * LOCAL_SORT : The sorting scope is bigger and one index tree per data node will be created, thus loading is slower but query is faster.
+
+ * GLOBAL_SORT : The sorting scope is bigger and one index tree per task will be created, thus loading is slower but query is faster.
+
+ * NO_SORT : Feasible if we want to load our data in unsorted manner.
--- End diff --

Introduce this first

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

12