Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #1144: [WIP] String datatype will be no dictionary c...

Classic

List

15 messages Options

Options

[GitHub] carbondata pull request #1144: [WIP] String datatype will be no dictionary c...

GitHub user QiangCai opened a pull request:

https://github.com/apache/carbondata/pull/1144

[WIP] String datatype will be no dictionary column by default

1. table creation
String datatype will be no dictionary column by default.
Property dictionary_exclude will be deprecated

2 remove HIGH_CARDINALITY_IDENTIFY_ENABLE

3.remove columngroup testcase

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/QiangCai/carbondata nodictionarybydefault

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/1144.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1144

----
commit 3bbae82e2fb52330e8efd855cda1ddd74cdcdbeb
Author: QiangCai <[hidden email]>
Date: 2017-07-06T02:41:55Z

no dictionary by default

----

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1144: [WIP] String datatype will be no dictionary column b...

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1144

Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/348/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1144: [WIP] String datatype will be no dictionary column b...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1144

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/2934/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1144: [WIP] String datatype will be no dictionary c...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1144#discussion_r126050294

--- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonGlobalDictionaryRDD.scala ---
@@ -375,90 +372,72 @@ class CarbonGlobalDictionaryGenerateRDD(
val distinctValueList = rddIter.next()._2
valuesBuffer ++= distinctValueList.values
rowCount += distinctValueList.rowCount
- // check high cardinality
- if (model.isFirstLoad && model.highCardIdentifyEnable
- && !model.isComplexes(split.index)
- && model.primDimensions(split.index).isColumnar) {
- isHighCardinalityColumn = GlobalDictionaryUtil.isHighCardinalityColumn(
- valuesBuffer.size, model)
- if (isHighCardinalityColumn) {
- break
- }
- }
}
}
val combineListTime = System.currentTimeMillis() - t1
- if (isHighCardinalityColumn) {
- LOGGER.info(s"column ${ model.table.getTableUniqueName }." +
- s"${
- model.primDimensions(split.index)
- .getColName
- } is high cardinality column")
+ isDictionaryLocked = dictLock.lockWithRetries()
+ if (isDictionaryLocked) {
+ logInfo(s"Successfully able to get the dictionary lock for ${
+ model.primDimensions(split.index).getColName
+ }")
} else {
- isDictionaryLocked = dictLock.lockWithRetries()
- if (isDictionaryLocked) {
- logInfo(s"Successfully able to get the dictionary lock for ${
+ sys
+ .error(s"Dictionary file ${
--- End diff --

should use log only

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1144: [WIP] String datatype will be no dictionary c...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1144#discussion_r126050365

--- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonGlobalDictionaryRDD.scala ---
@@ -375,90 +372,72 @@ class CarbonGlobalDictionaryGenerateRDD(
val distinctValueList = rddIter.next()._2
valuesBuffer ++= distinctValueList.values
rowCount += distinctValueList.rowCount
- // check high cardinality
- if (model.isFirstLoad && model.highCardIdentifyEnable
- && !model.isComplexes(split.index)
- && model.primDimensions(split.index).isColumnar) {
- isHighCardinalityColumn = GlobalDictionaryUtil.isHighCardinalityColumn(
- valuesBuffer.size, model)
- if (isHighCardinalityColumn) {
- break
- }
- }
}
}
val combineListTime = System.currentTimeMillis() - t1
- if (isHighCardinalityColumn) {
- LOGGER.info(s"column ${ model.table.getTableUniqueName }." +
- s"${
- model.primDimensions(split.index)
- .getColName
- } is high cardinality column")
+ isDictionaryLocked = dictLock.lockWithRetries()
+ if (isDictionaryLocked) {
+ logInfo(s"Successfully able to get the dictionary lock for ${
+ model.primDimensions(split.index).getColName
+ }")
} else {
- isDictionaryLocked = dictLock.lockWithRetries()
- if (isDictionaryLocked) {
- logInfo(s"Successfully able to get the dictionary lock for ${
+ sys
+ .error(s"Dictionary file ${
model.primDimensions(split.index).getColName
- }")
- } else {
- sys
- .error(s"Dictionary file ${
- model.primDimensions(split.index).getColName
- } is locked for updation. Please try after some time")
- }
- val t2 = System.currentTimeMillis
- val fileType = FileFactory.getFileType(model.dictFilePaths(split.index))
- model.dictFileExists(split.index) = FileFactory
- .isFileExist(model.dictFilePaths(split.index), fileType)
- dictionaryForDistinctValueLookUp = if (model.dictFileExists(split.index)) {
- CarbonLoaderUtil.getDictionary(model.table,
- model.columnIdentifier(split.index),
- model.hdfsLocation,
- model.primDimensions(split.index).getDataType
- )
- } else {
- null
- }
- val dictCacheTime = System.currentTimeMillis - t2
- val t3 = System.currentTimeMillis()
- val dictWriteTask = new DictionaryWriterTask(valuesBuffer,
- dictionaryForDistinctValueLookUp,
- model.table,
+ } is locked for updation. Please try after some time")
+ }
+ val t2 = System.currentTimeMillis
+ val fileType = FileFactory.getFileType(model.dictFilePaths(split.index))
+ model.dictFileExists(split.index) = FileFactory
--- End diff --

I think you can put this check in below `if` clause

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1144: [WIP] String datatype will be no dictionary c...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1144#discussion_r126050555

--- Diff: integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala ---
@@ -572,6 +572,7 @@ abstract class CarbonDDLSqlParser extends AbstractCarbonSparkSQLParser {

// All excluded cols should be there in create table cols
if (tableProperties.get(CarbonCommonConstants.DICTIONARY_EXCLUDE).isDefined) {
+ LOGGER.warn("dictionary_exclude option was deprecated")
--- End diff --

add one more message: "by default string column does not use global dictionary"

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1144: [WIP] String datatype will be no dictionary c...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1144#discussion_r126050694

--- Diff: integration/spark2/src/test/scala/org/apache/carbondata/spark/util/ExternalColumnDictionaryTestCase.scala ---
@@ -78,7 +78,7 @@ class ExternalColumnDictionaryTestCase extends QueryTest with BeforeAndAfterAll
proddate struct<productionDate:string,activeDeactivedate:array<string>>,
gamePointId double,contractNumber double)
STORED BY 'org.apache.carbondata.format'
- TBLPROPERTIES('DICTIONARY_INCLUDE' = 'deviceInformationId')
+ TBLPROPERTIES('DICTIONARY_INCLUDE' = 'deviceInformationId,channelsId')
--- End diff --

add space after `,`

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1144: [WIP] String datatype will be no dictionary c...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1144#discussion_r126050712

--- Diff: integration/spark2/src/test/scala/org/apache/carbondata/spark/util/ExternalColumnDictionaryTestCase.scala ---
@@ -89,7 +89,7 @@ class ExternalColumnDictionaryTestCase extends QueryTest with BeforeAndAfterAll
"""CREATE TABLE verticalDelimitedTable (deviceInformationId int,
channelsId string,contractNumber double)
STORED BY 'org.apache.carbondata.format'
- TBLPROPERTIES('DICTIONARY_INCLUDE' = 'deviceInformationId')
+ TBLPROPERTIES('DICTIONARY_INCLUDE' = 'deviceInformationId,channelsId')
--- End diff --

add space after `,`

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1144: [WIP] String datatype will be no dictionary column b...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1144

Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/354/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1144: [WIP] String datatype will be no dictionary column b...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1144

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/2941/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1144: [WIP] String datatype will be no dictionary column b...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1144

Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/355/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1144: [WIP] String datatype will be no dictionary column b...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1144

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/2942/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1144: [CARBONDATA-1273] String datatype will be no diction...

In reply to this post by qiuchenjian-2

Github user QiangCai commented on the issue:

https://github.com/apache/carbondata/pull/1144

fixed all comments

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1144: [CARBONDATA-1273] String datatype will be no diction...

In reply to this post by qiuchenjian-2

Github user jackylk commented on the issue:

https://github.com/apache/carbondata/pull/1144

LGTM

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1144: [CARBONDATA-1273] String datatype will be no ...

In reply to this post by qiuchenjian-2

Github user QiangCai closed the pull request at:

https://github.com/apache/carbondata/pull/1144

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---