[GitHub] carbondata pull request #1831: [[CARBONDATA-1993] Carbon properties default ...

classic Classic list List threaded Threaded
52 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1831: [CARBONDATA-1993] Carbon properties default values f...

qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1831
 
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2120/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1831: [CARBONDATA-1993] Carbon properties default values f...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1831
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3275/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1831: [CARBONDATA-1993] Carbon properties default values f...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1831
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2152/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1831: [CARBONDATA-1993] Carbon properties default values f...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1831
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3389/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1831: [CARBONDATA-1993] Carbon properties default v...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1831#discussion_r165543880
 
    --- Diff: conf/carbon.properties.template ---
    @@ -17,29 +17,25 @@
     #
     
     #################### System Configuration ##################
    -#Mandatory. Carbon Store path
    -carbon.storelocation=hdfs://hacluster/Opt/CarbonStore
    +#Optional. Carbon Store path
    --- End diff --
   
    Mention that if it is not specified it takes spark warehouse path


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1831: [CARBONDATA-1993] Carbon properties default v...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1831#discussion_r165544265
 
    --- Diff: conf/carbon.properties.template ---
    @@ -76,22 +72,16 @@ carbon.enable.quick.filter=false
     #carbon.block.meta.size.reserved.percentage=10
     ##csv reading buffer size.
     #carbon.csv.read.buffersize.byte=1048576
    -##To identify and apply compression for non-high cardinality columns
    -#high.cardinality.value=100000
     ##maximum no of threads used for reading intermediate files for final merging.
     #carbon.merge.sort.reader.thread=3
     ##Carbon blocklet size. Note: this configuration cannot be change once store is generated
     #carbon.blocklet.size=120000
    -##number of retries to get the metadata lock for loading data to table
    -#carbon.load.metadata.lock.retries=3
     ##Minimum blocklets needed for distribution.
     #carbon.blockletdistribution.min.blocklet.size=10
     ##Interval between the retries to get the lock
     #carbon.load.metadata.lock.retry.timeout.sec=5
     ##Temporary store location, By default it will take System.getProperty("java.io.tmpdir")
    -#carbon.tempstore.location=/opt/Carbon/TempStoreLoc
    -##data loading records count logger
    -#carbon.load.log.counter=500000
    +#carbon.tempstore.location
    --- End diff --
   
    Are we really using this? I think we always depends on eith java tmp dir or get tmp directoris from spark/yarn. Please reverify and remove if not used


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1831: [CARBONDATA-1993] Carbon properties default v...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1831#discussion_r165544424
 
    --- Diff: conf/carbon.properties.template ---
    @@ -110,7 +100,7 @@ carbon.enable.quick.filter=false
     ##Percentage to identify whether column cardinality is more than configured percent of total row count
     #high.cardinality.row.count.percentage=80
    --- End diff --
   
    This is also not used I guess, please check and remove


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1831: [CARBONDATA-1993] Carbon properties default v...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1831#discussion_r165544641
 
    --- Diff: conf/carbon.properties.template ---
    @@ -76,22 +72,16 @@ carbon.enable.quick.filter=false
     #carbon.block.meta.size.reserved.percentage=10
     ##csv reading buffer size.
     #carbon.csv.read.buffersize.byte=1048576
    -##To identify and apply compression for non-high cardinality columns
    -#high.cardinality.value=100000
     ##maximum no of threads used for reading intermediate files for final merging.
     #carbon.merge.sort.reader.thread=3
     ##Carbon blocklet size. Note: this configuration cannot be change once store is generated
     #carbon.blocklet.size=120000
    -##number of retries to get the metadata lock for loading data to table
    -#carbon.load.metadata.lock.retries=3
     ##Minimum blocklets needed for distribution.
     #carbon.blockletdistribution.min.blocklet.size=10
     ##Interval between the retries to get the lock
     #carbon.load.metadata.lock.retry.timeout.sec=5
     ##Temporary store location, By default it will take System.getProperty("java.io.tmpdir")
    -#carbon.tempstore.location=/opt/Carbon/TempStoreLoc
    -##data loading records count logger
    -#carbon.load.log.counter=500000
    +#carbon.tempstore.location
     ##To dissable/enable carbon block distribution
     #carbon.custom.block.distribution=false
    --- End diff --
   
    This property is now changed to `carbon.task.distribution` and its default value is `block`


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1831: [CARBONDATA-1993] Carbon properties default v...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1831#discussion_r165545325
 
    --- Diff: docs/configuration-parameters.md ---
    @@ -32,10 +32,10 @@ This section provides the details of all the configurations required for the Car
     
     | Property | Default Value | Description |
     |----------------------------|-------------------------------------||
    -| carbon.storelocation | /user/hive/warehouse/carbon.store | Location where CarbonData will create the store, and write the data in its own format. NOTE: Store location should be in HDFS. |
    -| carbon.ddl.base.hdfs.url | hdfs://hacluster/opt/data | This property is used to configure the HDFS relative path, the path configured in carbon.ddl.base.hdfs.url will be appended to the HDFS path configured in fs.defaultFS. If this path is configured, then user need not pass the complete path while dataload. For example: If absolute path of the csv file is hdfs://10.18.101.155:54310/data/cnbc/2016/xyz.csv, the path "hdfs://10.18.101.155:54310" will come from property fs.defaultFS and user can configure the /data/cnbc/ as carbon.ddl.base.hdfs.url. Now while dataload user can specify the csv path as /2016/xyz.csv. |
    -| carbon.badRecords.location | /opt/Carbon/Spark/badrecords | Path where the bad records are stored. |
    -| carbon.data.file.version | 3 | If this parameter value is set to 1, then CarbonData will support the data load which is in old format(0.x version). If the value is set to 2(1.x onwards version), then CarbonData will support the data load of new format only. The default value for this parameter is 3(latest version is set as default version). It improves the query performance by ~20% to 50%. For configuring V3 format explicitly, add carbon.data.file.version = V3 in carbon.properties file. |
    +| carbon.storelocation |  | Location where CarbonData will create the store, and write the data in its own format. NOTE: Store location should be in HDFS. |
    --- End diff --
   
    Here also mention that if it is not specified it takes spark warehouse path


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1831: [CARBONDATA-1993] Carbon properties default v...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user mohammadshahidkhan commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1831#discussion_r165741342
 
    --- Diff: conf/carbon.properties.template ---
    @@ -76,22 +72,16 @@ carbon.enable.quick.filter=false
     #carbon.block.meta.size.reserved.percentage=10
     ##csv reading buffer size.
     #carbon.csv.read.buffersize.byte=1048576
    -##To identify and apply compression for non-high cardinality columns
    -#high.cardinality.value=100000
     ##maximum no of threads used for reading intermediate files for final merging.
     #carbon.merge.sort.reader.thread=3
     ##Carbon blocklet size. Note: this configuration cannot be change once store is generated
     #carbon.blocklet.size=120000
    -##number of retries to get the metadata lock for loading data to table
    -#carbon.load.metadata.lock.retries=3
     ##Minimum blocklets needed for distribution.
     #carbon.blockletdistribution.min.blocklet.size=10
     ##Interval between the retries to get the lock
     #carbon.load.metadata.lock.retry.timeout.sec=5
     ##Temporary store location, By default it will take System.getProperty("java.io.tmpdir")
    -#carbon.tempstore.location=/opt/Carbon/TempStoreLoc
    -##data loading records count logger
    -#carbon.load.log.counter=500000
    +#carbon.tempstore.location
     ##To dissable/enable carbon block distribution
     #carbon.custom.block.distribution=false
    --- End diff --
   
    The property still in use
     val useCustomDistribution =
              CarbonProperties.getInstance().getProperty(
                CarbonCommonConstants.CARBON_CUSTOM_BLOCK_DISTRIBUTION,
                "false").toBoolean ||
              carbonDistribution.equalsIgnoreCase(CarbonCommonConstants.CARBON_TASK_DISTRIBUTION_CUSTOM)
            if (useCustomDistribution)


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1831: [CARBONDATA-1993] Carbon properties default v...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user mohammadshahidkhan commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1831#discussion_r165742856
 
    --- Diff: conf/carbon.properties.template ---
    @@ -110,7 +100,7 @@ carbon.enable.quick.filter=false
     ##Percentage to identify whether column cardinality is more than configured percent of total row count
     #high.cardinality.row.count.percentage=80
    --- End diff --
   
    not used  removed


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1831: [CARBONDATA-1993] Carbon properties default v...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user mohammadshahidkhan commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1831#discussion_r165743651
 
    --- Diff: conf/carbon.properties.template ---
    @@ -76,22 +72,16 @@ carbon.enable.quick.filter=false
     #carbon.block.meta.size.reserved.percentage=10
     ##csv reading buffer size.
     #carbon.csv.read.buffersize.byte=1048576
    -##To identify and apply compression for non-high cardinality columns
    -#high.cardinality.value=100000
     ##maximum no of threads used for reading intermediate files for final merging.
     #carbon.merge.sort.reader.thread=3
     ##Carbon blocklet size. Note: this configuration cannot be change once store is generated
     #carbon.blocklet.size=120000
    -##number of retries to get the metadata lock for loading data to table
    -#carbon.load.metadata.lock.retries=3
     ##Minimum blocklets needed for distribution.
     #carbon.blockletdistribution.min.blocklet.size=10
     ##Interval between the retries to get the lock
     #carbon.load.metadata.lock.retry.timeout.sec=5
     ##Temporary store location, By default it will take System.getProperty("java.io.tmpdir")
    -#carbon.tempstore.location=/opt/Carbon/TempStoreLoc
    -##data loading records count logger
    -#carbon.load.log.counter=500000
    +#carbon.tempstore.location
    --- End diff --
   
    We have used this in CarbonAlterTableCompactionCommand, but i think there also we can use java tmp dir. so removed the property and usage also.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1831: [CARBONDATA-1993] Carbon properties default v...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user mohammadshahidkhan commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1831#discussion_r165743745
 
    --- Diff: docs/configuration-parameters.md ---
    @@ -32,10 +32,10 @@ This section provides the details of all the configurations required for the Car
     
     | Property | Default Value | Description |
     |----------------------------|-------------------------------------||
    -| carbon.storelocation | /user/hive/warehouse/carbon.store | Location where CarbonData will create the store, and write the data in its own format. NOTE: Store location should be in HDFS. |
    -| carbon.ddl.base.hdfs.url | hdfs://hacluster/opt/data | This property is used to configure the HDFS relative path, the path configured in carbon.ddl.base.hdfs.url will be appended to the HDFS path configured in fs.defaultFS. If this path is configured, then user need not pass the complete path while dataload. For example: If absolute path of the csv file is hdfs://10.18.101.155:54310/data/cnbc/2016/xyz.csv, the path "hdfs://10.18.101.155:54310" will come from property fs.defaultFS and user can configure the /data/cnbc/ as carbon.ddl.base.hdfs.url. Now while dataload user can specify the csv path as /2016/xyz.csv. |
    -| carbon.badRecords.location | /opt/Carbon/Spark/badrecords | Path where the bad records are stored. |
    -| carbon.data.file.version | 3 | If this parameter value is set to 1, then CarbonData will support the data load which is in old format(0.x version). If the value is set to 2(1.x onwards version), then CarbonData will support the data load of new format only. The default value for this parameter is 3(latest version is set as default version). It improves the query performance by ~20% to 50%. For configuring V3 format explicitly, add carbon.data.file.version = V3 in carbon.properties file. |
    +| carbon.storelocation |  | Location where CarbonData will create the store, and write the data in its own format. NOTE: Store location should be in HDFS. |
    --- End diff --
   
    Added


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1831: [CARBONDATA-1993] Carbon properties default v...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user mohammadshahidkhan commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1831#discussion_r165743860
 
    --- Diff: conf/carbon.properties.template ---
    @@ -17,29 +17,25 @@
     #
     
     #################### System Configuration ##################
    -#Mandatory. Carbon Store path
    -carbon.storelocation=hdfs://hacluster/Opt/CarbonStore
    +#Optional. Carbon Store path
    --- End diff --
   
    added


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1831: [CARBONDATA-1993] Carbon properties default values f...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1831
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3463/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1831: [CARBONDATA-1993] Carbon properties default values f...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1831
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2223/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1831: [CARBONDATA-1993] Carbon properties default values f...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1831
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3466/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1831: [CARBONDATA-1993] Carbon properties default values f...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1831
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2226/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1831: [CARBONDATA-1993] Carbon properties default values f...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1831
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3331/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1831: [CARBONDATA-1993] Carbon properties default values f...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user mohammadshahidkhan commented on the issue:

    https://github.com/apache/carbondata/pull/1831
 
    retest SDV please


---
123