Github user ndwangsen commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2864#discussion_r228708277 --- Diff: docs/ddl-of-carbondata.md --- @@ -474,7 +475,22 @@ CarbonData DDL statements are documented here,which includes: be later viewed in table description for reference. ``` - TBLPROPERTIES('BAD_RECORD_PATH'='/opt/badrecords'') + TBLPROPERTIES('BAD_RECORD_PATH'='/opt/badrecords') + ``` + + - ##### Load minimum data size + This property determines whether to enable node minumun input data size allocation strategy + for data loading.It will make sure that the node load the minimum amount of data there by + reducing number of carbondata files. This property is useful if the size of the input data + files are very small, like 1MB to 256MB. And This property can also be specified + in the load option, the property value only int value is supported. + + ``` + TBLPROPERTIES('LOAD_MIN_SIZE_INMB'='256 MB') --- End diff -- Has been modified based on the review --- |
In reply to this post by qiuchenjian-2
Github user ndwangsen commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2864#discussion_r228708281 --- Diff: integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala --- @@ -1171,12 +1171,27 @@ object CarbonDataRDDFactory { .ensureExecutorsAndGetNodeList(blockList, sqlContext.sparkContext) val skewedDataOptimization = CarbonProperties.getInstance() .isLoadSkewedDataOptimizationEnabled() - val loadMinSizeOptimization = CarbonProperties.getInstance() - .isLoadMinSizeOptimizationEnabled() // get user ddl input the node loads the smallest amount of data - val expectedMinSizePerNode = carbonLoadModel.getLoadMinSize() + val carbonTable = carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable + val loadMinSize = carbonTable.getTableInfo.getFactTable.getTableProperties.asScala --- End diff -- Has been modified based on the review --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2864 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1089/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2864 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1301/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2864 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9353/ --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2864#discussion_r228777044 --- Diff: integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala --- @@ -1171,21 +1171,25 @@ object CarbonDataRDDFactory { .ensureExecutorsAndGetNodeList(blockList, sqlContext.sparkContext) val skewedDataOptimization = CarbonProperties.getInstance() .isLoadSkewedDataOptimizationEnabled() - val loadMinSizeOptimization = CarbonProperties.getInstance() - .isLoadMinSizeOptimizationEnabled() // get user ddl input the node loads the smallest amount of data - val expectedMinSizePerNode = carbonLoadModel.getLoadMinSize() - val blockAssignStrategy = if (skewedDataOptimization) { - CarbonLoaderUtil.BlockAssignmentStrategy.BLOCK_SIZE_FIRST - } else if (loadMinSizeOptimization) { + val carbonTable = carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable + var loadMinSize = carbonLoadModel.getLoadMinSize() + if (loadMinSize == "0" ) { --- End diff -- for comparison for string, use String.equals instead of == --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2864#discussion_r228776762 --- Diff: docs/ddl-of-carbondata.md --- @@ -474,7 +475,19 @@ CarbonData DDL statements are documented here,which includes: be later viewed in table description for reference. ``` - TBLPROPERTIES('BAD_RECORD_PATH'='/opt/badrecords'') + TBLPROPERTIES('BAD_RECORD_PATH'='/opt/badrecords') + ``` + + - ##### Load minimum data size --- End diff -- You can check it by this  --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2864#discussion_r228776926 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala --- @@ -833,4 +833,32 @@ object CommonUtil { }) } } + + /** + * This method will validate single node minimum load data volume of table specified by the user + * + * @param tableProperties table property specified by user + * @param propertyName property name + */ + def validateLoadMinSize(tableProperties: Map[String, String], propertyName: String): Unit = { + var size: Integer = 0 + if (tableProperties.get(propertyName).isDefined) { + val loadSizeStr: String = + parsePropertyValueStringInMB(tableProperties(propertyName)) + try { + size = Integer.parseInt(loadSizeStr) + } catch { + case e: NumberFormatException => + throw new MalformedCarbonCommandException(s"Invalid $propertyName value found: " + + s"$loadSizeStr, only int value greater " + + s"than 0 is supported.") + } + // if the value is negative, set the value is 0 + if(size > 0) { + tableProperties.put(propertyName, loadSizeStr) + } else { + tableProperties.put(propertyName, "0") --- End diff -- for the default value '0', please use a static variable instead of magic number. --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2864#discussion_r228776708 --- Diff: docs/ddl-of-carbondata.md --- @@ -474,7 +475,19 @@ CarbonData DDL statements are documented here,which includes: be later viewed in table description for reference. ``` - TBLPROPERTIES('BAD_RECORD_PATH'='/opt/badrecords'') + TBLPROPERTIES('BAD_RECORD_PATH'='/opt/badrecords') + ``` + + - ##### Load minimum data size --- End diff -- please note that current TOC is incorrect.  --- |
In reply to this post by qiuchenjian-2
Github user ndwangsen commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2864#discussion_r228791841 --- Diff: docs/ddl-of-carbondata.md --- @@ -474,7 +475,19 @@ CarbonData DDL statements are documented here,which includes: be later viewed in table description for reference. ``` - TBLPROPERTIES('BAD_RECORD_PATH'='/opt/badrecords'') + TBLPROPERTIES('BAD_RECORD_PATH'='/opt/badrecords') + ``` + + - ##### Load minimum data size --- End diff -- ok, i modify it --- |
In reply to this post by qiuchenjian-2
Github user ndwangsen commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2864#discussion_r228791974 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala --- @@ -833,4 +833,32 @@ object CommonUtil { }) } } + + /** + * This method will validate single node minimum load data volume of table specified by the user + * + * @param tableProperties table property specified by user + * @param propertyName property name + */ + def validateLoadMinSize(tableProperties: Map[String, String], propertyName: String): Unit = { + var size: Integer = 0 + if (tableProperties.get(propertyName).isDefined) { + val loadSizeStr: String = + parsePropertyValueStringInMB(tableProperties(propertyName)) + try { + size = Integer.parseInt(loadSizeStr) + } catch { + case e: NumberFormatException => + throw new MalformedCarbonCommandException(s"Invalid $propertyName value found: " + + s"$loadSizeStr, only int value greater " + + s"than 0 is supported.") + } + // if the value is negative, set the value is 0 + if(size > 0) { + tableProperties.put(propertyName, loadSizeStr) + } else { + tableProperties.put(propertyName, "0") --- End diff -- ok, i modify it --- |
In reply to this post by qiuchenjian-2
Github user ndwangsen commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2864#discussion_r228792020 --- Diff: integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala --- @@ -1171,21 +1171,25 @@ object CarbonDataRDDFactory { .ensureExecutorsAndGetNodeList(blockList, sqlContext.sparkContext) val skewedDataOptimization = CarbonProperties.getInstance() .isLoadSkewedDataOptimizationEnabled() - val loadMinSizeOptimization = CarbonProperties.getInstance() - .isLoadMinSizeOptimizationEnabled() // get user ddl input the node loads the smallest amount of data - val expectedMinSizePerNode = carbonLoadModel.getLoadMinSize() - val blockAssignStrategy = if (skewedDataOptimization) { - CarbonLoaderUtil.BlockAssignmentStrategy.BLOCK_SIZE_FIRST - } else if (loadMinSizeOptimization) { + val carbonTable = carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable + var loadMinSize = carbonLoadModel.getLoadMinSize() + if (loadMinSize == "0" ) { --- End diff -- ok, i modify it --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2864 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1099/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2864 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1310/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2864 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9362/ --- |
In reply to this post by qiuchenjian-2
Github user ndwangsen commented on the issue:
https://github.com/apache/carbondata/pull/2864 retest this please --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2864 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1103/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2864 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1314/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2864 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9366/ --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/2864 LGTM --- |
Free forum by Nabble | Edit this page |