[GitHub] [carbondata] shenh062326 opened a new pull request #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000

classic Classic list List threaded Threaded
45 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] shenh062326 opened a new pull request #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000

GitBox
shenh062326 opened a new pull request #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000
URL: https://github.com/apache/carbondata/pull/3546
 
 
   Before change, when string length exceed 32000, the error message is
   ```
   Previous exception in task: Dataload failed, String length cannot exceed 32000 characters
    org.apache.carbondata.streaming.parser.FieldConverter$.objectToString(FieldConverter.scala:53)
    org.apache.carbondata.spark.util.CarbonScalaUtil$.getString(CarbonScalaUtil.scala:71)
    org.apache.carbondata.spark.rdd.NewRddIterator$$anonfun$next$1.apply$mcVI$sp(NewCarbonDataLoadRDD.scala:360)
    scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
    org.apache.carbondata.spark.rdd.NewRddIterator.next(NewCarbonDataLoadRDD.scala:359)
    org.apache.carbondata.spark.load.DataLoadProcessorStepOnSpark$$anon$1.next(DataLoadProcessorStepOnSpark.scala:66)
   ... ...
   ```
   
   After change,  when string length exceed 32000, the error message is
   ```
       Previous exception in task: Column idx 49 too long
        org.apache.carbondata.spark.util.CarbonScalaUtil$.getString(CarbonScalaUtil.scala:80)
        org.apache.carbondata.spark.rdd.NewRddIterator$$anonfun$next$1.apply$mcVI$sp(NewCarbonDataLoadRDD.scala:360)
        scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
        org.apache.carbondata.spark.rdd.NewRddIterator.next(NewCarbonDataLoadRDD.scala:359)
        org.apache.carbondata.spark.load.DataLoadProcessorStepOnSpark$$anon$1.next(DataLoadProcessorStepOnSpark.scala:66)
        ... ...
       Caused by: java.lang.Exception: Dataload failed, String length cannot exceed 32000 characters
        at org.apache.carbondata.streaming.parser.FieldConverter$.objectToString(FieldConverter.scala:54)
        at org.apache.carbondata.spark.util.CarbonScalaUtil$.getString(CarbonScalaUtil.scala:74)
        ... 31 more
   ```
   
   Be sure to do all of the following checklist to help us incorporate
   your contribution quickly and easily:
   
    - [ ] Any interfaces changed? No
   
    - [ ] Any backward compatibility impacted? No
   
    - [ ] Document update required? No
   
    - [ ] Testing done Yes
         
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000

GitBox
jackylk commented on a change in pull request #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000
URL: https://github.com/apache/carbondata/pull/3546#discussion_r361907045
 
 

 ##########
 File path: integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonGlobalDictionaryRDD.scala
 ##########
 @@ -297,7 +297,8 @@ class CarbonBlockDistinctValuesCombineRDD(
           val complexDelimiters = new util.ArrayList[String]
           model.delimiters.foreach(x => complexDelimiters.add(x))
           for (i <- 0 until dimNum) {
-            dimensionParsers(i).parseString(CarbonScalaUtil.getString(row.get(i),
+            dimensionParsers(i).parseString(CarbonScalaUtil.getString(row,
 
 Review comment:
   move ` CarbonScalaUtil.getString(row,` to next line like `CarbonScalaUtil.getString(row, i),`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000
URL: https://github.com/apache/carbondata/pull/3546#discussion_r361907089
 
 

 ##########
 File path: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CarbonScalaUtil.scala
 ##########
 @@ -60,17 +60,27 @@ object CarbonScalaUtil {
 
   private val LOGGER: Logger = LogServiceFactory.getLogService(this.getClass.getCanonicalName)
 
-  def getString(value: Any,
+  def getString(row: Row,
 
 Review comment:
   move `row:Row` to next line

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000
URL: https://github.com/apache/carbondata/pull/3546#discussion_r361907260
 
 

 ##########
 File path: streaming/src/main/scala/org/apache/carbondata/streaming/parser/FieldConverter.scala
 ##########
 @@ -50,7 +51,7 @@ object FieldConverter {
       value match {
         case s: String => if (!isVarcharType && !isComplexType &&
                               s.length > CarbonCommonConstants.MAX_CHARS_PER_COLUMN_DEFAULT) {
-          throw new Exception("Dataload failed, String length cannot exceed " +
+          throw new Exception( exceedErrorMsg +
 
 Review comment:
   suggest to use `IllegalArgumentException` can catch it in CarbonScalaUtil.scala

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000
URL: https://github.com/apache/carbondata/pull/3546#discussion_r361907260
 
 

 ##########
 File path: streaming/src/main/scala/org/apache/carbondata/streaming/parser/FieldConverter.scala
 ##########
 @@ -50,7 +51,7 @@ object FieldConverter {
       value match {
         case s: String => if (!isVarcharType && !isComplexType &&
                               s.length > CarbonCommonConstants.MAX_CHARS_PER_COLUMN_DEFAULT) {
-          throw new Exception("Dataload failed, String length cannot exceed " +
+          throw new Exception( exceedErrorMsg +
 
 Review comment:
   suggest to use `IllegalArgumentException`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000
URL: https://github.com/apache/carbondata/pull/3546#discussion_r361907409
 
 

 ##########
 File path: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CarbonScalaUtil.scala
 ##########
 @@ -60,17 +60,27 @@ object CarbonScalaUtil {
 
   private val LOGGER: Logger = LogServiceFactory.getLogService(this.getClass.getCanonicalName)
 
-  def getString(value: Any,
+  def getString(row: Row,
+      idx: Int,
       serializationNullFormat: String,
       complexDelimiters: util.ArrayList[String],
       timeStampFormat: SimpleDateFormat,
       dateFormat: SimpleDateFormat,
       isVarcharType: Boolean = false,
       isComplexType: Boolean = false,
       level: Int = 0): String = {
-    FieldConverter.objectToString(value, serializationNullFormat, complexDelimiters,
-      timeStampFormat, dateFormat, isVarcharType = isVarcharType, isComplexType = isComplexType,
-      level)
+    try {
+      FieldConverter.objectToString(row.get(idx), serializationNullFormat, complexDelimiters,
+        timeStampFormat, dateFormat, isVarcharType = isVarcharType, isComplexType = isComplexType,
+        level)
+    } catch {
+      case e: Exception =>
+        if (e.getMessage.startsWith(FieldConverter.exceedErrorMsg)) {
+          throw new Exception("Column idx " + idx + " too long", e)
 
 Review comment:
   Why change the content and throw again? Why not throw it directly?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000
URL: https://github.com/apache/carbondata/pull/3546#issuecomment-569596183
 
 
   Build Success with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1349/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000
URL: https://github.com/apache/carbondata/pull/3546#issuecomment-569603453
 
 
   Build Failed with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1359/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000
URL: https://github.com/apache/carbondata/pull/3546#issuecomment-569605699
 
 
   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1370/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] shenh062326 commented on a change in pull request #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000

GitBox
In reply to this post by GitBox
shenh062326 commented on a change in pull request #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000
URL: https://github.com/apache/carbondata/pull/3546#discussion_r362154198
 
 

 ##########
 File path: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CarbonScalaUtil.scala
 ##########
 @@ -60,17 +60,27 @@ object CarbonScalaUtil {
 
   private val LOGGER: Logger = LogServiceFactory.getLogService(this.getClass.getCanonicalName)
 
-  def getString(value: Any,
+  def getString(row: Row,
+      idx: Int,
       serializationNullFormat: String,
       complexDelimiters: util.ArrayList[String],
       timeStampFormat: SimpleDateFormat,
       dateFormat: SimpleDateFormat,
       isVarcharType: Boolean = false,
       isComplexType: Boolean = false,
       level: Int = 0): String = {
-    FieldConverter.objectToString(value, serializationNullFormat, complexDelimiters,
-      timeStampFormat, dateFormat, isVarcharType = isVarcharType, isComplexType = isComplexType,
-      level)
+    try {
+      FieldConverter.objectToString(row.get(idx), serializationNullFormat, complexDelimiters,
+        timeStampFormat, dateFormat, isVarcharType = isVarcharType, isComplexType = isComplexType,
+        level)
+    } catch {
+      case e: Exception =>
+        if (e.getMessage.startsWith(FieldConverter.exceedErrorMsg)) {
+          throw new Exception("Column idx " + idx + " too long", e)
 
 Review comment:
   I want to add column idx into the error message.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] shenh062326 commented on a change in pull request #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000

GitBox
In reply to this post by GitBox
shenh062326 commented on a change in pull request #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000
URL: https://github.com/apache/carbondata/pull/3546#discussion_r362154286
 
 

 ##########
 File path: integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonGlobalDictionaryRDD.scala
 ##########
 @@ -297,7 +297,8 @@ class CarbonBlockDistinctValuesCombineRDD(
           val complexDelimiters = new util.ArrayList[String]
           model.delimiters.foreach(x => complexDelimiters.add(x))
           for (i <- 0 until dimNum) {
-            dimensionParsers(i).parseString(CarbonScalaUtil.getString(row.get(i),
+            dimensionParsers(i).parseString(CarbonScalaUtil.getString(row,
 
 Review comment:
   done

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] shenh062326 commented on a change in pull request #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000

GitBox
In reply to this post by GitBox
shenh062326 commented on a change in pull request #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000
URL: https://github.com/apache/carbondata/pull/3546#discussion_r362154300
 
 

 ##########
 File path: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CarbonScalaUtil.scala
 ##########
 @@ -60,17 +60,27 @@ object CarbonScalaUtil {
 
   private val LOGGER: Logger = LogServiceFactory.getLogService(this.getClass.getCanonicalName)
 
-  def getString(value: Any,
+  def getString(row: Row,
 
 Review comment:
   done

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] shenh062326 commented on a change in pull request #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000

GitBox
In reply to this post by GitBox
shenh062326 commented on a change in pull request #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000
URL: https://github.com/apache/carbondata/pull/3546#discussion_r362154321
 
 

 ##########
 File path: streaming/src/main/scala/org/apache/carbondata/streaming/parser/FieldConverter.scala
 ##########
 @@ -50,7 +51,7 @@ object FieldConverter {
       value match {
         case s: String => if (!isVarcharType && !isComplexType &&
                               s.length > CarbonCommonConstants.MAX_CHARS_PER_COLUMN_DEFAULT) {
-          throw new Exception("Dataload failed, String length cannot exceed " +
+          throw new Exception( exceedErrorMsg +
 
 Review comment:
   done

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000
URL: https://github.com/apache/carbondata/pull/3546#issuecomment-569880578
 
 
   Build Success with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1372/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000
URL: https://github.com/apache/carbondata/pull/3546#issuecomment-569887598
 
 
   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1392/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000
URL: https://github.com/apache/carbondata/pull/3546#issuecomment-569889012
 
 
   Build Failed with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1382/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000
URL: https://github.com/apache/carbondata/pull/3546#issuecomment-569907489
 
 
   Build Success with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1377/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000
URL: https://github.com/apache/carbondata/pull/3546#issuecomment-569913567
 
 
   Build Failed with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1387/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000
URL: https://github.com/apache/carbondata/pull/3546#issuecomment-569919488
 
 
   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1398/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000
URL: https://github.com/apache/carbondata/pull/3546#issuecomment-570111631
 
 
   Build Success with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1387/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
123