[GitHub] carbondata pull request #1785: [CARBONDATA-2015] Restricted maximum length o...

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1785: [CARBONDATA-2015] Restricted maximum length o...

qiuchenjian-2
GitHub user dhatchayani opened a pull request:

    https://github.com/apache/carbondata/pull/1785

    [CARBONDATA-2015] Restricted maximum length of bytes per column

    Validation for number of bytes for a column is added.
   
    We have limited the number of characters per column to 32000.
    For example, a single unicode character takes 3 bytes. So in this case, if my column has 30,000 unicode characters, then 32000 * 3 exceeds the short range. So, load will fail.
   
     - [ ] Any interfaces changed?
     
     - [ ] Any backward compatibility impacted?
     
     - [ ] Document update required?
   
     - [x] Testing done
            UT Added
           
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dhatchayani/incubator-carbondata 32000_bytes

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/1785.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1785
   
----
commit e380a1d6b2ffae8611f6045e9f63d2ca6e710652
Author: dhatchayani <dhatcha.official@...>
Date:   2018-01-10T10:59:14Z

    [CARBONDATA-2015] Restricted maximum length of bytes per column

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1785: [CARBONDATA-2015] Restricted maximum length of bytes...

qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1785
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2677/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1785: [CARBONDATA-2015] Restricted maximum length of bytes...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1785
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1443/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1785: [CARBONDATA-2015] Restricted maximum length of bytes...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1785
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2814/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1785: [CARBONDATA-2015] Restricted maximum length o...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1785#discussion_r161144886
 
    --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CarbonScalaUtil.scala ---
    @@ -113,6 +113,9 @@ object CarbonScalaUtil {
             case s: String => if (s.length > CSVInputFormat.MAX_CHARS_PER_COLUMN_DEFAULT) {
               throw new Exception("Dataload failed, String length cannot exceed " +
                                   CSVInputFormat.MAX_CHARS_PER_COLUMN_DEFAULT + " characters")
    +        } else if (ByteUtil.toBytes(s).length > CSVInputFormat.MAX_CHARS_PER_COLUMN_DEFAULT) {
    --- End diff --
   
    this can impact the data load performance. Move this check to CarbonDictionaryWriterImpl class where we are already converting the string values to byte array. This will not add any extra cost for for conversion of string to byte array


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1785: [CARBONDATA-2015] Restricted maximum length o...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1785#discussion_r161144435
 
    --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataGeneral.scala ---
    @@ -187,6 +187,75 @@ class TestLoadDataGeneral extends QueryTest with BeforeAndAfterAll {
         } catch {
           case _:Exception => assert(true)
         }
    +  }
    +
    +  test("test load / insert / update with data more than 32000 bytes - dictionary_exclude") {
    +    val testdata = s"$resourcesPath/unicodechar.csv"
    +    sql("drop table if exists load32000bytes")
    +    sql("create table load32000bytes(name string) stored by 'carbondata'")
    +    sql("insert into table load32000bytes select 'aaa'")
    +
    +    assert(intercept[Exception] {
    +      sql(s"load data local inpath '$testdata' into table load32000bytes OPTIONS ('FILEHEADER'='name')")
    +    }.getMessage.equals("DataLoad failure: Dataload failed, String size cannot exceed 32000 bytes"))
    +
    +    val source = scala.io.Source.fromFile(testdata)
    +    val data = source.mkString
    +
    +    try {
    +      sql(s"insert into load32000bytes values('$data')")
    --- End diff --
   
    avoid using try catch and use intercept[Exception]


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1785: [CARBONDATA-2015] Restricted maximum length o...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1785#discussion_r161144455
 
    --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataGeneral.scala ---
    @@ -187,6 +187,75 @@ class TestLoadDataGeneral extends QueryTest with BeforeAndAfterAll {
         } catch {
           case _:Exception => assert(true)
         }
    +  }
    +
    +  test("test load / insert / update with data more than 32000 bytes - dictionary_exclude") {
    +    val testdata = s"$resourcesPath/unicodechar.csv"
    +    sql("drop table if exists load32000bytes")
    +    sql("create table load32000bytes(name string) stored by 'carbondata'")
    +    sql("insert into table load32000bytes select 'aaa'")
    +
    +    assert(intercept[Exception] {
    +      sql(s"load data local inpath '$testdata' into table load32000bytes OPTIONS ('FILEHEADER'='name')")
    +    }.getMessage.equals("DataLoad failure: Dataload failed, String size cannot exceed 32000 bytes"))
    +
    +    val source = scala.io.Source.fromFile(testdata)
    +    val data = source.mkString
    +
    +    try {
    +      sql(s"insert into load32000bytes values('$data')")
    +      assert(false)
    +    } catch {
    +      case _: Exception =>
    +        assert(true)
    +    }
    +
    +    try {
    --- End diff --
   
    same comment as above


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1785: [CARBONDATA-2015] Restricted maximum length o...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1785#discussion_r161144939
 
    --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataGeneral.scala ---
    @@ -187,6 +187,75 @@ class TestLoadDataGeneral extends QueryTest with BeforeAndAfterAll {
         } catch {
           case _:Exception => assert(true)
         }
    +  }
    +
    +  test("test load / insert / update with data more than 32000 bytes - dictionary_exclude") {
    +    val testdata = s"$resourcesPath/unicodechar.csv"
    +    sql("drop table if exists load32000bytes")
    +    sql("create table load32000bytes(name string) stored by 'carbondata'")
    +    sql("insert into table load32000bytes select 'aaa'")
    +
    +    assert(intercept[Exception] {
    +      sql(s"load data local inpath '$testdata' into table load32000bytes OPTIONS ('FILEHEADER'='name')")
    +    }.getMessage.equals("DataLoad failure: Dataload failed, String size cannot exceed 32000 bytes"))
    +
    +    val source = scala.io.Source.fromFile(testdata)
    +    val data = source.mkString
    +
    +    try {
    +      sql(s"insert into load32000bytes values('$data')")
    +      assert(false)
    +    } catch {
    +      case _: Exception =>
    +        assert(true)
    +    }
    +
    +    try {
    +      sql(s"update load32000bytes set(name)= ('$data')").show()
    +      assert(false)
    +    } catch {
    +      case _: Exception =>
    +        assert(true)
    +    }
    +
    +    sql("drop table if exists load32000bytes")
    +
    +  }
    +
    +  test("test load / insert / update with data more than 32000 bytes - dictionary_include") {
    +    val testdata = s"$resourcesPath/unicodechar.csv"
    +    sql("drop table if exists load32000bytes")
    +    sql("create table load32000bytes(name string) stored by 'carbondata' TBLPROPERTIES('DICTIONARY_INCLUDE'='name')")
    +    sql("insert into table load32000bytes select 'aaa'")
    +
    +    val error = intercept[Exception] {
    +      sql(s"load data local inpath '$testdata' into table load32000bytes OPTIONS ('FILEHEADER'='name')")
    +    }.getMessage
    +
    +    assert(intercept[Exception] {
    +      sql(s"load data local inpath '$testdata' into table load32000bytes OPTIONS ('FILEHEADER'='name')")
    +    }.getMessage.contains("generate global dictionary failed, Dataload failed, String size cannot exceed 32000 bytes"))
    +
    +    val source = scala.io.Source.fromFile(testdata)
    +    val data = source.mkString
    +
    +    try {
    +      sql(s"insert into load32000bytes values('$data')")
    +      assert(false)
    +    } catch {
    +      case _: Exception =>
    +        assert(true)
    +    }
    +
    +    try {
    --- End diff --
   
    avoid using try catch at all the places in the modified code


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1785: [CARBONDATA-2015] Restricted maximum length of bytes...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1785
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1489/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1785: [CARBONDATA-2015] Restricted maximum length of bytes...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1785
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2722/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1785: [CARBONDATA-2015] Restricted maximum length of bytes...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user dhatchayani commented on the issue:

    https://github.com/apache/carbondata/pull/1785
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1785: [CARBONDATA-2015] Restricted maximum length of bytes...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1785
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2849/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1785: [CARBONDATA-2015] Restricted maximum length of bytes...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1785
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2725/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1785: [CARBONDATA-2015] Restricted maximum length of bytes...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1785
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1492/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1785: [CARBONDATA-2015] Restricted maximum length of bytes...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on the issue:

    https://github.com/apache/carbondata/pull/1785
 
    LGTM


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1785: [CARBONDATA-2015] Restricted maximum length o...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user asfgit closed the pull request at:

    https://github.com/apache/carbondata/pull/1785


---