[GitHub] carbondata pull request #1265: [CARBONDATA-1128] Add direct string encoding ...

classic Classic list List threaded Threaded
41 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1265: [CARBONDATA-1128] Add direct string encoding ...

qiuchenjian-2
GitHub user jackylk opened a pull request:

    https://github.com/apache/carbondata/pull/1265

    [CARBONDATA-1128] Add direct string encoding for short string column

    For short string columns less than 128 bytes, add a new encoding to improve compression and loading speed.
    DirectStringCodec encode the input column by two array:
    1. one for string content, stored in data page of DataChunk2
    2. another for string length, stored in EncoderMeta in DataChunk2
    They are compressed separately by compressor.
   
    I have tested using TPC-H generated data (1GB)
    1.  For high cardinality columns (L_COMENT in LINEITEM table, distinct value is 4580667)
    CREATE TABLE LINEITEM (
    L_COMMENT VARCHAR(44)
     )
     STORED BY 'carbondata'
   
    - Use direct string encoding
    loading time: 12496 ms
    size: 42M
   
    - Use existing encoding
    loading time: 12230 ms
    size: 45M
   
    2.  For low cardinality columns (L_SHIPMODE in LINEITEM table, distinct value is 7)
    CREATE TABLE LINEITEM (
    L_SHIPMODE CHAR(10)
    )
    STORED BY 'carbondata'
   
    - Use direct string encoding
    loading time: 6089 ms
    size: 1.4M
   
    - Use existing encoding
    loading time: 6556 ms
    size: 1.7M

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jackylk/incubator-carbondata direct_string

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/1265.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1265
   
----
commit 828f108fc5c312a087f80e4470cae7666293fc3f
Author: Jacky Li <[hidden email]>
Date:   2017-08-17T01:57:43Z

    add integral rle codec

commit 37d1c0977220eab37ba4695e0585e0045258e702
Author: Jacky Li <[hidden email]>
Date:   2017-08-17T13:32:02Z

    decode by meta

commit 05eaa0b6ef64ba23425259006e3c57fd867f95c4
Author: Jacky Li <[hidden email]>
Date:   2017-08-18T06:05:38Z

    add direct string codec for short string column

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1265: [CARBONDATA-1128] Add direct string encoding for sho...

qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1265
 
    SDV Build Failed with Spark 2.1, Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/226/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1265: [CARBONDATA-1128] Add direct string encoding for sho...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1265
 
    SDV Build Failed with Spark 2.1, Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/233/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1265: [CARBONDATA-1128] Add direct string encoding for sho...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1265
 
    SDV Build Failed with Spark 2.1, Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/252/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1265: [CARBONDATA-1128] Add direct string encoding for sho...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1265
 
    SDV Build Failed with Spark 2.1, Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/275/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1265: [CARBONDATA-1128] Add direct string encoding for sho...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on the issue:

    https://github.com/apache/carbondata/pull/1265
 
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1265: [CARBONDATA-1128] Add direct string encoding for sho...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1265
 
    @jackylk it seems lot of tests are failing, can you check


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1265: [CARBONDATA-1128] Add direct string encoding for sho...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1265
 
    SDV Build Failed with Spark 2.1, Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/311/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1265: [CARBONDATA-1128] Add direct string encoding for sho...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1265
 
    SDV Build Failed with Spark 2.1, Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/335/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1265: [CARBONDATA-1128] Add direct string encoding for sho...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1265
 
    SDV Build Failed with Spark 2.1, Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/339/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1265: [CARBONDATA-1128] Add encoding for non-dictionary di...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1265
 
    SDV Build Failed with Spark 2.1, Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/354/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1265: [CARBONDATA-1128] Add encoding for non-dictionary di...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1265
 
    SDV Build Failed with Spark 2.1, Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/360/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1265: [CARBONDATA-1128] Add encoding for non-dictionary di...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1265
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/493/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1265: [CARBONDATA-1128] Add encoding for non-dictionary di...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1265
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/494/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1265: [CARBONDATA-1128] Add encoding for non-dictionary di...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1265
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/507/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1265: [CARBONDATA-1128] Add encoding for non-dictionary di...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1265
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/512/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1265: [CARBONDATA-1128] Add encoding for non-dictionary di...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1265
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/513/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1265: [CARBONDATA-1128] Add encoding for non-dictionary di...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1265
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/518/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1265: [CARBONDATA-1128] Add encoding for non-dictionary di...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1265
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/519/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1265: [CARBONDATA-1128] Add encoding for non-dictionary di...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1265
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/537/



---
123