[GitHub] carbondata pull request #1695: [CARBONDATA-1920] [PrestoIntegration] Sparksq...

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1695: [CARBONDATA-1920] [PrestoIntegration] Sparksq...

qiuchenjian-2
GitHub user anubhav100 opened a pull request:

    https://github.com/apache/carbondata/pull/1695

    [CARBONDATA-1920] [PrestoIntegration] Sparksql query result is not same as presto on same sql

    **Sparksql query result is not same as presto on same sql**
   
    **Problem:** In Stream Readers we are only decoding dictionary value in case our javatype is slice and it is not decimal type
   
    **Solution**: To solve this problem we need to do dictionary decoding in case of everydatatype if its dictionary exists because user can put any data type in dictionary_include when creating carbontable
   
    **This Pr Include Following Changes**
   
    1.In  stream readers functionality of dictionary decoding is provided
    2.Removed duplicate variables
    3.In presto filter util case of smallint was missing in ConvertDataByType method,it is added
    4.Refactor the code of carbondatastorecreator to include dictionary encoding
    5.In CarbonDictionaryDecodeReadSupport create sliceArrayBlock Only If data type is string in case of any other datatype do not create sliceArrayBlock we can decode the dictionary values using dictionary
    array the same way orc is doing on presto
    6.test cases are same but with dictionary encoding
   
    **How Testing Is Done**
   
    1.mvn -Pspark-2.1 clean install is passing
    2.Manually checked for all the tpch presto queries
    3.check all the possible queries with both dictionary_include and dictionary_exclude

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/anubhav100/incubator-carbondata presto-dict

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/1695.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1695
   
----
commit 0d0631f3d1bcefc8264b31811fff9a932a972b86
Author: anubhav100 <anubhav.tarar@...>
Date:   2017-12-05T06:55:58Z

    Sparksql query result is not same as presto on same sql because Dictionary decoding Logic is missing for all Stream Readers excpet the string
   
    Refactored the code the add back accidently deleted object stream reader file
   
    Alter the check style
   
    Refactored code for short int data type as it is failing for the gereator then and less than operator
   
    Refactored the carbondatastorecreator to include the dictionary encoding for all columns
   
    Modified store creator for creation of dictionary files for short data type
   
    Refactored the code
   
    Resolved the pr comments

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1695: [CARBONDATA-1920] [PrestoIntegration] Sparksql query...

qiuchenjian-2
Github user anubhav100 commented on the issue:

    https://github.com/apache/carbondata/pull/1695
 
    @chenliang can you please review this pr


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1695: [CARBONDATA-1920] [PrestoIntegration] Sparksql query...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user chenliang613 commented on the issue:

    https://github.com/apache/carbondata/pull/1695
 
    sure, i will review it. thanks for your contribution.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1695: [CARBONDATA-1920] [PrestoIntegration] Sparksql query...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user chenliang613 commented on the issue:

    https://github.com/apache/carbondata/pull/1695
 
    Have you used carbondata 1.3.0-master code and spark 2.1 to test ?
    Same issues ?


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1695: [CARBONDATA-1920] [PrestoIntegration] Sparksql query...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user anubhav100 commented on the issue:

    https://github.com/apache/carbondata/pull/1695
 
    @chenliang613 yes same issues are getting replicated for both 1.2.0 and 1.3.0


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1695: [CARBONDATA-1920] [PrestoIntegration] Sparksql query...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user chenliang613 commented on the issue:

    https://github.com/apache/carbondata/pull/1695
 
    this pr verified , looks good to me.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1695: [CARBONDATA-1920] [PrestoIntegration] Sparksq...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user asfgit closed the pull request at:

    https://github.com/apache/carbondata/pull/1695


---