GitHub user anubhav100 opened a pull request:
https://github.com/apache/carbondata/pull/1695 [CARBONDATA-1920] [PrestoIntegration] Sparksql query result is not same as presto on same sql **Sparksql query result is not same as presto on same sql** **Problem:** In Stream Readers we are only decoding dictionary value in case our javatype is slice and it is not decimal type **Solution**: To solve this problem we need to do dictionary decoding in case of everydatatype if its dictionary exists because user can put any data type in dictionary_include when creating carbontable **This Pr Include Following Changes** 1.In stream readers functionality of dictionary decoding is provided 2.Removed duplicate variables 3.In presto filter util case of smallint was missing in ConvertDataByType method,it is added 4.Refactor the code of carbondatastorecreator to include dictionary encoding 5.In CarbonDictionaryDecodeReadSupport create sliceArrayBlock Only If data type is string in case of any other datatype do not create sliceArrayBlock we can decode the dictionary values using dictionary array the same way orc is doing on presto 6.test cases are same but with dictionary encoding **How Testing Is Done** 1.mvn -Pspark-2.1 clean install is passing 2.Manually checked for all the tpch presto queries 3.check all the possible queries with both dictionary_include and dictionary_exclude You can merge this pull request into a Git repository by running: $ git pull https://github.com/anubhav100/incubator-carbondata presto-dict Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1695.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1695 ---- commit 0d0631f3d1bcefc8264b31811fff9a932a972b86 Author: anubhav100 <anubhav.tarar@...> Date: 2017-12-05T06:55:58Z Sparksql query result is not same as presto on same sql because Dictionary decoding Logic is missing for all Stream Readers excpet the string Refactored the code the add back accidently deleted object stream reader file Alter the check style Refactored code for short int data type as it is failing for the gereator then and less than operator Refactored the carbondatastorecreator to include the dictionary encoding for all columns Modified store creator for creation of dictionary files for short data type Refactored the code Resolved the pr comments ---- --- |
Github user anubhav100 commented on the issue:
https://github.com/apache/carbondata/pull/1695 @chenliang can you please review this pr --- |
In reply to this post by qiuchenjian-2
Github user chenliang613 commented on the issue:
https://github.com/apache/carbondata/pull/1695 sure, i will review it. thanks for your contribution. --- |
In reply to this post by qiuchenjian-2
Github user chenliang613 commented on the issue:
https://github.com/apache/carbondata/pull/1695 Have you used carbondata 1.3.0-master code and spark 2.1 to test ? Same issues ? --- |
In reply to this post by qiuchenjian-2
Github user anubhav100 commented on the issue:
https://github.com/apache/carbondata/pull/1695 @chenliang613 yes same issues are getting replicated for both 1.2.0 and 1.3.0 --- |
In reply to this post by qiuchenjian-2
Github user chenliang613 commented on the issue:
https://github.com/apache/carbondata/pull/1695 this pr verified , looks good to me. --- |
In reply to this post by qiuchenjian-2
|
Free forum by Nabble | Edit this page |