Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #1695: [CARBONDATA-1920] [PrestoIntegration] Sparksq...

Classic

List

Threaded

7 messages Options

qiuchenjian-2

[GitHub] carbondata pull request #1695: [CARBONDATA-1920] [PrestoIntegration] Sparksq...

GitHub user anubhav100 opened a pull request:

https://github.com/apache/carbondata/pull/1695

[CARBONDATA-1920] [PrestoIntegration] Sparksql query result is not same as presto on same sql

**Sparksql query result is not same as presto on same sql**

**Problem:** In Stream Readers we are only decoding dictionary value in case our javatype is slice and it is not decimal type

**Solution**: To solve this problem we need to do dictionary decoding in case of everydatatype if its dictionary exists because user can put any data type in dictionary_include when creating carbontable

**This Pr Include Following Changes**

1.In stream readers functionality of dictionary decoding is provided
2.Removed duplicate variables
3.In presto filter util case of smallint was missing in ConvertDataByType method,it is added
4.Refactor the code of carbondatastorecreator to include dictionary encoding
5.In CarbonDictionaryDecodeReadSupport create sliceArrayBlock Only If data type is string in case of any other datatype do not create sliceArrayBlock we can decode the dictionary values using dictionary
array the same way orc is doing on presto
6.test cases are same but with dictionary encoding

**How Testing Is Done**

1.mvn -Pspark-2.1 clean install is passing
2.Manually checked for all the tpch presto queries
3.check all the possible queries with both dictionary_include and dictionary_exclude

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/anubhav100/incubator-carbondata presto-dict

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/1695.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1695

----
commit 0d0631f3d1bcefc8264b31811fff9a932a972b86
Author: anubhav100 <anubhav.tarar@...>
Date: 2017-12-05T06:55:58Z

Sparksql query result is not same as presto on same sql because Dictionary decoding Logic is missing for all Stream Readers excpet the string

Refactored the code the add back accidently deleted object stream reader file

Alter the check style

Refactored code for short int data type as it is failing for the gereator then and less than operator

Refactored the carbondatastorecreator to include the dictionary encoding for all columns

Modified store creator for creation of dictionary files for short data type

Refactored the code

Resolved the pr comments

----

---

qiuchenjian-2

[GitHub] carbondata issue #1695: [CARBONDATA-1920] [PrestoIntegration] Sparksql query...

Github user anubhav100 commented on the issue:

https://github.com/apache/carbondata/pull/1695

@chenliang can you please review this pr

---

qiuchenjian-2