[GitHub] incubator-carbondata pull request #709: [WIP] Improvements in query

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #709: [WIP] Improvements in query

qiuchenjian-2
GitHub user ravipesala opened a pull request:

    https://github.com/apache/incubator-carbondata/pull/709

    [WIP] Improvements in query

    Following are the list of improvements done in this part of PR.
    1. Removed multiple creation of array and copy of it in Dimension and measure chunk readers.
    2. Simplified logic of finding offsets of nodictionary keys in the class SafeVariableLengthDimensionDataChunkStore.
    3. Avoided byte array creation and copy for nodictionary columns in case of vectorized reader. Instead directly sending the length and offset to vector.
    4. Removed unnecessary decoder plan additions to oprtimized plan. It can optimize the codegen flow.
    5. Updated CompareTest to take table blocksize and kept as 32 Mb in order to make use of small sorting when doing take ordered in spark.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ravipesala/incubator-carbondata minor-perf-improv

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-carbondata/pull/709.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #709
   
----
commit eaa964425ffa784d905b12045e6e719c55eb1164
Author: ravipesala <[hidden email]>
Date:   2017-03-05T15:02:35Z

    Removed unnecessary array copy and bitset checking

commit 62914d866063a2606f6396b9912cf4466cbacef9
Author: ravipesala <[hidden email]>
Date:   2017-03-28T15:24:26Z

    OPtimized code

commit 45a4dcab42842f61d7cf28c5834bb4810c77bcbc
Author: ravipesala <[hidden email]>
Date:   2017-03-29T13:57:11Z

    Added table_blocksize option.

commit 57d135937843ce89eb3805cddad6034cf9db3aaf
Author: ravipesala <[hidden email]>
Date:   2017-03-29T18:49:36Z

    Removed unnecessary plan from optimized plan.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata issue #709: [WIP] Improvements in query

qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/709
 
    Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1380/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #709: [CARBONDATA-861] Improvements in que...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/709#discussion_r109959185
 
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/CompareTest.scala ---
    @@ -306,6 +307,8 @@ object CompareTest {
         // do GC and sleep for some time before running next table
         System.gc()
         Thread.sleep(1000)
    +    System.gc()
    --- End diff --
   
    Is this required?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #709: [CARBONDATA-861] Improvements in que...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/709#discussion_r109959692
 
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/CarbonDataFrameWriter.scala ---
    @@ -171,11 +171,20 @@ class CarbonDataFrameWriter(sqlContext: SQLContext, val dataFrame: DataFrame) {
           }
         ).append(
           if (options.dictionaryExclude.isDefined) {
    -        s"'DICTIONARY_EXCLUDE' = '${options.dictionaryExclude.get}'"
    +        s"'DICTIONARY_EXCLUDE' = '${options.dictionaryExclude.get}' ,"
    +      } else {
    +        ""
    +      }
    +    ).append(
    +      if (options.tableBlockSize.isDefined) {
    +        s"'table_blocksize' = '${options.tableBlockSize.get}'"
           } else {
             ""
           }
         )
    +    if (property.nonEmpty && property.charAt(property.length-1) == ',') {
    +      property = property.replace(property.length-1, property.length, "")
    --- End diff --
   
    change `property.length-1` to `property.length - 1`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #709: [CARBONDATA-861] Improvements in que...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/709#discussion_r110081787
 
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/CompareTest.scala ---
    @@ -306,6 +307,8 @@ object CompareTest {
         // do GC and sleep for some time before running next table
         System.gc()
         Thread.sleep(1000)
    +    System.gc()
    --- End diff --
   
    There is no guarntee that GC will be called after calling of System.gc(), thats why after waiting for 1 second called again to increase the probability of running GC


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #709: [CARBONDATA-861] Improvements in que...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/709#discussion_r110081790
 
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/CarbonDataFrameWriter.scala ---
    @@ -171,11 +171,20 @@ class CarbonDataFrameWriter(sqlContext: SQLContext, val dataFrame: DataFrame) {
           }
         ).append(
           if (options.dictionaryExclude.isDefined) {
    -        s"'DICTIONARY_EXCLUDE' = '${options.dictionaryExclude.get}'"
    +        s"'DICTIONARY_EXCLUDE' = '${options.dictionaryExclude.get}' ,"
    +      } else {
    +        ""
    +      }
    +    ).append(
    +      if (options.tableBlockSize.isDefined) {
    +        s"'table_blocksize' = '${options.tableBlockSize.get}'"
           } else {
             ""
           }
         )
    +    if (property.nonEmpty && property.charAt(property.length-1) == ',') {
    +      property = property.replace(property.length-1, property.length, "")
    --- End diff --
   
    ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata issue #709: [CARBONDATA-861] Improvements in query

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/709
 
    Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1461/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata issue #709: [CARBONDATA-861] Improvements in query

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/709
 
    @ravipesala please rebase


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata issue #709: [CARBONDATA-861] Improvements in query

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/709
 
    Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1560/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata issue #709: [CARBONDATA-861] Improvements in query

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/709
 
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #709: [CARBONDATA-861] Improvements in que...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-carbondata/pull/709


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---