GitHub user ravipesala opened a pull request:
https://github.com/apache/incubator-carbondata/pull/709 [WIP] Improvements in query Following are the list of improvements done in this part of PR. 1. Removed multiple creation of array and copy of it in Dimension and measure chunk readers. 2. Simplified logic of finding offsets of nodictionary keys in the class SafeVariableLengthDimensionDataChunkStore. 3. Avoided byte array creation and copy for nodictionary columns in case of vectorized reader. Instead directly sending the length and offset to vector. 4. Removed unnecessary decoder plan additions to oprtimized plan. It can optimize the codegen flow. 5. Updated CompareTest to take table blocksize and kept as 32 Mb in order to make use of small sorting when doing take ordered in spark. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ravipesala/incubator-carbondata minor-perf-improv Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/709.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #709 ---- commit eaa964425ffa784d905b12045e6e719c55eb1164 Author: ravipesala <[hidden email]> Date: 2017-03-05T15:02:35Z Removed unnecessary array copy and bitset checking commit 62914d866063a2606f6396b9912cf4466cbacef9 Author: ravipesala <[hidden email]> Date: 2017-03-28T15:24:26Z OPtimized code commit 45a4dcab42842f61d7cf28c5834bb4810c77bcbc Author: ravipesala <[hidden email]> Date: 2017-03-29T13:57:11Z Added table_blocksize option. commit 57d135937843ce89eb3805cddad6034cf9db3aaf Author: ravipesala <[hidden email]> Date: 2017-03-29T18:49:36Z Removed unnecessary plan from optimized plan. ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/709 Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1380/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/709#discussion_r109959185 --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/CompareTest.scala --- @@ -306,6 +307,8 @@ object CompareTest { // do GC and sleep for some time before running next table System.gc() Thread.sleep(1000) + System.gc() --- End diff -- Is this required? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/709#discussion_r109959692 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/CarbonDataFrameWriter.scala --- @@ -171,11 +171,20 @@ class CarbonDataFrameWriter(sqlContext: SQLContext, val dataFrame: DataFrame) { } ).append( if (options.dictionaryExclude.isDefined) { - s"'DICTIONARY_EXCLUDE' = '${options.dictionaryExclude.get}'" + s"'DICTIONARY_EXCLUDE' = '${options.dictionaryExclude.get}' ," + } else { + "" + } + ).append( + if (options.tableBlockSize.isDefined) { + s"'table_blocksize' = '${options.tableBlockSize.get}'" } else { "" } ) + if (property.nonEmpty && property.charAt(property.length-1) == ',') { + property = property.replace(property.length-1, property.length, "") --- End diff -- change `property.length-1` to `property.length - 1` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/709#discussion_r110081787 --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/CompareTest.scala --- @@ -306,6 +307,8 @@ object CompareTest { // do GC and sleep for some time before running next table System.gc() Thread.sleep(1000) + System.gc() --- End diff -- There is no guarntee that GC will be called after calling of System.gc(), thats why after waiting for 1 second called again to increase the probability of running GC --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/709#discussion_r110081790 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/CarbonDataFrameWriter.scala --- @@ -171,11 +171,20 @@ class CarbonDataFrameWriter(sqlContext: SQLContext, val dataFrame: DataFrame) { } ).append( if (options.dictionaryExclude.isDefined) { - s"'DICTIONARY_EXCLUDE' = '${options.dictionaryExclude.get}'" + s"'DICTIONARY_EXCLUDE' = '${options.dictionaryExclude.get}' ," + } else { + "" + } + ).append( + if (options.tableBlockSize.isDefined) { + s"'table_blocksize' = '${options.tableBlockSize.get}'" } else { "" } ) + if (property.nonEmpty && property.charAt(property.length-1) == ',') { + property = property.replace(property.length-1, property.length, "") --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/709 Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1461/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on the issue:
https://github.com/apache/incubator-carbondata/pull/709 @ravipesala please rebase --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/709 Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1560/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on the issue:
https://github.com/apache/incubator-carbondata/pull/709 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user asfgit closed the pull request at:
https://github.com/apache/incubator-carbondata/pull/709 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
Free forum by Nabble | Edit this page |