kunal642 opened a new pull request #3738: URL: https://github.com/apache/carbondata/pull/3738 ### Why is this PR needed? Fix documentation for various features ### What changes were proposed in this PR? 1. Added write with hive doc 2. Added alter upgrade segment doc 3. Fix other random issues ### Does this PR introduce any user interface change? - No - Yes. (please explain the change and update document) ### Is any new testcase added? - No - Yes ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
Indhumathi27 commented on a change in pull request #3738: URL: https://github.com/apache/carbondata/pull/3738#discussion_r419235138 ########## File path: docs/ddl-of-carbondata.md ########## @@ -20,7 +20,6 @@ CarbonData DDL statements are documented here,which includes: * [CREATE TABLE](#create-table) - * [Dictionary Encoding](#dictionary-encoding-configuration) Review comment: Datamap keyword exists in ddl guide. Please check ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3738: URL: https://github.com/apache/carbondata/pull/3738#issuecomment-623307157 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2917/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3738: URL: https://github.com/apache/carbondata/pull/3738#issuecomment-623310559 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1200/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3738: URL: https://github.com/apache/carbondata/pull/3738#issuecomment-623361798 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1205/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3738: URL: https://github.com/apache/carbondata/pull/3738#issuecomment-623366636 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2924/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3738: URL: https://github.com/apache/carbondata/pull/3738#discussion_r419871090 ########## File path: docs/index-server.md ########## @@ -82,6 +82,15 @@ the pruned blocklets which would be further used for result fetching. **Note:** Multiple JDBC drivers can connect to the index server to use the cache. +## Enabling Size based distribution for Legacy stores +The default round robin based distribution causes unequal distribution of cache amoung the executors, which can cause any 1 of the executors to be bloated with too much cache and cause performance degrade. Review comment: ```suggestion The default round robin based distribution causes unequal distribution of cache among the executors, which can cause any 1 of the executors to be bloated with too much cache and cause performance degrade. ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3738: URL: https://github.com/apache/carbondata/pull/3738#discussion_r419871344 ########## File path: docs/index-server.md ########## @@ -82,6 +82,15 @@ the pruned blocklets which would be further used for result fetching. **Note:** Multiple JDBC drivers can connect to the index server to use the cache. +## Enabling Size based distribution for Legacy stores +The default round robin based distribution causes unequal distribution of cache amoung the executors, which can cause any 1 of the executors to be bloated with too much cache and cause performance degrade. +This problem can be solved by running the upgrade_segment command which will fill the data size values for each segment in the tablestatus file. Any cache loaded after this can use the traditional size based distribution. Review comment: ```suggestion This problem can be solved by running the `upgrade_segment` command which will fill the data size values for each segment in the tablestatus file. Any cache loaded after this can use the traditional size based distribution. ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3738: URL: https://github.com/apache/carbondata/pull/3738#discussion_r419872076 ########## File path: docs/index-server.md ########## @@ -19,8 +19,8 @@ ## Background -Carbon currently prunes and caches all block/blocklet datamap index information into the driver for -normal table, for Bloom/Index datamaps the JDBC driver will launch a job to prune and cache the +Carbon currently prunes and caches all block/blocklet index information into the driver for +normal table, for Bloom/Index indexes the JDBC driver will launch a job to prune and cache the Review comment: ```suggestion normal table, for Bloom/Lucene indexes the JDBC driver will launch a job to prune and cache the ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3738: URL: https://github.com/apache/carbondata/pull/3738#discussion_r419872161 ########## File path: docs/index-server.md ########## @@ -19,8 +19,8 @@ ## Background -Carbon currently prunes and caches all block/blocklet datamap index information into the driver for -normal table, for Bloom/Index datamaps the JDBC driver will launch a job to prune and cache the +Carbon currently prunes and caches all block/blocklet index information into the driver for +normal table, for Bloom/Index indexes the JDBC driver will launch a job to prune and cache the datamaps in executors. Review comment: ```suggestion indexes in executors. ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3738: URL: https://github.com/apache/carbondata/pull/3738#discussion_r419877825 ########## File path: docs/index-server.md ########## @@ -82,6 +82,15 @@ the pruned blocklets which would be further used for result fetching. **Note:** Multiple JDBC drivers can connect to the index server to use the cache. +## Enabling Size based distribution for Legacy stores Review comment: Please correct the format in Line No.55 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
akashrn5 commented on a change in pull request #3738: URL: https://github.com/apache/carbondata/pull/3738#discussion_r419892047 ########## File path: docs/ddl-of-carbondata.md ########## @@ -608,12 +607,10 @@ CarbonData DDL statements are documented here,which includes: This can be SDK output or C++ SDK output. Refer [SDK Guide](./sdk-guide.md) and [C++ SDK Guide](./csdk-guide.md). **Note:** - 1. Dropping of the external table should not delete the files present in the location. + 1. Dropping of the external table will not delete the files present in the location. 2. When external table is created on non-transactional table data, external table will be registered with the schema of carbondata files. - If multiple files with different schema is present, exception will be thrown. - So, If table registered with one schema and files are of different schema, - suggest to drop the external table and create again to register table with new schema. + If multiple files with different schema is present, exception will be thrown. Review comment: actually if multiple files different schema present, we check if same column present with different schema, in that case we throw exception, so here we can be more specific ########## File path: docs/index-server.md ########## @@ -82,6 +82,15 @@ the pruned blocklets which would be further used for result fetching. **Note:** Multiple JDBC drivers can connect to the index server to use the cache. +## Enabling Size based distribution for Legacy stores +The default round robin based distribution causes unequal distribution of cache amoung the executors, which can cause any 1 of the executors to be bloated with too much cache and cause performance degrade. Review comment: ```suggestion The default round robin based distribution causes unequal distribution of cache among the executors, which can cause any one of the executors to be bloated with too much cache resulting in performance degrade. ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
kunal642 commented on a change in pull request #3738: URL: https://github.com/apache/carbondata/pull/3738#discussion_r420111629 ########## File path: docs/ddl-of-carbondata.md ########## @@ -608,12 +607,10 @@ CarbonData DDL statements are documented here,which includes: This can be SDK output or C++ SDK output. Refer [SDK Guide](./sdk-guide.md) and [C++ SDK Guide](./csdk-guide.md). **Note:** - 1. Dropping of the external table should not delete the files present in the location. + 1. Dropping of the external table will not delete the files present in the location. 2. When external table is created on non-transactional table data, external table will be registered with the schema of carbondata files. - If multiple files with different schema is present, exception will be thrown. - So, If table registered with one schema and files are of different schema, - suggest to drop the external table and create again to register table with new schema. + If multiple files with different schema is present, exception will be thrown. Review comment: if column is not present then also we throw exception, right? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3738: URL: https://github.com/apache/carbondata/pull/3738#issuecomment-624138362 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2942/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3738: URL: https://github.com/apache/carbondata/pull/3738#issuecomment-624139653 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1224/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
akashrn5 commented on a change in pull request #3738: URL: https://github.com/apache/carbondata/pull/3738#discussion_r420575005 ########## File path: docs/ddl-of-carbondata.md ########## @@ -608,12 +607,10 @@ CarbonData DDL statements are documented here,which includes: This can be SDK output or C++ SDK output. Refer [SDK Guide](./sdk-guide.md) and [C++ SDK Guide](./csdk-guide.md). **Note:** - 1. Dropping of the external table should not delete the files present in the location. + 1. Dropping of the external table will not delete the files present in the location. 2. When external table is created on non-transactional table data, external table will be registered with the schema of carbondata files. - If multiple files with different schema is present, exception will be thrown. - So, If table registered with one schema and files are of different schema, - suggest to drop the external table and create again to register table with new schema. + If multiple files with different schema is present, exception will be thrown. Review comment: it basically checks latest file and give data based on columns present in that file. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
kunal642 commented on a change in pull request #3738: URL: https://github.com/apache/carbondata/pull/3738#discussion_r420770234 ########## File path: docs/index-server.md ########## @@ -82,6 +82,15 @@ the pruned blocklets which would be further used for result fetching. **Note:** Multiple JDBC drivers can connect to the index server to use the cache. +## Enabling Size based distribution for Legacy stores +The default round robin based distribution causes unequal distribution of cache amoung the executors, which can cause any 1 of the executors to be bloated with too much cache and cause performance degrade. Review comment: done ########## File path: docs/index-server.md ########## @@ -82,6 +82,15 @@ the pruned blocklets which would be further used for result fetching. **Note:** Multiple JDBC drivers can connect to the index server to use the cache. +## Enabling Size based distribution for Legacy stores +The default round robin based distribution causes unequal distribution of cache amoung the executors, which can cause any 1 of the executors to be bloated with too much cache and cause performance degrade. +This problem can be solved by running the upgrade_segment command which will fill the data size values for each segment in the tablestatus file. Any cache loaded after this can use the traditional size based distribution. Review comment: done ########## File path: docs/index-server.md ########## @@ -19,8 +19,8 @@ ## Background -Carbon currently prunes and caches all block/blocklet datamap index information into the driver for -normal table, for Bloom/Index datamaps the JDBC driver will launch a job to prune and cache the +Carbon currently prunes and caches all block/blocklet index information into the driver for +normal table, for Bloom/Index indexes the JDBC driver will launch a job to prune and cache the Review comment: done ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
kunal642 commented on a change in pull request #3738: URL: https://github.com/apache/carbondata/pull/3738#discussion_r420770408 ########## File path: docs/index-server.md ########## @@ -19,8 +19,8 @@ ## Background -Carbon currently prunes and caches all block/blocklet datamap index information into the driver for -normal table, for Bloom/Index datamaps the JDBC driver will launch a job to prune and cache the +Carbon currently prunes and caches all block/blocklet index information into the driver for +normal table, for Bloom/Index indexes the JDBC driver will launch a job to prune and cache the datamaps in executors. Review comment: done ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
kunal642 commented on a change in pull request #3738: URL: https://github.com/apache/carbondata/pull/3738#discussion_r420770573 ########## File path: docs/index-server.md ########## @@ -82,6 +82,15 @@ the pruned blocklets which would be further used for result fetching. **Note:** Multiple JDBC drivers can connect to the index server to use the cache. +## Enabling Size based distribution for Legacy stores Review comment: done ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
kunal642 commented on a change in pull request #3738: URL: https://github.com/apache/carbondata/pull/3738#discussion_r420772644 ########## File path: docs/index-server.md ########## @@ -82,6 +82,15 @@ the pruned blocklets which would be further used for result fetching. **Note:** Multiple JDBC drivers can connect to the index server to use the cache. +## Enabling Size based distribution for Legacy stores +The default round robin based distribution causes unequal distribution of cache amoung the executors, which can cause any 1 of the executors to be bloated with too much cache and cause performance degrade. Review comment: done ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
Free forum by Nabble | Edit this page |