GitHub user akashrn5 opened a pull request:
https://github.com/apache/carbondata/pull/2828 Added DDL support for cli and added more info in carbon file footer ### **Changes Proposed in this PR:** 1. Add more info to carbon file footer, like `written_by` (which will be spark application_name) in case of insert into and load command. To read this info one can use CLI For SDK this API will be exposed to write this info in footer and one API will exposed to read this info from SDK. 2. footer will have information about in which `version` of carbon the file is written, which will be helpful for getting details, for comaptibility etc Enhancement in CLI tool 3. a new option called "`-v`" is added to get the written_by and version details in addition to existing options 4. SQL support is added for CLI This is introduced so that user can get the details in beeline only and no need to execute separately. Currently the command is as below, based on comment we can change this DDL `Show summary for table <table_name> options('command'='-cmd,summary,-a,-p,-b');` 5. when we get details for column statistics ,we will get the Min and Max percentage, if we add actual min and max values, then it will be helpful for the developer ``` BLK BLKLT Meta Size Data Size LocalDict DictEntries DictSize AvgPageSize Min% Max% Min Max 0 0 2.90KB 4.87MB false 0 0.0B 93.76KB 0.0 100.0 0 2999990 0 1 2.90KB 2.29MB false 0 0.0B 93.76KB 0.0 100.0 1 2999992 1 0 2.90KB 4.87MB false 0 0.0B 93.76KB 0.0 100.0 3 2999993 1 1 2.90KB 2.29MB false 0 0.0B 93.76KB 0.0 100.0 4 2999995 2 0 2.90KB 5.52MB false 0 0.0B 93.76KB 0.0 100.0 6 2999997 2 1 2.90KB 2.94MB false 0 0.0B 93.76KB 0.0 100.0 8 2999998 2 2 830.0B 586.81KB false 0 0.0B 83.71KB 0.0 100.0 9 2999999 ``` 6. Currently CLI tool get blocklet details for all the blockfiles, so if we have more number of carbondata files, then it will take lot of time to get the details, so limit is added to it, by default when we give option as "-b", only 4 outputs will be given, if we want more, we can pass the option value for b as limit number, Example: "`-b 30`" =>> Then limit will be increased to 30 7. one more is a new Option is added called "`-B`", which takes mandatory option value as a block path. This is added just to get the block detail like, number of blocklets, number of pages, rows and size Example: "`-B /home/ss/ss/part-0-0_batchno0-0-0-1539782855178.carbondata`" ``` ## Filtered Block Details for: part-0-0_batchno0-0-0-1539782855178.carbondata BLKT NumPages NumRows Size 0 4 20 10.0B ``` Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [x] Any interfaces changed? NA - [x] Any backward compatibility impacted? Need o handle - [x] Document update required? Yes, will be raised in separate PR - [x] Testing done UTs are added Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [x] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/akashrn5/incubator-carbondata integrate Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2828.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2828 ---- commit b747030671155067bf729e032ebdc88b87033534 Author: akashrn5 <akashnilugal@...> Date: 2018-10-10T13:15:31Z added DDL support for cli and add more info in carbon file footer ---- --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2828 Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9101/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2828 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1033/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2828 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/836/ --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on the issue:
https://github.com/apache/carbondata/pull/2828 @akashrn5 Can you break this PR into independent PR, separating format modification and CLI enhancement --- |
In reply to this post by qiuchenjian-2
|
Free forum by Nabble | Edit this page |