[GitHub] carbondata pull request #2828: Added DDL support for cli and added more info...

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2828: Added DDL support for cli and added more info...

qiuchenjian-2
GitHub user akashrn5 opened a pull request:

    https://github.com/apache/carbondata/pull/2828

    Added DDL support for cli and added more info in carbon file footer

    ### **Changes Proposed in this PR:**
   
    1. Add more info to carbon file footer, like `written_by` (which will be spark application_name) in case of insert into and load command. To read this info one can use CLI
       For SDK this API will be exposed to write this info in footer and one API will exposed to read this info from SDK.
       
    2. footer will have information about in which `version` of carbon the file is written, which will be helpful for getting details, for comaptibility etc
   
    Enhancement in CLI tool
    3.  a new option called "`-v`" is added to get the written_by and version details in addition to existing options
    4.  SQL support is added for CLI
    This is introduced so that user can get the details in beeline only and no need to execute separately. Currently the command is as below, based on comment we can change this DDL
    `Show summary for table <table_name> options('command'='-cmd,summary,-a,-p,-b');`
   
    5.  when we get details for column statistics ,we will get the Min and Max percentage, if we add actual min and max values, then it will be helpful for the developer
    ```
    BLK  BLKLT  Meta Size  Data Size  LocalDict  DictEntries DictSize  AvgPageSize  Min%  Max%  Min  Max      
    0      0         2.90KB       4.87MB     false          0                0.0B       93.76KB         0.0         100.0  0    2999990  
    0      1         2.90KB       2.29MB     false          0                0.0B       93.76KB         0.0         100.0  1    2999992  
    1      0         2.90KB       4.87MB     false          0                0.0B       93.76KB         0.0         100.0  3    2999993  
    1      1         2.90KB       2.29MB     false          0                0.0B       93.76KB         0.0         100.0  4    2999995  
    2      0         2.90KB       5.52MB     false          0                0.0B       93.76KB         0.0         100.0  6    2999997  
    2      1         2.90KB       2.94MB     false          0                0.0B       93.76KB         0.0         100.0  8    2999998  
    2      2         830.0B       586.81KB   false          0                0.0B       83.71KB        0.0         100.0  9    2999999
   
    ```
    6.  Currently CLI tool get blocklet details for all the blockfiles, so if we have more number of carbondata files, then it will take lot of time to get the details,
    so limit is added to it, by default when we give option as "-b", only 4 outputs will be given, if we want more, we can pass the option value for b as limit number,
    Example:   "`-b 30`"   =>>  Then limit will be increased to 30
   
    7. one more is a new Option is added called "`-B`", which takes mandatory option value as a block path. This is added just to get the block detail like, number of blocklets, number of pages, rows and size
   
    Example:  "`-B /home/ss/ss/part-0-0_batchno0-0-0-1539782855178.carbondata`"
    ```
    ## Filtered Block Details for: part-0-0_batchno0-0-0-1539782855178.carbondata
    BLKT  NumPages  NumRows  Size
                            0        4                 20               10.0B
    ```
   
    Be sure to do all of the following checklist to help us incorporate
    your contribution quickly and easily:
   
     - [x] Any interfaces changed?
     NA
     - [x] Any backward compatibility impacted?
     Need o handle
     - [x] Document update required?
    Yes, will be raised in separate PR
     - [x] Testing done
    UTs are added
            Please provide details on
            - Whether new unit test cases have been added or why no new tests are required?
            - How it is tested? Please attach test report.
            - Is it a performance related change? Please attach the performance test report.
            - Any additional information to help reviewers in testing this change.
           
     - [x] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
    NA
   


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/akashrn5/incubator-carbondata integrate

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2828.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2828
   
----
commit b747030671155067bf729e032ebdc88b87033534
Author: akashrn5 <akashnilugal@...>
Date:   2018-10-10T13:15:31Z

    added DDL support for cli and add more info in carbon file footer

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2828: [CARBONDATA-3025]Added DDL support for cli and added...

qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2828
 
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9101/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2828: [CARBONDATA-3025]Added DDL support for cli and added...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2828
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1033/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2828: [CARBONDATA-3025]Added DDL support for cli and added...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2828
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/836/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2828: [CARBONDATA-3025]Added DDL support for cli and added...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on the issue:

    https://github.com/apache/carbondata/pull/2828
 
    @akashrn5 Can you break this PR into independent PR, separating format modification and CLI enhancement


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2828: [CARBONDATA-3025]Added DDL support for cli an...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user akashrn5 closed the pull request at:

    https://github.com/apache/carbondata/pull/2828


---