[jira] [Updated] (CARBONDATA-2951) CSDK: Provide C++ interface for SDK

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Updated] (CARBONDATA-2951) CSDK: Provide C++ interface for SDK

Akash R Nilugal (Jira)

     [ https://issues.apache.org/jira/browse/CARBONDATA-2951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

xubo245 updated CARBONDATA-2951:
--------------------------------
    Description:
CSDK:  Provide C++ interface for SDK
1. Provide CarbonReader for SDK, it can read carbon data in C++ language
        ##features/interfaces
       1.1. create CarbonReader
        1.2. hasNext()
        1.3. readNextRow()
        1.4. close()
        1.5. support OBS(AK/SK/Endpoint)
        1.6 support batch read(withBatch,readNextBatchRow)
        1.7 support vecor read(default) and carbonrecordreader (withRowRecordReader)
        1.8 projection
       
        ##support data types:
         String, Long,Varchar(string),Short,Int,Date(int),timestamp(long),boolean,Decimal(string),Float
         Array<String> in carbonrecordreader, not support in vectorreader
         byte=>support in java RowUtil, not in C++ carbon reader
         
        ## Schema and data
         Create table tbl_email_form_to_for_XX(
                Event_Time Timestamp,
                Ingestion_Time Timestamp,
                From_Email String,
                To_Email String,
                From_To_type String,
                Event_ID String
                ) using carbon options(path ‘obs://X/tbl_email_form_to_for_XX’)
                ETL 6 columns from 18 columns table
               
                example data:
                [hidden email] [hidden email] from_to <29528303.1075855666657.JavaMail.evans@thyme> 1538015497000000 9755149200000

2. the performance should be reach X millions records/s/node

3.Provide CarbonWriter for SDK, it can write carbon data in C++ language
        ##features/interfaces
        3.1. create CarbonWriter, including create schema(withCsvInput),set outputPath, and build,
        3.2. write()
        3.3. close()
        3.4. support OBS(AK/SK/Endpoint)(withHadoopConf)
        3.5. writtenBy
        3.6.     support withTableProperty, withLoadOption,taskNo, uniqueIdentifier, withThreadSafe,  withBlockSize, withBlockletSize, localDictionaryThreshold, enableLocalDictionary in C++ SDK (PR2899 TO BE review)
       
        ##Data types:
           Carbon need support base data types, including string, float, double, int, long, date, timestamp, bool, array<String>.
          For other, we can convert:
             char array => carbon string
             Enum => Carbon string
              set and list => carbon array<String>

        ##performance
        Writing Performance is not required now
       
4. read schema function
readSchema
getVersionDetails  =>TODO

5. support carbonproperties
        5.1 addProperty
        5.2 getProperty
       
6.TODO:
        6.1.getVersionDetails
        6.2.updated SDK/CSDK reader doc
        6.3.support byte(write read)
        6.4.support long string columns
        6.5.support sortBy
        6.6.support withCsvInput(Schema schema);  create schema(JAVA)
        6.7. optimize the write doc
                        /**
                        * Create a {@link CarbonWriterBuilder} to build a {@link CarbonWriter}
                        */
                        public static CarbonWriterBuilder builder() {
                                return new CarbonWriterBuilder();
                        }

  was:
CSDK: Provide C++ interface for SDK
1.Provide CarbonReader for SDK, it can read carbon data in C++ language
2.Provide CarbonWriter for SDK, it can write carbon data in C++ language


> CSDK: Provide C++ interface for SDK
> -----------------------------------
>
>                 Key: CARBONDATA-2951
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-2951
>             Project: CarbonData
>          Issue Type: Task
>          Components: other
>    Affects Versions: 1.5.0
>            Reporter: xubo245
>            Assignee: xubo245
>            Priority: Critical
>             Fix For: NONE
>
>
> CSDK:  Provide C++ interface for SDK
> 1. Provide CarbonReader for SDK, it can read carbon data in C++ language
> ##features/interfaces
>        1.1. create CarbonReader
> 1.2. hasNext()
> 1.3. readNextRow()
> 1.4. close()
> 1.5. support OBS(AK/SK/Endpoint)
> 1.6 support batch read(withBatch,readNextBatchRow)
> 1.7 support vecor read(default) and carbonrecordreader (withRowRecordReader)
> 1.8 projection
>
> ##support data types:
> String, Long,Varchar(string),Short,Int,Date(int),timestamp(long),boolean,Decimal(string),Float
> Array<String> in carbonrecordreader, not support in vectorreader
> byte=>support in java RowUtil, not in C++ carbon reader
>
> ## Schema and data
> Create table tbl_email_form_to_for_XX(
> Event_Time Timestamp,
> Ingestion_Time Timestamp,
> From_Email String,
> To_Email String,
> From_To_type String,
> Event_ID String
> ) using carbon options(path ‘obs://X/tbl_email_form_to_for_XX’)
> ETL 6 columns from 18 columns table
>
> example data:
> [hidden email] [hidden email] from_to <29528303.1075855666657.JavaMail.evans@thyme> 1538015497000000 9755149200000
> 2. the performance should be reach X millions records/s/node
> 3.Provide CarbonWriter for SDK, it can write carbon data in C++ language
> ##features/interfaces
> 3.1. create CarbonWriter, including create schema(withCsvInput),set outputPath, and build,
> 3.2. write()
> 3.3. close()
> 3.4. support OBS(AK/SK/Endpoint)(withHadoopConf)
> 3.5. writtenBy
> 3.6.     support withTableProperty, withLoadOption,taskNo, uniqueIdentifier, withThreadSafe,  withBlockSize, withBlockletSize, localDictionaryThreshold, enableLocalDictionary in C++ SDK (PR2899 TO BE review)
>
> ##Data types:
>   Carbon need support base data types, including string, float, double, int, long, date, timestamp, bool, array<String>.
>           For other, we can convert:
>              char array => carbon string
>              Enum => Carbon string
>               set and list => carbon array<String>
> ##performance
> Writing Performance is not required now
>
> 4. read schema function
> readSchema
> getVersionDetails  =>TODO
> 5. support carbonproperties
> 5.1 addProperty
> 5.2 getProperty
>
> 6.TODO:
> 6.1.getVersionDetails
> 6.2.updated SDK/CSDK reader doc
> 6.3.support byte(write read)
> 6.4.support long string columns
> 6.5.support sortBy
> 6.6.support withCsvInput(Schema schema);  create schema(JAVA)
> 6.7. optimize the write doc
> /**
> * Create a {@link CarbonWriterBuilder} to build a {@link CarbonWriter}
> */
> public static CarbonWriterBuilder builder() {
> return new CarbonWriterBuilder();
> }



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)