[jira] [Updated] (CARBONDATA-3254) PyCarbon: provide python interface for users to use CarbonData by python code

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Updated] (CARBONDATA-3254) PyCarbon: provide python interface for users to use CarbonData by python code

Akash R Nilugal (Jira)

     [ https://issues.apache.org/jira/browse/CARBONDATA-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bo Xu updated CARBONDATA-3254:
------------------------------
    Description:
      More and more people use big data to optimize their algorithm, train their model, deploy their model as service and inference image.  It's big challenge to storage, manage and analysis lots of structured and unstructured data, especially unstructured data, like image, video, audio and so on.


     Many users use python to install their project for these scenario. Apache CarbonData is an indexed columnar data store solution for fast analytics on big data platform. Apache CarbonData has many great feature and high performance to storage, manage and analysis big data. Apache CarbonData not only already supported String, Int, Double, Boolean, Char,Date, TImeStamp data types, but also supported Binay [(CARBONDATA-3336)|https://issues.apache.org/jira/browse/CARBONDATA-3336], which can avoid small binary files problem and speed up S3 access performance reach dozens or even hundreds of times, also can decrease cost of accessing OBS by decreasing the number of calling S3 API. But it's not easy for them to use carbon by Java/Scala/C++. So it's better to provide python interface for these users to use CarbonData by python code

We already work for these feature several months in  https://github.com/xubo245/pycarbon

     Goals:
1. Apache CarbonData should provides python interface to support to write and read structured and unstructured data in CarbonData, like String, int and binary data: image/voice/video. It should not dependency Apache Spark.
2. Apache CarbonData should provides python interface to support deep learning framework to ready and write data from/to CarbonData, like TensorFlow , MXNet, PyTorch and so on. It should not dependency Apache Spark.
3. Apache CarbonData should provides python interface to manage and analysis data based on Apache Spark. Apache CarbonData should support DDL, DML, DataMap feature  in Python.

 
   


  was:
      More and more people use big data to optimize their algorithm, train their model, deploy their model as service and inference image.  It's big challenge to storage, manage and analysis lots of structured and unstructured data, especially unstructured data, like image, video, audio and so on.


     Many users use python to install their project for these scenario. Apache CarbonData is an indexed columnar data store solution for fast analytics on big data platform. Apache CarbonData has many great feature and high performance to storage, manage and analysis big data. Apache CarbonData not only already supported String, Int, Double, Boolean, Char,Date, TImeStamp data types, but also supported Binay [(CARBONDATA-3336)|https://issues.apache.org/jira/browse/CARBONDATA-3336], which can avoid small binary files problem and speed up S3 access performance reach dozens or even hundreds of times, also can decrease cost of accessing OBS by decreasing the number of calling S3 API. But it's not easy for them to use carbon by Java/Scala/C++. So it's better to provide python interface for these users to use CarbonData by python code

     Goals:
1. Apache CarbonData should provides python interface to support to write and read structured and unstructured data in CarbonData, like String, int and binary data: image/voice/video. It should not dependency Apache Spark.
2. Apache CarbonData should provides python interface to support deep learning framework to ready and write data from/to CarbonData, like TensorFlow , MXNet, PyTorch and so on. It should not dependency Apache Spark.
3. Apache CarbonData should provides python interface to manage and analysis data based on Apache Spark. Apache CarbonData should support DDL, DML, DataMap feature  in Python.

 
   



> PyCarbon: provide python interface for users to use CarbonData by python code
> -----------------------------------------------------------------------------
>
>                 Key: CARBONDATA-3254
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-3254
>             Project: CarbonData
>          Issue Type: New Feature
>            Reporter: Bo Xu
>            Assignee: Bo Xu
>            Priority: Major
>
>       More and more people use big data to optimize their algorithm, train their model, deploy their model as service and inference image.  It's big challenge to storage, manage and analysis lots of structured and unstructured data, especially unstructured data, like image, video, audio and so on.
>      Many users use python to install their project for these scenario. Apache CarbonData is an indexed columnar data store solution for fast analytics on big data platform. Apache CarbonData has many great feature and high performance to storage, manage and analysis big data. Apache CarbonData not only already supported String, Int, Double, Boolean, Char,Date, TImeStamp data types, but also supported Binay [(CARBONDATA-3336)|https://issues.apache.org/jira/browse/CARBONDATA-3336], which can avoid small binary files problem and speed up S3 access performance reach dozens or even hundreds of times, also can decrease cost of accessing OBS by decreasing the number of calling S3 API. But it's not easy for them to use carbon by Java/Scala/C++. So it's better to provide python interface for these users to use CarbonData by python code
> We already work for these feature several months in  https://github.com/xubo245/pycarbon
>      Goals:
> 1. Apache CarbonData should provides python interface to support to write and read structured and unstructured data in CarbonData, like String, int and binary data: image/voice/video. It should not dependency Apache Spark.
> 2. Apache CarbonData should provides python interface to support deep learning framework to ready and write data from/to CarbonData, like TensorFlow , MXNet, PyTorch and so on. It should not dependency Apache Spark.
> 3. Apache CarbonData should provides python interface to manage and analysis data based on Apache Spark. Apache CarbonData should support DDL, DML, DataMap feature  in Python.
>  
>    



--
This message was sent by Atlassian Jira
(v8.3.4#803005)