[ https://issues.apache.org/jira/browse/CARBONDATA-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xu updated CARBONDATA-3271: ------------------------------ Description: Apache CarbonData should provides python interface to support deep learning framework to ready and write data from/to CarbonData, like TensorFlow , MXNet, PyTorch and so on. Basic framework: Supports shuffle read, which reads the data in random order when feeding data to training model for each epoch. Supports data cache to improve reading speed for multiple epoch, including local-disk and memory-cache. Supports parallel reading using thread pool and process pool in python. Supports reading data in object storage Supports manifest format and CarbonData folder AI compute engine integration: Tensorflow integration: New python API in pycarbon to support TensorFlow to read data from CarbonData files for training model was: Apache CarbonData should provides python interface to support deep learning framework to ready and write data from/to CarbonData, like TensorFlow , MXNet, PyTorch and so on. It should not dependency Apache Spark. Goals: 1. CarbonData provides python interface to support TensorFlow to ready data from CarbonData for training model 2. CarbonData provides python interface to support MXNet to ready data from CarbonData for training model 3. CarbonData provides python interface to support PyTorch to ready data from CarbonData for training model 4. CarbonData should support epoch function 5. CarbonData should support cache for speed up performance. Summary: Apache CarbonData should provides python interface to support deep learning framework TensorFlow to ready and write data from/to CarbonData (was: Apache CarbonData should provides python interface to support deep learning framework to ready and write data from/to CarbonData) > Apache CarbonData should provides python interface to support deep learning framework TensorFlow to ready and write data from/to CarbonData > ------------------------------------------------------------------------------------------------------------------------------------------- > > Key: CARBONDATA-3271 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3271 > Project: CarbonData > Issue Type: Sub-task > Reporter: Bo Xu > Assignee: Bo Xu > Priority: Major > Fix For: 2.0.0 > > Time Spent: 14h 40m > Remaining Estimate: 0h > > Apache CarbonData should provides python interface to support deep learning framework to ready and write data from/to CarbonData, like TensorFlow , MXNet, PyTorch and so on. > Basic framework: > Supports shuffle read, which reads the data in random order when feeding data to training model for each epoch. > Supports data cache to improve reading speed for multiple epoch, including local-disk and memory-cache. > Supports parallel reading using thread pool and process pool in python. > Supports reading data in object storage > Supports manifest format and CarbonData folder > AI compute engine integration: > Tensorflow integration: > New python API in pycarbon to support TensorFlow to read data from CarbonData files for training model -- This message was sent by Atlassian Jira (v8.3.4#803005) |
Free forum by Nabble | Edit this page |