Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #2850: Added concurrent reading through SDK

Classic

List

Threaded

29 messages Options

qiuchenjian-2

[GitHub] carbondata pull request #2850: Added concurrent reading through SDK

GitHub user NamanRastogi opened a pull request:

https://github.com/apache/carbondata/pull/2850

Added concurrent reading through SDK

Added another API for _CarbonReader.split_ to enable concurrent reading of carbondata files through SDK.
```java
List<CarbonReader> multipleReaders = CarbonReader.split(maxSplits)
```

For detailed information on how to use this API for concurrent reading, please refer **ConcurrentSdkReaderTest.java**

## Performance Metrics:

| | configured table block: 1 MB | configured table block size: 10 MB | configured table block: 100 MB |
| --- | --- | --- | --- |
| **# rows: 1e6** **Store: 7.6 MB** | # files generated: 11 Sequential Read: 274 ms Parallel Read: 123 ms | # files generated: 1 Sequential Read: 247 ms Parallel Read: 248 ms | # files generated: 1 Sequential Read: 252 ms Parallel Read: 254 ms |
| **# rows: 1e7** **Store: 78 MB** | # files generated: 104 Sequential Read: 2685 ms Parallel Read: 1230 ms | # files generated: 9 Sequential Read: 2499 ms Parallel Read: 1357 ms | # files generated: 1 Sequential Read: 2527 ms Parallel Read: 2597 ms |
| **# rows: 1e8** **Store: 865 MB** | | # files generated: 95 Sequential Read: 27069 ms Parallel Read: 16082 ms | # files generated: 15 Sequential Read: 25841 ms Parallel Read: 13256 ms |

- [ ] Any interfaces changed?
- [x] Any backward compatibility impacted: No
- [ ] Document update required?
- [x] Testing done
- New unit test case have been added.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/NamanRastogi/carbondata sdk_reader

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2850.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2850

----
commit 55383136232203ca9de97a9304033c20cf7085f8
Author: Naman Rastogi <naman.rastogi.52@...>
Date: 2018-10-18T12:54:23Z

Added split for CarbonReader

to enable multithreaded reading of carbondata files

commit 79871f291262a05a1970b765232bf2f43f75e5d5
Author: Naman Rastogi <naman.rastogi.52@...>
Date: 2018-10-22T14:07:06Z

Added reader.close in CarbonSdkReaderTest

commit cd44ee7efbe09c46cd4f6b84c431261b18a13d3d
Author: Naman Rastogi <naman.rastogi.52@...>
Date: 2018-10-22T14:07:06Z

Added reader.close in CarbonSdkReaderTest

commit 201d98ea157590c1d5f4decba89fcabae684c755
Author: Naman Rastogi <naman.rastogi.52@...>
Date: 2018-10-24T06:53:44Z

Merge branch 'sdk_reader' of https://github.com/NamanRastogi/carbondata into sdk_reader

----

---

qiuchenjian-2

[GitHub] carbondata issue #2850: [WIP] Added concurrent reading through SDK

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2850

Can one of the admins verify this patch?

---

qiuchenjian-2