Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #2055: [CARBONDATA-2224][File Level Reader Support] ...

Classic

List

Threaded

51 messages Options

123

qiuchenjian-2

[GitHub] carbondata pull request #2055: [CARBONDATA-2224][File Level Reader Support] ...

GitHub user sounakr opened a pull request:

https://github.com/apache/carbondata/pull/2055

[CARBONDATA-2224][File Level Reader Support] External File level reader support

File level reader reads any carbondata file placed in any external file path. The reading can be done through 3 methods.
a) Reading as a datasource from Spark. CarbonFileLevelFormat.scala is used in this case to read the file. To create a spark datasource external table
" CREATE TABLE sdkOutputTable USING CarbonDataFileFormat LOCATION '$writerOutputFilePath1'"
For more details please refer the test file org/apache/carbondata/spark/testsuite/createTable/TestCreateTableUsingCarbonFileLevelFormat.scala
file.

b) Reading from spark sql as a external table. CarbonFileinputFormat.java is used for reading the files. The create table syntax for this will be
"CREATE EXTERNAL TABLE sdkOutputTable STORED BY 'carbondatafileformat' LOCATION '$writerOutputFilePath6'"
For more details org/apache/carbondata/spark/testsuite/createTable/TestCarbonFileInputFormatWithExternalCarbonTable.scala.

c) Reading Through Hadoop Map reduce job. Please refer org/apache/carbondata/mapred/TestMapReduceCarbonFileInputFormat.java for more details.

Limitation :: This implementation depend on writer SDK file path as following table_name/Fact/Part0/Segment_null. This reader writer must be independent of static path.
Due to this reader currently won't work with standard partition also. This will be handled in future PRs.

- [ ] Any interfaces changed?

- [ ] Any backward compatibility impacted?

- [ ] Document update required?

- [ ] Testing done
Please provide details on
- Whether new unit test cases have been added or why no new tests are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance test report.
- Any additional information to help reviewers in testing this change.

- [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sounakr/incubator-carbondata file_level_reader_master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2055.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2055

----
commit 5e65f3f97749571a74b6c04a05f5b09aec709787
Author: sounakr <sounakr@...>
Date: 2018-02-24T02:25:14Z

File Format Reader

commit bcb8f64d61e19787fb3303a00d59cb61a6ebce32
Author: sounakr <sounakr@...>
Date: 2018-02-26T11:58:47Z

File Format Phase 2

commit 35b09072d7d75677f473e9d54b3a5db0ff1b64dc
Author: Ajantha-Bhat <ajanthabhat@...>
Date: 2018-02-27T06:06:56Z

* File Format Phase 2 (cleanup code)

commit 466abfad2fdcc50d69dbbf32791466b7fc4836d1
Author: Ajantha-Bhat <ajanthabhat@...>
Date: 2018-02-27T06:36:28Z

* File Format Phase 2 (cleanup code)

commit 5b2ad29bc9402e223af22124cc6d3d91962e72f4
Author: Ajantha-Bhat <ajanthabhat@...>
Date: 2018-02-27T09:54:43Z

* File Format Phase 2 (cleanup code and adding testCase)

commit 994372f0d2c7e8c528f9900c7b17ff8c8a857698
Author: Ajantha-Bhat <ajanthabhat@...>
Date: 2018-02-27T11:58:37Z

* File Format Phase 2 (filter issue fix)

commit e3160888dcac715928f9d18febd33b22177513a0
Author: Ajantha-Bhat <ajanthabhat@...>
Date: 2018-02-27T12:20:46Z

* File Format Phase 2 (filter issue fix return value)

commit 949e6a97680f46a91808be094505a519340a1a53
Author: sounakr <sounakr@...>
Date: 2018-02-27T13:55:16Z

Clear DataMap Cache

commit 7fdccc3885ab1c731d7066e36a2237372198ae22
Author: Ajantha-Bhat <ajanthabhat@...>
Date: 2018-02-27T14:02:35Z

* File Format Phase 2 (test cases)

commit 528e8120527a712308adee4b91d516a9891975ea
Author: sounakr <sounakr@...>
Date: 2018-02-28T03:18:45Z

Refactor CarbonFileInputFormat

commit 0a2b2249ea8486d2a217ff245b2311bb96936d64
Author: Ajantha-Bhat <ajanthabhat@...>
Date: 2018-02-28T10:02:08Z

* File Format Phase 2
a. test cases addition
b. Exception handling when the files are not present
c. Setting the filter expression in carbonTableInputFormat

commit fdfe2f405a2bb8ca122a785919290bc82a72c01c
Author: Ajantha-Bhat <ajanthabhat@...>
Date: 2018-02-28T10:02:08Z

* File Format Phase 2
a. test cases addition
b. Exception handling when the files are not present
c. Setting the filter expression in carbonTableInputFormat

commit 64627d2f2953779a9ee32f23be0b552b6b18f1d9
Author: Ajantha-Bhat <ajanthabhat@...>
Date: 2018-02-28T10:02:08Z

* File Format Phase 2
a. test cases addition
b. Exception handling when the files are not present
c. Setting the filter expression in carbonTableInputFormat

commit 8871e3140afa008794dfa0e8e2df58f5b29f46bd
Author: sounakr <sounakr@...>
Date: 2018-03-01T11:23:39Z

Map Reduce Test Case for CarbonInputFileFormat

commit 51403245ce250625de7a0bd20e369d3011f2eeb9
Author: Ajantha-Bhat <ajanthabhat@...>
Date: 2018-03-01T11:41:03Z

* fixed the issues
Existing external table flow got impacted
Added a new storage(provider) carbondatafileformat for external table creation

commit 1f89d92c947e4b4a1248493552187b70d1f51dba
Author: Ajantha-Bhat <ajanthabhat@...>
Date: 2018-03-01T15:32:07Z

* Bug fixes
CarbonFileInputFormat flow 3 issue fixes.
a. schema ordinal
b. table path problem in absolute identifier
c. drop of external table fix
d. unwanted code cleanup

commit e1e2ae5019c863d1d43d91d8f5f6852c6d92be29
Author: sounakr <sounakr@...>
Date: 2018-03-02T05:09:45Z

Review Code

commit 1e374feadd7dd86848b31fed113cf234f0ddb542
Author: Ajantha-Bhat <ajanthabhat@...>
Date: 2018-03-05T11:07:10Z

merge conflict fix

commit 97d90a1d2bf461dea0259153ab9b28247c2a75ab
Author: Ajantha-Bhat <ajanthabhat@...>
Date: 2018-03-06T10:08:20Z

* Fixed the test script failure for spark 2.1

commit b3dc89c278b6f89ce9c63ea9f3597124f6916543
Author: Ajantha-Bhat <ajanthabhat@...>
Date: 2018-03-06T11:58:32Z

* Fixed the test script failure for spark 2.1, 2.2

commit eca6617089702b246dcfb9b039be04d61ede5c6b
Author: Ajantha-Bhat <ajanthabhat@...>
Date: 2018-03-12T12:46:10Z

* Fix the compilation errors after rebase to master.

commit 761a7ba32b7a4fc990f80e4ed6dc4e0294d7747c
Author: Ajantha-Bhat <ajanthabhat@...>
Date: 2018-03-12T12:59:00Z

*Fixing the test case of this requirement

commit 16745af45b0683d2121a40272dde92cc07275c93
Author: sounakr <sounakr@...>
Date: 2018-03-12T18:45:19Z

Review Comments

----

---

qiuchenjian-2