GitHub user sounakr opened a pull request:
https://github.com/apache/carbondata/pull/2032 [CARBONDATA-2224] External File level reader support File level reader reads any carbondata file placed in any external file path. The reading can be done through 3 methods. a) Reading as a datasource from Spark. CarbonFileLevelFormat.scala is used in this case to read the file. To create a spark datasource external table " CREATE TABLE sdkOutputTable **USING CarbonDataFileFormat** LOCATION '$writerOutputFilePath1'" For more details please refer the test file org/apache/carbondata/spark/testsuite/createTable/TestCreateTableUsingCarbonFileLevelFormat.scala file. b) Reading from spark sql as a external table. CarbonFileinputFormat.java is used for reading the files. The create table syntax for this will be "CREATE EXTERNAL TABLE sdkOutputTable **STORED BY 'carbondatafileformat'** LOCATION '$writerOutputFilePath6'" For more details org/apache/carbondata/spark/testsuite/createTable/TestCarbonFileInputFormatWithExternalCarbonTable.scala. c) Reading Through Hadoop Map reduce job. Please refer org/apache/carbondata/mapred/TestMapReduceCarbonFileInputFormat.java for more details. - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sounakr/incubator-carbondata file_level_reader Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2032.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2032 ---- commit 65ce23b1f6e35c3c6722c7f0c14c19b7c8536d23 Author: Jacky Li <jacky.likun@...> Date: 2018-01-06T12:28:44Z [CARBONDATA-1992] Remove partitionId in CarbonTablePath In CarbonTablePath, there is a deprecated partition id which is always 0, it should be removed to avoid confusion. This closes #1765 commit c9ceaaae66574c98a13cc65bc3b91ab8346a456b Author: Jacky Li <jacky.likun@...> Date: 2018-01-30T13:24:04Z [CARBONDATA-2099] Refactor query scan process to improve readability Unified concepts in scan process flow: 1.QueryModel contains all parameter for scan, it is created by API in CarbonTable. (In future, CarbonTable will be the entry point for various table operations) 2.Use term ColumnChunk to represent one column in one blocklet, and use ChunkIndex in reader to read specified column chunk 3.Use term ColumnPage to represent one page in one ColumnChunk 4.QueryColumn => ProjectionColumn, indicating it is for projection This closes #1874 commit 01fcd539af815956975eb4ea480f14e4bb1a2062 Author: ravipesala <ravi.pesala@...> Date: 2017-11-15T14:18:40Z [CARBONDATA-1544][Datamap] Datamap FineGrain implementation Implemented interfaces for FG datamap and integrated to filterscanner to use the pruned bitset from FG datamap. FG Query flow as follows. 1.The user can add FG datamap to any table and implement there interfaces. 2. Any filter query which hits the table with datamap will call prune method of FGdatamap. 3. The prune method of FGDatamap return list FineGrainBlocklet , these blocklets contain the information of block, blocklet, page and rowids information as well. 4. The pruned blocklets are internally wriitten to file and returns only the block , blocklet and filepath information as part of Splits. 5. Based on the splits scanrdd schedule the tasks. 6. In filterscanner we check the datamapwriterpath from split and reNoteads the bitset if exists. And pass this bitset as input to it. This closes #1471 commit da82cdbda4f45fa741f56594e23c61a575c2fd2c Author: Jacky Li <jacky.likun@...> Date: 2018-02-27T00:51:25Z [REBASE] resolve conflict after rebasing to master commit 072c95a6770a2b847e111f3349df271bade62675 Author: Jacky Li <jacky.likun@...> Date: 2018-02-10T02:34:59Z Revert "[CARBONDATA-2023][DataLoad] Add size base block allocation in data loading" This reverts commit 6dd8b038fc898dbf48ad30adfc870c19eb38e3d0. commit 50af4d91ca2415d12e559b6070f72bfe5a881641 Author: Jacky Li <jacky.likun@...> Date: 2018-02-11T13:37:04Z [CARBONDATA-2159] Remove carbon-spark dependency in store-sdk module To make assembling JAR of store-sdk module, it should not depend on carbon-spark module This closes #1970 commit e77fcac978a87d9d526ea7012954fc8e48e9e34c Author: xuchuanyin <xuchuanyin@...> Date: 2018-02-08T06:42:39Z [CARBONDATA-2023][DataLoad] Add size base block allocation in data loading Carbondata assign blocks to nodes at the beginning of data loading. Previous block allocation strategy is block number based and it will suffer skewed data problem if the size of input files differs a lot. We introduced a size based block allocation strategy to optimize data loading performance in skewed data scenario. This closes #1808 commit 00e5208a6da5cc13aabd3ed6c437d2d1c5fa06ff Author: sounakr <sounakr@...> Date: 2017-09-28T10:51:05Z [CARBONDATA-1480]Min Max Index Example for DataMap Datamap Example. Implementation of Min Max Index through Datamap. And Using the Index while prunning. This closes #1359 commit 3212c0c025191c754c454ad88de3adbec26dc58b Author: ravipesala <ravi.pesala@...> Date: 2017-11-15T14:18:40Z [CARBONDATA-1544][Datamap] Datamap FineGrain implementation Implemented interfaces for FG datamap and integrated to filterscanner to use the pruned bitset from FG datamap. FG Query flow as follows. 1.The user can add FG datamap to any table and implement there interfaces. 2. Any filter query which hits the table with datamap will call prune method of FGdatamap. 3. The prune method of FGDatamap return list FineGrainBlocklet , these blocklets contain the information of block, blocklet, page and rowids information as well. 4. The pruned blocklets are internally wriitten to file and returns only the block , blocklet and filepath information as part of Splits. 5. Based on the splits scanrdd schedule the tasks. 6. In filterscanner we check the datamapwriterpath from split and reNoteads the bitset if exists. And pass this bitset as input to it. This closes #1471 commit aa3f2ff731fa6e0004dea827417c0d932d4a6291 Author: Jacky Li <jacky.likun@...> Date: 2018-01-06T12:28:44Z [CARBONDATA-1992] Remove partitionId in CarbonTablePath In CarbonTablePath, there is a deprecated partition id which is always 0, it should be removed to avoid confusion. This closes #1765 commit 3ba31a162dc66bc5ee9023c7ff466c7de4c31c50 Author: Jacky Li <jacky.likun@...> Date: 2018-01-30T13:24:04Z [CARBONDATA-2099] Refactor query scan process to improve readability Unified concepts in scan process flow: 1.QueryModel contains all parameter for scan, it is created by API in CarbonTable. (In future, CarbonTable will be the entry point for various table operations) 2.Use term ColumnChunk to represent one column in one blocklet, and use ChunkIndex in reader to read specified column chunk 3.Use term ColumnPage to represent one page in one ColumnChunk 4.QueryColumn => ProjectionColumn, indicating it is for projection This closes #1874 commit 810f093c28dc9e8a70a04bef1bc701569ec4261e Author: Jacky Li <jacky.likun@...> Date: 2018-01-31T08:14:27Z [CARBONDATA-2025] Unify all path construction through CarbonTablePath static method Refactory CarbonTablePath: 1.Remove CarbonStorePath and use CarbonTablePath only. 2.Make CarbonTablePath an utility without object creation, it can avoid creating object before using it, thus code is cleaner and GC is less. This closes #1768 commit 5a91a4cf49e3554f95f88637d93b51c80bf5329f Author: xuchuanyin <xuchuanyin@...> Date: 2018-02-08T06:42:39Z [CARBONDATA-2023][DataLoad] Add size base block allocation in data loading Carbondata assign blocks to nodes at the beginning of data loading. Previous block allocation strategy is block number based and it will suffer skewed data problem if the size of input files differs a lot. We introduced a size based block allocation strategy to optimize data loading performance in skewed data scenario. This closes #1808 commit 667303e7dfa515cda7cd3e34c736b74b5e246c29 Author: xuchuanyin <xuchuanyin@...> Date: 2018-02-08T07:39:45Z [HotFix][CheckStyle] Fix import related checkstyle This closes #1952 commit 442350f6cbc908ea02ec6ef5f8d5b748b63d73d9 Author: Jacky Li <jacky.likun@...> Date: 2018-02-27T03:26:30Z [REBASE] Solve conflict after merging master commit ea51dbf0d0d03d5cf9a946594cec61e4d9a2a46d Author: Jacky Li <jacky.likun@...> Date: 2018-02-10T02:34:59Z Revert "[CARBONDATA-2023][DataLoad] Add size base block allocation in data loading" This reverts commit 6dd8b038fc898dbf48ad30adfc870c19eb38e3d0. commit d13f01bfb7bf84fd8a231300219cbc4818eabe5b Author: sounakr <sounakr@...> Date: 2018-02-24T02:25:14Z File Format Reader commit 06b0c74edbc6097ada28382f27c54905a1b07159 Author: sounakr <sounakr@...> Date: 2018-02-26T11:58:47Z File Format Phase 2 commit 372b380470600c03a2f723b53a106a5ce0087ae9 Author: Ajantha-Bhat <ajanthabhat@...> Date: 2018-02-27T06:06:56Z * File Format Phase 2 (cleanup code) commit 8eb20a5dd9543029239a051bd978e855a69d805c Author: Ajantha-Bhat <ajanthabhat@...> Date: 2018-02-27T06:36:28Z * File Format Phase 2 (cleanup code) commit 462fd28cbc1268bbb529f947ee2e93c068e0d682 Author: Ajantha-Bhat <ajanthabhat@...> Date: 2018-02-27T09:54:43Z * File Format Phase 2 (cleanup code and adding testCase) commit 952688b8cf1b17954b85af6143abcab77d081da8 Author: Ajantha-Bhat <ajanthabhat@...> Date: 2018-02-27T11:58:37Z * File Format Phase 2 (filter issue fix) commit 87c84943122c8523291cc25751829ac143161469 Author: Ajantha-Bhat <ajanthabhat@...> Date: 2018-02-27T12:20:46Z * File Format Phase 2 (filter issue fix return value) commit 3a0c3b9448c3cca0742db0f557518ffa12d0dabb Author: sounakr <sounakr@...> Date: 2018-02-27T13:55:16Z Clear DataMap Cache commit 1943cf6dcd266cd78483f137e0499083d95e4332 Author: Ajantha-Bhat <ajanthabhat@...> Date: 2018-02-27T14:02:35Z * File Format Phase 2 (test cases) commit 4f97c7e35fade5fe0abb58b0c781a6b7f5b744e9 Author: sounakr <sounakr@...> Date: 2018-02-28T03:18:45Z Refactor CarbonFileInputFormat commit 7df78cf50b658cc6fb79e28b0ad76f74dc8a680a Author: Ajantha-Bhat <ajanthabhat@...> Date: 2018-02-28T10:02:08Z * File Format Phase 2 a. test cases addition b. Exception handling when the files are not present c. Setting the filter expression in carbonTableInputFormat commit 4825fcc8d023c2b1a031ee0417addf5b6f2d5763 Author: Ajantha-Bhat <ajanthabhat@...> Date: 2018-02-28T10:02:08Z * File Format Phase 2 a. test cases addition b. Exception handling when the files are not present c. Setting the filter expression in carbonTableInputFormat commit 5e5adbe21b8b786c13fda13e7e052bc5e46f22b4 Author: Ajantha-Bhat <ajanthabhat@...> Date: 2018-02-28T10:02:08Z * File Format Phase 2 a. test cases addition b. Exception handling when the files are not present c. Setting the filter expression in carbonTableInputFormat commit b510faa9e033fb2ca0ae64125aee10709201e69f Author: sounakr <sounakr@...> Date: 2018-03-01T11:23:39Z Map Reduce Test Case for CarbonInputFileFormat ---- --- |
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2032 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3775/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2032 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4077/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2032 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3776/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2032 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4078/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2032 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2832/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2032 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2833/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2032 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3777/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2032 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4079/ --- |
In reply to this post by qiuchenjian-2
Github user sounakr commented on the issue:
https://github.com/apache/carbondata/pull/2032 Retest this please. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2032 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2838/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2032 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4083/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2032 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4086/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2032 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2841/ --- |
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on the issue:
https://github.com/apache/carbondata/pull/2032 Retest this please --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2032 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2847/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2032 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4092/ --- |
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on the issue:
https://github.com/apache/carbondata/pull/2032 retest this please --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2032 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2849/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2032 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4094/ --- |
Free forum by Nabble | Edit this page |