GitHub user ravipesala opened a pull request:
https://github.com/apache/carbondata/pull/2661 [CARBONDATA-2888] Support multi level subfolder for SDK read and fileformat read This PR supports multi-level subfolders read for SDK Reader and spark's carbon fileformat reader. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ravipesala/incubator-carbondata sdk-multi-folder-support Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2661.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2661 ---- commit a71189330b223a76c0b40ee8abf28ce2fd9733c1 Author: ravipesala <ravi.pesala@...> Date: 2018-08-26T17:17:14Z Support multi level subfolder for SDK read and fileformat read ---- --- |
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2661 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6413/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2661 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8072/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2661 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/14/ --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2661#discussion_r212897009 --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/SegmentIndexFileStore.java --- @@ -338,6 +338,24 @@ private MergedBlockIndex readMergeBlockIndex(ThriftReader thriftReader) throws I }); } + /** + * List all the index files of the segment. + * + * @param carbonFile directory + */ + public static void getCarbonIndexFilesRecursively(CarbonFile carbonFile, --- End diff -- Why not return the list instead of passing it as a parameter? --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2661#discussion_r212897508 --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java --- @@ -268,6 +257,18 @@ public boolean accept(CarbonFile file) { return CarbonTable.buildFromTableInfo(tableInfoInfer); } + private static CarbonFile getFirstIndexFile(CarbonFile tablePath) { + CarbonFile[] carbonFiles = tablePath.listFiles(); + for (CarbonFile carbonFile : carbonFiles) { + if (carbonFile.isDirectory()) { + return getFirstIndexFile(carbonFile); + } else if (carbonFile.getName().endsWith(CarbonTablePath.INDEX_FILE_EXT)) { + return carbonFile; + } + } + return null; --- End diff -- why not just throw exception if the file does not exist? --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2661#discussion_r213570439 --- Diff: integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/CarbonFileIndexReplaceRule.scala --- @@ -82,4 +82,23 @@ class CarbonFileIndexReplaceRule extends Rule[LogicalPlan] { fileIndex } } + + /** + * Get datafolders recursively + */ + private def getDataFolders(carbonFile: CarbonFile): Seq[CarbonFile] = { --- End diff -- Is the input CarbonFile a table folder? If so, please rename tableFolder --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2661#discussion_r213574978 --- Diff: integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/CarbonFileIndexReplaceRule.scala --- @@ -82,4 +82,23 @@ class CarbonFileIndexReplaceRule extends Rule[LogicalPlan] { fileIndex } } + + /** + * Get datafolders recursively + */ + private def getDataFolders(carbonFile: CarbonFile): Seq[CarbonFile] = { + val files = carbonFile.listFiles() + var folders: Seq[CarbonFile] = Seq() + files.foreach { f => + if (f.isDirectory) { + val files = f.listFiles() + if (files.nonEmpty && !files(0).isDirectory) { + folders = Seq(f) ++ folders --- End diff -- Will this create too many immutable object? Would ArrayBuffer better? --- |
In reply to this post by qiuchenjian-2
Github user KanakaKumar commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2661#discussion_r214273342 --- Diff: integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/CarbonFileIndexReplaceRule.scala --- @@ -82,4 +82,23 @@ class CarbonFileIndexReplaceRule extends Rule[LogicalPlan] { fileIndex } } + + /** + * Get datafolders recursively + */ + private def getDataFolders(carbonFile: CarbonFile): Seq[CarbonFile] = { + val files = carbonFile.listFiles() + var folders: Seq[CarbonFile] = Seq() + files.foreach { f => + if (f.isDirectory) { + val files = f.listFiles() + if (files.nonEmpty && !files(0).isDirectory) { + folders = Seq(f) ++ folders + } else { + folders = getDataFolders(f) ++ folders --- End diff -- This statement can be moved under files.nonEmpty check --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2661#discussion_r214737914 --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/SegmentIndexFileStore.java --- @@ -338,6 +338,24 @@ private MergedBlockIndex readMergeBlockIndex(ThriftReader thriftReader) throws I }); } + /** + * List all the index files of the segment. + * + * @param carbonFile directory + */ + public static void getCarbonIndexFilesRecursively(CarbonFile carbonFile, --- End diff -- Its a recursive call, and it adds file to the passed list if it finds any. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2661#discussion_r214737987 --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java --- @@ -268,6 +257,18 @@ public boolean accept(CarbonFile file) { return CarbonTable.buildFromTableInfo(tableInfoInfer); } + private static CarbonFile getFirstIndexFile(CarbonFile tablePath) { + CarbonFile[] carbonFiles = tablePath.listFiles(); + for (CarbonFile carbonFile : carbonFiles) { + if (carbonFile.isDirectory()) { + return getFirstIndexFile(carbonFile); + } else if (carbonFile.getName().endsWith(CarbonTablePath.INDEX_FILE_EXT)) { + return carbonFile; + } + } + return null; --- End diff -- It is handled in caller method --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2661#discussion_r214738056 --- Diff: integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/CarbonFileIndexReplaceRule.scala --- @@ -82,4 +82,23 @@ class CarbonFileIndexReplaceRule extends Rule[LogicalPlan] { fileIndex } } + + /** + * Get datafolders recursively + */ + private def getDataFolders(carbonFile: CarbonFile): Seq[CarbonFile] = { --- End diff -- ok --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2661#discussion_r214738731 --- Diff: integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/CarbonFileIndexReplaceRule.scala --- @@ -82,4 +82,23 @@ class CarbonFileIndexReplaceRule extends Rule[LogicalPlan] { fileIndex } } + + /** + * Get datafolders recursively + */ + private def getDataFolders(carbonFile: CarbonFile): Seq[CarbonFile] = { + val files = carbonFile.listFiles() + var folders: Seq[CarbonFile] = Seq() + files.foreach { f => + if (f.isDirectory) { + val files = f.listFiles() + if (files.nonEmpty && !files(0).isDirectory) { + folders = Seq(f) ++ folders --- End diff -- ok --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2661 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8273/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2661 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/202/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2661 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/207/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2661 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8278/ --- |
In reply to this post by qiuchenjian-2
Github user KanakaKumar commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2661#discussion_r214899510 --- Diff: integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/CarbonFileIndexReplaceRule.scala --- @@ -82,4 +82,23 @@ class CarbonFileIndexReplaceRule extends Rule[LogicalPlan] { fileIndex } } + + /** + * Get datafolders recursively + */ + private def getDataFolders(carbonFile: CarbonFile): Seq[CarbonFile] = { + val files = carbonFile.listFiles() + var folders: Seq[CarbonFile] = Seq() + files.foreach { f => + if (f.isDirectory) { + val files = f.listFiles() + if (files.nonEmpty && !files(0).isDirectory) { + folders = Seq(f) ++ folders + } else { + folders = getDataFolders(f) ++ folders --- End diff -- In the next call to getDataFolders() for empty folder it skips the iteration. So, no problem with the logic. you can ignore the comment. --- |
In reply to this post by qiuchenjian-2
Github user KanakaKumar commented on the issue:
https://github.com/apache/carbondata/pull/2661 LGTM --- |
In reply to this post by qiuchenjian-2
|
Free forum by Nabble | Edit this page |