GitHub user QiangCai opened a pull request:
https://github.com/apache/incubator-carbondata/pull/518 [WIP]unify file header reader You can merge this pull request into a Git repository by running: $ git pull https://github.com/QiangCai/incubator-carbondata fileheader Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/518.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #518 ---- commit 5440b9c16799d935f9da1728344564a65a2d6ef2 Author: QiangCai <[hidden email]> Date: 2017-01-10T13:32:51Z readfileheader ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/518 Build Success with Spark 1.5.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/542/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/518 Build Failed with Spark 1.5.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/543/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/518 Build Success with Spark 1.5.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/544/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/518#discussion_r95505900 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala --- @@ -301,4 +304,45 @@ object CommonUtil { LOGGER.info(s"mapreduce.input.fileinputformat.split.maxsize: ${ newSplitSize.toString }") } } + + def getCsvHeaderColumns(carbonLoadModel: CarbonLoadModel): Array[String] = { + val delimiter = if (StringUtils.isEmpty(carbonLoadModel.getCsvDelimiter)) { --- End diff -- I think delimiter can not be " ", right? so better to use isBlank instead of isEmpty --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/518#discussion_r95506643 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala --- @@ -301,4 +304,45 @@ object CommonUtil { LOGGER.info(s"mapreduce.input.fileinputformat.split.maxsize: ${ newSplitSize.toString }") } } + + def getCsvHeaderColumns(carbonLoadModel: CarbonLoadModel): Array[String] = { + val delimiter = if (StringUtils.isEmpty(carbonLoadModel.getCsvDelimiter)) { + CarbonCommonConstants.COMMA + } else { + CarbonUtil.delimiterConverter(carbonLoadModel.getCsvDelimiter) + } + var csvFile: String = null + var csvHeader: String = carbonLoadModel.getCsvHeader + val csvColumns = if (StringUtils.isBlank(csvHeader)) { + // read header from csv file + csvFile = carbonLoadModel.getFactFilePath.split(",")(0) + csvHeader = CarbonUtil.readHeader(csvFile) + if (StringUtils.isBlank(csvHeader)) { + throw new CarbonDataLoadingException("First line of the csv is not valid.") + } + csvHeader.toLowerCase().split(delimiter).map(_.replaceAll("\"", "").trim) + } else { + csvHeader.toLowerCase.split(CarbonCommonConstants.COMMA).map(_.trim) + } + + if (!CarbonDataProcessorUtil.isHeaderValid(carbonLoadModel.getTableName, csvColumns, + carbonLoadModel.getCarbonDataLoadSchema)) { + if (csvFile == null) { + LOGGER.error("CSV header provided in DDL is not proper." + + " Column names in schema and CSV header are not the same.") + throw new CarbonDataLoadingException( + "CSV header provided in DDL is not proper. Column names in schema and CSV header are " + + "not the same.") + } else { + LOGGER.error( + "CSV File provided is not proper. Column names in schema and csv header are not same. " --- End diff -- Better to tell "CSV header in the input file ($csvFile) is not proper." --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/518#discussion_r95506953 --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java --- @@ -373,83 +368,15 @@ private static void addAllComplexTypeChildren(CarbonDimension dimension, StringB return complexTypesMap; } - /** - * Get the csv file to read if it the path is file otherwise get the first file of directory. - * - * @param csvFilePath - * @return File - */ - public static CarbonFile getCsvFileToRead(String csvFilePath) { - CarbonFile csvFile = - FileFactory.getCarbonFile(csvFilePath, FileFactory.getFileType(csvFilePath)); - - CarbonFile[] listFiles = null; - if (csvFile.isDirectory()) { - listFiles = csvFile.listFiles(new CarbonFileFilter() { - @Override public boolean accept(CarbonFile pathname) { - if (!pathname.isDirectory()) { - if (pathname.getName().endsWith(CarbonCommonConstants.CSV_FILE_EXTENSION) || pathname - .getName().endsWith(CarbonCommonConstants.CSV_FILE_EXTENSION - + CarbonCommonConstants.FILE_INPROGRESS_STATUS)) { - return true; - } - } - return false; - } - }); - } else { - listFiles = new CarbonFile[1]; - listFiles[0] = csvFile; - } - return listFiles[0]; - } - - /** - * Get the file header from csv file. - */ - public static String getFileHeader(CarbonFile csvFile) - throws DataLoadingException { - DataInputStream fileReader = null; - BufferedReader bufferedReader = null; - String readLine = null; - - FileType fileType = FileFactory.getFileType(csvFile.getAbsolutePath()); - - if (!csvFile.exists()) { - csvFile = FileFactory - .getCarbonFile(csvFile.getAbsolutePath() + CarbonCommonConstants.FILE_INPROGRESS_STATUS, - fileType); - } - - try { - fileReader = FileFactory.getDataInputStream(csvFile.getAbsolutePath(), fileType); - bufferedReader = - new BufferedReader(new InputStreamReader(fileReader, Charset.defaultCharset())); - readLine = bufferedReader.readLine(); - } catch (FileNotFoundException e) { - LOGGER.error(e, "CSV Input File not found " + e.getMessage()); - throw new DataLoadingException("CSV Input File not found ", e); - } catch (IOException e) { - LOGGER.error(e, "Not able to read CSV input File " + e.getMessage()); - throw new DataLoadingException("Not able to read CSV input File ", e); - } finally { - CarbonUtil.closeStreams(fileReader, bufferedReader); - } - - return readLine; - } - - public static boolean isHeaderValid(String tableName, String header, - CarbonDataLoadSchema schema, String delimiter) throws DataLoadingException { - delimiter = CarbonUtil.delimiterConverter(delimiter); + public static boolean isHeaderValid(String tableName, String[] csvHeader, + CarbonDataLoadSchema schema) throws DataLoadingException { --- End diff -- I think DataLoadingException can be removed, it is not thrown by the body --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/518#discussion_r95507187 --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java --- @@ -462,6 +389,13 @@ public static boolean isHeaderValid(String tableName, String header, return count == columnNames.length; } + public static boolean isHeaderValid(String tableName, String header, + CarbonDataLoadSchema schema, String delimiter) throws DataLoadingException { + delimiter = CarbonUtil.delimiterConverter(delimiter); --- End diff -- declare a local variable --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/518#discussion_r95507253 --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java --- @@ -373,83 +368,15 @@ private static void addAllComplexTypeChildren(CarbonDimension dimension, StringB return complexTypesMap; } - /** - * Get the csv file to read if it the path is file otherwise get the first file of directory. - * - * @param csvFilePath - * @return File - */ - public static CarbonFile getCsvFileToRead(String csvFilePath) { - CarbonFile csvFile = - FileFactory.getCarbonFile(csvFilePath, FileFactory.getFileType(csvFilePath)); - - CarbonFile[] listFiles = null; - if (csvFile.isDirectory()) { - listFiles = csvFile.listFiles(new CarbonFileFilter() { - @Override public boolean accept(CarbonFile pathname) { - if (!pathname.isDirectory()) { - if (pathname.getName().endsWith(CarbonCommonConstants.CSV_FILE_EXTENSION) || pathname - .getName().endsWith(CarbonCommonConstants.CSV_FILE_EXTENSION - + CarbonCommonConstants.FILE_INPROGRESS_STATUS)) { - return true; - } - } - return false; - } - }); - } else { - listFiles = new CarbonFile[1]; - listFiles[0] = csvFile; - } - return listFiles[0]; - } - - /** - * Get the file header from csv file. - */ - public static String getFileHeader(CarbonFile csvFile) - throws DataLoadingException { - DataInputStream fileReader = null; - BufferedReader bufferedReader = null; - String readLine = null; - - FileType fileType = FileFactory.getFileType(csvFile.getAbsolutePath()); - - if (!csvFile.exists()) { - csvFile = FileFactory - .getCarbonFile(csvFile.getAbsolutePath() + CarbonCommonConstants.FILE_INPROGRESS_STATUS, - fileType); - } - - try { - fileReader = FileFactory.getDataInputStream(csvFile.getAbsolutePath(), fileType); - bufferedReader = - new BufferedReader(new InputStreamReader(fileReader, Charset.defaultCharset())); - readLine = bufferedReader.readLine(); - } catch (FileNotFoundException e) { - LOGGER.error(e, "CSV Input File not found " + e.getMessage()); - throw new DataLoadingException("CSV Input File not found ", e); - } catch (IOException e) { - LOGGER.error(e, "Not able to read CSV input File " + e.getMessage()); - throw new DataLoadingException("Not able to read CSV input File ", e); - } finally { - CarbonUtil.closeStreams(fileReader, bufferedReader); - } - - return readLine; - } - - public static boolean isHeaderValid(String tableName, String header, - CarbonDataLoadSchema schema, String delimiter) throws DataLoadingException { - delimiter = CarbonUtil.delimiterConverter(delimiter); + public static boolean isHeaderValid(String tableName, String[] csvHeader, + CarbonDataLoadSchema schema) throws DataLoadingException { String[] columnNames = CarbonDataProcessorUtil.getSchemaColumnNames(schema, tableName).toArray(new String[0]); - String[] csvHeader = header.toLowerCase().split(delimiter); - List<String> csvColumnsList = new ArrayList<String>(CarbonCommonConstants.CONSTANT_SIZE_TEN); + List<String> csvColumnsList = new ArrayList<String>(csvHeader.length); for (String column : csvHeader) { - csvColumnsList.add(column.replaceAll("\"", "").trim()); + csvColumnsList.add(column); --- End diff -- use `Collections.addAll` instead --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user QiangCai commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/518#discussion_r95507937 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala --- @@ -301,4 +304,45 @@ object CommonUtil { LOGGER.info(s"mapreduce.input.fileinputformat.split.maxsize: ${ newSplitSize.toString }") } } + + def getCsvHeaderColumns(carbonLoadModel: CarbonLoadModel): Array[String] = { + val delimiter = if (StringUtils.isEmpty(carbonLoadModel.getCsvDelimiter)) { --- End diff -- I think the delimiter maybe a blank " " --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/518#discussion_r95507943 --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java --- @@ -373,83 +368,15 @@ private static void addAllComplexTypeChildren(CarbonDimension dimension, StringB return complexTypesMap; } - /** - * Get the csv file to read if it the path is file otherwise get the first file of directory. - * - * @param csvFilePath - * @return File - */ - public static CarbonFile getCsvFileToRead(String csvFilePath) { - CarbonFile csvFile = - FileFactory.getCarbonFile(csvFilePath, FileFactory.getFileType(csvFilePath)); - - CarbonFile[] listFiles = null; - if (csvFile.isDirectory()) { - listFiles = csvFile.listFiles(new CarbonFileFilter() { - @Override public boolean accept(CarbonFile pathname) { - if (!pathname.isDirectory()) { - if (pathname.getName().endsWith(CarbonCommonConstants.CSV_FILE_EXTENSION) || pathname - .getName().endsWith(CarbonCommonConstants.CSV_FILE_EXTENSION - + CarbonCommonConstants.FILE_INPROGRESS_STATUS)) { - return true; - } - } - return false; - } - }); - } else { - listFiles = new CarbonFile[1]; - listFiles[0] = csvFile; - } - return listFiles[0]; - } - - /** - * Get the file header from csv file. - */ - public static String getFileHeader(CarbonFile csvFile) - throws DataLoadingException { - DataInputStream fileReader = null; - BufferedReader bufferedReader = null; - String readLine = null; - - FileType fileType = FileFactory.getFileType(csvFile.getAbsolutePath()); - - if (!csvFile.exists()) { - csvFile = FileFactory - .getCarbonFile(csvFile.getAbsolutePath() + CarbonCommonConstants.FILE_INPROGRESS_STATUS, - fileType); - } - - try { - fileReader = FileFactory.getDataInputStream(csvFile.getAbsolutePath(), fileType); - bufferedReader = - new BufferedReader(new InputStreamReader(fileReader, Charset.defaultCharset())); - readLine = bufferedReader.readLine(); - } catch (FileNotFoundException e) { - LOGGER.error(e, "CSV Input File not found " + e.getMessage()); - throw new DataLoadingException("CSV Input File not found ", e); - } catch (IOException e) { - LOGGER.error(e, "Not able to read CSV input File " + e.getMessage()); - throw new DataLoadingException("Not able to read CSV input File ", e); - } finally { - CarbonUtil.closeStreams(fileReader, bufferedReader); - } - - return readLine; - } - - public static boolean isHeaderValid(String tableName, String header, - CarbonDataLoadSchema schema, String delimiter) throws DataLoadingException { - delimiter = CarbonUtil.delimiterConverter(delimiter); + public static boolean isHeaderValid(String tableName, String[] csvHeader, + CarbonDataLoadSchema schema) throws DataLoadingException { --- End diff -- In this function, basically you want to compare two String array to find out weather they are the same, case-insensitively. take a look at http://stackoverflow.com/questions/2419061/compare-string-array-using-collection According to this link, using TreeSet is optimal in this case --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user QiangCai commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/518#discussion_r95518309 --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java --- @@ -462,6 +389,13 @@ public static boolean isHeaderValid(String tableName, String header, return count == columnNames.length; } + public static boolean isHeaderValid(String tableName, String header, + CarbonDataLoadSchema schema, String delimiter) throws DataLoadingException { + delimiter = CarbonUtil.delimiterConverter(delimiter); --- End diff -- fixed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user QiangCai commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/518#discussion_r95518311 --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java --- @@ -373,83 +368,15 @@ private static void addAllComplexTypeChildren(CarbonDimension dimension, StringB return complexTypesMap; } - /** - * Get the csv file to read if it the path is file otherwise get the first file of directory. - * - * @param csvFilePath - * @return File - */ - public static CarbonFile getCsvFileToRead(String csvFilePath) { - CarbonFile csvFile = - FileFactory.getCarbonFile(csvFilePath, FileFactory.getFileType(csvFilePath)); - - CarbonFile[] listFiles = null; - if (csvFile.isDirectory()) { - listFiles = csvFile.listFiles(new CarbonFileFilter() { - @Override public boolean accept(CarbonFile pathname) { - if (!pathname.isDirectory()) { - if (pathname.getName().endsWith(CarbonCommonConstants.CSV_FILE_EXTENSION) || pathname - .getName().endsWith(CarbonCommonConstants.CSV_FILE_EXTENSION - + CarbonCommonConstants.FILE_INPROGRESS_STATUS)) { - return true; - } - } - return false; - } - }); - } else { - listFiles = new CarbonFile[1]; - listFiles[0] = csvFile; - } - return listFiles[0]; - } - - /** - * Get the file header from csv file. - */ - public static String getFileHeader(CarbonFile csvFile) - throws DataLoadingException { - DataInputStream fileReader = null; - BufferedReader bufferedReader = null; - String readLine = null; - - FileType fileType = FileFactory.getFileType(csvFile.getAbsolutePath()); - - if (!csvFile.exists()) { - csvFile = FileFactory - .getCarbonFile(csvFile.getAbsolutePath() + CarbonCommonConstants.FILE_INPROGRESS_STATUS, - fileType); - } - - try { - fileReader = FileFactory.getDataInputStream(csvFile.getAbsolutePath(), fileType); - bufferedReader = - new BufferedReader(new InputStreamReader(fileReader, Charset.defaultCharset())); - readLine = bufferedReader.readLine(); - } catch (FileNotFoundException e) { - LOGGER.error(e, "CSV Input File not found " + e.getMessage()); - throw new DataLoadingException("CSV Input File not found ", e); - } catch (IOException e) { - LOGGER.error(e, "Not able to read CSV input File " + e.getMessage()); - throw new DataLoadingException("Not able to read CSV input File ", e); - } finally { - CarbonUtil.closeStreams(fileReader, bufferedReader); - } - - return readLine; - } - - public static boolean isHeaderValid(String tableName, String header, - CarbonDataLoadSchema schema, String delimiter) throws DataLoadingException { - delimiter = CarbonUtil.delimiterConverter(delimiter); + public static boolean isHeaderValid(String tableName, String[] csvHeader, + CarbonDataLoadSchema schema) throws DataLoadingException { --- End diff -- fixed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user QiangCai commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/518#discussion_r95518312 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala --- @@ -301,4 +304,45 @@ object CommonUtil { LOGGER.info(s"mapreduce.input.fileinputformat.split.maxsize: ${ newSplitSize.toString }") } } + + def getCsvHeaderColumns(carbonLoadModel: CarbonLoadModel): Array[String] = { + val delimiter = if (StringUtils.isEmpty(carbonLoadModel.getCsvDelimiter)) { + CarbonCommonConstants.COMMA + } else { + CarbonUtil.delimiterConverter(carbonLoadModel.getCsvDelimiter) + } + var csvFile: String = null + var csvHeader: String = carbonLoadModel.getCsvHeader + val csvColumns = if (StringUtils.isBlank(csvHeader)) { + // read header from csv file + csvFile = carbonLoadModel.getFactFilePath.split(",")(0) + csvHeader = CarbonUtil.readHeader(csvFile) + if (StringUtils.isBlank(csvHeader)) { + throw new CarbonDataLoadingException("First line of the csv is not valid.") + } + csvHeader.toLowerCase().split(delimiter).map(_.replaceAll("\"", "").trim) + } else { + csvHeader.toLowerCase.split(CarbonCommonConstants.COMMA).map(_.trim) + } + + if (!CarbonDataProcessorUtil.isHeaderValid(carbonLoadModel.getTableName, csvColumns, + carbonLoadModel.getCarbonDataLoadSchema)) { + if (csvFile == null) { + LOGGER.error("CSV header provided in DDL is not proper." + + " Column names in schema and CSV header are not the same.") + throw new CarbonDataLoadingException( + "CSV header provided in DDL is not proper. Column names in schema and CSV header are " + + "not the same.") + } else { + LOGGER.error( + "CSV File provided is not proper. Column names in schema and csv header are not same. " --- End diff -- fixed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/518 Build Success with Spark 1.5.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/547/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/518 Build Success with Spark 1.5.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/549/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/518#discussion_r95521643 --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java --- @@ -373,93 +368,25 @@ private static void addAllComplexTypeChildren(CarbonDimension dimension, StringB return complexTypesMap; } - /** - * Get the csv file to read if it the path is file otherwise get the first file of directory. - * - * @param csvFilePath - * @return File - */ - public static CarbonFile getCsvFileToRead(String csvFilePath) { - CarbonFile csvFile = - FileFactory.getCarbonFile(csvFilePath, FileFactory.getFileType(csvFilePath)); - - CarbonFile[] listFiles = null; - if (csvFile.isDirectory()) { - listFiles = csvFile.listFiles(new CarbonFileFilter() { - @Override public boolean accept(CarbonFile pathname) { - if (!pathname.isDirectory()) { - if (pathname.getName().endsWith(CarbonCommonConstants.CSV_FILE_EXTENSION) || pathname - .getName().endsWith(CarbonCommonConstants.CSV_FILE_EXTENSION - + CarbonCommonConstants.FILE_INPROGRESS_STATUS)) { - return true; - } - } - return false; - } - }); - } else { - listFiles = new CarbonFile[1]; - listFiles[0] = csvFile; - } - return listFiles[0]; - } - - /** - * Get the file header from csv file. - */ - public static String getFileHeader(CarbonFile csvFile) - throws DataLoadingException { - DataInputStream fileReader = null; - BufferedReader bufferedReader = null; - String readLine = null; - - FileType fileType = FileFactory.getFileType(csvFile.getAbsolutePath()); - - if (!csvFile.exists()) { - csvFile = FileFactory - .getCarbonFile(csvFile.getAbsolutePath() + CarbonCommonConstants.FILE_INPROGRESS_STATUS, - fileType); - } + public static boolean isHeaderValid(String tableName, String[] csvHeader, + CarbonDataLoadSchema schema) { + Iterator<String> columnIterator = + CarbonDataProcessorUtil.getSchemaColumnNames(schema, tableName).iterator(); + Set<String> csvColumns = new HashSet<String>(Arrays.asList(csvHeader)); --- End diff -- You can use `Collection.addAll` instead of converting to list and add --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/518 Build Success with Spark 1.5.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/550/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/518#discussion_r95522233 --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java --- @@ -373,93 +369,26 @@ private static void addAllComplexTypeChildren(CarbonDimension dimension, StringB return complexTypesMap; } - /** - * Get the csv file to read if it the path is file otherwise get the first file of directory. - * - * @param csvFilePath - * @return File - */ - public static CarbonFile getCsvFileToRead(String csvFilePath) { - CarbonFile csvFile = - FileFactory.getCarbonFile(csvFilePath, FileFactory.getFileType(csvFilePath)); - - CarbonFile[] listFiles = null; - if (csvFile.isDirectory()) { - listFiles = csvFile.listFiles(new CarbonFileFilter() { - @Override public boolean accept(CarbonFile pathname) { - if (!pathname.isDirectory()) { - if (pathname.getName().endsWith(CarbonCommonConstants.CSV_FILE_EXTENSION) || pathname - .getName().endsWith(CarbonCommonConstants.CSV_FILE_EXTENSION - + CarbonCommonConstants.FILE_INPROGRESS_STATUS)) { - return true; - } - } - return false; - } - }); - } else { - listFiles = new CarbonFile[1]; - listFiles[0] = csvFile; - } - return listFiles[0]; - } - - /** - * Get the file header from csv file. - */ - public static String getFileHeader(CarbonFile csvFile) - throws DataLoadingException { - DataInputStream fileReader = null; - BufferedReader bufferedReader = null; - String readLine = null; - - FileType fileType = FileFactory.getFileType(csvFile.getAbsolutePath()); - - if (!csvFile.exists()) { - csvFile = FileFactory - .getCarbonFile(csvFile.getAbsolutePath() + CarbonCommonConstants.FILE_INPROGRESS_STATUS, - fileType); - } + public static boolean isHeaderValid(String tableName, String[] csvHeader, + CarbonDataLoadSchema schema) { + Iterator<String> columnIterator = + CarbonDataProcessorUtil.getSchemaColumnNames(schema, tableName).iterator(); + Set<String> csvColumns = new HashSet<String>(csvHeader.length); + Collections.addAll(csvColumns, csvHeader); - try { - fileReader = FileFactory.getDataInputStream(csvFile.getAbsolutePath(), fileType); - bufferedReader = - new BufferedReader(new InputStreamReader(fileReader, Charset.defaultCharset())); - readLine = bufferedReader.readLine(); - } catch (FileNotFoundException e) { - LOGGER.error(e, "CSV Input File not found " + e.getMessage()); - throw new DataLoadingException("CSV Input File not found ", e); - } catch (IOException e) { - LOGGER.error(e, "Not able to read CSV input File " + e.getMessage()); - throw new DataLoadingException("Not able to read CSV input File ", e); - } finally { - CarbonUtil.closeStreams(fileReader, bufferedReader); + while (columnIterator.hasNext()) { --- End diff -- please add comment to describe this logic, column definition in schema should be subset of input CSV header --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/518 Build Success with Spark 1.5.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/552/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
Free forum by Nabble | Edit this page |