[GitHub] incubator-carbondata pull request #518: [WIP]unify file header reader

classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #518: [WIP]unify file header reader

qiuchenjian-2
GitHub user QiangCai opened a pull request:

    https://github.com/apache/incubator-carbondata/pull/518

    [WIP]unify file header reader

   

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/QiangCai/incubator-carbondata fileheader

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-carbondata/pull/518.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #518
   
----
commit 5440b9c16799d935f9da1728344564a65a2d6ef2
Author: QiangCai <[hidden email]>
Date:   2017-01-10T13:32:51Z

    readfileheader

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata issue #518: [WIP]unify file header reader

qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/518
 
    Build Success with Spark 1.5.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/542/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata issue #518: [WIP]unify file header reader

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/518
 
    Build Failed  with Spark 1.5.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/543/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata issue #518: [WIP]unify file header reader

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/518
 
    Build Success with Spark 1.5.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/544/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #518: [CARBONDATA-622]unify file header re...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/518#discussion_r95505900
 
    --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala ---
    @@ -301,4 +304,45 @@ object CommonUtil {
           LOGGER.info(s"mapreduce.input.fileinputformat.split.maxsize: ${ newSplitSize.toString }")
         }
       }
    +
    +  def getCsvHeaderColumns(carbonLoadModel: CarbonLoadModel): Array[String] = {
    +    val delimiter = if (StringUtils.isEmpty(carbonLoadModel.getCsvDelimiter)) {
    --- End diff --
   
    I think delimiter can not be " ", right? so better to use isBlank instead of isEmpty


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #518: [CARBONDATA-622]unify file header re...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/518#discussion_r95506643
 
    --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala ---
    @@ -301,4 +304,45 @@ object CommonUtil {
           LOGGER.info(s"mapreduce.input.fileinputformat.split.maxsize: ${ newSplitSize.toString }")
         }
       }
    +
    +  def getCsvHeaderColumns(carbonLoadModel: CarbonLoadModel): Array[String] = {
    +    val delimiter = if (StringUtils.isEmpty(carbonLoadModel.getCsvDelimiter)) {
    +      CarbonCommonConstants.COMMA
    +    } else {
    +      CarbonUtil.delimiterConverter(carbonLoadModel.getCsvDelimiter)
    +    }
    +    var csvFile: String = null
    +    var csvHeader: String = carbonLoadModel.getCsvHeader
    +    val csvColumns = if (StringUtils.isBlank(csvHeader)) {
    +      // read header from csv file
    +      csvFile = carbonLoadModel.getFactFilePath.split(",")(0)
    +      csvHeader = CarbonUtil.readHeader(csvFile)
    +      if (StringUtils.isBlank(csvHeader)) {
    +        throw new CarbonDataLoadingException("First line of the csv is not valid.")
    +      }
    +      csvHeader.toLowerCase().split(delimiter).map(_.replaceAll("\"", "").trim)
    +    } else {
    +      csvHeader.toLowerCase.split(CarbonCommonConstants.COMMA).map(_.trim)
    +    }
    +
    +    if (!CarbonDataProcessorUtil.isHeaderValid(carbonLoadModel.getTableName, csvColumns,
    +        carbonLoadModel.getCarbonDataLoadSchema)) {
    +      if (csvFile == null) {
    +        LOGGER.error("CSV header provided in DDL is not proper."
    +                     + " Column names in schema and CSV header are not the same.")
    +        throw new CarbonDataLoadingException(
    +          "CSV header provided in DDL is not proper. Column names in schema and CSV header are "
    +          + "not the same.")
    +      } else {
    +        LOGGER.error(
    +          "CSV File provided is not proper. Column names in schema and csv header are not same. "
    --- End diff --
   
    Better to tell "CSV header in the input file ($csvFile) is not proper."


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #518: [CARBONDATA-622]unify file header re...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/518#discussion_r95506953
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java ---
    @@ -373,83 +368,15 @@ private static void addAllComplexTypeChildren(CarbonDimension dimension, StringB
         return complexTypesMap;
       }
     
    -  /**
    -   * Get the csv file to read if it the path is file otherwise get the first file of directory.
    -   *
    -   * @param csvFilePath
    -   * @return File
    -   */
    -  public static CarbonFile getCsvFileToRead(String csvFilePath) {
    -    CarbonFile csvFile =
    -        FileFactory.getCarbonFile(csvFilePath, FileFactory.getFileType(csvFilePath));
    -
    -    CarbonFile[] listFiles = null;
    -    if (csvFile.isDirectory()) {
    -      listFiles = csvFile.listFiles(new CarbonFileFilter() {
    -        @Override public boolean accept(CarbonFile pathname) {
    -          if (!pathname.isDirectory()) {
    -            if (pathname.getName().endsWith(CarbonCommonConstants.CSV_FILE_EXTENSION) || pathname
    -                .getName().endsWith(CarbonCommonConstants.CSV_FILE_EXTENSION
    -                    + CarbonCommonConstants.FILE_INPROGRESS_STATUS)) {
    -              return true;
    -            }
    -          }
    -          return false;
    -        }
    -      });
    -    } else {
    -      listFiles = new CarbonFile[1];
    -      listFiles[0] = csvFile;
    -    }
    -    return listFiles[0];
    -  }
    -
    -  /**
    -   * Get the file header from csv file.
    -   */
    -  public static String getFileHeader(CarbonFile csvFile)
    -      throws DataLoadingException {
    -    DataInputStream fileReader = null;
    -    BufferedReader bufferedReader = null;
    -    String readLine = null;
    -
    -    FileType fileType = FileFactory.getFileType(csvFile.getAbsolutePath());
    -
    -    if (!csvFile.exists()) {
    -      csvFile = FileFactory
    -          .getCarbonFile(csvFile.getAbsolutePath() + CarbonCommonConstants.FILE_INPROGRESS_STATUS,
    -              fileType);
    -    }
    -
    -    try {
    -      fileReader = FileFactory.getDataInputStream(csvFile.getAbsolutePath(), fileType);
    -      bufferedReader =
    -          new BufferedReader(new InputStreamReader(fileReader, Charset.defaultCharset()));
    -      readLine = bufferedReader.readLine();
    -    } catch (FileNotFoundException e) {
    -      LOGGER.error(e, "CSV Input File not found  " + e.getMessage());
    -      throw new DataLoadingException("CSV Input File not found ", e);
    -    } catch (IOException e) {
    -      LOGGER.error(e, "Not able to read CSV input File  " + e.getMessage());
    -      throw new DataLoadingException("Not able to read CSV input File ", e);
    -    } finally {
    -      CarbonUtil.closeStreams(fileReader, bufferedReader);
    -    }
    -
    -    return readLine;
    -  }
    -
    -  public static boolean isHeaderValid(String tableName, String header,
    -      CarbonDataLoadSchema schema, String delimiter) throws DataLoadingException {
    -    delimiter = CarbonUtil.delimiterConverter(delimiter);
    +  public static boolean isHeaderValid(String tableName, String[] csvHeader,
    +      CarbonDataLoadSchema schema) throws DataLoadingException {
    --- End diff --
   
    I think DataLoadingException can be removed, it is not thrown by the body


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #518: [CARBONDATA-622]unify file header re...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/518#discussion_r95507187
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java ---
    @@ -462,6 +389,13 @@ public static boolean isHeaderValid(String tableName, String header,
         return count == columnNames.length;
       }
     
    +  public static boolean isHeaderValid(String tableName, String header,
    +      CarbonDataLoadSchema schema, String delimiter) throws DataLoadingException {
    +    delimiter = CarbonUtil.delimiterConverter(delimiter);
    --- End diff --
   
    declare a local variable


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #518: [CARBONDATA-622]unify file header re...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/518#discussion_r95507253
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java ---
    @@ -373,83 +368,15 @@ private static void addAllComplexTypeChildren(CarbonDimension dimension, StringB
         return complexTypesMap;
       }
     
    -  /**
    -   * Get the csv file to read if it the path is file otherwise get the first file of directory.
    -   *
    -   * @param csvFilePath
    -   * @return File
    -   */
    -  public static CarbonFile getCsvFileToRead(String csvFilePath) {
    -    CarbonFile csvFile =
    -        FileFactory.getCarbonFile(csvFilePath, FileFactory.getFileType(csvFilePath));
    -
    -    CarbonFile[] listFiles = null;
    -    if (csvFile.isDirectory()) {
    -      listFiles = csvFile.listFiles(new CarbonFileFilter() {
    -        @Override public boolean accept(CarbonFile pathname) {
    -          if (!pathname.isDirectory()) {
    -            if (pathname.getName().endsWith(CarbonCommonConstants.CSV_FILE_EXTENSION) || pathname
    -                .getName().endsWith(CarbonCommonConstants.CSV_FILE_EXTENSION
    -                    + CarbonCommonConstants.FILE_INPROGRESS_STATUS)) {
    -              return true;
    -            }
    -          }
    -          return false;
    -        }
    -      });
    -    } else {
    -      listFiles = new CarbonFile[1];
    -      listFiles[0] = csvFile;
    -    }
    -    return listFiles[0];
    -  }
    -
    -  /**
    -   * Get the file header from csv file.
    -   */
    -  public static String getFileHeader(CarbonFile csvFile)
    -      throws DataLoadingException {
    -    DataInputStream fileReader = null;
    -    BufferedReader bufferedReader = null;
    -    String readLine = null;
    -
    -    FileType fileType = FileFactory.getFileType(csvFile.getAbsolutePath());
    -
    -    if (!csvFile.exists()) {
    -      csvFile = FileFactory
    -          .getCarbonFile(csvFile.getAbsolutePath() + CarbonCommonConstants.FILE_INPROGRESS_STATUS,
    -              fileType);
    -    }
    -
    -    try {
    -      fileReader = FileFactory.getDataInputStream(csvFile.getAbsolutePath(), fileType);
    -      bufferedReader =
    -          new BufferedReader(new InputStreamReader(fileReader, Charset.defaultCharset()));
    -      readLine = bufferedReader.readLine();
    -    } catch (FileNotFoundException e) {
    -      LOGGER.error(e, "CSV Input File not found  " + e.getMessage());
    -      throw new DataLoadingException("CSV Input File not found ", e);
    -    } catch (IOException e) {
    -      LOGGER.error(e, "Not able to read CSV input File  " + e.getMessage());
    -      throw new DataLoadingException("Not able to read CSV input File ", e);
    -    } finally {
    -      CarbonUtil.closeStreams(fileReader, bufferedReader);
    -    }
    -
    -    return readLine;
    -  }
    -
    -  public static boolean isHeaderValid(String tableName, String header,
    -      CarbonDataLoadSchema schema, String delimiter) throws DataLoadingException {
    -    delimiter = CarbonUtil.delimiterConverter(delimiter);
    +  public static boolean isHeaderValid(String tableName, String[] csvHeader,
    +      CarbonDataLoadSchema schema) throws DataLoadingException {
         String[] columnNames =
             CarbonDataProcessorUtil.getSchemaColumnNames(schema, tableName).toArray(new String[0]);
    -    String[] csvHeader = header.toLowerCase().split(delimiter);
     
    -    List<String> csvColumnsList = new ArrayList<String>(CarbonCommonConstants.CONSTANT_SIZE_TEN);
    +    List<String> csvColumnsList = new ArrayList<String>(csvHeader.length);
     
         for (String column : csvHeader) {
    -      csvColumnsList.add(column.replaceAll("\"", "").trim());
    +      csvColumnsList.add(column);
    --- End diff --
   
    use `Collections.addAll` instead


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #518: [CARBONDATA-622]unify file header re...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user QiangCai commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/518#discussion_r95507937
 
    --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala ---
    @@ -301,4 +304,45 @@ object CommonUtil {
           LOGGER.info(s"mapreduce.input.fileinputformat.split.maxsize: ${ newSplitSize.toString }")
         }
       }
    +
    +  def getCsvHeaderColumns(carbonLoadModel: CarbonLoadModel): Array[String] = {
    +    val delimiter = if (StringUtils.isEmpty(carbonLoadModel.getCsvDelimiter)) {
    --- End diff --
   
    I think the delimiter maybe a blank " "


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #518: [CARBONDATA-622]unify file header re...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/518#discussion_r95507943
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java ---
    @@ -373,83 +368,15 @@ private static void addAllComplexTypeChildren(CarbonDimension dimension, StringB
         return complexTypesMap;
       }
     
    -  /**
    -   * Get the csv file to read if it the path is file otherwise get the first file of directory.
    -   *
    -   * @param csvFilePath
    -   * @return File
    -   */
    -  public static CarbonFile getCsvFileToRead(String csvFilePath) {
    -    CarbonFile csvFile =
    -        FileFactory.getCarbonFile(csvFilePath, FileFactory.getFileType(csvFilePath));
    -
    -    CarbonFile[] listFiles = null;
    -    if (csvFile.isDirectory()) {
    -      listFiles = csvFile.listFiles(new CarbonFileFilter() {
    -        @Override public boolean accept(CarbonFile pathname) {
    -          if (!pathname.isDirectory()) {
    -            if (pathname.getName().endsWith(CarbonCommonConstants.CSV_FILE_EXTENSION) || pathname
    -                .getName().endsWith(CarbonCommonConstants.CSV_FILE_EXTENSION
    -                    + CarbonCommonConstants.FILE_INPROGRESS_STATUS)) {
    -              return true;
    -            }
    -          }
    -          return false;
    -        }
    -      });
    -    } else {
    -      listFiles = new CarbonFile[1];
    -      listFiles[0] = csvFile;
    -    }
    -    return listFiles[0];
    -  }
    -
    -  /**
    -   * Get the file header from csv file.
    -   */
    -  public static String getFileHeader(CarbonFile csvFile)
    -      throws DataLoadingException {
    -    DataInputStream fileReader = null;
    -    BufferedReader bufferedReader = null;
    -    String readLine = null;
    -
    -    FileType fileType = FileFactory.getFileType(csvFile.getAbsolutePath());
    -
    -    if (!csvFile.exists()) {
    -      csvFile = FileFactory
    -          .getCarbonFile(csvFile.getAbsolutePath() + CarbonCommonConstants.FILE_INPROGRESS_STATUS,
    -              fileType);
    -    }
    -
    -    try {
    -      fileReader = FileFactory.getDataInputStream(csvFile.getAbsolutePath(), fileType);
    -      bufferedReader =
    -          new BufferedReader(new InputStreamReader(fileReader, Charset.defaultCharset()));
    -      readLine = bufferedReader.readLine();
    -    } catch (FileNotFoundException e) {
    -      LOGGER.error(e, "CSV Input File not found  " + e.getMessage());
    -      throw new DataLoadingException("CSV Input File not found ", e);
    -    } catch (IOException e) {
    -      LOGGER.error(e, "Not able to read CSV input File  " + e.getMessage());
    -      throw new DataLoadingException("Not able to read CSV input File ", e);
    -    } finally {
    -      CarbonUtil.closeStreams(fileReader, bufferedReader);
    -    }
    -
    -    return readLine;
    -  }
    -
    -  public static boolean isHeaderValid(String tableName, String header,
    -      CarbonDataLoadSchema schema, String delimiter) throws DataLoadingException {
    -    delimiter = CarbonUtil.delimiterConverter(delimiter);
    +  public static boolean isHeaderValid(String tableName, String[] csvHeader,
    +      CarbonDataLoadSchema schema) throws DataLoadingException {
    --- End diff --
   
    In this function, basically you want to compare two String array to find out weather they are the same, case-insensitively.
    take a look at http://stackoverflow.com/questions/2419061/compare-string-array-using-collection
    According to this link, using TreeSet is optimal in this case


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #518: [CARBONDATA-622]unify file header re...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user QiangCai commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/518#discussion_r95518309
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java ---
    @@ -462,6 +389,13 @@ public static boolean isHeaderValid(String tableName, String header,
         return count == columnNames.length;
       }
     
    +  public static boolean isHeaderValid(String tableName, String header,
    +      CarbonDataLoadSchema schema, String delimiter) throws DataLoadingException {
    +    delimiter = CarbonUtil.delimiterConverter(delimiter);
    --- End diff --
   
    fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #518: [CARBONDATA-622]unify file header re...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user QiangCai commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/518#discussion_r95518311
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java ---
    @@ -373,83 +368,15 @@ private static void addAllComplexTypeChildren(CarbonDimension dimension, StringB
         return complexTypesMap;
       }
     
    -  /**
    -   * Get the csv file to read if it the path is file otherwise get the first file of directory.
    -   *
    -   * @param csvFilePath
    -   * @return File
    -   */
    -  public static CarbonFile getCsvFileToRead(String csvFilePath) {
    -    CarbonFile csvFile =
    -        FileFactory.getCarbonFile(csvFilePath, FileFactory.getFileType(csvFilePath));
    -
    -    CarbonFile[] listFiles = null;
    -    if (csvFile.isDirectory()) {
    -      listFiles = csvFile.listFiles(new CarbonFileFilter() {
    -        @Override public boolean accept(CarbonFile pathname) {
    -          if (!pathname.isDirectory()) {
    -            if (pathname.getName().endsWith(CarbonCommonConstants.CSV_FILE_EXTENSION) || pathname
    -                .getName().endsWith(CarbonCommonConstants.CSV_FILE_EXTENSION
    -                    + CarbonCommonConstants.FILE_INPROGRESS_STATUS)) {
    -              return true;
    -            }
    -          }
    -          return false;
    -        }
    -      });
    -    } else {
    -      listFiles = new CarbonFile[1];
    -      listFiles[0] = csvFile;
    -    }
    -    return listFiles[0];
    -  }
    -
    -  /**
    -   * Get the file header from csv file.
    -   */
    -  public static String getFileHeader(CarbonFile csvFile)
    -      throws DataLoadingException {
    -    DataInputStream fileReader = null;
    -    BufferedReader bufferedReader = null;
    -    String readLine = null;
    -
    -    FileType fileType = FileFactory.getFileType(csvFile.getAbsolutePath());
    -
    -    if (!csvFile.exists()) {
    -      csvFile = FileFactory
    -          .getCarbonFile(csvFile.getAbsolutePath() + CarbonCommonConstants.FILE_INPROGRESS_STATUS,
    -              fileType);
    -    }
    -
    -    try {
    -      fileReader = FileFactory.getDataInputStream(csvFile.getAbsolutePath(), fileType);
    -      bufferedReader =
    -          new BufferedReader(new InputStreamReader(fileReader, Charset.defaultCharset()));
    -      readLine = bufferedReader.readLine();
    -    } catch (FileNotFoundException e) {
    -      LOGGER.error(e, "CSV Input File not found  " + e.getMessage());
    -      throw new DataLoadingException("CSV Input File not found ", e);
    -    } catch (IOException e) {
    -      LOGGER.error(e, "Not able to read CSV input File  " + e.getMessage());
    -      throw new DataLoadingException("Not able to read CSV input File ", e);
    -    } finally {
    -      CarbonUtil.closeStreams(fileReader, bufferedReader);
    -    }
    -
    -    return readLine;
    -  }
    -
    -  public static boolean isHeaderValid(String tableName, String header,
    -      CarbonDataLoadSchema schema, String delimiter) throws DataLoadingException {
    -    delimiter = CarbonUtil.delimiterConverter(delimiter);
    +  public static boolean isHeaderValid(String tableName, String[] csvHeader,
    +      CarbonDataLoadSchema schema) throws DataLoadingException {
    --- End diff --
   
    fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #518: [CARBONDATA-622]unify file header re...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user QiangCai commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/518#discussion_r95518312
 
    --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala ---
    @@ -301,4 +304,45 @@ object CommonUtil {
           LOGGER.info(s"mapreduce.input.fileinputformat.split.maxsize: ${ newSplitSize.toString }")
         }
       }
    +
    +  def getCsvHeaderColumns(carbonLoadModel: CarbonLoadModel): Array[String] = {
    +    val delimiter = if (StringUtils.isEmpty(carbonLoadModel.getCsvDelimiter)) {
    +      CarbonCommonConstants.COMMA
    +    } else {
    +      CarbonUtil.delimiterConverter(carbonLoadModel.getCsvDelimiter)
    +    }
    +    var csvFile: String = null
    +    var csvHeader: String = carbonLoadModel.getCsvHeader
    +    val csvColumns = if (StringUtils.isBlank(csvHeader)) {
    +      // read header from csv file
    +      csvFile = carbonLoadModel.getFactFilePath.split(",")(0)
    +      csvHeader = CarbonUtil.readHeader(csvFile)
    +      if (StringUtils.isBlank(csvHeader)) {
    +        throw new CarbonDataLoadingException("First line of the csv is not valid.")
    +      }
    +      csvHeader.toLowerCase().split(delimiter).map(_.replaceAll("\"", "").trim)
    +    } else {
    +      csvHeader.toLowerCase.split(CarbonCommonConstants.COMMA).map(_.trim)
    +    }
    +
    +    if (!CarbonDataProcessorUtil.isHeaderValid(carbonLoadModel.getTableName, csvColumns,
    +        carbonLoadModel.getCarbonDataLoadSchema)) {
    +      if (csvFile == null) {
    +        LOGGER.error("CSV header provided in DDL is not proper."
    +                     + " Column names in schema and CSV header are not the same.")
    +        throw new CarbonDataLoadingException(
    +          "CSV header provided in DDL is not proper. Column names in schema and CSV header are "
    +          + "not the same.")
    +      } else {
    +        LOGGER.error(
    +          "CSV File provided is not proper. Column names in schema and csv header are not same. "
    --- End diff --
   
    fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata issue #518: [CARBONDATA-622]unify file header reader

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/518
 
    Build Success with Spark 1.5.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/547/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata issue #518: [CARBONDATA-622]unify file header reader

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/518
 
    Build Success with Spark 1.5.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/549/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #518: [CARBONDATA-622]unify file header re...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/518#discussion_r95521643
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java ---
    @@ -373,93 +368,25 @@ private static void addAllComplexTypeChildren(CarbonDimension dimension, StringB
         return complexTypesMap;
       }
     
    -  /**
    -   * Get the csv file to read if it the path is file otherwise get the first file of directory.
    -   *
    -   * @param csvFilePath
    -   * @return File
    -   */
    -  public static CarbonFile getCsvFileToRead(String csvFilePath) {
    -    CarbonFile csvFile =
    -        FileFactory.getCarbonFile(csvFilePath, FileFactory.getFileType(csvFilePath));
    -
    -    CarbonFile[] listFiles = null;
    -    if (csvFile.isDirectory()) {
    -      listFiles = csvFile.listFiles(new CarbonFileFilter() {
    -        @Override public boolean accept(CarbonFile pathname) {
    -          if (!pathname.isDirectory()) {
    -            if (pathname.getName().endsWith(CarbonCommonConstants.CSV_FILE_EXTENSION) || pathname
    -                .getName().endsWith(CarbonCommonConstants.CSV_FILE_EXTENSION
    -                    + CarbonCommonConstants.FILE_INPROGRESS_STATUS)) {
    -              return true;
    -            }
    -          }
    -          return false;
    -        }
    -      });
    -    } else {
    -      listFiles = new CarbonFile[1];
    -      listFiles[0] = csvFile;
    -    }
    -    return listFiles[0];
    -  }
    -
    -  /**
    -   * Get the file header from csv file.
    -   */
    -  public static String getFileHeader(CarbonFile csvFile)
    -      throws DataLoadingException {
    -    DataInputStream fileReader = null;
    -    BufferedReader bufferedReader = null;
    -    String readLine = null;
    -
    -    FileType fileType = FileFactory.getFileType(csvFile.getAbsolutePath());
    -
    -    if (!csvFile.exists()) {
    -      csvFile = FileFactory
    -          .getCarbonFile(csvFile.getAbsolutePath() + CarbonCommonConstants.FILE_INPROGRESS_STATUS,
    -              fileType);
    -    }
    +  public static boolean isHeaderValid(String tableName, String[] csvHeader,
    +      CarbonDataLoadSchema schema) {
    +    Iterator<String> columnIterator =
    +        CarbonDataProcessorUtil.getSchemaColumnNames(schema, tableName).iterator();
    +    Set<String> csvColumns = new HashSet<String>(Arrays.asList(csvHeader));
    --- End diff --
   
    You can use `Collection.addAll` instead of converting to list and add


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata issue #518: [CARBONDATA-622]unify file header reader

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/518
 
    Build Success with Spark 1.5.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/550/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata pull request #518: [CARBONDATA-622]unify file header re...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/518#discussion_r95522233
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java ---
    @@ -373,93 +369,26 @@ private static void addAllComplexTypeChildren(CarbonDimension dimension, StringB
         return complexTypesMap;
       }
     
    -  /**
    -   * Get the csv file to read if it the path is file otherwise get the first file of directory.
    -   *
    -   * @param csvFilePath
    -   * @return File
    -   */
    -  public static CarbonFile getCsvFileToRead(String csvFilePath) {
    -    CarbonFile csvFile =
    -        FileFactory.getCarbonFile(csvFilePath, FileFactory.getFileType(csvFilePath));
    -
    -    CarbonFile[] listFiles = null;
    -    if (csvFile.isDirectory()) {
    -      listFiles = csvFile.listFiles(new CarbonFileFilter() {
    -        @Override public boolean accept(CarbonFile pathname) {
    -          if (!pathname.isDirectory()) {
    -            if (pathname.getName().endsWith(CarbonCommonConstants.CSV_FILE_EXTENSION) || pathname
    -                .getName().endsWith(CarbonCommonConstants.CSV_FILE_EXTENSION
    -                    + CarbonCommonConstants.FILE_INPROGRESS_STATUS)) {
    -              return true;
    -            }
    -          }
    -          return false;
    -        }
    -      });
    -    } else {
    -      listFiles = new CarbonFile[1];
    -      listFiles[0] = csvFile;
    -    }
    -    return listFiles[0];
    -  }
    -
    -  /**
    -   * Get the file header from csv file.
    -   */
    -  public static String getFileHeader(CarbonFile csvFile)
    -      throws DataLoadingException {
    -    DataInputStream fileReader = null;
    -    BufferedReader bufferedReader = null;
    -    String readLine = null;
    -
    -    FileType fileType = FileFactory.getFileType(csvFile.getAbsolutePath());
    -
    -    if (!csvFile.exists()) {
    -      csvFile = FileFactory
    -          .getCarbonFile(csvFile.getAbsolutePath() + CarbonCommonConstants.FILE_INPROGRESS_STATUS,
    -              fileType);
    -    }
    +  public static boolean isHeaderValid(String tableName, String[] csvHeader,
    +      CarbonDataLoadSchema schema) {
    +    Iterator<String> columnIterator =
    +        CarbonDataProcessorUtil.getSchemaColumnNames(schema, tableName).iterator();
    +    Set<String> csvColumns = new HashSet<String>(csvHeader.length);
    +    Collections.addAll(csvColumns, csvHeader);
     
    -    try {
    -      fileReader = FileFactory.getDataInputStream(csvFile.getAbsolutePath(), fileType);
    -      bufferedReader =
    -          new BufferedReader(new InputStreamReader(fileReader, Charset.defaultCharset()));
    -      readLine = bufferedReader.readLine();
    -    } catch (FileNotFoundException e) {
    -      LOGGER.error(e, "CSV Input File not found  " + e.getMessage());
    -      throw new DataLoadingException("CSV Input File not found ", e);
    -    } catch (IOException e) {
    -      LOGGER.error(e, "Not able to read CSV input File  " + e.getMessage());
    -      throw new DataLoadingException("Not able to read CSV input File ", e);
    -    } finally {
    -      CarbonUtil.closeStreams(fileReader, bufferedReader);
    +    while (columnIterator.hasNext()) {
    --- End diff --
   
    please add comment to describe this logic, column definition in schema should be subset of input CSV header


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-carbondata issue #518: [CARBONDATA-622]unify file header reader

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/518
 
    Build Success with Spark 1.5.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/552/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
12