[GitHub] carbondata pull request #1825: [CARBONDATA-2032][DataLoad] directly write ca...

classic Classic list List threaded Threaded
108 messages Options
123456
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1825: [CARBONDATA-2032][DataLoad] directly write carbon da...

qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1825
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3688/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1825: [CARBONDATA-2032][DataLoad] directly write carbon da...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/1825
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1825: [CARBONDATA-2032][DataLoad] directly write carbon da...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1825
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3914/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1825: [CARBONDATA-2032][DataLoad] directly write carbon da...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1825
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2669/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1825: [CARBONDATA-2032][DataLoad] directly write carbon da...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1825
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3689/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1825: [CARBONDATA-2032][DataLoad] directly write carbon da...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on the issue:

    https://github.com/apache/carbondata/pull/1825
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1825: [CARBONDATA-2032][DataLoad] directly write ca...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1825#discussion_r171454147
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/writer/AbstractFactDataWriter.java ---
    @@ -239,24 +260,30 @@ private void notifyDataMapBlockEnd() {
         blockletId = 0;
       }
     
    -  private String constructFactFileFullPath() {
    -    String factFilePath =
    -        this.model.getCarbonDataDirectoryPath() + File.separator + this.carbonDataFileName;
    -    return factFilePath;
    -  }
       /**
        * Finish writing current file. It will flush stream, copy and rename temp file to final file
        * @param copyInCurrentThread set to false if want to do data copy in a new thread
        */
       protected void commitCurrentFile(boolean copyInCurrentThread) {
         notifyDataMapBlockEnd();
         CarbonUtil.closeStreams(this.fileOutputStream, this.fileChannel);
    -    if (copyInCurrentThread) {
    -      CarbonUtil.copyCarbonDataFileToCarbonStorePath(
    -          carbonDataFileTempPath, model.getCarbonDataDirectoryPath(),
    -          fileSizeInBytes);
    +    if (enableDirectlyWriteData2Hdfs) {
    +      if (copyInCurrentThread) {
    +        CarbonUtil.completeRemainingHdfsReplicas(carbonDataFileHdfsPath,
    +            FileFactory.FileType.HDFS);
    +      } else {
    +        executorServiceSubmitList.add(executorService.submit(
    +            new CopyThread(carbonDataFileHdfsPath, FileFactory.FileType.HDFS)));
    --- End diff --
   
    Copy again? or just rename?


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1825: [CARBONDATA-2032][DataLoad] directly write ca...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1825#discussion_r171454525
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/writer/AbstractFactDataWriter.java ---
    @@ -386,29 +380,42 @@ protected void writeIndexFile() throws IOException, CarbonDataWriterException {
             .getIndexHeader(localCardinality, thriftColumnSchemaList, model.getBucketId());
         // get the block index info thrift
         List<BlockIndex> blockIndexThrift = CarbonMetadataUtil.getBlockIndexInfo(blockIndexInfoList);
    -    // randomly choose a temp location for index file
    -    String[] tempLocations = model.getStoreLocation();
    -    String chosenTempLocation = tempLocations[new Random().nextInt(tempLocations.length)];
    -    LOGGER.info("Randomly choose index file location: " + chosenTempLocation);
    +    String indexFileName;
    +    if (enableDirectlyWriteData2Hdfs) {
    +      String rawFileName = model.getCarbonDataDirectoryPath() + File.separator + CarbonTablePath
    +          .getCarbonIndexFileName(model.getCarbonDataFileAttributes().getTaskId(),
    +              model.getBucketId(), model.getTaskExtension(),
    +              "" + model.getCarbonDataFileAttributes().getFactTimeStamp());
    +      indexFileName = FileFactory.getUpdatedFilePath(rawFileName, FileFactory.FileType.HDFS);
    +    } else {
    +      // randomly choose a temp location for index file
    +      String[] tempLocations = model.getStoreLocation();
    +      String chosenTempLocation = tempLocations[new Random().nextInt(tempLocations.length)];
    +      LOGGER.info("Randomly choose index file location: " + chosenTempLocation);
    +      indexFileName = chosenTempLocation + File.separator + CarbonTablePath
    +          .getCarbonIndexFileName(model.getCarbonDataFileAttributes().getTaskId(),
    +              model.getBucketId(), model.getTaskExtension(),
    +              "" + model.getCarbonDataFileAttributes().getFactTimeStamp());
    +    }
     
    -    String fileName = chosenTempLocation + File.separator + CarbonTablePath
    -        .getCarbonIndexFileName(model.getCarbonDataFileAttributes().getTaskId(),
    -            model.getBucketId(), model.getTaskExtension(),
    -            "" + model.getCarbonDataFileAttributes().getFactTimeStamp());
         CarbonIndexFileWriter writer = new CarbonIndexFileWriter();
         // open file
    -    writer.openThriftWriter(fileName);
    +    writer.openThriftWriter(indexFileName);
         // write the header first
         writer.writeThrift(indexHeader);
         // write the indexes
         for (BlockIndex blockIndex : blockIndexThrift) {
           writer.writeThrift(blockIndex);
         }
         writer.close();
    -    // copy from temp to actual store location
    -    CarbonUtil.copyCarbonDataFileToCarbonStorePath(fileName,
    -            model.getCarbonDataDirectoryPath(),
    -            fileSizeInBytes);
    +    if (enableDirectlyWriteData2Hdfs) {
    +      executorServiceSubmitList.add(executorService.submit(
    +          new CopyThread(indexFileName, FileFactory.FileType.HDFS)));
    --- End diff --
   
    The name of CopyThread is confusing, is it copy or rename?


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1825: [CARBONDATA-2032][DataLoad] directly write ca...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1825#discussion_r171456256
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/writer/AbstractFactDataWriter.java ---
    @@ -386,29 +380,42 @@ protected void writeIndexFile() throws IOException, CarbonDataWriterException {
             .getIndexHeader(localCardinality, thriftColumnSchemaList, model.getBucketId());
         // get the block index info thrift
         List<BlockIndex> blockIndexThrift = CarbonMetadataUtil.getBlockIndexInfo(blockIndexInfoList);
    -    // randomly choose a temp location for index file
    -    String[] tempLocations = model.getStoreLocation();
    -    String chosenTempLocation = tempLocations[new Random().nextInt(tempLocations.length)];
    -    LOGGER.info("Randomly choose index file location: " + chosenTempLocation);
    +    String indexFileName;
    +    if (enableDirectlyWriteData2Hdfs) {
    +      String rawFileName = model.getCarbonDataDirectoryPath() + File.separator + CarbonTablePath
    +          .getCarbonIndexFileName(model.getCarbonDataFileAttributes().getTaskId(),
    +              model.getBucketId(), model.getTaskExtension(),
    +              "" + model.getCarbonDataFileAttributes().getFactTimeStamp());
    +      indexFileName = FileFactory.getUpdatedFilePath(rawFileName, FileFactory.FileType.HDFS);
    +    } else {
    +      // randomly choose a temp location for index file
    +      String[] tempLocations = model.getStoreLocation();
    +      String chosenTempLocation = tempLocations[new Random().nextInt(tempLocations.length)];
    +      LOGGER.info("Randomly choose index file location: " + chosenTempLocation);
    +      indexFileName = chosenTempLocation + File.separator + CarbonTablePath
    +          .getCarbonIndexFileName(model.getCarbonDataFileAttributes().getTaskId(),
    +              model.getBucketId(), model.getTaskExtension(),
    +              "" + model.getCarbonDataFileAttributes().getFactTimeStamp());
    +    }
     
    -    String fileName = chosenTempLocation + File.separator + CarbonTablePath
    -        .getCarbonIndexFileName(model.getCarbonDataFileAttributes().getTaskId(),
    -            model.getBucketId(), model.getTaskExtension(),
    -            "" + model.getCarbonDataFileAttributes().getFactTimeStamp());
         CarbonIndexFileWriter writer = new CarbonIndexFileWriter();
         // open file
    -    writer.openThriftWriter(fileName);
    +    writer.openThriftWriter(indexFileName);
         // write the header first
         writer.writeThrift(indexHeader);
         // write the indexes
         for (BlockIndex blockIndex : blockIndexThrift) {
           writer.writeThrift(blockIndex);
         }
         writer.close();
    -    // copy from temp to actual store location
    -    CarbonUtil.copyCarbonDataFileToCarbonStorePath(fileName,
    -            model.getCarbonDataDirectoryPath(),
    -            fileSizeInBytes);
    +    if (enableDirectlyWriteData2Hdfs) {
    +      executorServiceSubmitList.add(executorService.submit(
    +          new CopyThread(indexFileName, FileFactory.FileType.HDFS)));
    --- End diff --
   
    Sometimes it's copy,sometimes it's setReplication.
   
    How about using the name ‘CompleteHdfsBackupsThread’?
    Or we can still use the name ‘CopyThread’ and add some comments to explain since we may remove ‘write temp fact to local’ in the further as we discussed before.  At that that we can give it a proper name.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1825: [CARBONDATA-2032][DataLoad] directly write carbon da...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1825
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3995/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1825: [CARBONDATA-2032][DataLoad] directly write carbon da...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1825
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2751/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1825: [CARBONDATA-2032][DataLoad] directly write carbon da...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1825
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3998/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1825: [CARBONDATA-2032][DataLoad] directly write carbon da...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1825
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2754/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1825: [CARBONDATA-2032][DataLoad] directly write ca...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1825#discussion_r171470463
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/writer/AbstractFactDataWriter.java ---
    @@ -386,29 +380,42 @@ protected void writeIndexFile() throws IOException, CarbonDataWriterException {
             .getIndexHeader(localCardinality, thriftColumnSchemaList, model.getBucketId());
         // get the block index info thrift
         List<BlockIndex> blockIndexThrift = CarbonMetadataUtil.getBlockIndexInfo(blockIndexInfoList);
    -    // randomly choose a temp location for index file
    -    String[] tempLocations = model.getStoreLocation();
    -    String chosenTempLocation = tempLocations[new Random().nextInt(tempLocations.length)];
    -    LOGGER.info("Randomly choose index file location: " + chosenTempLocation);
    +    String indexFileName;
    +    if (enableDirectlyWriteData2Hdfs) {
    +      String rawFileName = model.getCarbonDataDirectoryPath() + File.separator + CarbonTablePath
    +          .getCarbonIndexFileName(model.getCarbonDataFileAttributes().getTaskId(),
    +              model.getBucketId(), model.getTaskExtension(),
    +              "" + model.getCarbonDataFileAttributes().getFactTimeStamp());
    +      indexFileName = FileFactory.getUpdatedFilePath(rawFileName, FileFactory.FileType.HDFS);
    +    } else {
    +      // randomly choose a temp location for index file
    +      String[] tempLocations = model.getStoreLocation();
    +      String chosenTempLocation = tempLocations[new Random().nextInt(tempLocations.length)];
    +      LOGGER.info("Randomly choose index file location: " + chosenTempLocation);
    +      indexFileName = chosenTempLocation + File.separator + CarbonTablePath
    +          .getCarbonIndexFileName(model.getCarbonDataFileAttributes().getTaskId(),
    +              model.getBucketId(), model.getTaskExtension(),
    +              "" + model.getCarbonDataFileAttributes().getFactTimeStamp());
    +    }
     
    -    String fileName = chosenTempLocation + File.separator + CarbonTablePath
    -        .getCarbonIndexFileName(model.getCarbonDataFileAttributes().getTaskId(),
    -            model.getBucketId(), model.getTaskExtension(),
    -            "" + model.getCarbonDataFileAttributes().getFactTimeStamp());
         CarbonIndexFileWriter writer = new CarbonIndexFileWriter();
         // open file
    -    writer.openThriftWriter(fileName);
    +    writer.openThriftWriter(indexFileName);
         // write the header first
         writer.writeThrift(indexHeader);
         // write the indexes
         for (BlockIndex blockIndex : blockIndexThrift) {
           writer.writeThrift(blockIndex);
         }
         writer.close();
    -    // copy from temp to actual store location
    -    CarbonUtil.copyCarbonDataFileToCarbonStorePath(fileName,
    -            model.getCarbonDataDirectoryPath(),
    -            fileSizeInBytes);
    +    if (enableDirectlyWriteData2Hdfs) {
    +      executorServiceSubmitList.add(executorService.submit(
    +          new CopyThread(indexFileName, FileFactory.FileType.HDFS)));
    --- End diff --
   
    use ‘completeHdfsBackendThread’ ~


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1825: [CARBONDATA-2032][DataLoad] directly write carbon da...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1825
 
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2823/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1825: [CARBONDATA-2032][DataLoad] directly write carbon da...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1825
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4069/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1825: [CARBONDATA-2032][DataLoad] directly write carbon da...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1825
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3763/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1825: [CARBONDATA-2032][DataLoad] directly write carbon da...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1825
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4088/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1825: [CARBONDATA-2032][DataLoad] directly write carbon da...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1825
 
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2843/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1825: [CARBONDATA-2032][DataLoad] directly write carbon da...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1825
 
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2844/



---
123456