[GitHub] [carbondata] niuge01 opened a new pull request #3532: [CARBONDATA-3557] Write flink streaming data to partition table

classic Classic list List threaded Threaded
39 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3532: [CARBONDATA-3557] Write flink streaming data to partition table

GitBox
CarbonDataQA1 commented on issue #3532: [CARBONDATA-3557] Write flink streaming data to partition table
URL: https://github.com/apache/carbondata/pull/3532#issuecomment-569406360
 
 
   Build Success with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1327/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3532: [CARBONDATA-3557] Write flink streaming data to partition table

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3532: [CARBONDATA-3557] Write flink streaming data to partition table
URL: https://github.com/apache/carbondata/pull/3532#issuecomment-569407298
 
 
   Build Success with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1320/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3532: [CARBONDATA-3557] Write flink streaming data to partition table

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3532: [CARBONDATA-3557] Write flink streaming data to partition table
URL: https://github.com/apache/carbondata/pull/3532#issuecomment-569418286
 
 
   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1343/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3532: [CARBONDATA-3557] Write flink streaming data to partition table

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3532: [CARBONDATA-3557] Write flink streaming data to partition table
URL: https://github.com/apache/carbondata/pull/3532#discussion_r361796705
 
 

 ##########
 File path: integration/flink/src/main/java/org/apache/carbon/flink/CarbonS3Writer.java
 ##########
 @@ -184,29 +204,15 @@ public void close() {
     }
   }
 
-  private Map<String, Long> uploadSegmentDataFiles(
-      final String localPath, final String remotePath) {
-    final File[] files = new File(localPath).listFiles();
-    if (files == null) {
-      return new HashMap<>(0);
+  private void closeWriters() throws IOException {
+    if (this.writerFactory == null) {
+      return;
     }
-    Map<String, Long> fileNameMapLength = new HashMap<>(files.length);
-    for (File file : files) {
-      fileNameMapLength.put(file.getName(), file.length());
-      if (LOGGER.isDebugEnabled()) {
-        LOGGER.debug("Upload file[" + file.getAbsolutePath() + "] to [" + remotePath + "] start.");
-      }
-      try {
-        CarbonUtil.copyCarbonDataFileToCarbonStorePath(file.getAbsolutePath(), remotePath, 1024);
-      } catch (CarbonDataWriterException exception) {
-        LOGGER.error(exception.getMessage(), exception);
-        throw exception;
-      }
-      if (LOGGER.isDebugEnabled()) {
-        LOGGER.debug("Upload file[" + file.getAbsolutePath() + "] to [" + remotePath + "] end.");
-      }
+    final List<org.apache.carbondata.sdk.file.CarbonWriter> writers =
 
 Review comment:
   ```suggestion
       final List<CarbonWriter> writers =
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3532: [CARBONDATA-3557] Write flink streaming data to partition table

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3532: [CARBONDATA-3557] Write flink streaming data to partition table
URL: https://github.com/apache/carbondata/pull/3532#discussion_r361796748
 
 

 ##########
 File path: integration/flink/src/main/java/org/apache/carbon/flink/CarbonWriter.java
 ##########
 @@ -17,10 +17,229 @@
 
 package org.apache.carbon.flink;
 
+import java.io.File;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datastore.exception.CarbonDataWriterException;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import org.apache.carbondata.core.metadata.schema.table.column.ColumnSchema;
+import org.apache.carbondata.core.statusmanager.StageInput;
+import org.apache.carbondata.core.util.CarbonUtil;
+
+import org.apache.log4j.Logger;
+
 /**
  * This class is a wrapper of CarbonWriter in SDK.
  * It not only write data to carbon with CarbonWriter in SDK, also generate segment file.
  */
-public abstract class CarbonWriter extends ProxyFileWriter<String> {
+public abstract class CarbonWriter extends ProxyFileWriter<Object[]> {
+
+  private static final Logger LOGGER =
+      LogServiceFactory.getLogService(CarbonS3Writer.class.getName());
 
 Review comment:
   ```suggestion
         LogServiceFactory.getLogService(CarbonWriter.class.getName());
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3532: [CARBONDATA-3557] Write flink streaming data to partition table

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3532: [CARBONDATA-3557] Write flink streaming data to partition table
URL: https://github.com/apache/carbondata/pull/3532#discussion_r361796793
 
 

 ##########
 File path: integration/flink/src/main/java/org/apache/carbon/flink/CarbonWriter.java
 ##########
 @@ -17,10 +17,229 @@
 
 package org.apache.carbon.flink;
 
+import java.io.File;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datastore.exception.CarbonDataWriterException;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import org.apache.carbondata.core.metadata.schema.table.column.ColumnSchema;
+import org.apache.carbondata.core.statusmanager.StageInput;
+import org.apache.carbondata.core.util.CarbonUtil;
+
+import org.apache.log4j.Logger;
+
 /**
  * This class is a wrapper of CarbonWriter in SDK.
  * It not only write data to carbon with CarbonWriter in SDK, also generate segment file.
  */
-public abstract class CarbonWriter extends ProxyFileWriter<String> {
+public abstract class CarbonWriter extends ProxyFileWriter<Object[]> {
+
+  private static final Logger LOGGER =
+      LogServiceFactory.getLogService(CarbonS3Writer.class.getName());
+
+  public CarbonWriter(final CarbonWriterFactory factory,
+      final String identifier, final CarbonTable table) {
+    ProxyFileWriterFactory.register(factory.getType(), factory.getClass());
+    if (LOGGER.isDebugEnabled()) {
+      LOGGER.debug("Open writer. " + this.toString());
+    }
+    this.factory = factory;
+    this.identifier = identifier;
+    this.table = table;
+  }
+
+  private final CarbonWriterFactory factory;
+
+  private final String identifier;
+
+  protected final CarbonTable table;
+
+  @Override
+  public CarbonWriterFactory getFactory() {
+    return this.factory;
+  }
+
+  @Override
+  public String getIdentifier() {
+    return this.identifier;
+  }
+
+  /**
+   * @return when there is no data file uploaded, then return <code>null</code>.
+   */
+  protected StageInput uploadSegmentDataFiles(final String localPath, final String remotePath) {
+    if (!this.table.isHivePartitionTable()) {
+      final File[] files = new File(localPath).listFiles();
+      if (files == null) {
+        return null;
+      }
+      Map<String, Long> fileNameMapLength = new HashMap<>(files.length);
+      for (File file : files) {
+        fileNameMapLength.put(file.getName(), file.length());
+        if (LOGGER.isDebugEnabled()) {
+          LOGGER.debug(
+              "Upload file[" + file.getAbsolutePath() + "] to [" + remotePath + "] start.");
+        }
+        try {
+          CarbonUtil.copyCarbonDataFileToCarbonStorePath(file.getAbsolutePath(), remotePath, 1024);
+        } catch (CarbonDataWriterException exception) {
+          LOGGER.error(exception.getMessage(), exception);
+          throw exception;
+        }
+        if (LOGGER.isDebugEnabled()) {
+          LOGGER.debug("Upload file[" + file.getAbsolutePath() + "] to [" + remotePath + "] end.");
+        }
+      }
+      return new StageInput(remotePath, fileNameMapLength);
+    } else {
+      final List<StageInput.PartitionLocation> partitionLocationList = new ArrayList<>();
+      final List<String> partitions = new ArrayList<>();
+      uploadSegmentDataFiles(new File(localPath), remotePath, partitionLocationList, partitions);
+      if (partitionLocationList.isEmpty()) {
+        return null;
+      } else {
+        return new StageInput(remotePath, partitionLocationList);
+      }
+    }
+  }
+
+  private static void uploadSegmentDataFiles(
 
 Review comment:
   please add comment to describe this function

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3532: [CARBONDATA-3557] Write flink streaming data to partition table

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3532: [CARBONDATA-3557] Write flink streaming data to partition table
URL: https://github.com/apache/carbondata/pull/3532#discussion_r361796868
 
 

 ##########
 File path: integration/flink/src/main/java/org/apache/carbon/flink/CarbonWriter.java
 ##########
 @@ -17,10 +17,229 @@
 
 package org.apache.carbon.flink;
 
+import java.io.File;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datastore.exception.CarbonDataWriterException;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import org.apache.carbondata.core.metadata.schema.table.column.ColumnSchema;
+import org.apache.carbondata.core.statusmanager.StageInput;
+import org.apache.carbondata.core.util.CarbonUtil;
+
+import org.apache.log4j.Logger;
+
 /**
  * This class is a wrapper of CarbonWriter in SDK.
  * It not only write data to carbon with CarbonWriter in SDK, also generate segment file.
  */
-public abstract class CarbonWriter extends ProxyFileWriter<String> {
+public abstract class CarbonWriter extends ProxyFileWriter<Object[]> {
+
+  private static final Logger LOGGER =
+      LogServiceFactory.getLogService(CarbonS3Writer.class.getName());
+
+  public CarbonWriter(final CarbonWriterFactory factory,
+      final String identifier, final CarbonTable table) {
+    ProxyFileWriterFactory.register(factory.getType(), factory.getClass());
+    if (LOGGER.isDebugEnabled()) {
+      LOGGER.debug("Open writer. " + this.toString());
+    }
+    this.factory = factory;
+    this.identifier = identifier;
+    this.table = table;
+  }
+
+  private final CarbonWriterFactory factory;
+
+  private final String identifier;
+
+  protected final CarbonTable table;
+
+  @Override
+  public CarbonWriterFactory getFactory() {
+    return this.factory;
+  }
+
+  @Override
+  public String getIdentifier() {
+    return this.identifier;
+  }
+
+  /**
+   * @return when there is no data file uploaded, then return <code>null</code>.
+   */
+  protected StageInput uploadSegmentDataFiles(final String localPath, final String remotePath) {
+    if (!this.table.isHivePartitionTable()) {
+      final File[] files = new File(localPath).listFiles();
+      if (files == null) {
 
 Review comment:
   ```suggestion
         if (files == null || files.length == 0) {
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3532: [CARBONDATA-3557] Write flink streaming data to partition table

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3532: [CARBONDATA-3557] Write flink streaming data to partition table
URL: https://github.com/apache/carbondata/pull/3532#discussion_r361796894
 
 

 ##########
 File path: integration/flink/src/main/java/org/apache/carbon/flink/CarbonWriter.java
 ##########
 @@ -17,10 +17,229 @@
 
 package org.apache.carbon.flink;
 
+import java.io.File;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datastore.exception.CarbonDataWriterException;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import org.apache.carbondata.core.metadata.schema.table.column.ColumnSchema;
+import org.apache.carbondata.core.statusmanager.StageInput;
+import org.apache.carbondata.core.util.CarbonUtil;
+
+import org.apache.log4j.Logger;
+
 /**
  * This class is a wrapper of CarbonWriter in SDK.
  * It not only write data to carbon with CarbonWriter in SDK, also generate segment file.
  */
-public abstract class CarbonWriter extends ProxyFileWriter<String> {
+public abstract class CarbonWriter extends ProxyFileWriter<Object[]> {
+
+  private static final Logger LOGGER =
+      LogServiceFactory.getLogService(CarbonS3Writer.class.getName());
+
+  public CarbonWriter(final CarbonWriterFactory factory,
+      final String identifier, final CarbonTable table) {
+    ProxyFileWriterFactory.register(factory.getType(), factory.getClass());
+    if (LOGGER.isDebugEnabled()) {
+      LOGGER.debug("Open writer. " + this.toString());
+    }
+    this.factory = factory;
+    this.identifier = identifier;
+    this.table = table;
+  }
+
+  private final CarbonWriterFactory factory;
+
+  private final String identifier;
+
+  protected final CarbonTable table;
+
+  @Override
+  public CarbonWriterFactory getFactory() {
+    return this.factory;
+  }
+
+  @Override
+  public String getIdentifier() {
+    return this.identifier;
+  }
+
+  /**
+   * @return when there is no data file uploaded, then return <code>null</code>.
+   */
+  protected StageInput uploadSegmentDataFiles(final String localPath, final String remotePath) {
+    if (!this.table.isHivePartitionTable()) {
+      final File[] files = new File(localPath).listFiles();
+      if (files == null) {
+        return null;
+      }
+      Map<String, Long> fileNameMapLength = new HashMap<>(files.length);
+      for (File file : files) {
+        fileNameMapLength.put(file.getName(), file.length());
+        if (LOGGER.isDebugEnabled()) {
+          LOGGER.debug(
+              "Upload file[" + file.getAbsolutePath() + "] to [" + remotePath + "] start.");
+        }
+        try {
+          CarbonUtil.copyCarbonDataFileToCarbonStorePath(file.getAbsolutePath(), remotePath, 1024);
 
 Review comment:
   create a constant for 1024

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3532: [CARBONDATA-3557] Write flink streaming data to partition table

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3532: [CARBONDATA-3557] Write flink streaming data to partition table
URL: https://github.com/apache/carbondata/pull/3532#discussion_r361796904
 
 

 ##########
 File path: integration/flink/src/main/java/org/apache/carbon/flink/CarbonWriter.java
 ##########
 @@ -17,10 +17,229 @@
 
 package org.apache.carbon.flink;
 
+import java.io.File;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datastore.exception.CarbonDataWriterException;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import org.apache.carbondata.core.metadata.schema.table.column.ColumnSchema;
+import org.apache.carbondata.core.statusmanager.StageInput;
+import org.apache.carbondata.core.util.CarbonUtil;
+
+import org.apache.log4j.Logger;
+
 /**
  * This class is a wrapper of CarbonWriter in SDK.
  * It not only write data to carbon with CarbonWriter in SDK, also generate segment file.
  */
-public abstract class CarbonWriter extends ProxyFileWriter<String> {
+public abstract class CarbonWriter extends ProxyFileWriter<Object[]> {
+
+  private static final Logger LOGGER =
+      LogServiceFactory.getLogService(CarbonS3Writer.class.getName());
+
+  public CarbonWriter(final CarbonWriterFactory factory,
+      final String identifier, final CarbonTable table) {
+    ProxyFileWriterFactory.register(factory.getType(), factory.getClass());
+    if (LOGGER.isDebugEnabled()) {
+      LOGGER.debug("Open writer. " + this.toString());
+    }
+    this.factory = factory;
+    this.identifier = identifier;
+    this.table = table;
+  }
+
+  private final CarbonWriterFactory factory;
+
+  private final String identifier;
+
+  protected final CarbonTable table;
+
+  @Override
+  public CarbonWriterFactory getFactory() {
+    return this.factory;
+  }
+
+  @Override
+  public String getIdentifier() {
+    return this.identifier;
+  }
+
+  /**
+   * @return when there is no data file uploaded, then return <code>null</code>.
+   */
+  protected StageInput uploadSegmentDataFiles(final String localPath, final String remotePath) {
+    if (!this.table.isHivePartitionTable()) {
+      final File[] files = new File(localPath).listFiles();
+      if (files == null) {
+        return null;
+      }
+      Map<String, Long> fileNameMapLength = new HashMap<>(files.length);
+      for (File file : files) {
+        fileNameMapLength.put(file.getName(), file.length());
+        if (LOGGER.isDebugEnabled()) {
+          LOGGER.debug(
+              "Upload file[" + file.getAbsolutePath() + "] to [" + remotePath + "] start.");
+        }
+        try {
+          CarbonUtil.copyCarbonDataFileToCarbonStorePath(file.getAbsolutePath(), remotePath, 1024);
+        } catch (CarbonDataWriterException exception) {
+          LOGGER.error(exception.getMessage(), exception);
+          throw exception;
+        }
+        if (LOGGER.isDebugEnabled()) {
+          LOGGER.debug("Upload file[" + file.getAbsolutePath() + "] to [" + remotePath + "] end.");
 
 Review comment:
   ```suggestion
             LOGGER.debug("Upload file[" + file.getAbsolutePath() + "] to [" + remotePath + "] finished.");
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3532: [CARBONDATA-3557] Write flink streaming data to partition table

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3532: [CARBONDATA-3557] Write flink streaming data to partition table
URL: https://github.com/apache/carbondata/pull/3532#discussion_r361796993
 
 

 ##########
 File path: integration/flink/src/main/java/org/apache/carbon/flink/CarbonWriter.java
 ##########
 @@ -17,10 +17,229 @@
 
 package org.apache.carbon.flink;
 
+import java.io.File;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datastore.exception.CarbonDataWriterException;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import org.apache.carbondata.core.metadata.schema.table.column.ColumnSchema;
+import org.apache.carbondata.core.statusmanager.StageInput;
+import org.apache.carbondata.core.util.CarbonUtil;
+
+import org.apache.log4j.Logger;
+
 /**
  * This class is a wrapper of CarbonWriter in SDK.
  * It not only write data to carbon with CarbonWriter in SDK, also generate segment file.
  */
-public abstract class CarbonWriter extends ProxyFileWriter<String> {
+public abstract class CarbonWriter extends ProxyFileWriter<Object[]> {
+
+  private static final Logger LOGGER =
+      LogServiceFactory.getLogService(CarbonS3Writer.class.getName());
+
+  public CarbonWriter(final CarbonWriterFactory factory,
+      final String identifier, final CarbonTable table) {
+    ProxyFileWriterFactory.register(factory.getType(), factory.getClass());
+    if (LOGGER.isDebugEnabled()) {
+      LOGGER.debug("Open writer. " + this.toString());
+    }
+    this.factory = factory;
+    this.identifier = identifier;
+    this.table = table;
+  }
+
+  private final CarbonWriterFactory factory;
+
+  private final String identifier;
+
+  protected final CarbonTable table;
+
+  @Override
+  public CarbonWriterFactory getFactory() {
+    return this.factory;
+  }
+
+  @Override
+  public String getIdentifier() {
+    return this.identifier;
+  }
+
+  /**
+   * @return when there is no data file uploaded, then return <code>null</code>.
+   */
+  protected StageInput uploadSegmentDataFiles(final String localPath, final String remotePath) {
+    if (!this.table.isHivePartitionTable()) {
+      final File[] files = new File(localPath).listFiles();
+      if (files == null) {
+        return null;
+      }
+      Map<String, Long> fileNameMapLength = new HashMap<>(files.length);
+      for (File file : files) {
+        fileNameMapLength.put(file.getName(), file.length());
+        if (LOGGER.isDebugEnabled()) {
+          LOGGER.debug(
+              "Upload file[" + file.getAbsolutePath() + "] to [" + remotePath + "] start.");
+        }
+        try {
+          CarbonUtil.copyCarbonDataFileToCarbonStorePath(file.getAbsolutePath(), remotePath, 1024);
+        } catch (CarbonDataWriterException exception) {
+          LOGGER.error(exception.getMessage(), exception);
+          throw exception;
+        }
+        if (LOGGER.isDebugEnabled()) {
+          LOGGER.debug("Upload file[" + file.getAbsolutePath() + "] to [" + remotePath + "] end.");
+        }
+      }
+      return new StageInput(remotePath, fileNameMapLength);
+    } else {
+      final List<StageInput.PartitionLocation> partitionLocationList = new ArrayList<>();
+      final List<String> partitions = new ArrayList<>();
+      uploadSegmentDataFiles(new File(localPath), remotePath, partitionLocationList, partitions);
+      if (partitionLocationList.isEmpty()) {
+        return null;
+      } else {
+        return new StageInput(remotePath, partitionLocationList);
+      }
+    }
+  }
+
+  private static void uploadSegmentDataFiles(
+      final File directory, final String remotePath,
 
 Review comment:
   ```suggestion
         final File localDirectory, final String remotePath,
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3532: [CARBONDATA-3557] Write flink streaming data to partition table

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3532: [CARBONDATA-3557] Write flink streaming data to partition table
URL: https://github.com/apache/carbondata/pull/3532#discussion_r361796993
 
 

 ##########
 File path: integration/flink/src/main/java/org/apache/carbon/flink/CarbonWriter.java
 ##########
 @@ -17,10 +17,229 @@
 
 package org.apache.carbon.flink;
 
+import java.io.File;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datastore.exception.CarbonDataWriterException;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import org.apache.carbondata.core.metadata.schema.table.column.ColumnSchema;
+import org.apache.carbondata.core.statusmanager.StageInput;
+import org.apache.carbondata.core.util.CarbonUtil;
+
+import org.apache.log4j.Logger;
+
 /**
  * This class is a wrapper of CarbonWriter in SDK.
  * It not only write data to carbon with CarbonWriter in SDK, also generate segment file.
  */
-public abstract class CarbonWriter extends ProxyFileWriter<String> {
+public abstract class CarbonWriter extends ProxyFileWriter<Object[]> {
+
+  private static final Logger LOGGER =
+      LogServiceFactory.getLogService(CarbonS3Writer.class.getName());
+
+  public CarbonWriter(final CarbonWriterFactory factory,
+      final String identifier, final CarbonTable table) {
+    ProxyFileWriterFactory.register(factory.getType(), factory.getClass());
+    if (LOGGER.isDebugEnabled()) {
+      LOGGER.debug("Open writer. " + this.toString());
+    }
+    this.factory = factory;
+    this.identifier = identifier;
+    this.table = table;
+  }
+
+  private final CarbonWriterFactory factory;
+
+  private final String identifier;
+
+  protected final CarbonTable table;
+
+  @Override
+  public CarbonWriterFactory getFactory() {
+    return this.factory;
+  }
+
+  @Override
+  public String getIdentifier() {
+    return this.identifier;
+  }
+
+  /**
+   * @return when there is no data file uploaded, then return <code>null</code>.
+   */
+  protected StageInput uploadSegmentDataFiles(final String localPath, final String remotePath) {
+    if (!this.table.isHivePartitionTable()) {
+      final File[] files = new File(localPath).listFiles();
+      if (files == null) {
+        return null;
+      }
+      Map<String, Long> fileNameMapLength = new HashMap<>(files.length);
+      for (File file : files) {
+        fileNameMapLength.put(file.getName(), file.length());
+        if (LOGGER.isDebugEnabled()) {
+          LOGGER.debug(
+              "Upload file[" + file.getAbsolutePath() + "] to [" + remotePath + "] start.");
+        }
+        try {
+          CarbonUtil.copyCarbonDataFileToCarbonStorePath(file.getAbsolutePath(), remotePath, 1024);
+        } catch (CarbonDataWriterException exception) {
+          LOGGER.error(exception.getMessage(), exception);
+          throw exception;
+        }
+        if (LOGGER.isDebugEnabled()) {
+          LOGGER.debug("Upload file[" + file.getAbsolutePath() + "] to [" + remotePath + "] end.");
+        }
+      }
+      return new StageInput(remotePath, fileNameMapLength);
+    } else {
+      final List<StageInput.PartitionLocation> partitionLocationList = new ArrayList<>();
+      final List<String> partitions = new ArrayList<>();
+      uploadSegmentDataFiles(new File(localPath), remotePath, partitionLocationList, partitions);
+      if (partitionLocationList.isEmpty()) {
+        return null;
+      } else {
+        return new StageInput(remotePath, partitionLocationList);
+      }
+    }
+  }
+
+  private static void uploadSegmentDataFiles(
+      final File directory, final String remotePath,
 
 Review comment:
   ```suggestion
         final File localPath, final String remotePath,
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3532: [CARBONDATA-3557] Write flink streaming data to partition table

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3532: [CARBONDATA-3557] Write flink streaming data to partition table
URL: https://github.com/apache/carbondata/pull/3532#discussion_r361797038
 
 

 ##########
 File path: integration/flink/src/main/java/org/apache/carbon/flink/CarbonWriter.java
 ##########
 @@ -17,10 +17,229 @@
 
 package org.apache.carbon.flink;
 
+import java.io.File;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datastore.exception.CarbonDataWriterException;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import org.apache.carbondata.core.metadata.schema.table.column.ColumnSchema;
+import org.apache.carbondata.core.statusmanager.StageInput;
+import org.apache.carbondata.core.util.CarbonUtil;
+
+import org.apache.log4j.Logger;
+
 /**
  * This class is a wrapper of CarbonWriter in SDK.
  * It not only write data to carbon with CarbonWriter in SDK, also generate segment file.
  */
-public abstract class CarbonWriter extends ProxyFileWriter<String> {
+public abstract class CarbonWriter extends ProxyFileWriter<Object[]> {
+
+  private static final Logger LOGGER =
+      LogServiceFactory.getLogService(CarbonS3Writer.class.getName());
+
+  public CarbonWriter(final CarbonWriterFactory factory,
+      final String identifier, final CarbonTable table) {
+    ProxyFileWriterFactory.register(factory.getType(), factory.getClass());
+    if (LOGGER.isDebugEnabled()) {
+      LOGGER.debug("Open writer. " + this.toString());
+    }
+    this.factory = factory;
+    this.identifier = identifier;
+    this.table = table;
+  }
+
+  private final CarbonWriterFactory factory;
+
+  private final String identifier;
+
+  protected final CarbonTable table;
+
+  @Override
+  public CarbonWriterFactory getFactory() {
+    return this.factory;
+  }
+
+  @Override
+  public String getIdentifier() {
+    return this.identifier;
+  }
+
+  /**
+   * @return when there is no data file uploaded, then return <code>null</code>.
+   */
+  protected StageInput uploadSegmentDataFiles(final String localPath, final String remotePath) {
+    if (!this.table.isHivePartitionTable()) {
+      final File[] files = new File(localPath).listFiles();
+      if (files == null) {
+        return null;
+      }
+      Map<String, Long> fileNameMapLength = new HashMap<>(files.length);
+      for (File file : files) {
+        fileNameMapLength.put(file.getName(), file.length());
+        if (LOGGER.isDebugEnabled()) {
+          LOGGER.debug(
+              "Upload file[" + file.getAbsolutePath() + "] to [" + remotePath + "] start.");
+        }
+        try {
+          CarbonUtil.copyCarbonDataFileToCarbonStorePath(file.getAbsolutePath(), remotePath, 1024);
+        } catch (CarbonDataWriterException exception) {
+          LOGGER.error(exception.getMessage(), exception);
+          throw exception;
+        }
+        if (LOGGER.isDebugEnabled()) {
+          LOGGER.debug("Upload file[" + file.getAbsolutePath() + "] to [" + remotePath + "] end.");
+        }
+      }
+      return new StageInput(remotePath, fileNameMapLength);
+    } else {
+      final List<StageInput.PartitionLocation> partitionLocationList = new ArrayList<>();
+      final List<String> partitions = new ArrayList<>();
+      uploadSegmentDataFiles(new File(localPath), remotePath, partitionLocationList, partitions);
+      if (partitionLocationList.isEmpty()) {
+        return null;
+      } else {
+        return new StageInput(remotePath, partitionLocationList);
+      }
+    }
+  }
+
+  private static void uploadSegmentDataFiles(
+      final File directory, final String remotePath,
+      final List<StageInput.PartitionLocation> partitionLocationList,
+      final List<String> partitions
+  ) {
+    final File[] files = directory.listFiles();
+    if (files == null) {
 
 Review comment:
   ```suggestion
       if (files == null || files.length == 0) {
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3532: [CARBONDATA-3557] Write flink streaming data to partition table

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3532: [CARBONDATA-3557] Write flink streaming data to partition table
URL: https://github.com/apache/carbondata/pull/3532#discussion_r361797118
 
 

 ##########
 File path: integration/flink/src/main/java/org/apache/carbon/flink/CarbonWriter.java
 ##########
 @@ -17,10 +17,229 @@
 
 package org.apache.carbon.flink;
 
+import java.io.File;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datastore.exception.CarbonDataWriterException;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import org.apache.carbondata.core.metadata.schema.table.column.ColumnSchema;
+import org.apache.carbondata.core.statusmanager.StageInput;
+import org.apache.carbondata.core.util.CarbonUtil;
+
+import org.apache.log4j.Logger;
+
 /**
  * This class is a wrapper of CarbonWriter in SDK.
  * It not only write data to carbon with CarbonWriter in SDK, also generate segment file.
  */
-public abstract class CarbonWriter extends ProxyFileWriter<String> {
+public abstract class CarbonWriter extends ProxyFileWriter<Object[]> {
+
+  private static final Logger LOGGER =
+      LogServiceFactory.getLogService(CarbonS3Writer.class.getName());
+
+  public CarbonWriter(final CarbonWriterFactory factory,
+      final String identifier, final CarbonTable table) {
+    ProxyFileWriterFactory.register(factory.getType(), factory.getClass());
+    if (LOGGER.isDebugEnabled()) {
+      LOGGER.debug("Open writer. " + this.toString());
+    }
+    this.factory = factory;
+    this.identifier = identifier;
+    this.table = table;
+  }
+
+  private final CarbonWriterFactory factory;
+
+  private final String identifier;
+
+  protected final CarbonTable table;
+
+  @Override
+  public CarbonWriterFactory getFactory() {
+    return this.factory;
+  }
+
+  @Override
+  public String getIdentifier() {
+    return this.identifier;
+  }
+
+  /**
+   * @return when there is no data file uploaded, then return <code>null</code>.
+   */
+  protected StageInput uploadSegmentDataFiles(final String localPath, final String remotePath) {
+    if (!this.table.isHivePartitionTable()) {
+      final File[] files = new File(localPath).listFiles();
+      if (files == null) {
+        return null;
+      }
+      Map<String, Long> fileNameMapLength = new HashMap<>(files.length);
+      for (File file : files) {
+        fileNameMapLength.put(file.getName(), file.length());
+        if (LOGGER.isDebugEnabled()) {
+          LOGGER.debug(
+              "Upload file[" + file.getAbsolutePath() + "] to [" + remotePath + "] start.");
+        }
+        try {
+          CarbonUtil.copyCarbonDataFileToCarbonStorePath(file.getAbsolutePath(), remotePath, 1024);
+        } catch (CarbonDataWriterException exception) {
+          LOGGER.error(exception.getMessage(), exception);
+          throw exception;
+        }
+        if (LOGGER.isDebugEnabled()) {
+          LOGGER.debug("Upload file[" + file.getAbsolutePath() + "] to [" + remotePath + "] end.");
+        }
+      }
+      return new StageInput(remotePath, fileNameMapLength);
+    } else {
+      final List<StageInput.PartitionLocation> partitionLocationList = new ArrayList<>();
+      final List<String> partitions = new ArrayList<>();
+      uploadSegmentDataFiles(new File(localPath), remotePath, partitionLocationList, partitions);
+      if (partitionLocationList.isEmpty()) {
+        return null;
+      } else {
+        return new StageInput(remotePath, partitionLocationList);
+      }
+    }
+  }
+
+  private static void uploadSegmentDataFiles(
+      final File directory, final String remotePath,
+      final List<StageInput.PartitionLocation> partitionLocationList,
+      final List<String> partitions
+  ) {
+    final File[] files = directory.listFiles();
+    if (files == null) {
+      return;
+    }
+    Map<String, Long> fileNameMapLength = new HashMap<>();
+    for (File file : files) {
+      if (file.isDirectory()) {
+        partitions.add(file.getName());
+        uploadSegmentDataFiles(file, remotePath, partitionLocationList, partitions);
+        partitions.remove(partitions.size() - 1);
+        continue;
+      }
+      fileNameMapLength.put(file.getName(), file.length());
+      if (LOGGER.isDebugEnabled()) {
+        LOGGER.debug("Upload file[" + file.getAbsolutePath() + "] to [" + remotePath + "] start.");
+      }
+      try {
+        CarbonUtil.copyCarbonDataFileToCarbonStorePath(file.getAbsolutePath(), remotePath, 1024);
+      } catch (CarbonDataWriterException exception) {
+        LOGGER.error(exception.getMessage(), exception);
+        throw exception;
+      }
+      if (LOGGER.isDebugEnabled()) {
+        LOGGER.debug("Upload file[" + file.getAbsolutePath() + "] to [" + remotePath + "] end.");
 
 Review comment:
   ```suggestion
           LOGGER.debug("Upload file[" + file.getAbsolutePath() + "] to [" + remotePath + "] finished.");
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3532: [CARBONDATA-3557] Write flink streaming data to partition table

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3532: [CARBONDATA-3557] Write flink streaming data to partition table
URL: https://github.com/apache/carbondata/pull/3532#discussion_r361797118
 
 

 ##########
 File path: integration/flink/src/main/java/org/apache/carbon/flink/CarbonWriter.java
 ##########
 @@ -17,10 +17,229 @@
 
 package org.apache.carbon.flink;
 
+import java.io.File;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datastore.exception.CarbonDataWriterException;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import org.apache.carbondata.core.metadata.schema.table.column.ColumnSchema;
+import org.apache.carbondata.core.statusmanager.StageInput;
+import org.apache.carbondata.core.util.CarbonUtil;
+
+import org.apache.log4j.Logger;
+
 /**
  * This class is a wrapper of CarbonWriter in SDK.
  * It not only write data to carbon with CarbonWriter in SDK, also generate segment file.
  */
-public abstract class CarbonWriter extends ProxyFileWriter<String> {
+public abstract class CarbonWriter extends ProxyFileWriter<Object[]> {
+
+  private static final Logger LOGGER =
+      LogServiceFactory.getLogService(CarbonS3Writer.class.getName());
+
+  public CarbonWriter(final CarbonWriterFactory factory,
+      final String identifier, final CarbonTable table) {
+    ProxyFileWriterFactory.register(factory.getType(), factory.getClass());
+    if (LOGGER.isDebugEnabled()) {
+      LOGGER.debug("Open writer. " + this.toString());
+    }
+    this.factory = factory;
+    this.identifier = identifier;
+    this.table = table;
+  }
+
+  private final CarbonWriterFactory factory;
+
+  private final String identifier;
+
+  protected final CarbonTable table;
+
+  @Override
+  public CarbonWriterFactory getFactory() {
+    return this.factory;
+  }
+
+  @Override
+  public String getIdentifier() {
+    return this.identifier;
+  }
+
+  /**
+   * @return when there is no data file uploaded, then return <code>null</code>.
+   */
+  protected StageInput uploadSegmentDataFiles(final String localPath, final String remotePath) {
+    if (!this.table.isHivePartitionTable()) {
+      final File[] files = new File(localPath).listFiles();
+      if (files == null) {
+        return null;
+      }
+      Map<String, Long> fileNameMapLength = new HashMap<>(files.length);
+      for (File file : files) {
+        fileNameMapLength.put(file.getName(), file.length());
+        if (LOGGER.isDebugEnabled()) {
+          LOGGER.debug(
+              "Upload file[" + file.getAbsolutePath() + "] to [" + remotePath + "] start.");
+        }
+        try {
+          CarbonUtil.copyCarbonDataFileToCarbonStorePath(file.getAbsolutePath(), remotePath, 1024);
+        } catch (CarbonDataWriterException exception) {
+          LOGGER.error(exception.getMessage(), exception);
+          throw exception;
+        }
+        if (LOGGER.isDebugEnabled()) {
+          LOGGER.debug("Upload file[" + file.getAbsolutePath() + "] to [" + remotePath + "] end.");
+        }
+      }
+      return new StageInput(remotePath, fileNameMapLength);
+    } else {
+      final List<StageInput.PartitionLocation> partitionLocationList = new ArrayList<>();
+      final List<String> partitions = new ArrayList<>();
+      uploadSegmentDataFiles(new File(localPath), remotePath, partitionLocationList, partitions);
+      if (partitionLocationList.isEmpty()) {
+        return null;
+      } else {
+        return new StageInput(remotePath, partitionLocationList);
+      }
+    }
+  }
+
+  private static void uploadSegmentDataFiles(
+      final File directory, final String remotePath,
+      final List<StageInput.PartitionLocation> partitionLocationList,
+      final List<String> partitions
+  ) {
+    final File[] files = directory.listFiles();
+    if (files == null) {
+      return;
+    }
+    Map<String, Long> fileNameMapLength = new HashMap<>();
+    for (File file : files) {
+      if (file.isDirectory()) {
+        partitions.add(file.getName());
+        uploadSegmentDataFiles(file, remotePath, partitionLocationList, partitions);
+        partitions.remove(partitions.size() - 1);
+        continue;
+      }
+      fileNameMapLength.put(file.getName(), file.length());
+      if (LOGGER.isDebugEnabled()) {
+        LOGGER.debug("Upload file[" + file.getAbsolutePath() + "] to [" + remotePath + "] start.");
+      }
+      try {
+        CarbonUtil.copyCarbonDataFileToCarbonStorePath(file.getAbsolutePath(), remotePath, 1024);
+      } catch (CarbonDataWriterException exception) {
+        LOGGER.error(exception.getMessage(), exception);
+        throw exception;
+      }
+      if (LOGGER.isDebugEnabled()) {
+        LOGGER.debug("Upload file[" + file.getAbsolutePath() + "] to [" + remotePath + "] end.");
 
 Review comment:
   ```suggestion
           LOGGER.debug("Upload file[" + file.getAbsolutePath() + "] to [" + remotePath + "] finished.");
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3532: [CARBONDATA-3557] Write flink streaming data to partition table

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3532: [CARBONDATA-3557] Write flink streaming data to partition table
URL: https://github.com/apache/carbondata/pull/3532#discussion_r361796904
 
 

 ##########
 File path: integration/flink/src/main/java/org/apache/carbon/flink/CarbonWriter.java
 ##########
 @@ -17,10 +17,229 @@
 
 package org.apache.carbon.flink;
 
+import java.io.File;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datastore.exception.CarbonDataWriterException;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import org.apache.carbondata.core.metadata.schema.table.column.ColumnSchema;
+import org.apache.carbondata.core.statusmanager.StageInput;
+import org.apache.carbondata.core.util.CarbonUtil;
+
+import org.apache.log4j.Logger;
+
 /**
  * This class is a wrapper of CarbonWriter in SDK.
  * It not only write data to carbon with CarbonWriter in SDK, also generate segment file.
  */
-public abstract class CarbonWriter extends ProxyFileWriter<String> {
+public abstract class CarbonWriter extends ProxyFileWriter<Object[]> {
+
+  private static final Logger LOGGER =
+      LogServiceFactory.getLogService(CarbonS3Writer.class.getName());
+
+  public CarbonWriter(final CarbonWriterFactory factory,
+      final String identifier, final CarbonTable table) {
+    ProxyFileWriterFactory.register(factory.getType(), factory.getClass());
+    if (LOGGER.isDebugEnabled()) {
+      LOGGER.debug("Open writer. " + this.toString());
+    }
+    this.factory = factory;
+    this.identifier = identifier;
+    this.table = table;
+  }
+
+  private final CarbonWriterFactory factory;
+
+  private final String identifier;
+
+  protected final CarbonTable table;
+
+  @Override
+  public CarbonWriterFactory getFactory() {
+    return this.factory;
+  }
+
+  @Override
+  public String getIdentifier() {
+    return this.identifier;
+  }
+
+  /**
+   * @return when there is no data file uploaded, then return <code>null</code>.
+   */
+  protected StageInput uploadSegmentDataFiles(final String localPath, final String remotePath) {
+    if (!this.table.isHivePartitionTable()) {
+      final File[] files = new File(localPath).listFiles();
+      if (files == null) {
+        return null;
+      }
+      Map<String, Long> fileNameMapLength = new HashMap<>(files.length);
+      for (File file : files) {
+        fileNameMapLength.put(file.getName(), file.length());
+        if (LOGGER.isDebugEnabled()) {
+          LOGGER.debug(
+              "Upload file[" + file.getAbsolutePath() + "] to [" + remotePath + "] start.");
+        }
+        try {
+          CarbonUtil.copyCarbonDataFileToCarbonStorePath(file.getAbsolutePath(), remotePath, 1024);
+        } catch (CarbonDataWriterException exception) {
+          LOGGER.error(exception.getMessage(), exception);
+          throw exception;
+        }
+        if (LOGGER.isDebugEnabled()) {
+          LOGGER.debug("Upload file[" + file.getAbsolutePath() + "] to [" + remotePath + "] end.");
 
 Review comment:
   ```suggestion
             LOGGER.debug("Upload file[" + file.getAbsolutePath() + "] to [" + remotePath + "] finished.");
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3532: [CARBONDATA-3557] Write flink streaming data to partition table

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3532: [CARBONDATA-3557] Write flink streaming data to partition table
URL: https://github.com/apache/carbondata/pull/3532#discussion_r361796705
 
 

 ##########
 File path: integration/flink/src/main/java/org/apache/carbon/flink/CarbonS3Writer.java
 ##########
 @@ -184,29 +204,15 @@ public void close() {
     }
   }
 
-  private Map<String, Long> uploadSegmentDataFiles(
-      final String localPath, final String remotePath) {
-    final File[] files = new File(localPath).listFiles();
-    if (files == null) {
-      return new HashMap<>(0);
+  private void closeWriters() throws IOException {
+    if (this.writerFactory == null) {
+      return;
     }
-    Map<String, Long> fileNameMapLength = new HashMap<>(files.length);
-    for (File file : files) {
-      fileNameMapLength.put(file.getName(), file.length());
-      if (LOGGER.isDebugEnabled()) {
-        LOGGER.debug("Upload file[" + file.getAbsolutePath() + "] to [" + remotePath + "] start.");
-      }
-      try {
-        CarbonUtil.copyCarbonDataFileToCarbonStorePath(file.getAbsolutePath(), remotePath, 1024);
-      } catch (CarbonDataWriterException exception) {
-        LOGGER.error(exception.getMessage(), exception);
-        throw exception;
-      }
-      if (LOGGER.isDebugEnabled()) {
-        LOGGER.debug("Upload file[" + file.getAbsolutePath() + "] to [" + remotePath + "] end.");
-      }
+    final List<org.apache.carbondata.sdk.file.CarbonWriter> writers =
 
 Review comment:
   ```suggestion
       final List<CarbonWriter> writers =
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3532: [CARBONDATA-3557] Write flink streaming data to partition table

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3532: [CARBONDATA-3557] Write flink streaming data to partition table
URL: https://github.com/apache/carbondata/pull/3532#discussion_r361797390
 
 

 ##########
 File path: integration/flink/src/main/java/org/apache/carbon/flink/CarbonWriter.java
 ##########
 @@ -17,10 +17,229 @@
 
 package org.apache.carbon.flink;
 
+import java.io.File;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datastore.exception.CarbonDataWriterException;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import org.apache.carbondata.core.metadata.schema.table.column.ColumnSchema;
+import org.apache.carbondata.core.statusmanager.StageInput;
+import org.apache.carbondata.core.util.CarbonUtil;
+
+import org.apache.log4j.Logger;
+
 /**
  * This class is a wrapper of CarbonWriter in SDK.
  * It not only write data to carbon with CarbonWriter in SDK, also generate segment file.
  */
-public abstract class CarbonWriter extends ProxyFileWriter<String> {
+public abstract class CarbonWriter extends ProxyFileWriter<Object[]> {
+
+  private static final Logger LOGGER =
+      LogServiceFactory.getLogService(CarbonS3Writer.class.getName());
+
+  public CarbonWriter(final CarbonWriterFactory factory,
+      final String identifier, final CarbonTable table) {
+    ProxyFileWriterFactory.register(factory.getType(), factory.getClass());
+    if (LOGGER.isDebugEnabled()) {
+      LOGGER.debug("Open writer. " + this.toString());
+    }
+    this.factory = factory;
+    this.identifier = identifier;
+    this.table = table;
+  }
+
+  private final CarbonWriterFactory factory;
+
+  private final String identifier;
+
+  protected final CarbonTable table;
+
+  @Override
+  public CarbonWriterFactory getFactory() {
+    return this.factory;
+  }
+
+  @Override
+  public String getIdentifier() {
+    return this.identifier;
+  }
+
+  /**
+   * @return when there is no data file uploaded, then return <code>null</code>.
+   */
+  protected StageInput uploadSegmentDataFiles(final String localPath, final String remotePath) {
+    if (!this.table.isHivePartitionTable()) {
+      final File[] files = new File(localPath).listFiles();
+      if (files == null) {
+        return null;
+      }
+      Map<String, Long> fileNameMapLength = new HashMap<>(files.length);
+      for (File file : files) {
+        fileNameMapLength.put(file.getName(), file.length());
+        if (LOGGER.isDebugEnabled()) {
+          LOGGER.debug(
+              "Upload file[" + file.getAbsolutePath() + "] to [" + remotePath + "] start.");
+        }
+        try {
+          CarbonUtil.copyCarbonDataFileToCarbonStorePath(file.getAbsolutePath(), remotePath, 1024);
+        } catch (CarbonDataWriterException exception) {
+          LOGGER.error(exception.getMessage(), exception);
+          throw exception;
+        }
+        if (LOGGER.isDebugEnabled()) {
+          LOGGER.debug("Upload file[" + file.getAbsolutePath() + "] to [" + remotePath + "] end.");
+        }
+      }
+      return new StageInput(remotePath, fileNameMapLength);
+    } else {
+      final List<StageInput.PartitionLocation> partitionLocationList = new ArrayList<>();
+      final List<String> partitions = new ArrayList<>();
+      uploadSegmentDataFiles(new File(localPath), remotePath, partitionLocationList, partitions);
+      if (partitionLocationList.isEmpty()) {
+        return null;
+      } else {
+        return new StageInput(remotePath, partitionLocationList);
+      }
+    }
+  }
+
+  private static void uploadSegmentDataFiles(
+      final File directory, final String remotePath,
+      final List<StageInput.PartitionLocation> partitionLocationList,
+      final List<String> partitions
+  ) {
+    final File[] files = directory.listFiles();
+    if (files == null) {
+      return;
+    }
+    Map<String, Long> fileNameMapLength = new HashMap<>();
+    for (File file : files) {
+      if (file.isDirectory()) {
+        partitions.add(file.getName());
+        uploadSegmentDataFiles(file, remotePath, partitionLocationList, partitions);
+        partitions.remove(partitions.size() - 1);
+        continue;
+      }
+      fileNameMapLength.put(file.getName(), file.length());
+      if (LOGGER.isDebugEnabled()) {
+        LOGGER.debug("Upload file[" + file.getAbsolutePath() + "] to [" + remotePath + "] start.");
+      }
+      try {
+        CarbonUtil.copyCarbonDataFileToCarbonStorePath(file.getAbsolutePath(), remotePath, 1024);
+      } catch (CarbonDataWriterException exception) {
+        LOGGER.error(exception.getMessage(), exception);
+        throw exception;
+      }
+      if (LOGGER.isDebugEnabled()) {
+        LOGGER.debug("Upload file[" + file.getAbsolutePath() + "] to [" + remotePath + "] end.");
+      }
+    }
+    if (!fileNameMapLength.isEmpty()) {
+      final Map<String, String> partitionMap = new HashMap<>(partitions.size());
+      for (String partition : partitions) {
+        final String[] segments = partition.split("=");
+        partitionMap.put(segments[0].trim(), segments[1].trim());
+      }
+      partitionLocationList.add(
+          new StageInput.PartitionLocation(
+              partitionMap,
+              fileNameMapLength
+          )
+      );
+    }
+  }
+
+  protected abstract static class WriterFactory {
+
+    public WriterFactory(final CarbonTable table, final String writePath) {
+      final List<ColumnSchema> partitionColumns;
+      if (table.getPartitionInfo() == null) {
+        partitionColumns = Collections.emptyList();
+      } else {
+        partitionColumns = table.getPartitionInfo().getColumnSchemaList();
+      }
+      this.table = table;
+      this.partitionColumns = partitionColumns;
+      this.writePath = writePath;
+      this.root = new Node();
+      this.writers = new ArrayList<>();
+    }
+
+    private final CarbonTable table;
+
+    private final List<ColumnSchema> partitionColumns;
+
+    private final String writePath;
+
+    private final Node root;
+
+    private final List<org.apache.carbondata.sdk.file.CarbonWriter> writers;
+
+    public List<org.apache.carbondata.sdk.file.CarbonWriter> getWriters() {
+      return this.writers;
+    }
+
+    public org.apache.carbondata.sdk.file.CarbonWriter getWriter(final Object[] row) {
+      Node node = this.root;
+      for (int index = 0; index < this.partitionColumns.size(); index++) {
+        final Object columnValue = row[this.partitionColumns.get(index).getSchemaOrdinal()];
+        if (columnValue == null) {
+          // TODO
+          throw new UnsupportedOperationException();
+        }
+        Node child = node.children.get(columnValue);
+        if (child == null) {
+          child = new Node();
+          node.children.put(columnValue, child);
+        }
+        node = child;
+      }
+      if (node.writer == null) {
+        node.writer = this.newWriter(row);
+        this.writers.add(node.writer);
+      }
+      return node.writer;
+    }
+
+    protected String getWritePath(final Object[] row) {
+      if (this.partitionColumns.isEmpty()) {
+        return this.writePath;
+      }
+      final StringBuilder stringBuilder = new StringBuilder();
+      stringBuilder.append(this.writePath);
+      for (int index = 0; index < this.partitionColumns.size(); index++) {
+        final ColumnSchema columnSchema = this.partitionColumns.get(index);
+        final Object columnValue = row[columnSchema.getSchemaOrdinal()];
+        stringBuilder.append(columnSchema.getColumnName());
+        stringBuilder.append("=");
+        stringBuilder.append(columnValue.toString());
+        stringBuilder.append(CarbonCommonConstants.FILE_SEPARATOR);
+      }
+      return stringBuilder.toString();
+    }
+
+    protected abstract org.apache.carbondata.sdk.file.CarbonWriter newWriter(final Object[] row);
+
+    public void reset() {
+      this.writers.clear();
+      this.root.children.clear();
+      this.root.writer = null;
+    }
+
+    private static final class Node {
 
 Review comment:
   Please add comment, why is this required?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on a change in pull request #3532: [CARBONDATA-3557] Write flink streaming data to partition table

GitBox
In reply to this post by GitBox
jackylk commented on a change in pull request #3532: [CARBONDATA-3557] Write flink streaming data to partition table
URL: https://github.com/apache/carbondata/pull/3532#discussion_r361797527
 
 

 ##########
 File path: integration/flink/src/test/scala/org/apache/carbon/flink/TestCarbonPartitionWriter.scala
 ##########
 @@ -0,0 +1,192 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbon.flink
+
+import java.io.{File, InputStreamReader}
+import java.util
+import java.util.{Collections, Properties}
+
+import com.google.gson.Gson
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.datastore.impl.FileFactory
+import org.apache.carbondata.core.statusmanager.StageInput
+import org.apache.carbondata.core.util.path.CarbonTablePath
+import org.apache.flink.api.common.restartstrategy.RestartStrategies
+import org.apache.flink.api.java.functions.KeySelector
+import org.apache.flink.core.fs.Path
+import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment
+import org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSink
+import org.apache.spark.sql.test.util.QueryTest
+import org.junit.Test
+
+import scala.collection.JavaConverters._
+
+class TestCarbonPartitionWriter extends QueryTest {
+
+  val tableName = "test_flink_partition"
+
+  @Test
+  def testLocal(): Unit = {
+    sql(s"drop table if exists $tableName").collect()
+    sql(
+      s"""
+         | CREATE TABLE $tableName (stringField string, intField int, shortField short)
+         | STORED AS carbondata
+         | PARTITIONED BY (hour_ string, date_ string)
 
 Review comment:
   can you add 2 more testcase:
   1. use Date type as partition column
   2. use Int type as partition column

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] niuge01 closed pull request #3532: [CARBONDATA-3557] Write flink streaming data to partition table

GitBox
In reply to this post by GitBox
niuge01 closed pull request #3532: [CARBONDATA-3557] Write flink streaming data to partition table
URL: https://github.com/apache/carbondata/pull/3532
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
12