[GitHub] [carbondata] kunal642 opened a new pull request #3583: [WIP] Support CarbonOutputFormat in Hive

classic Classic list List threaded Threaded
139 messages Options
1 ... 4567
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kunal642 commented on a change in pull request #3583: [CARBONDATA-3687] Support writing non-transactional carbondata files through hive

GitBox
kunal642 commented on a change in pull request #3583: [CARBONDATA-3687] Support writing non-transactional carbondata files through hive
URL: https://github.com/apache/carbondata/pull/3583#discussion_r388206618
 
 

 ##########
 File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/addsegment/AddSegmentTestCase.scala
 ##########
 @@ -758,7 +758,9 @@ class AddSegmentTestCase extends QueryTest with BeforeAndAfterAll {
     val writer = CarbonWriter.builder
       .outputPath(externalSegmentPath)
       .writtenBy("AddSegmentTestCase")
-      .withCsvInput(new Schema(fields))
+      .withSchemaFile(CarbonTablePath.getSchemaFilePath(CarbonEnv.getCarbonTable(None,
 
 Review comment:
   reverted

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kunal642 commented on a change in pull request #3583: [CARBONDATA-3687] Support writing non-transactional carbondata files through hive

GitBox
In reply to this post by GitBox
kunal642 commented on a change in pull request #3583: [CARBONDATA-3687] Support writing non-transactional carbondata files through hive
URL: https://github.com/apache/carbondata/pull/3583#discussion_r388206653
 
 

 ##########
 File path: integration/hive/src/test/java/org/apache/carbondata/hive/HiveCarbonTest.java
 ##########
 @@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.hive;
+
+import java.sql.ResultSet;
+import java.sql.Statement;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile;
+import org.apache.carbondata.core.datastore.impl.FileFactory;
+import org.apache.carbondata.core.metadata.schema.SchemaReader;
+import org.apache.carbondata.core.util.CarbonProperties;
+import org.apache.carbondata.core.util.path.CarbonTablePath;
+
+import org.junit.After;
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
+import org.junit.Ignore;
+import org.junit.Test;
+
+public class HiveCarbonTest extends HiveTestUtils {
+
+  private static Statement statement;
+
+  @BeforeClass
+  public static void setup() throws Exception {
+    CarbonProperties.getInstance().addProperty(CarbonCommonConstants.ENABLE_OFFHEAP_SORT_DEFAULT, "false");
+    CarbonProperties.getInstance().addProperty(CarbonCommonConstants.ENABLE_UNSAFE_SORT, "false");
+    CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_WRITTEN_BY_APPNAME, "hive");
 
 Review comment:
   done

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3583: [CARBONDATA-3687] Support writing non-transactional carbondata files through hive

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3583: [CARBONDATA-3687] Support writing non-transactional carbondata files through hive
URL: https://github.com/apache/carbondata/pull/3583#issuecomment-595161020
 
 
   Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/633/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3583: [CARBONDATA-3687] Support writing non-transactional carbondata files through hive

GitBox
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3583: [CARBONDATA-3687] Support writing non-transactional carbondata files through hive
URL: https://github.com/apache/carbondata/pull/3583#discussion_r388213194
 
 

 ##########
 File path: integration/hive/src/main/java/org/apache/carbondata/hive/MapredCarbonOutputFormat.java
 ##########
 @@ -18,43 +18,115 @@
 package org.apache.carbondata.hive;
 
 import java.io.IOException;
+import java.util.Arrays;
+import java.util.Map;
 import java.util.Properties;
 
+import org.apache.carbondata.core.datastore.impl.FileFactory;
+import org.apache.carbondata.core.metadata.schema.PartitionInfo;
+import org.apache.carbondata.core.util.ObjectSerializationUtil;
+import org.apache.carbondata.core.util.ThreadLocalSessionInfo;
 import org.apache.carbondata.hadoop.api.CarbonTableOutputFormat;
+import org.apache.carbondata.hadoop.internal.ObjectArrayWritable;
+import org.apache.carbondata.hive.util.HiveCarbonUtil;
+import org.apache.carbondata.processing.loading.model.CarbonLoadModel;
 
 import org.apache.hadoop.fs.FileSystem;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.hive.ql.exec.FileSinkOperator;
 import org.apache.hadoop.hive.ql.io.HiveOutputFormat;
+import org.apache.hadoop.io.NullWritable;
 import org.apache.hadoop.io.Writable;
 import org.apache.hadoop.mapred.JobConf;
+import org.apache.hadoop.mapred.OutputFormat;
 import org.apache.hadoop.mapred.RecordWriter;
-import org.apache.hadoop.mapreduce.Job;
+import org.apache.hadoop.mapreduce.TaskAttemptID;
+import org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl;
 import org.apache.hadoop.util.Progressable;
 
-/**
- * TODO : To extend CarbonOutputFormat
- */
 public class MapredCarbonOutputFormat<T> extends CarbonTableOutputFormat
-    implements HiveOutputFormat<Void, T> {
+    implements HiveOutputFormat<Void, T>, OutputFormat<Void, T> {
 
   @Override
   public RecordWriter<Void, T> getRecordWriter(FileSystem fileSystem, JobConf jobConf, String s,
-      Progressable progressable) {
-    return null;
+      Progressable progressable) throws IOException {
+    throw new RuntimeException("Should never be used");
   }
 
   @Override
-  public void checkOutputSpecs(FileSystem fileSystem, JobConf jobConf)
-      throws IOException {
-    org.apache.hadoop.mapreduce.JobContext jobContext = Job.getInstance(jobConf);
-    super.checkOutputSpecs(jobContext);
+  public void checkOutputSpecs(FileSystem fileSystem, JobConf jobConf) throws IOException {
   }
 
   @Override
   public FileSinkOperator.RecordWriter getHiveRecordWriter(JobConf jc, Path finalOutPath,
       Class<? extends Writable> valueClass, boolean isCompressed, Properties tableProperties,
-      Progressable progress) {
-    return null;
+      Progressable progress) throws IOException {
+    CarbonLoadModel carbonLoadModel = null;
+    String encodedString = jc.get(LOAD_MODEL);
+    if (encodedString != null) {
+      carbonLoadModel =
+          (CarbonLoadModel) ObjectSerializationUtil.convertStringToObject(encodedString);
+    }
+    if (carbonLoadModel == null) {
+      carbonLoadModel = HiveCarbonUtil.getCarbonLoadModel(tableProperties, jc);
+    } else {
+      for (Map.Entry<Object, Object> entry : tableProperties.entrySet()) {
+        carbonLoadModel.getCarbonDataLoadSchema().getCarbonTable().getTableInfo().getFactTable()
+            .getTableProperties().put(entry.getKey().toString().toLowerCase(),
+            entry.getValue().toString().toLowerCase());
+      }
+    }
+    String tablePath = FileFactory.getCarbonFile(carbonLoadModel.getTablePath()).getAbsolutePath();
+    TaskAttemptID taskAttemptID = TaskAttemptID.forName(jc.get("mapred.task.id"));
+    TaskAttemptContextImpl context = new TaskAttemptContextImpl(jc, taskAttemptID);
+    final boolean isHivePartitionedTable =
+        carbonLoadModel.getCarbonDataLoadSchema().getCarbonTable().isHivePartitionTable();
+    PartitionInfo partitionInfo =
+        carbonLoadModel.getCarbonDataLoadSchema().getCarbonTable().getPartitionInfo();
+    final int partitionColumn =
+        partitionInfo != null ? partitionInfo.getColumnSchemaList().size() : 0;
+    String finalOutputPath = FileFactory.getCarbonFile(finalOutPath.toString()).getAbsolutePath();
+    if (carbonLoadModel.getCarbonDataLoadSchema().getCarbonTable().isHivePartitionTable()) {
+      carbonLoadModel.getOutputFilesInfoHolder().addToPartitionPath(finalOutputPath);
+      context.getConfiguration().set("carbon.outputformat.writepath", finalOutputPath);
+    }
+    CarbonTableOutputFormat.setLoadModel(context.getConfiguration(), carbonLoadModel);
+    org.apache.hadoop.mapreduce.RecordWriter<NullWritable, ObjectArrayWritable> re =
+        super.getRecordWriter(context);
+    return new FileSinkOperator.RecordWriter() {
+      @Override
+      public void write(Writable writable) throws IOException {
+        try {
+          ObjectArrayWritable objectArrayWritable = new ObjectArrayWritable();
+          if (isHivePartitionedTable) {
+            Object[] actualRow = ((CarbonHiveRow) writable).getData();
+            Object[] newData = Arrays.copyOf(actualRow, actualRow.length + partitionColumn);
+            String[] partitionValues = finalOutputPath.substring(tablePath.length()).split("/");
 
 Review comment:
   yeah ok, I remember now

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3583: [CARBONDATA-3687] Support writing non-transactional carbondata files through hive

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3583: [CARBONDATA-3687] Support writing non-transactional carbondata files through hive
URL: https://github.com/apache/carbondata/pull/3583#issuecomment-595161465
 
 
   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2340/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on issue #3583: [CARBONDATA-3687] Support writing non-transactional carbondata files through hive

GitBox
In reply to this post by GitBox
ajantha-bhat commented on issue #3583: [CARBONDATA-3687] Support writing non-transactional carbondata files through hive
URL: https://github.com/apache/carbondata/pull/3583#issuecomment-595163508
 
 
   LGTM

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat edited a comment on issue #3583: [CARBONDATA-3687] Support writing non-transactional carbondata files through hive

GitBox
In reply to this post by GitBox
ajantha-bhat edited a comment on issue #3583: [CARBONDATA-3687] Support writing non-transactional carbondata files through hive
URL: https://github.com/apache/carbondata/pull/3583#issuecomment-595163508
 
 
   LGTM
   
   But all the users of SDK, need to change code (need to reimport the Field!)
   As it is major version. I guess it is ok

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat edited a comment on issue #3583: [CARBONDATA-3687] Support writing non-transactional carbondata files through hive

GitBox
In reply to this post by GitBox
ajantha-bhat edited a comment on issue #3583: [CARBONDATA-3687] Support writing non-transactional carbondata files through hive
URL: https://github.com/apache/carbondata/pull/3583#issuecomment-595163508
 
 
   LGTM
   
   But all the users of SDK, need to change code (need to reimport the Field!)
   As it is major version. I guess it is ok.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on issue #3583: [CARBONDATA-3687] Support writing non-transactional carbondata files through hive

GitBox
In reply to this post by GitBox
jackylk commented on issue #3583: [CARBONDATA-3687] Support writing non-transactional carbondata files through hive
URL: https://github.com/apache/carbondata/pull/3583#issuecomment-595246355
 
 
   Better not to move the `Field` class in sdk to core, since this is SDK API that many users have used already. Otherwise they can not compile

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk edited a comment on issue #3583: [CARBONDATA-3687] Support writing non-transactional carbondata files through hive

GitBox
In reply to this post by GitBox
jackylk edited a comment on issue #3583: [CARBONDATA-3687] Support writing non-transactional carbondata files through hive
URL: https://github.com/apache/carbondata/pull/3583#issuecomment-595246355
 
 
   If not absolutely necessary, better not to move the `Field` class in sdk to core, since this is SDK API that many users have used already. Otherwise they can not compile

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kunal642 commented on issue #3583: [CARBONDATA-3687] Support writing non-transactional carbondata files through hive

GitBox
In reply to this post by GitBox
kunal642 commented on issue #3583: [CARBONDATA-3687] Support writing non-transactional carbondata files through hive
URL: https://github.com/apache/carbondata/pull/3583#issuecomment-595627546
 
 
   @ajantha-bhat @jackylk it is for a major version release, i think its ok to change some API for this version.
   It will be better than duplicate code.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kunal642 edited a comment on issue #3583: [CARBONDATA-3687] Support writing non-transactional carbondata files through hive

GitBox
In reply to this post by GitBox
kunal642 edited a comment on issue #3583: [CARBONDATA-3687] Support writing non-transactional carbondata files through hive
URL: https://github.com/apache/carbondata/pull/3583#issuecomment-595627546
 
 
   @ajantha-bhat @jackylk it is for a major version release, i think its ok to change some API for this version.
   It will be better than duplicate code.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on issue #3583: [CARBONDATA-3687] Support writing non-transactional carbondata files through hive

GitBox
In reply to this post by GitBox
jackylk commented on issue #3583: [CARBONDATA-3687] Support writing non-transactional carbondata files through hive
URL: https://github.com/apache/carbondata/pull/3583#issuecomment-595646137
 
 
   ok, please put a note in this class to notify the user

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on issue #3583: [CARBONDATA-3687] Support writing non-transactional carbondata files through hive

GitBox
In reply to this post by GitBox
ajantha-bhat commented on issue #3583: [CARBONDATA-3687] Support writing non-transactional carbondata files through hive
URL: https://github.com/apache/carbondata/pull/3583#issuecomment-595719080
 
 
   can we still keep same package name inside core? so user need not have to change code ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat edited a comment on issue #3583: [CARBONDATA-3687] Support writing non-transactional carbondata files through hive

GitBox
In reply to this post by GitBox
ajantha-bhat edited a comment on issue #3583: [CARBONDATA-3687] Support writing non-transactional carbondata files through hive
URL: https://github.com/apache/carbondata/pull/3583#issuecomment-595719080
 
 
   can we still keep same package name inside core? so user need not have to change code.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kunal642 commented on issue #3583: [CARBONDATA-3687] Support writing non-transactional carbondata files through hive

GitBox
In reply to this post by GitBox
kunal642 commented on issue #3583: [CARBONDATA-3687] Support writing non-transactional carbondata files through hive
URL: https://github.com/apache/carbondata/pull/3583#issuecomment-595769202
 
 
   @ajantha-bhat it dosent make sense to make a package names 'sdk' in core module. I think it is okay like this, we can put this as a breaking change.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on issue #3583: [CARBONDATA-3687] Support writing non-transactional carbondata files through hive

GitBox
In reply to this post by GitBox
jackylk commented on issue #3583: [CARBONDATA-3687] Support writing non-transactional carbondata files through hive
URL: https://github.com/apache/carbondata/pull/3583#issuecomment-596199554
 
 
   ok, please remember to mention this in 2.0's release note

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on issue #3583: [CARBONDATA-3687] Support writing non-transactional carbondata files through hive

GitBox
In reply to this post by GitBox
jackylk commented on issue #3583: [CARBONDATA-3687] Support writing non-transactional carbondata files through hive
URL: https://github.com/apache/carbondata/pull/3583#issuecomment-596199562
 
 
   LGTM

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] asfgit closed pull request #3583: [CARBONDATA-3687] Support writing non-transactional carbondata files through hive

GitBox
In reply to this post by GitBox
asfgit closed pull request #3583: [CARBONDATA-3687] Support writing non-transactional carbondata files through hive
URL: https://github.com/apache/carbondata/pull/3583
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
1 ... 4567