Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] [carbondata] akashrn5 opened a new pull request #3875: [WIP]Presto write transactional

Classic

List

91 messages Options

Options

12345

GitBox

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3875: [CARBONDATA-3934]Support write transactional table with presto.

ajantha-bhat commented on a change in pull request #3875:
URL: https://github.com/apache/carbondata/pull/3875#discussion_r508431395

##########
File path: integration/hive/src/main/java/org/apache/carbondata/hive/MapredCarbonOutputFormat.java
##########
@@ -92,6 +95,14 @@ public void checkOutputSpecs(FileSystem fileSystem, JobConf jobConf) throws IOEx
}
String tablePath = FileFactory.getCarbonFile(carbonLoadModel.getTablePath()).getAbsolutePath();
TaskAttemptID taskAttemptID = TaskAttemptID.forName(jc.get("mapred.task.id"));
+ // taskAttemptID will be null when the insert job is fired from presto. Presto send the JobConf
+ // and since presto does not use the MR framework for execution, the mapred.task.id will be
+ // null, so prepare a new ID.
+ if (taskAttemptID == null) {
+ SimpleDateFormat formatter = new SimpleDateFormat("yyyyMMddHHmm");
+ String jobTrackerId = formatter.format(new Date());
+ taskAttemptID = new TaskAttemptID(jobTrackerId, 0, TaskType.MAP, 0, 0);

Review comment:
Also please check filenames while testing whether segment id and other info is proper in the file name created by presto.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] QiangCai commented on pull request #3875: [CARBONDATA-3934]Support write transactional table with presto.

In reply to this post by GitBox

QiangCai commented on pull request #3875:
URL: https://github.com/apache/carbondata/pull/3875#issuecomment-712798753

retest this please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3875: [CARBONDATA-3934]Support write transactional table with presto.

In reply to this post by GitBox

akashrn5 commented on a change in pull request #3875:
URL: https://github.com/apache/carbondata/pull/3875#discussion_r508461938

##########
File path: integration/presto/src/main/prestosql/org/apache/carbondata/presto/CarbonDataFileWriter.java
##########
@@ -0,0 +1,188 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.presto;
+
+import java.io.IOException;
+import java.io.UncheckedIOException;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Properties;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.hadoop.api.CarbonTableOutputFormat;
+import org.apache.carbondata.hive.CarbonHiveSerDe;
+import org.apache.carbondata.hive.MapredCarbonOutputFormat;
+import org.apache.carbondata.presto.impl.CarbonTableConfig;
+
+import com.google.common.collect.ImmutableList;
+import io.prestosql.plugin.hive.HiveFileWriter;
+import io.prestosql.plugin.hive.HiveType;
+import io.prestosql.plugin.hive.HiveWriteUtils;
+import io.prestosql.spi.Page;
+import io.prestosql.spi.PrestoException;
+import io.prestosql.spi.block.Block;
+import io.prestosql.spi.type.Type;
+import io.prestosql.spi.type.TypeManager;
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.exec.FileSinkOperator;
+import org.apache.hadoop.hive.ql.io.HiveOutputFormat;
+import org.apache.hadoop.hive.ql.io.IOConstants;
+import org.apache.hadoop.hive.ql.io.parquet.serde.ArrayWritableObjectInspector;
+import org.apache.hadoop.hive.serde2.SerDeException;
+import org.apache.hadoop.hive.serde2.objectinspector.SettableStructObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.StructField;
+import org.apache.hadoop.io.Text;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.hadoop.mapred.Reporter;
+import org.apache.log4j.Logger;
+
+import static com.google.common.collect.ImmutableList.toImmutableList;
+import static io.prestosql.plugin.hive.HiveErrorCode.HIVE_WRITER_DATA_ERROR;
+import static java.util.Objects.requireNonNull;
+import static java.util.stream.Collectors.toList;
+import static org.apache.hadoop.hive.conf.HiveConf.ConfVars.COMPRESSRESULT;
+
+/**
+ * This class implements HiveFileWriter and it creates the carbonFileWriter to write the page data
+ * sent from presto.
+ */
+public class CarbonDataFileWriter implements HiveFileWriter {
+
+ private static final Logger LOG =
+ LogServiceFactory.getLogService(CarbonDataFileWriter.class.getName());
+
+ private final JobConf configuration;
+ private final Path outPutPath;
+ private final FileSinkOperator.RecordWriter recordWriter;
+ private final CarbonHiveSerDe serDe;
+ private final int fieldCount;
+ private final Object row;
+ private final SettableStructObjectInspector tableInspector;
+ private final List<StructField> structFields;
+ private final HiveWriteUtils.FieldSetter[] setters;
+
+ private boolean isCommitDone;
+
+ public CarbonDataFileWriter(Path outPutPath, List<String> inputColumnNames, Properties properties,
+ JobConf configuration, TypeManager typeManager) throws SerDeException {
+ requireNonNull(outPutPath, "path is null");
+ // take the outputPath same as location in compliance with the carbon store folder structure.
+ this.outPutPath = new Path(properties.getProperty("location"));
+ this.configuration = requireNonNull(configuration, "conf is null");
+ List<String> columnNames = Arrays
+ .asList(properties.getProperty(IOConstants.COLUMNS, "").split(CarbonCommonConstants.COMMA));
+ List<Type> fileColumnTypes =
+ HiveType.toHiveTypes(properties.getProperty(IOConstants.COLUMNS_TYPES, "")).stream()
+ .map(hiveType -> hiveType.getType(typeManager)).collect(toList());
+ this.fieldCount = columnNames.size();
+ this.serDe = new CarbonHiveSerDe();
+ serDe.initialize(configuration, properties);
+ this.tableInspector = (ArrayWritableObjectInspector) serDe.getObjectInspector();
+
+ this.structFields =
+ ImmutableList.copyOf(inputColumnNames.stream().map(tableInspector::getStructFieldRef)
+ .collect(toImmutableList()));
+
+ this.row = tableInspector.create();
+
+ this.setters = new HiveWriteUtils.FieldSetter[structFields.size()];
+ for (int i = 0; i < setters.length; i++) {
+ setters[i] = HiveWriteUtils.createFieldSetter(tableInspector, row, structFields.get(i),
+ fileColumnTypes.get(structFields.get(i).getFieldID()));
+ }
+ String encodedLoadModel = this.configuration.get(CarbonTableConfig.CARBON_PRESTO_LOAD_MODEL);
+ if (StringUtils.isNotEmpty(encodedLoadModel)) {
+ this.configuration.set(CarbonTableOutputFormat.LOAD_MODEL, encodedLoadModel);
+ }
+ try {
+ boolean compress = HiveConf.getBoolVar(this.configuration, COMPRESSRESULT);
+ Object writer =
+ Class.forName(MapredCarbonOutputFormat.class.getName()).getConstructor().newInstance();
+ this.recordWriter = ((MapredCarbonOutputFormat<?>) writer)
+ .getHiveRecordWriter(this.configuration, this.outPutPath, Text.class, compress,
+ properties, Reporter.NULL);
+ } catch (Exception e) {
+ LOG.error("error while initializing writer", e);
+ throw new RuntimeException("writer class not found");
+ }
+ }
+
+ @Override
+ public long getWrittenBytes() {
+ if (isCommitDone) {
+ try {
+ return outPutPath.getFileSystem(configuration).getFileStatus(outPutPath).getLen();
+ } catch (IOException e) {
+ throw new UncheckedIOException(e);
+ }
+ }
+ return 0;
+ }
+
+ @Override
+ public long getSystemMemoryUsage() {
+ return 0;

Review comment:
added TODO and created Jira for tracking https://issues.apache.org/jira/browse/CARBONDATA-4038

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3875: [CARBONDATA-3934]Support write transactional table with presto.

In reply to this post by GitBox

akashrn5 commented on a change in pull request #3875:
URL: https://github.com/apache/carbondata/pull/3875#discussion_r508502780

##########
File path: integration/presto/src/main/prestosql/org/apache/carbondata/presto/CarbondataModule.java
##########
@@ -127,7 +127,8 @@ public void configure(Binder binder) {
.in(Scopes.SINGLETON);
binder.bind(HivePartitionManager.class).in(Scopes.SINGLETON);
binder.bind(LocationService.class).to(HiveLocationService.class).in(Scopes.SINGLETON);
- binder.bind(HiveMetadataFactory.class).in(Scopes.SINGLETON);
+ binder.bind(HiveLocationService.class).to(CarbonDataLocationService.class).in(Scopes.SINGLETON);

Review comment:
added to jira

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3875: [CARBONDATA-3934]Support write transactional table with presto.

In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3875:
URL: https://github.com/apache/carbondata/pull/3875#issuecomment-712853401

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2783/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3875: [CARBONDATA-3934]Support write transactional table with presto.

In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3875:
URL: https://github.com/apache/carbondata/pull/3875#issuecomment-712853970

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4537/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] ydvpankaj99 commented on pull request #3875: [CARBONDATA-3934]Support write transactional table with presto.

In reply to this post by GitBox

ydvpankaj99 commented on pull request #3875:
URL: https://github.com/apache/carbondata/pull/3875#issuecomment-713228858

retest this please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3875: [CARBONDATA-3934]Support write transactional table with presto.

In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3875:
URL: https://github.com/apache/carbondata/pull/3875#issuecomment-713302289

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2806/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3875: [CARBONDATA-3934]Support write transactional table with presto.

In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3875:
URL: https://github.com/apache/carbondata/pull/3875#issuecomment-713305605

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4562/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3875: [CARBONDATA-3934]Support write transactional table with presto.

In reply to this post by GitBox

akashrn5 commented on a change in pull request #3875:
URL: https://github.com/apache/carbondata/pull/3875#discussion_r509014972

##########
File path: integration/presto/src/test/scala/org/apache/carbondata/presto/integrationtest/PrestoInsertIntoTableTestCase.scala
##########
@@ -0,0 +1,207 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.presto.integrationtest
+
+import java.io.File
+import java.util
+import java.util.UUID
+import java.util.concurrent.{Callable, Executor, Executors, Future}
+
+import scala.collection.JavaConverters._
+
+import org.scalatest.{BeforeAndAfterAll, BeforeAndAfterEach, FunSuiteLike}
+
+import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.datastore.filesystem.{CarbonFile, CarbonFileFilter}
+import org.apache.carbondata.core.datastore.impl.FileFactory
+import org.apache.carbondata.core.metadata.schema.SchemaReader
+import org.apache.carbondata.core.metadata.{AbsoluteTableIdentifier, CarbonTableIdentifier}
+import org.apache.carbondata.core.statusmanager.SegmentStatusManager
+import org.apache.carbondata.core.util.path.CarbonTablePath
+import org.apache.carbondata.core.util.{CarbonProperties, CarbonUtil}
+import org.apache.carbondata.presto.server.PrestoServer
+import org.apache.carbondata.presto.util.CarbonDataStoreCreator
+
+class PrestoInsertIntoTableTestCase extends FunSuiteLike with BeforeAndAfterAll with BeforeAndAfterEach {
+
+ private val logger = LogServiceFactory
+ .getLogService(classOf[PrestoAllDataTypeTest].getCanonicalName)
+
+ private val rootPath = new File(this.getClass.getResource("/").getPath
+ + "../../../..").getCanonicalPath
+ private val storePath = s"$rootPath/integration/presto/target/store"
+ private val prestoServer = new PrestoServer
+ private val executorService = Executors.newFixedThreadPool(1)
+
+ override def beforeAll: Unit = {
+ CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_WRITTEN_BY_APPNAME,
+ "Presto")
+ val map = new util.HashMap[String, String]()
+ map.put("hive.metastore", "file")
+ map.put("hive.metastore.catalog.dir", s"file://$storePath")
+ map.put("hive.allow-drop-table", "true")
+ prestoServer.startServer("testdb", map)
+ prestoServer.execute("drop schema if exists testdb")
+ prestoServer.execute("create schema testdb")
+ }
+
+ override protected def beforeEach(): Unit = {
+ val query = "create table testdb.testtable(ID int, date date, country varchar, name varchar, phonetype varchar, serialname varchar,salary decimal(6,1), bonus decimal(8,6), monthlyBonus decimal(5,3), dob timestamp, shortField smallint, iscurrentemployee boolean) with(format='CARBONDATA') "
+ createTable(query, "testdb", "testtable")
+ }
+
+ private def createTable(query: String, databaseName: String, tableName: String): Unit = {
+ prestoServer.execute(s"drop table if exists ${databaseName}.${tableName}")
+ prestoServer.execute(query)
+ logger.info("Creating The Carbon Store")
+ val absoluteTableIdentifier: AbsoluteTableIdentifier = getAbsoluteIdentifier(databaseName, tableName)
+ CarbonDataStoreCreator.createTable(absoluteTableIdentifier, true)
+ logger.info(s"\nCarbon store is created at location: $storePath")
+ }
+
+ private def getAbsoluteIdentifier(dbName: String,
+ tableName: String) = {
+ val absoluteTableIdentifier = AbsoluteTableIdentifier.from(
+ storePath + "/" + dbName + "/" + tableName,
+ new CarbonTableIdentifier(dbName,
+ tableName,
+ UUID.randomUUID().toString))
+ absoluteTableIdentifier
+ }
+
+ test("test insert with different storage format names") {
+ val query1 = "create table testdb.testtable(ID int, date date, country varchar, name varchar, phonetype varchar, serialname varchar,salary decimal(6,1), bonus decimal(8,6), monthlyBonus decimal(5,3), dob timestamp, shortField smallint, iscurrentemployee boolean) with(format='CARBONDATA') "
+ val query2 = "create table testdb.testtable(ID int, date date, country varchar, name varchar, phonetype varchar, serialname varchar,salary decimal(6,1), bonus decimal(8,6), monthlyBonus decimal(5,3), dob timestamp, shortField smallint, iscurrentemployee boolean) with(format='CARBON') "
+ val query3 = "create table testdb.testtable(ID int, date date, country varchar, name varchar, phonetype varchar, serialname varchar,salary decimal(6,1), bonus decimal(8,6), monthlyBonus decimal(5,3), dob timestamp, shortField smallint, iscurrentemployee boolean) with(format='ORG.APACHE.CARBONDATA.FORMAT') "
+ createTable(query1, "testdb", "testtable")
+ createTable(query2, "testdb", "testtable")
+ createTable(query3, "testdb", "testtable")
+ prestoServer.execute("insert into testdb.testtable values(10, current_date, 'INDIA', 'Chandler', 'qwerty', 'usn20392',10000.0,16.234567,25.678,timestamp '1994-06-14 05:00:09',smallint '23', true)")
+ val absoluteTableIdentifier: AbsoluteTableIdentifier = getAbsoluteIdentifier("testdb", "testtable")
+ val carbonTable = SchemaReader.readCarbonTableFromStore(absoluteTableIdentifier)
+ val segmentPath = CarbonTablePath.getSegmentPath(carbonTable.getTablePath, "0")
+ assert(FileFactory.getCarbonFile(segmentPath).isFileExist)
+ }
+
+ test("test insert into one segment and check folder structure") {
+ prestoServer.execute("insert into testdb.testtable values(10, current_date, 'INDIA', 'Chandler', 'qwerty', 'usn20392',10000.0,16.234567,25.678,timestamp '1994-06-14 05:00:09',smallint '23', true)")
+ prestoServer.execute("insert into testdb.testtable values(10, current_date, 'INDIA', 'Chandler', 'qwerty', 'usn20392',10000.0,16.234567,25.678,timestamp '1994-06-14 05:00:09',smallint '23', true)")
+ val absoluteTableIdentifier: AbsoluteTableIdentifier = getAbsoluteIdentifier("testdb", "testtable")
+ val carbonTable = SchemaReader.readCarbonTableFromStore(absoluteTableIdentifier)
+ val tablePath = carbonTable.getTablePath
+ val segment0Path = CarbonTablePath.getSegmentPath(tablePath, "0")
+ val segment1Path = CarbonTablePath.getSegmentPath(tablePath, "1")
+ val segment0 = FileFactory.getCarbonFile(segment0Path)
+ assert(segment0.isFileExist)
+ assert(segment0.listFiles(new CarbonFileFilter {
+ override def accept(file: CarbonFile): Boolean = {
+ file.getName.endsWith(CarbonTablePath.CARBON_DATA_EXT) ||
+ file.getName.endsWith(CarbonTablePath.MERGE_INDEX_FILE_EXT)
+ }
+ }).length == 2)
+ val segment1 = FileFactory.getCarbonFile(segment1Path)
+ assert(segment1.isFileExist)
+ assert(segment1.listFiles(new CarbonFileFilter {
+ override def accept(file: CarbonFile): Boolean = {
+ file.getName.endsWith(CarbonTablePath.CARBON_DATA_EXT) ||
+ file.getName.endsWith(CarbonTablePath.MERGE_INDEX_FILE_EXT)
+ }
+ }).length == 2)
+ val segmentsPath = CarbonTablePath.getSegmentFilesLocation(tablePath)
+ assert(FileFactory.getCarbonFile(segmentsPath).isFileExist && FileFactory.getCarbonFile(segmentsPath).listFiles(true).size() == 2)
+ val metadataFolderPath = CarbonTablePath.getMetadataPath(tablePath)
+ FileFactory.getCarbonFile(metadataFolderPath).listFiles(new CarbonFileFilter {
+ override def accept(file: CarbonFile): Boolean = {
+ file.getName.endsWith(CarbonTablePath.TABLE_STATUS_FILE)
+ }
+ })
+ }
+
+ test("test insert into many segments and check segment count and data count") {
+ prestoServer.execute("insert into testdb.testtable values(10, current_date, 'INDIA', 'Chandler', 'qwerty', 'usn20392',10000.0,16.234567,25.678,timestamp '1994-06-14 05:00:09',smallint '23', true)")
+ prestoServer.execute("insert into testdb.testtable values(10, current_date, 'INDIA', 'Chandler', 'qwerty', 'usn20392',10000.0,16.234567,25.678,timestamp '1998-12-16 10:12:09',smallint '23', true)")
+ prestoServer.execute("insert into testdb.testtable values(10, current_date, 'INDIA', 'Chandler', 'qwerty', 'usn20392',10000.0,16.234567,25.678,timestamp '1994-06-14 05:00:09',smallint '23', true)")
+ prestoServer.execute("insert into testdb.testtable values(10, current_date, 'INDIA', 'Chandler', 'qwerty', 'usn20392',10000.0,16.234567,25.678,timestamp '1998-12-16 10:12:09',smallint '23', true)")
+ val absoluteTableIdentifier: AbsoluteTableIdentifier = getAbsoluteIdentifier("testdb", "testtable")
+ val carbonTable = SchemaReader.readCarbonTableFromStore(absoluteTableIdentifier)
+ val segmentFoldersLocation = CarbonTablePath.getPartitionDir(carbonTable.getTablePath)
+ assert(FileFactory.getCarbonFile(segmentFoldersLocation).listFiles(false).size() == 8)
+ val actualResult1: List[Map[String, Any]] = prestoServer
+ .executeQuery("select count(*) AS RESULT from testdb.testtable")
+ val expectedResult1: List[Map[String, Any]] = List(Map("RESULT" -> 4))
+ assert(actualResult1.equals(expectedResult1))
+ // filter query
+ val actualResult2: List[Map[String, Any]] = prestoServer
+ .executeQuery("select count(*) AS RESULT from testdb.testtable WHERE dob = timestamp '1998-12-16 10:12:09'")
+ val expectedResult2: List[Map[String, Any]] = List(Map("RESULT" -> 2))
+ assert(actualResult2.equals(expectedResult2))
+ }
+
+ test("test if the table status contains the segment file name for each load") {
+ prestoServer.execute("insert into testdb.testtable values(10, current_date, 'INDIA', 'Chandler', 'qwerty', 'usn20392',10000.0,16.234567,25.678,timestamp '1994-06-14 05:00:09',smallint '23', true)")
+ val absoluteTableIdentifier: AbsoluteTableIdentifier = getAbsoluteIdentifier("testdb", "testtable")
+ val carbonTable = SchemaReader.readCarbonTableFromStore(absoluteTableIdentifier)
+ val ssm = new SegmentStatusManager(carbonTable.getAbsoluteTableIdentifier)
+ ssm.getValidAndInvalidSegments.getValidSegments.asScala.foreach { segment =>
+ val loadMetadataDetails = segment.getLoadMetadataDetails
+ assert(loadMetadataDetails.getSegmentFile != null)
+ }
+ }
+
+ test("test for query when insert in progress") {
+ prestoServer.execute("insert into testdb.testtable values(10, current_date, 'INDIA', 'Chandler', 'qwerty', 'usn20392',10000.0,16.234567,25.678,timestamp '1994-06-14 05:00:09',smallint '23', true)")
+ val query = "insert into testdb.testtable values(10, current_date, 'INDIA', 'Chandler', 'qwerty', 'usn20392',10000.0,16.234567,25.678,timestamp '1994-06-14 05:00:09',smallint '23', true)"
+ val asyncQuery = runSqlAsync(query)
+ val actualResult1: List[Map[String, Any]] = prestoServer.executeQuery("select count(*) AS RESULT from testdb.testtable WHERE dob = timestamp '1994-06-14 05:00:09'")
+ val expectedResult1: List[Map[String, Any]] = List(Map("RESULT" -> 1))
+ assert(actualResult1.equals(expectedResult1))
+ assert(asyncQuery.get().equalsIgnoreCase("PASS"))
+ val actualResult2: List[Map[String, Any]] = prestoServer.executeQuery("select count(*) AS RESULT from testdb.testtable WHERE dob = timestamp '1994-06-14 05:00:09'")
+ val expectedResult2: List[Map[String, Any]] = List(Map("RESULT" -> 2))
+ assert(actualResult2.equals(expectedResult2))
+ }
+
+ class QueryTask(query: String) extends Callable[String] {
+ override def call(): String = {
+ var result = "PASS"
+ try {
+ prestoServer.execute(query)
+ } catch {
+ case ex: Exception =>
+ println(ex.printStackTrace())
+ result = "FAIL"
+ }
+ result
+ }
+ }
+
+ private def runSqlAsync(sql: String): Future[String] = {
+ val future = executorService.submit(
+ new QueryTask(sql)
+ )
+ Thread.sleep(2)
+ future
+ }
+
+ override def afterAll(): Unit = {

Review comment:
cluster test is done, and concurrent test also done. Since we dont have clean files scenarios and all , failure cases are not clearly handled like spark. This we can improvise in later part.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3875: [CARBONDATA-3934]Support write transactional table with presto.

In reply to this post by GitBox

akashrn5 commented on a change in pull request #3875:
URL: https://github.com/apache/carbondata/pull/3875#discussion_r509015656

##########
File path: integration/hive/src/main/java/org/apache/carbondata/hive/MapredCarbonOutputFormat.java
##########
@@ -92,6 +95,14 @@ public void checkOutputSpecs(FileSystem fileSystem, JobConf jobConf) throws IOEx
}
String tablePath = FileFactory.getCarbonFile(carbonLoadModel.getTablePath()).getAbsolutePath();
TaskAttemptID taskAttemptID = TaskAttemptID.forName(jc.get("mapred.task.id"));
+ // taskAttemptID will be null when the insert job is fired from presto. Presto send the JobConf
+ // and since presto does not use the MR framework for execution, the mapred.task.id will be
+ // null, so prepare a new ID.
+ if (taskAttemptID == null) {
+ SimpleDateFormat formatter = new SimpleDateFormat("yyyyMMddHHmm");
+ String jobTrackerId = formatter.format(new Date());
+ taskAttemptID = new TaskAttemptID(jobTrackerId, 0, TaskType.MAP, 0, 0);

Review comment:
> Also please check filenames while testing whether segment id and other info is proper in the file name created by presto.

`Fact/Part0/Segment_10/part-0-0_batchno0-0-10-1603260474337.snappy.carbondata`, `Fact/Part0/Segment_10/10_1603260475282.carbonindexmerge`

These are indexmerge and carbon file inside segment for segment 10, so naming is fine.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3875: [CARBONDATA-3934]Support write transactional table with presto.

In reply to this post by GitBox

akashrn5 commented on a change in pull request #3875:
URL: https://github.com/apache/carbondata/pull/3875#discussion_r509084981

##########
File path: integration/presto/src/test/scala/org/apache/carbondata/presto/integrationtest/PrestoInsertIntoTableTestCase.scala
##########
@@ -0,0 +1,207 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.presto.integrationtest
+
+import java.io.File
+import java.util
+import java.util.UUID
+import java.util.concurrent.{Callable, Executor, Executors, Future}
+
+import scala.collection.JavaConverters._
+
+import org.scalatest.{BeforeAndAfterAll, BeforeAndAfterEach, FunSuiteLike}
+
+import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.datastore.filesystem.{CarbonFile, CarbonFileFilter}
+import org.apache.carbondata.core.datastore.impl.FileFactory
+import org.apache.carbondata.core.metadata.schema.SchemaReader
+import org.apache.carbondata.core.metadata.{AbsoluteTableIdentifier, CarbonTableIdentifier}
+import org.apache.carbondata.core.statusmanager.SegmentStatusManager
+import org.apache.carbondata.core.util.path.CarbonTablePath
+import org.apache.carbondata.core.util.{CarbonProperties, CarbonUtil}
+import org.apache.carbondata.presto.server.PrestoServer
+import org.apache.carbondata.presto.util.CarbonDataStoreCreator
+
+class PrestoInsertIntoTableTestCase extends FunSuiteLike with BeforeAndAfterAll with BeforeAndAfterEach {
+
+ private val logger = LogServiceFactory
+ .getLogService(classOf[PrestoAllDataTypeTest].getCanonicalName)
+
+ private val rootPath = new File(this.getClass.getResource("/").getPath
+ + "../../../..").getCanonicalPath
+ private val storePath = s"$rootPath/integration/presto/target/store"
+ private val prestoServer = new PrestoServer
+ private val executorService = Executors.newFixedThreadPool(1)
+
+ override def beforeAll: Unit = {
+ CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_WRITTEN_BY_APPNAME,
+ "Presto")
+ val map = new util.HashMap[String, String]()
+ map.put("hive.metastore", "file")
+ map.put("hive.metastore.catalog.dir", s"file://$storePath")
+ map.put("hive.allow-drop-table", "true")
+ prestoServer.startServer("testdb", map)
+ prestoServer.execute("drop schema if exists testdb")
+ prestoServer.execute("create schema testdb")
+ }
+
+ override protected def beforeEach(): Unit = {
+ val query = "create table testdb.testtable(ID int, date date, country varchar, name varchar, phonetype varchar, serialname varchar,salary decimal(6,1), bonus decimal(8,6), monthlyBonus decimal(5,3), dob timestamp, shortField smallint, iscurrentemployee boolean) with(format='CARBONDATA') "
+ createTable(query, "testdb", "testtable")
+ }
+
+ private def createTable(query: String, databaseName: String, tableName: String): Unit = {
+ prestoServer.execute(s"drop table if exists ${databaseName}.${tableName}")
+ prestoServer.execute(query)
+ logger.info("Creating The Carbon Store")
+ val absoluteTableIdentifier: AbsoluteTableIdentifier = getAbsoluteIdentifier(databaseName, tableName)
+ CarbonDataStoreCreator.createTable(absoluteTableIdentifier, true)
+ logger.info(s"\nCarbon store is created at location: $storePath")
+ }
+
+ private def getAbsoluteIdentifier(dbName: String,
+ tableName: String) = {
+ val absoluteTableIdentifier = AbsoluteTableIdentifier.from(
+ storePath + "/" + dbName + "/" + tableName,
+ new CarbonTableIdentifier(dbName,
+ tableName,
+ UUID.randomUUID().toString))
+ absoluteTableIdentifier
+ }
+
+ test("test insert with different storage format names") {
+ val query1 = "create table testdb.testtable(ID int, date date, country varchar, name varchar, phonetype varchar, serialname varchar,salary decimal(6,1), bonus decimal(8,6), monthlyBonus decimal(5,3), dob timestamp, shortField smallint, iscurrentemployee boolean) with(format='CARBONDATA') "
+ val query2 = "create table testdb.testtable(ID int, date date, country varchar, name varchar, phonetype varchar, serialname varchar,salary decimal(6,1), bonus decimal(8,6), monthlyBonus decimal(5,3), dob timestamp, shortField smallint, iscurrentemployee boolean) with(format='CARBON') "
+ val query3 = "create table testdb.testtable(ID int, date date, country varchar, name varchar, phonetype varchar, serialname varchar,salary decimal(6,1), bonus decimal(8,6), monthlyBonus decimal(5,3), dob timestamp, shortField smallint, iscurrentemployee boolean) with(format='ORG.APACHE.CARBONDATA.FORMAT') "
+ createTable(query1, "testdb", "testtable")
+ createTable(query2, "testdb", "testtable")
+ createTable(query3, "testdb", "testtable")
+ prestoServer.execute("insert into testdb.testtable values(10, current_date, 'INDIA', 'Chandler', 'qwerty', 'usn20392',10000.0,16.234567,25.678,timestamp '1994-06-14 05:00:09',smallint '23', true)")
+ val absoluteTableIdentifier: AbsoluteTableIdentifier = getAbsoluteIdentifier("testdb", "testtable")
+ val carbonTable = SchemaReader.readCarbonTableFromStore(absoluteTableIdentifier)
+ val segmentPath = CarbonTablePath.getSegmentPath(carbonTable.getTablePath, "0")
+ assert(FileFactory.getCarbonFile(segmentPath).isFileExist)
+ }
+
+ test("test insert into one segment and check folder structure") {
+ prestoServer.execute("insert into testdb.testtable values(10, current_date, 'INDIA', 'Chandler', 'qwerty', 'usn20392',10000.0,16.234567,25.678,timestamp '1994-06-14 05:00:09',smallint '23', true)")
+ prestoServer.execute("insert into testdb.testtable values(10, current_date, 'INDIA', 'Chandler', 'qwerty', 'usn20392',10000.0,16.234567,25.678,timestamp '1994-06-14 05:00:09',smallint '23', true)")
+ val absoluteTableIdentifier: AbsoluteTableIdentifier = getAbsoluteIdentifier("testdb", "testtable")
+ val carbonTable = SchemaReader.readCarbonTableFromStore(absoluteTableIdentifier)
+ val tablePath = carbonTable.getTablePath
+ val segment0Path = CarbonTablePath.getSegmentPath(tablePath, "0")
+ val segment1Path = CarbonTablePath.getSegmentPath(tablePath, "1")
+ val segment0 = FileFactory.getCarbonFile(segment0Path)
+ assert(segment0.isFileExist)
+ assert(segment0.listFiles(new CarbonFileFilter {
+ override def accept(file: CarbonFile): Boolean = {
+ file.getName.endsWith(CarbonTablePath.CARBON_DATA_EXT) ||
+ file.getName.endsWith(CarbonTablePath.MERGE_INDEX_FILE_EXT)
+ }
+ }).length == 2)
+ val segment1 = FileFactory.getCarbonFile(segment1Path)
+ assert(segment1.isFileExist)
+ assert(segment1.listFiles(new CarbonFileFilter {
+ override def accept(file: CarbonFile): Boolean = {
+ file.getName.endsWith(CarbonTablePath.CARBON_DATA_EXT) ||
+ file.getName.endsWith(CarbonTablePath.MERGE_INDEX_FILE_EXT)
+ }
+ }).length == 2)
+ val segmentsPath = CarbonTablePath.getSegmentFilesLocation(tablePath)
+ assert(FileFactory.getCarbonFile(segmentsPath).isFileExist && FileFactory.getCarbonFile(segmentsPath).listFiles(true).size() == 2)
+ val metadataFolderPath = CarbonTablePath.getMetadataPath(tablePath)
+ FileFactory.getCarbonFile(metadataFolderPath).listFiles(new CarbonFileFilter {
+ override def accept(file: CarbonFile): Boolean = {
+ file.getName.endsWith(CarbonTablePath.TABLE_STATUS_FILE)
+ }
+ })
+ }
+
+ test("test insert into many segments and check segment count and data count") {

Review comment:
added

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3875: [CARBONDATA-3934]Support write transactional table with presto.

In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3875:
URL: https://github.com/apache/carbondata/pull/3875#issuecomment-713448928

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4573/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3875: [CARBONDATA-3934]Support write transactional table with presto.

In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3875:
URL: https://github.com/apache/carbondata/pull/3875#issuecomment-713449247

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2824/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3875: [CARBONDATA-3934]Support write transactional table with presto.

In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3875:
URL: https://github.com/apache/carbondata/pull/3875#issuecomment-713498847

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4586/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3875: [CARBONDATA-3934]Support write transactional table with presto.

In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3875:
URL: https://github.com/apache/carbondata/pull/3875#issuecomment-713498928

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2836/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3875: [CARBONDATA-3934]Support write transactional table with presto.

In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3875:
URL: https://github.com/apache/carbondata/pull/3875#issuecomment-713820234

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2842/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3875: [CARBONDATA-3934]Support write transactional table with presto.

In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3875:
URL: https://github.com/apache/carbondata/pull/3875#issuecomment-713853801

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4594/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3875: [CARBONDATA-3934]Support write transactional table with presto.

In reply to this post by GitBox

akashrn5 commented on a change in pull request #3875:
URL: https://github.com/apache/carbondata/pull/3875#discussion_r510102591

##########
File path: integration/hive/src/main/java/org/apache/carbondata/hive/MapredCarbonOutputFormat.java
##########
@@ -92,6 +95,14 @@ public void checkOutputSpecs(FileSystem fileSystem, JobConf jobConf) throws IOEx
}
String tablePath = FileFactory.getCarbonFile(carbonLoadModel.getTablePath()).getAbsolutePath();
TaskAttemptID taskAttemptID = TaskAttemptID.forName(jc.get("mapred.task.id"));
+ // taskAttemptID will be null when the insert job is fired from presto. Presto send the JobConf
+ // and since presto does not use the MR framework for execution, the mapred.task.id will be
+ // null, so prepare a new ID.
+ if (taskAttemptID == null) {
+ SimpleDateFormat formatter = new SimpleDateFormat("yyyyMMddHHmm");
+ String jobTrackerId = formatter.format(new Date());
+ taskAttemptID = new TaskAttemptID(jobTrackerId, 0, TaskType.MAP, 0, 0);

Review comment:
> ok, If this task number is used in file name, in case of non-transactional concurrent write. two files can have same file name leading to many issues. so, I suggested UUID. you can check again.

I set the taskID to loadmodel only of the mapred.task.id is present and taskAttempt is not null, if null i dont set taskID to loadmodel, when we call super.getRecordWriter, CarbonTableOutputFormat will set load model based on DEFAULT_TASK_NO. Please have a look

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3875: [CARBONDATA-3934]Support write transactional table with presto.

In reply to this post by GitBox

akashrn5 commented on a change in pull request #3875:
URL: https://github.com/apache/carbondata/pull/3875#discussion_r510102591

##########
File path: integration/hive/src/main/java/org/apache/carbondata/hive/MapredCarbonOutputFormat.java
##########
@@ -92,6 +95,14 @@ public void checkOutputSpecs(FileSystem fileSystem, JobConf jobConf) throws IOEx
}
String tablePath = FileFactory.getCarbonFile(carbonLoadModel.getTablePath()).getAbsolutePath();
TaskAttemptID taskAttemptID = TaskAttemptID.forName(jc.get("mapred.task.id"));
+ // taskAttemptID will be null when the insert job is fired from presto. Presto send the JobConf
+ // and since presto does not use the MR framework for execution, the mapred.task.id will be
+ // null, so prepare a new ID.
+ if (taskAttemptID == null) {
+ SimpleDateFormat formatter = new SimpleDateFormat("yyyyMMddHHmm");
+ String jobTrackerId = formatter.format(new Date());
+ taskAttemptID = new TaskAttemptID(jobTrackerId, 0, TaskType.MAP, 0, 0);

Review comment:
> ok, If this task number is used in file name, in case of non-transactional concurrent write. two files can have same file name leading to many issues. so, I suggested UUID. you can check again.

I set the taskID to loadmodel only of the mapred.task.id is present and taskAttempt is not null, if null i dont set taskID to loadmodel, when we call super.getRecordWriter, CarbonTableOutputFormat will set load model based on DEFAULT_TASK_NO. Please have a look, transactional tables also shouldn't be problem

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

12345