[GitHub] [carbondata] akkio-97 opened a new pull request #3773: [CARBONDATA-3830]Presto complex columns read support

classic Classic list List threaded Threaded
112 messages Options
123456
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat edited a comment on pull request #3773: [CARBONDATA-3830]Presto array columns read support

GitBox

ajantha-bhat edited a comment on pull request #3773:
URL: https://github.com/apache/carbondata/pull/3773#issuecomment-669844507


   @akkio-97 : update limitations and TODO clearly
   a. with local dictionary arrays cannot be read now
   b. arrays with other complex type is not supported yet
   c. currently, array is row by row filling, not really vector processing.  can use offset vector-like ORC
   https://github.com/prestosql/presto/blob/master/presto-orc/src/main/java/io/prestosql/orc/reader/ListColumnReader.java
   
   I also feel arrayStreamRader and some interface need to cleaned up [I will do it with struct support]


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akkio-97 commented on a change in pull request #3773: [CARBONDATA-3830]Presto array columns read support

GitBox
In reply to this post by GitBox

akkio-97 commented on a change in pull request #3773:
URL: https://github.com/apache/carbondata/pull/3773#discussion_r466857297



##########
File path: integration/presto/src/test/scala/org/apache/carbondata/presto/integrationtest/GenerateFiles.scala
##########
@@ -0,0 +1,667 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.presto.integrationtest
+
+import java.io.{ByteArrayInputStream, ByteArrayOutputStream, DataInputStream, File, InputStream}
+import java.util
+
+import scala.collection.JavaConverters._
+
+import org.apache.avro
+import org.apache.avro.file.DataFileWriter
+import org.apache.avro.generic.{GenericDatumReader, GenericDatumWriter, GenericRecord}
+import org.apache.avro.io.{DecoderFactory, Encoder}
+import org.junit.Assert
+
+import org.apache.carbondata.core.cache.dictionary.DictionaryByteArrayWrapper
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.datastore.block.TableBlockInfo
+import org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk
+import org.apache.carbondata.core.datastore.chunk.reader.CarbonDataReaderFactory
+import org.apache.carbondata.core.datastore.chunk.reader.dimension.v3.DimensionChunkReaderV3
+import org.apache.carbondata.core.datastore.compression.CompressorFactory
+import org.apache.carbondata.core.datastore.filesystem.{CarbonFile, CarbonFileFilter}
+import org.apache.carbondata.core.datastore.impl.FileFactory
+import org.apache.carbondata.core.datastore.page.encoding.DefaultEncodingFactory
+import org.apache.carbondata.core.metadata.ColumnarFormatVersion
+import org.apache.carbondata.core.util.{CarbonMetadataUtil, DataFileFooterConverterV3}
+import org.apache.carbondata.sdk.file.CarbonWriter
+
+class GenerateFiles {
+
+  def singleLevelArrayFile() = {
+    val json1: String =
+      """ {"stringCol": "bob","intCol": 14,"doubleCol": 10.5,"realCol": 12.7,
+        |"boolCol": true,"arrayStringCol1":["Street1"],"arrayStringCol2": ["India", "Egypt"],
+        |"arrayIntCol": [1,2,3],"arrayBigIntCol":[70000,600000000],"arrayRealCol":[1.111,2.2],
+        |"arrayDoubleCol":[1.1,2.2,3.3], "arrayBooleanCol": [true, false, true]} """.stripMargin
+    val json2: String =
+      """ {"stringCol": "Alex","intCol": 15,"doubleCol": 11.5,"realCol": 13.7,
+        |"boolCol": true, "arrayStringCol1": ["Street1", "Street2"],"arrayStringCol2": ["Japan",
+        |"China", "India"],"arrayIntCol": [1,2,3,4],"arrayBigIntCol":[70000,600000000,8000],
+        |"arrayRealCol":[1.1,2.2,3.3],"arrayDoubleCol":[1.1,2.2,4.45,3.3],
+        |"arrayBooleanCol": [true, true, true]} """.stripMargin
+    val json3: String =
+      """ {"stringCol": "Rio","intCol": 16,"doubleCol": 12.5,"realCol": 14.7,
+        |"boolCol": true, "arrayStringCol1": ["Street1", "Street2","Street3"],
+        |"arrayStringCol2": ["China", "Brazil", "Paris", "France"],"arrayIntCol": [1,2,3,4,5],
+        |"arrayBigIntCol":[70000,600000000,8000,9111111111],"arrayRealCol":[1.1,2.2,3.3,4.45],
+        |"arrayDoubleCol":[1.1,2.2,4.45,5.5,3.3], "arrayBooleanCol": [true, false, true]} """
+        .stripMargin
+    val json4: String =
+      """ {"stringCol": "bob","intCol": 14,"doubleCol": 10.5,"realCol": 12.7,
+        |"boolCol": true, "arrayStringCol1":["Street1"],"arrayStringCol2": ["India", "Egypt"],
+        |"arrayIntCol": [1,2,3],"arrayBigIntCol":[70000,600000000],"arrayRealCol":[1.1,2.2],
+        |"arrayDoubleCol":[1.1,2.2,3.3], "arrayBooleanCol": [true, false, true]} """.stripMargin
+    val json5: String =
+      """ {"stringCol": "Alex","intCol": 15,"doubleCol": 11.5,"realCol": 13.7,
+        |"boolCol": true, "arrayStringCol1": ["Street1", "Street2"],"arrayStringCol2": ["Japan",
+        |"China", "India"],"arrayIntCol": [1,2,3,4],"arrayBigIntCol":[70000,600000000,8000],
+        |"arrayRealCol":[1.1,2.2,3.3],"arrayDoubleCol":[4,1,21.222,15.231],
+        |"arrayBooleanCol": [false, false, false]} """.stripMargin
+
+
+    val mySchema =
+      """ {
+        |      "name": "address",
+        |      "type": "record",
+        |      "fields": [
+        |      {
+        |      "name": "stringCol",
+        |      "type": "string"
+        |      },
+        |      {
+        |      "name": "intCol",
+        |      "type": "int"
+        |      },
+        |      {
+        |      "name": "doubleCol",
+        |      "type": "double"
+        |      },
+        |      {
+        |      "name": "realCol",
+        |      "type": "float"
+        |      },
+        |      {
+        |      "name": "boolCol",
+        |      "type": "boolean"
+        |      },
+        |      {
+        |      "name": "arrayStringCol1",
+        |      "type": {
+        |      "type": "array",
+        |      "items": {
+        |      "name": "street",
+        |      "type": "string"
+        |      }
+        |      }
+        |      },
+        |      {
+        |      "name": "arrayStringCol2",
+        |      "type": {
+        |      "type": "array",
+        |      "items": {
+        |      "name": "street",
+        |      "type": "string"
+        |      }
+        |      }
+        |      },
+        |      {
+        |      "name": "arrayIntCol",
+        |      "type": {
+        |      "type": "array",
+        |      "items": {
+        |      "name": "street",
+        |      "type": "int"
+        |      }
+        |      }
+        |      },
+        |      {
+        |      "name": "arrayBigIntCol",
+        |      "type": {
+        |      "type": "array",
+        |      "items": {
+        |      "name": "street",
+        |      "type": "long"
+        |      }
+        |      }
+        |      },
+        |      {
+        |      "name": "arrayRealCol",
+        |      "type": {
+        |      "type": "array",
+        |      "items": {
+        |      "name": "street",
+        |      "type": "float"
+        |      }
+        |      }
+        |      },
+        |      {
+        |      "name": "arrayDoubleCol",
+        |      "type": {
+        |      "type": "array",
+        |      "items": {
+        |      "name": "street",
+        |      "type": "double"
+        |      }
+        |      }
+        |      },
+        |      {
+        |      "name": "arrayBooleanCol",
+        |      "type": {
+        |      "type": "array",
+        |      "items": {
+        |      "name": "street",
+        |      "type": "boolean"
+        |      }
+        |      }
+        |      }
+        |      ]
+        |  }
+                   """.stripMargin
+
+    val nn = new avro.Schema.Parser().parse(mySchema)
+    val record1 = testUtil.jsonToAvro(json1, mySchema)
+    val record2 = testUtil.jsonToAvro(json2, mySchema)
+    val record3 = testUtil.jsonToAvro(json3, mySchema)
+    val record4 = testUtil.jsonToAvro(json4, mySchema)
+    val record5 = testUtil.jsonToAvro(json5, mySchema)
+    var writerPath = new File(this.getClass.getResource("/").getPath
+                              + "../../target/store/sdk_output/files")
+      .getCanonicalPath
+    //getCanonicalPath gives path with \, but the code expects /.
+    writerPath = writerPath.replace("\\", "/")
+    try {
+      val writer = CarbonWriter.builder
+        .outputPath(writerPath)
+        .enableLocalDictionary(false)
+        .uniqueIdentifier(System.currentTimeMillis())
+        .withAvroInput(nn)
+        .writtenBy("GenerateFiles")
+        .build()
+      writer.write(record1)
+      writer.write(record2)
+      writer.write(record3)
+      writer.write(record4)
+      writer.write(record5)
+      writer.close()
+    } catch {
+      case e: Exception =>
+        e.printStackTrace()
+        Assert.fail(e.getMessage)
+    }
+  }
+
+  def twoLevelArrayFile() = {
+    val json1 =
+      """   {
+        |         "arrayArrayInt": [[1,2,3], [4,5]],
+        |         "arrayArrayBigInt":[[90000,600000000],[8000],[911111111]],
+        |         "arrayArrayReal":[[1.111,2.2], [9.139,2.98]],
+        |         "arrayArrayDouble":[[1.111,2.2], [9.139,2.98989898]],
+        |         "arrayArrayString":[["Japan", "China"], ["India"]],
+        |         "arrayArrayBoolean":[[false, false], [false]]
+        |        }   """.stripMargin
+    val json2 =
+      """   {
+        |         "arrayArrayInt": [[1,2,3], [0,5], [1,2,3,4,5], [4,5]],
+        |         "arrayArrayBigInt":[[40000, 600000000, 8000],[9111111111]],
+        |         "arrayArrayReal":[[1.111, 2.2], [9.139, 2.98], [9.99]],
+        |         "arrayArrayDouble":[[1.111, 2.2],[9.139777, 2.98],[9.99888]],
+        |         "arrayArrayString":[["China", "Brazil"], ["Paris", "France"]],
+        |         "arrayArrayBoolean":[[false], [true, false]]
+        |        }   """.stripMargin
+    val json3 =
+      """   {
+        |         "arrayArrayInt": [[1], [0], [3], [4,5]],
+        |         "arrayArrayBigInt":[[5000],[600000000],[8000,9111111111],[20000],[600000000,
+        |         8000,9111111111]],
+        |         "arrayArrayReal":[[9.198]],
+        |         "arrayArrayDouble":[[0.1987979]],
+        |         "arrayArrayString":[["Japan", "China", "India"]],
+        |         "arrayArrayBoolean":[[false, true, false]]
+        |        }   """.stripMargin
+    val json4 =
+      """   {
+        |         "arrayArrayInt": [[0,9,0,1,3,2,3,4,7]],
+        |         "arrayArrayBigInt":[[5000, 600087000, 8000, 9111111111, 20000, 600000000, 8000,
+        |          977777]],
+        |         "arrayArrayReal":[[1.111, 2.2], [9.139, 2.98, 4.67], [2.91, 2.2], [9.139, 2.98]],
+        |         "arrayArrayDouble":[[1.111, 2.0, 4.67, 2.91, 2.2, 9.139, 2.98]],
+        |         "arrayArrayString":[["Japan"], ["China"], ["India"]],
+        |         "arrayArrayBoolean":[[false], [true], [false]]
+        |        }   """.stripMargin
+
+    val mySchema =
+      """ {
+        | "name": "address",
+        | "type": "record",
+        | "fields": [
+        |  {
+        | "name": "arrayArrayInt",
+        | "type": {
+        | "type": "array",
+        | "items": {
+        | "name": "FloorNum",
+        | "type": "array",
+        | "items": {
+        | "name": "EachdoorNums",
+        | "type": "int"
+        | }
+        | }
+        | }
+        | },
+        |  {
+        | "name": "arrayArrayBigInt",
+        | "type": {
+        | "type": "array",
+        | "items": {
+        | "name": "FloorNum",
+        | "type": "array",
+        | "items": {
+        | "name": "EachdoorNums",
+        | "type": "long"
+        | }
+        | }
+        | }
+        | },
+        |  {
+        | "name": "arrayArrayReal",
+        | "type": {
+        | "type": "array",
+        | "items": {
+        | "name": "FloorNum",
+        | "type": "array",
+        | "items": {
+        | "name": "EachdoorNums",
+        | "type": "float"
+        | }
+        | }
+        | }
+        | },
+        |  {
+        | "name": "arrayArrayDouble",
+        | "type": {
+        | "type": "array",
+        | "items": {
+        | "name": "FloorNum",
+        | "type": "array",
+        | "items": {
+        | "name": "EachdoorNums",
+        | "type": "double"
+        | }
+        | }
+        | }
+        | },
+        |  {
+        | "name": "arrayArrayString",
+        | "type": {
+        | "type": "array",
+        | "items": {
+        | "name": "FloorNum",
+        | "type": "array",
+        | "items": {
+        | "name": "EachdoorNums",
+        | "type": "string"
+        | }
+        | }
+        | }
+        | },
+        |  {
+        | "name": "arrayArrayBoolean",
+        | "type": {
+        | "type": "array",
+        | "items": {
+        | "name": "FloorNum",
+        | "type": "array",
+        | "items": {
+        | "name": "EachdoorNums",
+        | "type": "boolean"
+        | }
+        | }
+        | }
+        | }
+        | ]
+        |} """.stripMargin
+
+    val nn = new avro.Schema.Parser().parse(mySchema)
+    val record1 = testUtil.jsonToAvro(json1, mySchema)
+    val record2 = testUtil.jsonToAvro(json2, mySchema)
+    val record3 = testUtil.jsonToAvro(json3, mySchema)
+    val record4 = testUtil.jsonToAvro(json4, mySchema)
+    var writerPath = new File(this.getClass.getResource("/").getPath
+                              + "../../target/store/sdk_output/files2")
+      .getCanonicalPath
+    //getCanonicalPath gives path with \, but the code expects /.
+    writerPath = writerPath.replace("\\", "/")
+    try {
+      val writer = CarbonWriter.builder
+        .outputPath(writerPath)
+        .enableLocalDictionary(false)
+        .uniqueIdentifier(System.currentTimeMillis())
+        .withAvroInput(nn)
+        .writtenBy("GenerateFiles")
+        .build()
+      writer.write(record1)
+      writer.write(record2)
+      writer.write(record3)
+      writer.write(record4)
+      writer.close()
+    } catch {
+      case e: Exception =>
+        e.printStackTrace()
+        Assert.fail(e.getMessage)
+    }
+  }
+
+  def threeLevelArrayFile() = {
+    val json1 =
+      """ {
+        | "array3_Int": [[[1,2,3], [4,5]], [[6,7,8], [9]], [[1,2], [4,5]]],
+        | "array3_BigInt":[[[90000,600000000],[8000]],[[911111111]]],
+        | "array3_Real":[[[1.111,2.2], [9.139,2.98]]],
+        | "array3_Double":[[[1.111,2.2]], [[9.139,2.98989898]]],
+        | "array3_String":[[["Japan", "China"], ["Brazil", "Paris"]], [["India"]]],
+        | "array3_Boolean":[[[false, false], [false]], [[true]]]
+        | } """.stripMargin
+    val json2 =
+      """ {
+        | "array3_Int": [[[1,2,3], [0,5], [1,2,3,4,5], [4,5]]],
+        | "array3_BigInt":[[[40000,600000000,8000],[9111111111]]],
+        | "array3_Real":[[[1.111,2.2], [9.139,2.98]], [[9.99]]],
+        | "array3_Double":[[[1.111,2.2], [9.139777,2.98]], [[9.99888]]],
+        | "array3_String":[[["China", "Brazil"], ["Paris", "France"]]],
+        | "array3_Boolean":[[[false], [true, false]]]
+        | } """.stripMargin
+    val json3 =
+      """ {
+        | "array3_Int": [[[1],[0],[3]],[[4,5]]],
+        | "array3_BigInt":[[[5000],[600000000],[8000,9111111111],[20000],[600000000,8000,
+        | 9111111111]]],
+        | "array3_Real":[[[9.198]]],
+        | "array3_Double":[[[0.1987979]]],
+        | "array3_String":[[["Japan", "China", "India"]]],
+        | "array3_Boolean":[[[false, true, false]]]
+        | } """.stripMargin
+    val json4 =
+      """ {
+        | "array3_Int": [[[0,9,0,1,3,2,3,4,7]]],
+        | "array3_BigInt":[[[5000,600087000,8000,9111111111,20000,600000000,8000,977777]]],
+        | "array3_Real":[[[1.111,2.2], [9.139,2.98,4.67]], [[2.91,2.2], [9.139,2.98]]],
+        | "array3_Double":[[[1.111,2,4.67, 2.91,2.2, 9.139,2.98]]],
+        | "array3_String":[[["Japan"], ["China"], ["India"]]],
+        | "array3_Boolean":[[[false], [true], [false]]]
+        | } """.stripMargin
+
+    val mySchema =
+      """ {
+        | "name": "address",
+        | "type": "record",
+        | "fields": [
+        |  {
+        | "name": "array3_Int",
+        | "type": {
+        | "type": "array",
+        | "items": {
+        | "type": "array",
+        | "items": {
+        |     "type": "array",
+        |           "items": {
+        | "type": "int"
+        |              }
+        | }
+        | }
+        | }
+        | },
+        |  {
+        | "name": "array3_BigInt",
+        | "type": {
+        | "type": "array",
+        | "items": {
+        | "type": "array",
+        | "items": {
+        |     "type": "array",
+        |           "items": {
+        | "type": "long"
+        |              }
+        | }
+        | }
+        | }
+        | },
+        |  {
+        | "name": "array3_Real",
+        | "type": {
+        | "type": "array",
+        | "items": {
+        | "type": "array",
+        | "items": {
+        |     "type": "array",
+        |           "items": {
+        | "type": "float"
+        |              }
+        | }
+        | }
+        | }
+        | },
+        |  {
+        | "name": "array3_Double",
+        | "type": {
+        | "type": "array",
+        | "items": {
+        | "type": "array",
+        | "items": {
+        |     "type": "array",
+        |           "items": {
+        | "type": "double"
+        |              }
+        | }
+        | }
+        | }
+        | },
+        |  {
+        | "name": "array3_String",
+        | "type": {
+        | "type": "array",
+        | "items": {
+        | "type": "array",
+        | "items": {
+        |     "type": "array",
+        |           "items": {
+        | "type": "string"
+        |              }
+        | }
+        | }
+        | }
+        | },
+        |  {
+        | "name": "array3_Boolean",
+        | "type": {
+        | "type": "array",
+        | "items": {
+        | "type": "array",
+        | "items": {
+        |     "type": "array",
+        |           "items": {
+        | "type": "boolean"
+        |              }
+        | }
+        | }
+        | }
+        | }
+        | ]
+        |} """.stripMargin
+
+    val nn = new avro.Schema.Parser().parse(mySchema)
+    val record1 = testUtil.jsonToAvro(json1, mySchema)
+    val record2 = testUtil.jsonToAvro(json2, mySchema)
+    val record3 = testUtil.jsonToAvro(json3, mySchema)
+    val record4 = testUtil.jsonToAvro(json4, mySchema)
+    var writerPath = new File(this.getClass.getResource("/").getPath
+                              + "../../target/store/sdk_output/files3")
+      .getCanonicalPath
+    //getCanonicalPath gives path with \, but the code expects /.
+    writerPath = writerPath.replace("\\", "/")
+    try {
+      val writer = CarbonWriter.builder
+        .outputPath(writerPath)
+        .enableLocalDictionary(false)
+        .uniqueIdentifier(System.currentTimeMillis())
+        .withAvroInput(nn)
+        .writtenBy("GenerateFiles")
+        .build()
+      writer.write(record1)
+      writer.write(record2)
+      writer.write(record3)
+      writer.write(record4)
+      writer.close()
+    } catch {
+      case e: Exception =>
+        e.printStackTrace()
+        Assert.fail(e.getMessage)
+    }
+  }
+
+  object testUtil {
+
+    def jsonToAvro(json: String, avroSchema: String): GenericRecord = {
+      var input: InputStream = null
+      var writer: DataFileWriter[GenericRecord] = null
+      var encoder: Encoder = null
+      var output: ByteArrayOutputStream = null
+      try {
+        val schema = new org.apache.avro.Schema.Parser().parse(avroSchema)
+        val reader = new GenericDatumReader[GenericRecord](schema)
+        input = new ByteArrayInputStream(json.getBytes())
+        output = new ByteArrayOutputStream()
+        val din = new DataInputStream(input)
+        writer = new DataFileWriter[GenericRecord](new GenericDatumWriter[GenericRecord]())
+        writer.create(schema, output)
+        val decoder = DecoderFactory.get().jsonDecoder(schema, din)
+        var datum: GenericRecord = reader.read(null, decoder)
+        return datum
+      } finally {
+        input.close()
+        writer.close()
+      }
+    }
+
+    /**
+     * this method returns true if local dictionary is created for all the blocklets or not
+     *
+     * @return
+     */
+    def getDimRawChunk(blockindex: Int,
+        storePath: String): util.ArrayList[DimensionRawColumnChunk] = {
+      val dataFiles = FileFactory.getCarbonFile(storePath)
+        .listFiles(new CarbonFileFilter() {
+          override def accept(file: CarbonFile): Boolean = {
+            if (file.getName
+              .endsWith(CarbonCommonConstants.FACT_FILE_EXT)) {
+              true
+            } else {
+              false
+            }
+          }
+        })
+      val dimensionRawColumnChunks = read(dataFiles(0).getAbsolutePath,
+        blockindex)
+      dimensionRawColumnChunks
+    }
+
+    def read(filePath: String, blockIndex: Int) = {
+      val carbonDataFiles = new File(filePath)
+      val dimensionRawColumnChunks = new
+          util.ArrayList[DimensionRawColumnChunk]
+      val offset = carbonDataFiles.length
+      val converter = new DataFileFooterConverterV3
+      val fileReader = FileFactory.getFileHolder(FileFactory.getFileType(filePath))
+      val actualOffset = fileReader.readLong(carbonDataFiles.getAbsolutePath, offset - 8)
+      val blockInfo = new TableBlockInfo(carbonDataFiles.getAbsolutePath,
+        actualOffset,
+        "0",
+        new Array[String](0),
+        carbonDataFiles.length,
+        ColumnarFormatVersion.V3,
+        null)
+      val dataFileFooter = converter.readDataFileFooter(blockInfo)
+      val blockletList = dataFileFooter.getBlockletList.asScala
+      for (blockletInfo <- blockletList) {
+        val dimensionColumnChunkReader =
+          CarbonDataReaderFactory
+            .getInstance
+            .getDimensionColumnChunkReader(ColumnarFormatVersion.V3,
+              blockletInfo,
+              carbonDataFiles.getAbsolutePath,
+              false).asInstanceOf[DimensionChunkReaderV3]
+        dimensionRawColumnChunks
+          .add(dimensionColumnChunkReader.readRawDimensionChunk(fileReader, blockIndex))
+      }
+      dimensionRawColumnChunks
+    }
+
+    def validateDictionary(rawColumnPage: DimensionRawColumnChunk,
+        data: Array[String]): Boolean = {
+      val local_dictionary = rawColumnPage.getDataChunkV3.local_dictionary
+      if (null != local_dictionary) {
+        val compressorName = CarbonMetadataUtil.getCompressorNameFromChunkMeta(
+          rawColumnPage.getDataChunkV3.getData_chunk_list.get(0).getChunk_meta)
+        val encodings = local_dictionary.getDictionary_meta.encoders
+        val encoderMetas = local_dictionary.getDictionary_meta.getEncoder_meta
+        val encodingFactory = DefaultEncodingFactory.getInstance
+        val decoder = encodingFactory.createDecoder(encodings, encoderMetas, compressorName)
+        val dictionaryPage = decoder
+          .decode(local_dictionary.getDictionary_data,
+            0,
+            local_dictionary.getDictionary_data.length)
+        val dictionaryMap = new util.HashMap[DictionaryByteArrayWrapper, Integer]
+        val usedDictionaryValues = util.BitSet
+          .valueOf(CompressorFactory.getInstance.getCompressor(compressorName)
+            .unCompressByte(local_dictionary.getDictionary_values))
+        var index = 0
+        var i = usedDictionaryValues.nextSetBit(0)
+        while ( { i >= 0 }) {
+          dictionaryMap
+            .put(new DictionaryByteArrayWrapper(dictionaryPage.getBytes({ index += 1; index - 1 })),
+              i)
+          i = usedDictionaryValues.nextSetBit(i + 1)
+        }
+        for (i <- data.indices) {
+          if (null == dictionaryMap.get(new DictionaryByteArrayWrapper(data(i).getBytes))) {
+            return false
+          }
+        }
+        return true
+      }
+      false
+    }
+
+    def checkForLocalDictionary(dimensionRawColumnChunks: util

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akkio-97 commented on a change in pull request #3773: [CARBONDATA-3830]Presto array columns read support

GitBox
In reply to this post by GitBox

akkio-97 commented on a change in pull request #3773:
URL: https://github.com/apache/carbondata/pull/3773#discussion_r466857370



##########
File path: integration/presto/src/test/scala/org/apache/carbondata/presto/integrationtest/GenerateFiles.scala
##########
@@ -0,0 +1,667 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.presto.integrationtest
+
+import java.io.{ByteArrayInputStream, ByteArrayOutputStream, DataInputStream, File, InputStream}
+import java.util
+
+import scala.collection.JavaConverters._
+
+import org.apache.avro
+import org.apache.avro.file.DataFileWriter
+import org.apache.avro.generic.{GenericDatumReader, GenericDatumWriter, GenericRecord}
+import org.apache.avro.io.{DecoderFactory, Encoder}
+import org.junit.Assert
+
+import org.apache.carbondata.core.cache.dictionary.DictionaryByteArrayWrapper
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.datastore.block.TableBlockInfo
+import org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk
+import org.apache.carbondata.core.datastore.chunk.reader.CarbonDataReaderFactory
+import org.apache.carbondata.core.datastore.chunk.reader.dimension.v3.DimensionChunkReaderV3
+import org.apache.carbondata.core.datastore.compression.CompressorFactory
+import org.apache.carbondata.core.datastore.filesystem.{CarbonFile, CarbonFileFilter}
+import org.apache.carbondata.core.datastore.impl.FileFactory
+import org.apache.carbondata.core.datastore.page.encoding.DefaultEncodingFactory
+import org.apache.carbondata.core.metadata.ColumnarFormatVersion
+import org.apache.carbondata.core.util.{CarbonMetadataUtil, DataFileFooterConverterV3}
+import org.apache.carbondata.sdk.file.CarbonWriter
+
+class GenerateFiles {
+
+  def singleLevelArrayFile() = {
+    val json1: String =
+      """ {"stringCol": "bob","intCol": 14,"doubleCol": 10.5,"realCol": 12.7,
+        |"boolCol": true,"arrayStringCol1":["Street1"],"arrayStringCol2": ["India", "Egypt"],
+        |"arrayIntCol": [1,2,3],"arrayBigIntCol":[70000,600000000],"arrayRealCol":[1.111,2.2],
+        |"arrayDoubleCol":[1.1,2.2,3.3], "arrayBooleanCol": [true, false, true]} """.stripMargin
+    val json2: String =
+      """ {"stringCol": "Alex","intCol": 15,"doubleCol": 11.5,"realCol": 13.7,
+        |"boolCol": true, "arrayStringCol1": ["Street1", "Street2"],"arrayStringCol2": ["Japan",
+        |"China", "India"],"arrayIntCol": [1,2,3,4],"arrayBigIntCol":[70000,600000000,8000],
+        |"arrayRealCol":[1.1,2.2,3.3],"arrayDoubleCol":[1.1,2.2,4.45,3.3],
+        |"arrayBooleanCol": [true, true, true]} """.stripMargin
+    val json3: String =
+      """ {"stringCol": "Rio","intCol": 16,"doubleCol": 12.5,"realCol": 14.7,
+        |"boolCol": true, "arrayStringCol1": ["Street1", "Street2","Street3"],
+        |"arrayStringCol2": ["China", "Brazil", "Paris", "France"],"arrayIntCol": [1,2,3,4,5],
+        |"arrayBigIntCol":[70000,600000000,8000,9111111111],"arrayRealCol":[1.1,2.2,3.3,4.45],
+        |"arrayDoubleCol":[1.1,2.2,4.45,5.5,3.3], "arrayBooleanCol": [true, false, true]} """
+        .stripMargin
+    val json4: String =
+      """ {"stringCol": "bob","intCol": 14,"doubleCol": 10.5,"realCol": 12.7,
+        |"boolCol": true, "arrayStringCol1":["Street1"],"arrayStringCol2": ["India", "Egypt"],
+        |"arrayIntCol": [1,2,3],"arrayBigIntCol":[70000,600000000],"arrayRealCol":[1.1,2.2],
+        |"arrayDoubleCol":[1.1,2.2,3.3], "arrayBooleanCol": [true, false, true]} """.stripMargin
+    val json5: String =
+      """ {"stringCol": "Alex","intCol": 15,"doubleCol": 11.5,"realCol": 13.7,
+        |"boolCol": true, "arrayStringCol1": ["Street1", "Street2"],"arrayStringCol2": ["Japan",
+        |"China", "India"],"arrayIntCol": [1,2,3,4],"arrayBigIntCol":[70000,600000000,8000],
+        |"arrayRealCol":[1.1,2.2,3.3],"arrayDoubleCol":[4,1,21.222,15.231],
+        |"arrayBooleanCol": [false, false, false]} """.stripMargin
+
+
+    val mySchema =
+      """ {
+        |      "name": "address",
+        |      "type": "record",
+        |      "fields": [
+        |      {
+        |      "name": "stringCol",
+        |      "type": "string"
+        |      },
+        |      {
+        |      "name": "intCol",
+        |      "type": "int"
+        |      },
+        |      {
+        |      "name": "doubleCol",
+        |      "type": "double"
+        |      },
+        |      {
+        |      "name": "realCol",
+        |      "type": "float"
+        |      },
+        |      {
+        |      "name": "boolCol",
+        |      "type": "boolean"
+        |      },
+        |      {
+        |      "name": "arrayStringCol1",
+        |      "type": {
+        |      "type": "array",
+        |      "items": {
+        |      "name": "street",
+        |      "type": "string"
+        |      }
+        |      }
+        |      },
+        |      {
+        |      "name": "arrayStringCol2",
+        |      "type": {
+        |      "type": "array",
+        |      "items": {
+        |      "name": "street",
+        |      "type": "string"
+        |      }
+        |      }
+        |      },
+        |      {
+        |      "name": "arrayIntCol",
+        |      "type": {
+        |      "type": "array",
+        |      "items": {
+        |      "name": "street",
+        |      "type": "int"
+        |      }
+        |      }
+        |      },
+        |      {
+        |      "name": "arrayBigIntCol",
+        |      "type": {
+        |      "type": "array",
+        |      "items": {
+        |      "name": "street",
+        |      "type": "long"
+        |      }
+        |      }
+        |      },
+        |      {
+        |      "name": "arrayRealCol",
+        |      "type": {
+        |      "type": "array",
+        |      "items": {
+        |      "name": "street",
+        |      "type": "float"
+        |      }
+        |      }
+        |      },
+        |      {
+        |      "name": "arrayDoubleCol",
+        |      "type": {
+        |      "type": "array",
+        |      "items": {
+        |      "name": "street",
+        |      "type": "double"
+        |      }
+        |      }
+        |      },
+        |      {
+        |      "name": "arrayBooleanCol",
+        |      "type": {
+        |      "type": "array",
+        |      "items": {
+        |      "name": "street",
+        |      "type": "boolean"
+        |      }
+        |      }
+        |      }
+        |      ]
+        |  }
+                   """.stripMargin
+
+    val nn = new avro.Schema.Parser().parse(mySchema)
+    val record1 = testUtil.jsonToAvro(json1, mySchema)
+    val record2 = testUtil.jsonToAvro(json2, mySchema)
+    val record3 = testUtil.jsonToAvro(json3, mySchema)
+    val record4 = testUtil.jsonToAvro(json4, mySchema)
+    val record5 = testUtil.jsonToAvro(json5, mySchema)
+    var writerPath = new File(this.getClass.getResource("/").getPath
+                              + "../../target/store/sdk_output/files")
+      .getCanonicalPath
+    //getCanonicalPath gives path with \, but the code expects /.
+    writerPath = writerPath.replace("\\", "/")
+    try {
+      val writer = CarbonWriter.builder
+        .outputPath(writerPath)
+        .enableLocalDictionary(false)
+        .uniqueIdentifier(System.currentTimeMillis())
+        .withAvroInput(nn)
+        .writtenBy("GenerateFiles")
+        .build()
+      writer.write(record1)
+      writer.write(record2)
+      writer.write(record3)
+      writer.write(record4)
+      writer.write(record5)
+      writer.close()
+    } catch {
+      case e: Exception =>
+        e.printStackTrace()
+        Assert.fail(e.getMessage)
+    }
+  }
+
+  def twoLevelArrayFile() = {
+    val json1 =
+      """   {
+        |         "arrayArrayInt": [[1,2,3], [4,5]],
+        |         "arrayArrayBigInt":[[90000,600000000],[8000],[911111111]],
+        |         "arrayArrayReal":[[1.111,2.2], [9.139,2.98]],
+        |         "arrayArrayDouble":[[1.111,2.2], [9.139,2.98989898]],
+        |         "arrayArrayString":[["Japan", "China"], ["India"]],
+        |         "arrayArrayBoolean":[[false, false], [false]]
+        |        }   """.stripMargin
+    val json2 =
+      """   {
+        |         "arrayArrayInt": [[1,2,3], [0,5], [1,2,3,4,5], [4,5]],
+        |         "arrayArrayBigInt":[[40000, 600000000, 8000],[9111111111]],
+        |         "arrayArrayReal":[[1.111, 2.2], [9.139, 2.98], [9.99]],
+        |         "arrayArrayDouble":[[1.111, 2.2],[9.139777, 2.98],[9.99888]],
+        |         "arrayArrayString":[["China", "Brazil"], ["Paris", "France"]],
+        |         "arrayArrayBoolean":[[false], [true, false]]
+        |        }   """.stripMargin
+    val json3 =
+      """   {
+        |         "arrayArrayInt": [[1], [0], [3], [4,5]],
+        |         "arrayArrayBigInt":[[5000],[600000000],[8000,9111111111],[20000],[600000000,
+        |         8000,9111111111]],
+        |         "arrayArrayReal":[[9.198]],
+        |         "arrayArrayDouble":[[0.1987979]],
+        |         "arrayArrayString":[["Japan", "China", "India"]],
+        |         "arrayArrayBoolean":[[false, true, false]]
+        |        }   """.stripMargin
+    val json4 =
+      """   {
+        |         "arrayArrayInt": [[0,9,0,1,3,2,3,4,7]],
+        |         "arrayArrayBigInt":[[5000, 600087000, 8000, 9111111111, 20000, 600000000, 8000,
+        |          977777]],
+        |         "arrayArrayReal":[[1.111, 2.2], [9.139, 2.98, 4.67], [2.91, 2.2], [9.139, 2.98]],
+        |         "arrayArrayDouble":[[1.111, 2.0, 4.67, 2.91, 2.2, 9.139, 2.98]],
+        |         "arrayArrayString":[["Japan"], ["China"], ["India"]],
+        |         "arrayArrayBoolean":[[false], [true], [false]]
+        |        }   """.stripMargin
+
+    val mySchema =
+      """ {
+        | "name": "address",
+        | "type": "record",
+        | "fields": [
+        |  {
+        | "name": "arrayArrayInt",
+        | "type": {
+        | "type": "array",
+        | "items": {
+        | "name": "FloorNum",
+        | "type": "array",
+        | "items": {
+        | "name": "EachdoorNums",
+        | "type": "int"
+        | }
+        | }
+        | }
+        | },
+        |  {
+        | "name": "arrayArrayBigInt",
+        | "type": {
+        | "type": "array",
+        | "items": {
+        | "name": "FloorNum",
+        | "type": "array",
+        | "items": {
+        | "name": "EachdoorNums",
+        | "type": "long"
+        | }
+        | }
+        | }
+        | },
+        |  {
+        | "name": "arrayArrayReal",
+        | "type": {
+        | "type": "array",
+        | "items": {
+        | "name": "FloorNum",
+        | "type": "array",
+        | "items": {
+        | "name": "EachdoorNums",
+        | "type": "float"
+        | }
+        | }
+        | }
+        | },
+        |  {
+        | "name": "arrayArrayDouble",
+        | "type": {
+        | "type": "array",
+        | "items": {
+        | "name": "FloorNum",
+        | "type": "array",
+        | "items": {
+        | "name": "EachdoorNums",
+        | "type": "double"
+        | }
+        | }
+        | }
+        | },
+        |  {
+        | "name": "arrayArrayString",
+        | "type": {
+        | "type": "array",
+        | "items": {
+        | "name": "FloorNum",
+        | "type": "array",
+        | "items": {
+        | "name": "EachdoorNums",
+        | "type": "string"
+        | }
+        | }
+        | }
+        | },
+        |  {
+        | "name": "arrayArrayBoolean",
+        | "type": {
+        | "type": "array",
+        | "items": {
+        | "name": "FloorNum",
+        | "type": "array",
+        | "items": {
+        | "name": "EachdoorNums",
+        | "type": "boolean"
+        | }
+        | }
+        | }
+        | }
+        | ]
+        |} """.stripMargin
+
+    val nn = new avro.Schema.Parser().parse(mySchema)
+    val record1 = testUtil.jsonToAvro(json1, mySchema)
+    val record2 = testUtil.jsonToAvro(json2, mySchema)
+    val record3 = testUtil.jsonToAvro(json3, mySchema)
+    val record4 = testUtil.jsonToAvro(json4, mySchema)
+    var writerPath = new File(this.getClass.getResource("/").getPath
+                              + "../../target/store/sdk_output/files2")
+      .getCanonicalPath
+    //getCanonicalPath gives path with \, but the code expects /.
+    writerPath = writerPath.replace("\\", "/")
+    try {
+      val writer = CarbonWriter.builder
+        .outputPath(writerPath)
+        .enableLocalDictionary(false)
+        .uniqueIdentifier(System.currentTimeMillis())
+        .withAvroInput(nn)
+        .writtenBy("GenerateFiles")
+        .build()
+      writer.write(record1)
+      writer.write(record2)
+      writer.write(record3)
+      writer.write(record4)
+      writer.close()
+    } catch {
+      case e: Exception =>
+        e.printStackTrace()
+        Assert.fail(e.getMessage)
+    }
+  }
+
+  def threeLevelArrayFile() = {
+    val json1 =
+      """ {
+        | "array3_Int": [[[1,2,3], [4,5]], [[6,7,8], [9]], [[1,2], [4,5]]],
+        | "array3_BigInt":[[[90000,600000000],[8000]],[[911111111]]],
+        | "array3_Real":[[[1.111,2.2], [9.139,2.98]]],
+        | "array3_Double":[[[1.111,2.2]], [[9.139,2.98989898]]],
+        | "array3_String":[[["Japan", "China"], ["Brazil", "Paris"]], [["India"]]],
+        | "array3_Boolean":[[[false, false], [false]], [[true]]]
+        | } """.stripMargin
+    val json2 =
+      """ {
+        | "array3_Int": [[[1,2,3], [0,5], [1,2,3,4,5], [4,5]]],
+        | "array3_BigInt":[[[40000,600000000,8000],[9111111111]]],
+        | "array3_Real":[[[1.111,2.2], [9.139,2.98]], [[9.99]]],
+        | "array3_Double":[[[1.111,2.2], [9.139777,2.98]], [[9.99888]]],
+        | "array3_String":[[["China", "Brazil"], ["Paris", "France"]]],
+        | "array3_Boolean":[[[false], [true, false]]]
+        | } """.stripMargin
+    val json3 =
+      """ {
+        | "array3_Int": [[[1],[0],[3]],[[4,5]]],
+        | "array3_BigInt":[[[5000],[600000000],[8000,9111111111],[20000],[600000000,8000,
+        | 9111111111]]],
+        | "array3_Real":[[[9.198]]],
+        | "array3_Double":[[[0.1987979]]],
+        | "array3_String":[[["Japan", "China", "India"]]],
+        | "array3_Boolean":[[[false, true, false]]]
+        | } """.stripMargin
+    val json4 =
+      """ {
+        | "array3_Int": [[[0,9,0,1,3,2,3,4,7]]],
+        | "array3_BigInt":[[[5000,600087000,8000,9111111111,20000,600000000,8000,977777]]],
+        | "array3_Real":[[[1.111,2.2], [9.139,2.98,4.67]], [[2.91,2.2], [9.139,2.98]]],
+        | "array3_Double":[[[1.111,2,4.67, 2.91,2.2, 9.139,2.98]]],
+        | "array3_String":[[["Japan"], ["China"], ["India"]]],
+        | "array3_Boolean":[[[false], [true], [false]]]
+        | } """.stripMargin
+
+    val mySchema =
+      """ {
+        | "name": "address",
+        | "type": "record",
+        | "fields": [
+        |  {
+        | "name": "array3_Int",
+        | "type": {
+        | "type": "array",
+        | "items": {
+        | "type": "array",
+        | "items": {
+        |     "type": "array",
+        |           "items": {
+        | "type": "int"
+        |              }
+        | }
+        | }
+        | }
+        | },
+        |  {
+        | "name": "array3_BigInt",
+        | "type": {
+        | "type": "array",
+        | "items": {
+        | "type": "array",
+        | "items": {
+        |     "type": "array",
+        |           "items": {
+        | "type": "long"
+        |              }
+        | }
+        | }
+        | }
+        | },
+        |  {
+        | "name": "array3_Real",
+        | "type": {
+        | "type": "array",
+        | "items": {
+        | "type": "array",
+        | "items": {
+        |     "type": "array",
+        |           "items": {
+        | "type": "float"
+        |              }
+        | }
+        | }
+        | }
+        | },
+        |  {
+        | "name": "array3_Double",
+        | "type": {
+        | "type": "array",
+        | "items": {
+        | "type": "array",
+        | "items": {
+        |     "type": "array",
+        |           "items": {
+        | "type": "double"
+        |              }
+        | }
+        | }
+        | }
+        | },
+        |  {
+        | "name": "array3_String",
+        | "type": {
+        | "type": "array",
+        | "items": {
+        | "type": "array",
+        | "items": {
+        |     "type": "array",
+        |           "items": {
+        | "type": "string"
+        |              }
+        | }
+        | }
+        | }
+        | },
+        |  {
+        | "name": "array3_Boolean",
+        | "type": {
+        | "type": "array",
+        | "items": {
+        | "type": "array",
+        | "items": {
+        |     "type": "array",
+        |           "items": {
+        | "type": "boolean"
+        |              }
+        | }
+        | }
+        | }
+        | }
+        | ]
+        |} """.stripMargin
+
+    val nn = new avro.Schema.Parser().parse(mySchema)
+    val record1 = testUtil.jsonToAvro(json1, mySchema)
+    val record2 = testUtil.jsonToAvro(json2, mySchema)
+    val record3 = testUtil.jsonToAvro(json3, mySchema)
+    val record4 = testUtil.jsonToAvro(json4, mySchema)
+    var writerPath = new File(this.getClass.getResource("/").getPath
+                              + "../../target/store/sdk_output/files3")
+      .getCanonicalPath
+    //getCanonicalPath gives path with \, but the code expects /.
+    writerPath = writerPath.replace("\\", "/")
+    try {
+      val writer = CarbonWriter.builder
+        .outputPath(writerPath)
+        .enableLocalDictionary(false)
+        .uniqueIdentifier(System.currentTimeMillis())
+        .withAvroInput(nn)
+        .writtenBy("GenerateFiles")
+        .build()
+      writer.write(record1)
+      writer.write(record2)
+      writer.write(record3)
+      writer.write(record4)
+      writer.close()
+    } catch {
+      case e: Exception =>
+        e.printStackTrace()
+        Assert.fail(e.getMessage)
+    }
+  }
+
+  object testUtil {
+
+    def jsonToAvro(json: String, avroSchema: String): GenericRecord = {
+      var input: InputStream = null
+      var writer: DataFileWriter[GenericRecord] = null
+      var encoder: Encoder = null
+      var output: ByteArrayOutputStream = null
+      try {
+        val schema = new org.apache.avro.Schema.Parser().parse(avroSchema)
+        val reader = new GenericDatumReader[GenericRecord](schema)
+        input = new ByteArrayInputStream(json.getBytes())
+        output = new ByteArrayOutputStream()
+        val din = new DataInputStream(input)
+        writer = new DataFileWriter[GenericRecord](new GenericDatumWriter[GenericRecord]())
+        writer.create(schema, output)
+        val decoder = DecoderFactory.get().jsonDecoder(schema, din)
+        var datum: GenericRecord = reader.read(null, decoder)
+        return datum
+      } finally {
+        input.close()
+        writer.close()
+      }
+    }
+
+    /**
+     * this method returns true if local dictionary is created for all the blocklets or not
+     *
+     * @return
+     */
+    def getDimRawChunk(blockindex: Int,
+        storePath: String): util.ArrayList[DimensionRawColumnChunk] = {
+      val dataFiles = FileFactory.getCarbonFile(storePath)
+        .listFiles(new CarbonFileFilter() {
+          override def accept(file: CarbonFile): Boolean = {
+            if (file.getName
+              .endsWith(CarbonCommonConstants.FACT_FILE_EXT)) {
+              true
+            } else {
+              false
+            }
+          }
+        })
+      val dimensionRawColumnChunks = read(dataFiles(0).getAbsolutePath,
+        blockindex)
+      dimensionRawColumnChunks
+    }
+
+    def read(filePath: String, blockIndex: Int) = {
+      val carbonDataFiles = new File(filePath)
+      val dimensionRawColumnChunks = new
+          util.ArrayList[DimensionRawColumnChunk]
+      val offset = carbonDataFiles.length
+      val converter = new DataFileFooterConverterV3
+      val fileReader = FileFactory.getFileHolder(FileFactory.getFileType(filePath))
+      val actualOffset = fileReader.readLong(carbonDataFiles.getAbsolutePath, offset - 8)
+      val blockInfo = new TableBlockInfo(carbonDataFiles.getAbsolutePath,
+        actualOffset,
+        "0",
+        new Array[String](0),
+        carbonDataFiles.length,
+        ColumnarFormatVersion.V3,
+        null)
+      val dataFileFooter = converter.readDataFileFooter(blockInfo)
+      val blockletList = dataFileFooter.getBlockletList.asScala
+      for (blockletInfo <- blockletList) {
+        val dimensionColumnChunkReader =
+          CarbonDataReaderFactory
+            .getInstance
+            .getDimensionColumnChunkReader(ColumnarFormatVersion.V3,
+              blockletInfo,
+              carbonDataFiles.getAbsolutePath,
+              false).asInstanceOf[DimensionChunkReaderV3]
+        dimensionRawColumnChunks
+          .add(dimensionColumnChunkReader.readRawDimensionChunk(fileReader, blockIndex))
+      }
+      dimensionRawColumnChunks
+    }
+
+    def validateDictionary(rawColumnPage: DimensionRawColumnChunk,

Review comment:
       done

##########
File path: integration/presto/src/test/scala/org/apache/carbondata/presto/integrationtest/GenerateFiles.scala
##########
@@ -0,0 +1,667 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.presto.integrationtest
+
+import java.io.{ByteArrayInputStream, ByteArrayOutputStream, DataInputStream, File, InputStream}
+import java.util
+
+import scala.collection.JavaConverters._
+
+import org.apache.avro
+import org.apache.avro.file.DataFileWriter
+import org.apache.avro.generic.{GenericDatumReader, GenericDatumWriter, GenericRecord}
+import org.apache.avro.io.{DecoderFactory, Encoder}
+import org.junit.Assert
+
+import org.apache.carbondata.core.cache.dictionary.DictionaryByteArrayWrapper
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.datastore.block.TableBlockInfo
+import org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk
+import org.apache.carbondata.core.datastore.chunk.reader.CarbonDataReaderFactory
+import org.apache.carbondata.core.datastore.chunk.reader.dimension.v3.DimensionChunkReaderV3
+import org.apache.carbondata.core.datastore.compression.CompressorFactory
+import org.apache.carbondata.core.datastore.filesystem.{CarbonFile, CarbonFileFilter}
+import org.apache.carbondata.core.datastore.impl.FileFactory
+import org.apache.carbondata.core.datastore.page.encoding.DefaultEncodingFactory
+import org.apache.carbondata.core.metadata.ColumnarFormatVersion
+import org.apache.carbondata.core.util.{CarbonMetadataUtil, DataFileFooterConverterV3}
+import org.apache.carbondata.sdk.file.CarbonWriter
+
+class GenerateFiles {
+
+  def singleLevelArrayFile() = {
+    val json1: String =
+      """ {"stringCol": "bob","intCol": 14,"doubleCol": 10.5,"realCol": 12.7,
+        |"boolCol": true,"arrayStringCol1":["Street1"],"arrayStringCol2": ["India", "Egypt"],
+        |"arrayIntCol": [1,2,3],"arrayBigIntCol":[70000,600000000],"arrayRealCol":[1.111,2.2],
+        |"arrayDoubleCol":[1.1,2.2,3.3], "arrayBooleanCol": [true, false, true]} """.stripMargin
+    val json2: String =
+      """ {"stringCol": "Alex","intCol": 15,"doubleCol": 11.5,"realCol": 13.7,
+        |"boolCol": true, "arrayStringCol1": ["Street1", "Street2"],"arrayStringCol2": ["Japan",
+        |"China", "India"],"arrayIntCol": [1,2,3,4],"arrayBigIntCol":[70000,600000000,8000],
+        |"arrayRealCol":[1.1,2.2,3.3],"arrayDoubleCol":[1.1,2.2,4.45,3.3],
+        |"arrayBooleanCol": [true, true, true]} """.stripMargin
+    val json3: String =
+      """ {"stringCol": "Rio","intCol": 16,"doubleCol": 12.5,"realCol": 14.7,
+        |"boolCol": true, "arrayStringCol1": ["Street1", "Street2","Street3"],
+        |"arrayStringCol2": ["China", "Brazil", "Paris", "France"],"arrayIntCol": [1,2,3,4,5],
+        |"arrayBigIntCol":[70000,600000000,8000,9111111111],"arrayRealCol":[1.1,2.2,3.3,4.45],
+        |"arrayDoubleCol":[1.1,2.2,4.45,5.5,3.3], "arrayBooleanCol": [true, false, true]} """
+        .stripMargin
+    val json4: String =
+      """ {"stringCol": "bob","intCol": 14,"doubleCol": 10.5,"realCol": 12.7,
+        |"boolCol": true, "arrayStringCol1":["Street1"],"arrayStringCol2": ["India", "Egypt"],
+        |"arrayIntCol": [1,2,3],"arrayBigIntCol":[70000,600000000],"arrayRealCol":[1.1,2.2],
+        |"arrayDoubleCol":[1.1,2.2,3.3], "arrayBooleanCol": [true, false, true]} """.stripMargin
+    val json5: String =
+      """ {"stringCol": "Alex","intCol": 15,"doubleCol": 11.5,"realCol": 13.7,
+        |"boolCol": true, "arrayStringCol1": ["Street1", "Street2"],"arrayStringCol2": ["Japan",
+        |"China", "India"],"arrayIntCol": [1,2,3,4],"arrayBigIntCol":[70000,600000000,8000],
+        |"arrayRealCol":[1.1,2.2,3.3],"arrayDoubleCol":[4,1,21.222,15.231],
+        |"arrayBooleanCol": [false, false, false]} """.stripMargin
+
+
+    val mySchema =
+      """ {
+        |      "name": "address",
+        |      "type": "record",
+        |      "fields": [
+        |      {
+        |      "name": "stringCol",
+        |      "type": "string"
+        |      },
+        |      {
+        |      "name": "intCol",
+        |      "type": "int"
+        |      },
+        |      {
+        |      "name": "doubleCol",
+        |      "type": "double"
+        |      },
+        |      {
+        |      "name": "realCol",
+        |      "type": "float"
+        |      },
+        |      {
+        |      "name": "boolCol",
+        |      "type": "boolean"
+        |      },
+        |      {
+        |      "name": "arrayStringCol1",
+        |      "type": {
+        |      "type": "array",
+        |      "items": {
+        |      "name": "street",
+        |      "type": "string"
+        |      }
+        |      }
+        |      },
+        |      {
+        |      "name": "arrayStringCol2",
+        |      "type": {
+        |      "type": "array",
+        |      "items": {
+        |      "name": "street",
+        |      "type": "string"
+        |      }
+        |      }
+        |      },
+        |      {
+        |      "name": "arrayIntCol",
+        |      "type": {
+        |      "type": "array",
+        |      "items": {
+        |      "name": "street",
+        |      "type": "int"
+        |      }
+        |      }
+        |      },
+        |      {
+        |      "name": "arrayBigIntCol",
+        |      "type": {
+        |      "type": "array",
+        |      "items": {
+        |      "name": "street",
+        |      "type": "long"
+        |      }
+        |      }
+        |      },
+        |      {
+        |      "name": "arrayRealCol",
+        |      "type": {
+        |      "type": "array",
+        |      "items": {
+        |      "name": "street",
+        |      "type": "float"
+        |      }
+        |      }
+        |      },
+        |      {
+        |      "name": "arrayDoubleCol",
+        |      "type": {
+        |      "type": "array",
+        |      "items": {
+        |      "name": "street",
+        |      "type": "double"
+        |      }
+        |      }
+        |      },
+        |      {
+        |      "name": "arrayBooleanCol",
+        |      "type": {
+        |      "type": "array",
+        |      "items": {
+        |      "name": "street",
+        |      "type": "boolean"
+        |      }
+        |      }
+        |      }
+        |      ]
+        |  }
+                   """.stripMargin
+
+    val nn = new avro.Schema.Parser().parse(mySchema)
+    val record1 = testUtil.jsonToAvro(json1, mySchema)
+    val record2 = testUtil.jsonToAvro(json2, mySchema)
+    val record3 = testUtil.jsonToAvro(json3, mySchema)
+    val record4 = testUtil.jsonToAvro(json4, mySchema)
+    val record5 = testUtil.jsonToAvro(json5, mySchema)
+    var writerPath = new File(this.getClass.getResource("/").getPath
+                              + "../../target/store/sdk_output/files")
+      .getCanonicalPath
+    //getCanonicalPath gives path with \, but the code expects /.
+    writerPath = writerPath.replace("\\", "/")
+    try {
+      val writer = CarbonWriter.builder
+        .outputPath(writerPath)
+        .enableLocalDictionary(false)
+        .uniqueIdentifier(System.currentTimeMillis())
+        .withAvroInput(nn)
+        .writtenBy("GenerateFiles")
+        .build()
+      writer.write(record1)
+      writer.write(record2)
+      writer.write(record3)
+      writer.write(record4)
+      writer.write(record5)
+      writer.close()
+    } catch {
+      case e: Exception =>
+        e.printStackTrace()
+        Assert.fail(e.getMessage)
+    }
+  }
+
+  def twoLevelArrayFile() = {
+    val json1 =
+      """   {
+        |         "arrayArrayInt": [[1,2,3], [4,5]],
+        |         "arrayArrayBigInt":[[90000,600000000],[8000],[911111111]],
+        |         "arrayArrayReal":[[1.111,2.2], [9.139,2.98]],
+        |         "arrayArrayDouble":[[1.111,2.2], [9.139,2.98989898]],
+        |         "arrayArrayString":[["Japan", "China"], ["India"]],
+        |         "arrayArrayBoolean":[[false, false], [false]]
+        |        }   """.stripMargin
+    val json2 =
+      """   {
+        |         "arrayArrayInt": [[1,2,3], [0,5], [1,2,3,4,5], [4,5]],
+        |         "arrayArrayBigInt":[[40000, 600000000, 8000],[9111111111]],
+        |         "arrayArrayReal":[[1.111, 2.2], [9.139, 2.98], [9.99]],
+        |         "arrayArrayDouble":[[1.111, 2.2],[9.139777, 2.98],[9.99888]],
+        |         "arrayArrayString":[["China", "Brazil"], ["Paris", "France"]],
+        |         "arrayArrayBoolean":[[false], [true, false]]
+        |        }   """.stripMargin
+    val json3 =
+      """   {
+        |         "arrayArrayInt": [[1], [0], [3], [4,5]],
+        |         "arrayArrayBigInt":[[5000],[600000000],[8000,9111111111],[20000],[600000000,
+        |         8000,9111111111]],
+        |         "arrayArrayReal":[[9.198]],
+        |         "arrayArrayDouble":[[0.1987979]],
+        |         "arrayArrayString":[["Japan", "China", "India"]],
+        |         "arrayArrayBoolean":[[false, true, false]]
+        |        }   """.stripMargin
+    val json4 =
+      """   {
+        |         "arrayArrayInt": [[0,9,0,1,3,2,3,4,7]],
+        |         "arrayArrayBigInt":[[5000, 600087000, 8000, 9111111111, 20000, 600000000, 8000,
+        |          977777]],
+        |         "arrayArrayReal":[[1.111, 2.2], [9.139, 2.98, 4.67], [2.91, 2.2], [9.139, 2.98]],
+        |         "arrayArrayDouble":[[1.111, 2.0, 4.67, 2.91, 2.2, 9.139, 2.98]],
+        |         "arrayArrayString":[["Japan"], ["China"], ["India"]],
+        |         "arrayArrayBoolean":[[false], [true], [false]]
+        |        }   """.stripMargin
+
+    val mySchema =
+      """ {
+        | "name": "address",
+        | "type": "record",
+        | "fields": [
+        |  {
+        | "name": "arrayArrayInt",
+        | "type": {
+        | "type": "array",
+        | "items": {
+        | "name": "FloorNum",
+        | "type": "array",
+        | "items": {
+        | "name": "EachdoorNums",
+        | "type": "int"
+        | }
+        | }
+        | }
+        | },
+        |  {
+        | "name": "arrayArrayBigInt",
+        | "type": {
+        | "type": "array",
+        | "items": {
+        | "name": "FloorNum",
+        | "type": "array",
+        | "items": {
+        | "name": "EachdoorNums",
+        | "type": "long"
+        | }
+        | }
+        | }
+        | },
+        |  {
+        | "name": "arrayArrayReal",
+        | "type": {
+        | "type": "array",
+        | "items": {
+        | "name": "FloorNum",
+        | "type": "array",
+        | "items": {
+        | "name": "EachdoorNums",
+        | "type": "float"
+        | }
+        | }
+        | }
+        | },
+        |  {
+        | "name": "arrayArrayDouble",
+        | "type": {
+        | "type": "array",
+        | "items": {
+        | "name": "FloorNum",
+        | "type": "array",
+        | "items": {
+        | "name": "EachdoorNums",
+        | "type": "double"
+        | }
+        | }
+        | }
+        | },
+        |  {
+        | "name": "arrayArrayString",
+        | "type": {
+        | "type": "array",
+        | "items": {
+        | "name": "FloorNum",
+        | "type": "array",
+        | "items": {
+        | "name": "EachdoorNums",
+        | "type": "string"
+        | }
+        | }
+        | }
+        | },
+        |  {
+        | "name": "arrayArrayBoolean",
+        | "type": {
+        | "type": "array",
+        | "items": {
+        | "name": "FloorNum",
+        | "type": "array",
+        | "items": {
+        | "name": "EachdoorNums",
+        | "type": "boolean"
+        | }
+        | }
+        | }
+        | }
+        | ]
+        |} """.stripMargin
+
+    val nn = new avro.Schema.Parser().parse(mySchema)
+    val record1 = testUtil.jsonToAvro(json1, mySchema)
+    val record2 = testUtil.jsonToAvro(json2, mySchema)
+    val record3 = testUtil.jsonToAvro(json3, mySchema)
+    val record4 = testUtil.jsonToAvro(json4, mySchema)
+    var writerPath = new File(this.getClass.getResource("/").getPath
+                              + "../../target/store/sdk_output/files2")
+      .getCanonicalPath
+    //getCanonicalPath gives path with \, but the code expects /.
+    writerPath = writerPath.replace("\\", "/")
+    try {
+      val writer = CarbonWriter.builder
+        .outputPath(writerPath)
+        .enableLocalDictionary(false)
+        .uniqueIdentifier(System.currentTimeMillis())
+        .withAvroInput(nn)
+        .writtenBy("GenerateFiles")
+        .build()
+      writer.write(record1)
+      writer.write(record2)
+      writer.write(record3)
+      writer.write(record4)
+      writer.close()
+    } catch {
+      case e: Exception =>
+        e.printStackTrace()
+        Assert.fail(e.getMessage)
+    }
+  }
+
+  def threeLevelArrayFile() = {
+    val json1 =
+      """ {
+        | "array3_Int": [[[1,2,3], [4,5]], [[6,7,8], [9]], [[1,2], [4,5]]],
+        | "array3_BigInt":[[[90000,600000000],[8000]],[[911111111]]],
+        | "array3_Real":[[[1.111,2.2], [9.139,2.98]]],
+        | "array3_Double":[[[1.111,2.2]], [[9.139,2.98989898]]],
+        | "array3_String":[[["Japan", "China"], ["Brazil", "Paris"]], [["India"]]],
+        | "array3_Boolean":[[[false, false], [false]], [[true]]]
+        | } """.stripMargin
+    val json2 =
+      """ {
+        | "array3_Int": [[[1,2,3], [0,5], [1,2,3,4,5], [4,5]]],
+        | "array3_BigInt":[[[40000,600000000,8000],[9111111111]]],
+        | "array3_Real":[[[1.111,2.2], [9.139,2.98]], [[9.99]]],
+        | "array3_Double":[[[1.111,2.2], [9.139777,2.98]], [[9.99888]]],
+        | "array3_String":[[["China", "Brazil"], ["Paris", "France"]]],
+        | "array3_Boolean":[[[false], [true, false]]]
+        | } """.stripMargin
+    val json3 =
+      """ {
+        | "array3_Int": [[[1],[0],[3]],[[4,5]]],
+        | "array3_BigInt":[[[5000],[600000000],[8000,9111111111],[20000],[600000000,8000,
+        | 9111111111]]],
+        | "array3_Real":[[[9.198]]],
+        | "array3_Double":[[[0.1987979]]],
+        | "array3_String":[[["Japan", "China", "India"]]],
+        | "array3_Boolean":[[[false, true, false]]]
+        | } """.stripMargin
+    val json4 =
+      """ {
+        | "array3_Int": [[[0,9,0,1,3,2,3,4,7]]],
+        | "array3_BigInt":[[[5000,600087000,8000,9111111111,20000,600000000,8000,977777]]],
+        | "array3_Real":[[[1.111,2.2], [9.139,2.98,4.67]], [[2.91,2.2], [9.139,2.98]]],
+        | "array3_Double":[[[1.111,2,4.67, 2.91,2.2, 9.139,2.98]]],
+        | "array3_String":[[["Japan"], ["China"], ["India"]]],
+        | "array3_Boolean":[[[false], [true], [false]]]
+        | } """.stripMargin
+
+    val mySchema =
+      """ {
+        | "name": "address",
+        | "type": "record",
+        | "fields": [
+        |  {
+        | "name": "array3_Int",
+        | "type": {
+        | "type": "array",
+        | "items": {
+        | "type": "array",
+        | "items": {
+        |     "type": "array",
+        |           "items": {
+        | "type": "int"
+        |              }
+        | }
+        | }
+        | }
+        | },
+        |  {
+        | "name": "array3_BigInt",
+        | "type": {
+        | "type": "array",
+        | "items": {
+        | "type": "array",
+        | "items": {
+        |     "type": "array",
+        |           "items": {
+        | "type": "long"
+        |              }
+        | }
+        | }
+        | }
+        | },
+        |  {
+        | "name": "array3_Real",
+        | "type": {
+        | "type": "array",
+        | "items": {
+        | "type": "array",
+        | "items": {
+        |     "type": "array",
+        |           "items": {
+        | "type": "float"
+        |              }
+        | }
+        | }
+        | }
+        | },
+        |  {
+        | "name": "array3_Double",
+        | "type": {
+        | "type": "array",
+        | "items": {
+        | "type": "array",
+        | "items": {
+        |     "type": "array",
+        |           "items": {
+        | "type": "double"
+        |              }
+        | }
+        | }
+        | }
+        | },
+        |  {
+        | "name": "array3_String",
+        | "type": {
+        | "type": "array",
+        | "items": {
+        | "type": "array",
+        | "items": {
+        |     "type": "array",
+        |           "items": {
+        | "type": "string"
+        |              }
+        | }
+        | }
+        | }
+        | },
+        |  {
+        | "name": "array3_Boolean",
+        | "type": {
+        | "type": "array",
+        | "items": {
+        | "type": "array",
+        | "items": {
+        |     "type": "array",
+        |           "items": {
+        | "type": "boolean"
+        |              }
+        | }
+        | }
+        | }
+        | }
+        | ]
+        |} """.stripMargin
+
+    val nn = new avro.Schema.Parser().parse(mySchema)
+    val record1 = testUtil.jsonToAvro(json1, mySchema)
+    val record2 = testUtil.jsonToAvro(json2, mySchema)
+    val record3 = testUtil.jsonToAvro(json3, mySchema)
+    val record4 = testUtil.jsonToAvro(json4, mySchema)
+    var writerPath = new File(this.getClass.getResource("/").getPath
+                              + "../../target/store/sdk_output/files3")
+      .getCanonicalPath
+    //getCanonicalPath gives path with \, but the code expects /.
+    writerPath = writerPath.replace("\\", "/")
+    try {
+      val writer = CarbonWriter.builder
+        .outputPath(writerPath)
+        .enableLocalDictionary(false)
+        .uniqueIdentifier(System.currentTimeMillis())
+        .withAvroInput(nn)
+        .writtenBy("GenerateFiles")
+        .build()
+      writer.write(record1)
+      writer.write(record2)
+      writer.write(record3)
+      writer.write(record4)
+      writer.close()
+    } catch {
+      case e: Exception =>
+        e.printStackTrace()
+        Assert.fail(e.getMessage)
+    }
+  }
+
+  object testUtil {
+
+    def jsonToAvro(json: String, avroSchema: String): GenericRecord = {
+      var input: InputStream = null
+      var writer: DataFileWriter[GenericRecord] = null
+      var encoder: Encoder = null
+      var output: ByteArrayOutputStream = null
+      try {
+        val schema = new org.apache.avro.Schema.Parser().parse(avroSchema)
+        val reader = new GenericDatumReader[GenericRecord](schema)
+        input = new ByteArrayInputStream(json.getBytes())
+        output = new ByteArrayOutputStream()
+        val din = new DataInputStream(input)
+        writer = new DataFileWriter[GenericRecord](new GenericDatumWriter[GenericRecord]())
+        writer.create(schema, output)
+        val decoder = DecoderFactory.get().jsonDecoder(schema, din)
+        var datum: GenericRecord = reader.read(null, decoder)
+        return datum
+      } finally {
+        input.close()
+        writer.close()
+      }
+    }
+
+    /**
+     * this method returns true if local dictionary is created for all the blocklets or not
+     *
+     * @return
+     */
+    def getDimRawChunk(blockindex: Int,
+        storePath: String): util.ArrayList[DimensionRawColumnChunk] = {
+      val dataFiles = FileFactory.getCarbonFile(storePath)
+        .listFiles(new CarbonFileFilter() {
+          override def accept(file: CarbonFile): Boolean = {
+            if (file.getName
+              .endsWith(CarbonCommonConstants.FACT_FILE_EXT)) {
+              true
+            } else {
+              false
+            }
+          }
+        })
+      val dimensionRawColumnChunks = read(dataFiles(0).getAbsolutePath,
+        blockindex)
+      dimensionRawColumnChunks
+    }
+
+    def read(filePath: String, blockIndex: Int) = {

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akkio-97 commented on a change in pull request #3773: [CARBONDATA-3830]Presto array columns read support

GitBox
In reply to this post by GitBox

akkio-97 commented on a change in pull request #3773:
URL: https://github.com/apache/carbondata/pull/3773#discussion_r466857547



##########
File path: integration/presto/src/test/scala/org/apache/carbondata/presto/integrationtest/GenerateFiles.scala
##########
@@ -0,0 +1,667 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.presto.integrationtest
+
+import java.io.{ByteArrayInputStream, ByteArrayOutputStream, DataInputStream, File, InputStream}
+import java.util
+
+import scala.collection.JavaConverters._
+
+import org.apache.avro
+import org.apache.avro.file.DataFileWriter
+import org.apache.avro.generic.{GenericDatumReader, GenericDatumWriter, GenericRecord}
+import org.apache.avro.io.{DecoderFactory, Encoder}
+import org.junit.Assert
+
+import org.apache.carbondata.core.cache.dictionary.DictionaryByteArrayWrapper
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.datastore.block.TableBlockInfo
+import org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk
+import org.apache.carbondata.core.datastore.chunk.reader.CarbonDataReaderFactory
+import org.apache.carbondata.core.datastore.chunk.reader.dimension.v3.DimensionChunkReaderV3
+import org.apache.carbondata.core.datastore.compression.CompressorFactory
+import org.apache.carbondata.core.datastore.filesystem.{CarbonFile, CarbonFileFilter}
+import org.apache.carbondata.core.datastore.impl.FileFactory
+import org.apache.carbondata.core.datastore.page.encoding.DefaultEncodingFactory
+import org.apache.carbondata.core.metadata.ColumnarFormatVersion
+import org.apache.carbondata.core.util.{CarbonMetadataUtil, DataFileFooterConverterV3}
+import org.apache.carbondata.sdk.file.CarbonWriter
+
+class GenerateFiles {
+
+  def singleLevelArrayFile() = {
+    val json1: String =
+      """ {"stringCol": "bob","intCol": 14,"doubleCol": 10.5,"realCol": 12.7,
+        |"boolCol": true,"arrayStringCol1":["Street1"],"arrayStringCol2": ["India", "Egypt"],
+        |"arrayIntCol": [1,2,3],"arrayBigIntCol":[70000,600000000],"arrayRealCol":[1.111,2.2],
+        |"arrayDoubleCol":[1.1,2.2,3.3], "arrayBooleanCol": [true, false, true]} """.stripMargin
+    val json2: String =
+      """ {"stringCol": "Alex","intCol": 15,"doubleCol": 11.5,"realCol": 13.7,
+        |"boolCol": true, "arrayStringCol1": ["Street1", "Street2"],"arrayStringCol2": ["Japan",
+        |"China", "India"],"arrayIntCol": [1,2,3,4],"arrayBigIntCol":[70000,600000000,8000],
+        |"arrayRealCol":[1.1,2.2,3.3],"arrayDoubleCol":[1.1,2.2,4.45,3.3],
+        |"arrayBooleanCol": [true, true, true]} """.stripMargin
+    val json3: String =
+      """ {"stringCol": "Rio","intCol": 16,"doubleCol": 12.5,"realCol": 14.7,
+        |"boolCol": true, "arrayStringCol1": ["Street1", "Street2","Street3"],
+        |"arrayStringCol2": ["China", "Brazil", "Paris", "France"],"arrayIntCol": [1,2,3,4,5],
+        |"arrayBigIntCol":[70000,600000000,8000,9111111111],"arrayRealCol":[1.1,2.2,3.3,4.45],
+        |"arrayDoubleCol":[1.1,2.2,4.45,5.5,3.3], "arrayBooleanCol": [true, false, true]} """
+        .stripMargin
+    val json4: String =
+      """ {"stringCol": "bob","intCol": 14,"doubleCol": 10.5,"realCol": 12.7,
+        |"boolCol": true, "arrayStringCol1":["Street1"],"arrayStringCol2": ["India", "Egypt"],
+        |"arrayIntCol": [1,2,3],"arrayBigIntCol":[70000,600000000],"arrayRealCol":[1.1,2.2],
+        |"arrayDoubleCol":[1.1,2.2,3.3], "arrayBooleanCol": [true, false, true]} """.stripMargin
+    val json5: String =
+      """ {"stringCol": "Alex","intCol": 15,"doubleCol": 11.5,"realCol": 13.7,
+        |"boolCol": true, "arrayStringCol1": ["Street1", "Street2"],"arrayStringCol2": ["Japan",
+        |"China", "India"],"arrayIntCol": [1,2,3,4],"arrayBigIntCol":[70000,600000000,8000],
+        |"arrayRealCol":[1.1,2.2,3.3],"arrayDoubleCol":[4,1,21.222,15.231],
+        |"arrayBooleanCol": [false, false, false]} """.stripMargin
+
+
+    val mySchema =
+      """ {
+        |      "name": "address",
+        |      "type": "record",
+        |      "fields": [
+        |      {
+        |      "name": "stringCol",
+        |      "type": "string"
+        |      },
+        |      {
+        |      "name": "intCol",
+        |      "type": "int"
+        |      },
+        |      {
+        |      "name": "doubleCol",
+        |      "type": "double"
+        |      },
+        |      {
+        |      "name": "realCol",
+        |      "type": "float"
+        |      },
+        |      {
+        |      "name": "boolCol",
+        |      "type": "boolean"
+        |      },
+        |      {
+        |      "name": "arrayStringCol1",
+        |      "type": {
+        |      "type": "array",
+        |      "items": {
+        |      "name": "street",
+        |      "type": "string"
+        |      }
+        |      }
+        |      },
+        |      {
+        |      "name": "arrayStringCol2",
+        |      "type": {
+        |      "type": "array",
+        |      "items": {
+        |      "name": "street",
+        |      "type": "string"
+        |      }
+        |      }
+        |      },
+        |      {
+        |      "name": "arrayIntCol",
+        |      "type": {
+        |      "type": "array",
+        |      "items": {
+        |      "name": "street",
+        |      "type": "int"
+        |      }
+        |      }
+        |      },
+        |      {
+        |      "name": "arrayBigIntCol",
+        |      "type": {
+        |      "type": "array",
+        |      "items": {
+        |      "name": "street",
+        |      "type": "long"
+        |      }
+        |      }
+        |      },
+        |      {
+        |      "name": "arrayRealCol",
+        |      "type": {
+        |      "type": "array",
+        |      "items": {
+        |      "name": "street",
+        |      "type": "float"
+        |      }
+        |      }
+        |      },
+        |      {
+        |      "name": "arrayDoubleCol",
+        |      "type": {
+        |      "type": "array",
+        |      "items": {
+        |      "name": "street",
+        |      "type": "double"
+        |      }
+        |      }
+        |      },
+        |      {
+        |      "name": "arrayBooleanCol",
+        |      "type": {
+        |      "type": "array",
+        |      "items": {
+        |      "name": "street",
+        |      "type": "boolean"
+        |      }
+        |      }
+        |      }
+        |      ]
+        |  }
+                   """.stripMargin
+
+    val nn = new avro.Schema.Parser().parse(mySchema)
+    val record1 = testUtil.jsonToAvro(json1, mySchema)
+    val record2 = testUtil.jsonToAvro(json2, mySchema)
+    val record3 = testUtil.jsonToAvro(json3, mySchema)
+    val record4 = testUtil.jsonToAvro(json4, mySchema)
+    val record5 = testUtil.jsonToAvro(json5, mySchema)
+    var writerPath = new File(this.getClass.getResource("/").getPath
+                              + "../../target/store/sdk_output/files")
+      .getCanonicalPath
+    //getCanonicalPath gives path with \, but the code expects /.
+    writerPath = writerPath.replace("\\", "/")
+    try {
+      val writer = CarbonWriter.builder
+        .outputPath(writerPath)
+        .enableLocalDictionary(false)
+        .uniqueIdentifier(System.currentTimeMillis())
+        .withAvroInput(nn)
+        .writtenBy("GenerateFiles")
+        .build()
+      writer.write(record1)
+      writer.write(record2)
+      writer.write(record3)
+      writer.write(record4)
+      writer.write(record5)
+      writer.close()
+    } catch {
+      case e: Exception =>
+        e.printStackTrace()
+        Assert.fail(e.getMessage)
+    }
+  }
+
+  def twoLevelArrayFile() = {
+    val json1 =
+      """   {
+        |         "arrayArrayInt": [[1,2,3], [4,5]],
+        |         "arrayArrayBigInt":[[90000,600000000],[8000],[911111111]],
+        |         "arrayArrayReal":[[1.111,2.2], [9.139,2.98]],
+        |         "arrayArrayDouble":[[1.111,2.2], [9.139,2.98989898]],
+        |         "arrayArrayString":[["Japan", "China"], ["India"]],
+        |         "arrayArrayBoolean":[[false, false], [false]]
+        |        }   """.stripMargin
+    val json2 =
+      """   {
+        |         "arrayArrayInt": [[1,2,3], [0,5], [1,2,3,4,5], [4,5]],
+        |         "arrayArrayBigInt":[[40000, 600000000, 8000],[9111111111]],
+        |         "arrayArrayReal":[[1.111, 2.2], [9.139, 2.98], [9.99]],
+        |         "arrayArrayDouble":[[1.111, 2.2],[9.139777, 2.98],[9.99888]],
+        |         "arrayArrayString":[["China", "Brazil"], ["Paris", "France"]],
+        |         "arrayArrayBoolean":[[false], [true, false]]
+        |        }   """.stripMargin
+    val json3 =
+      """   {
+        |         "arrayArrayInt": [[1], [0], [3], [4,5]],
+        |         "arrayArrayBigInt":[[5000],[600000000],[8000,9111111111],[20000],[600000000,
+        |         8000,9111111111]],
+        |         "arrayArrayReal":[[9.198]],
+        |         "arrayArrayDouble":[[0.1987979]],
+        |         "arrayArrayString":[["Japan", "China", "India"]],
+        |         "arrayArrayBoolean":[[false, true, false]]
+        |        }   """.stripMargin
+    val json4 =
+      """   {
+        |         "arrayArrayInt": [[0,9,0,1,3,2,3,4,7]],
+        |         "arrayArrayBigInt":[[5000, 600087000, 8000, 9111111111, 20000, 600000000, 8000,
+        |          977777]],
+        |         "arrayArrayReal":[[1.111, 2.2], [9.139, 2.98, 4.67], [2.91, 2.2], [9.139, 2.98]],
+        |         "arrayArrayDouble":[[1.111, 2.0, 4.67, 2.91, 2.2, 9.139, 2.98]],
+        |         "arrayArrayString":[["Japan"], ["China"], ["India"]],
+        |         "arrayArrayBoolean":[[false], [true], [false]]
+        |        }   """.stripMargin
+
+    val mySchema =
+      """ {
+        | "name": "address",
+        | "type": "record",
+        | "fields": [
+        |  {
+        | "name": "arrayArrayInt",
+        | "type": {
+        | "type": "array",
+        | "items": {
+        | "name": "FloorNum",
+        | "type": "array",
+        | "items": {
+        | "name": "EachdoorNums",
+        | "type": "int"
+        | }
+        | }
+        | }
+        | },
+        |  {
+        | "name": "arrayArrayBigInt",
+        | "type": {
+        | "type": "array",
+        | "items": {
+        | "name": "FloorNum",
+        | "type": "array",
+        | "items": {
+        | "name": "EachdoorNums",
+        | "type": "long"
+        | }
+        | }
+        | }
+        | },
+        |  {
+        | "name": "arrayArrayReal",
+        | "type": {
+        | "type": "array",
+        | "items": {
+        | "name": "FloorNum",
+        | "type": "array",
+        | "items": {
+        | "name": "EachdoorNums",
+        | "type": "float"
+        | }
+        | }
+        | }
+        | },
+        |  {
+        | "name": "arrayArrayDouble",
+        | "type": {
+        | "type": "array",
+        | "items": {
+        | "name": "FloorNum",
+        | "type": "array",
+        | "items": {
+        | "name": "EachdoorNums",
+        | "type": "double"
+        | }
+        | }
+        | }
+        | },
+        |  {
+        | "name": "arrayArrayString",
+        | "type": {
+        | "type": "array",
+        | "items": {
+        | "name": "FloorNum",
+        | "type": "array",
+        | "items": {
+        | "name": "EachdoorNums",
+        | "type": "string"
+        | }
+        | }
+        | }
+        | },
+        |  {
+        | "name": "arrayArrayBoolean",
+        | "type": {
+        | "type": "array",
+        | "items": {
+        | "name": "FloorNum",
+        | "type": "array",
+        | "items": {
+        | "name": "EachdoorNums",
+        | "type": "boolean"
+        | }
+        | }
+        | }
+        | }
+        | ]
+        |} """.stripMargin
+
+    val nn = new avro.Schema.Parser().parse(mySchema)
+    val record1 = testUtil.jsonToAvro(json1, mySchema)
+    val record2 = testUtil.jsonToAvro(json2, mySchema)
+    val record3 = testUtil.jsonToAvro(json3, mySchema)
+    val record4 = testUtil.jsonToAvro(json4, mySchema)
+    var writerPath = new File(this.getClass.getResource("/").getPath
+                              + "../../target/store/sdk_output/files2")
+      .getCanonicalPath
+    //getCanonicalPath gives path with \, but the code expects /.
+    writerPath = writerPath.replace("\\", "/")
+    try {
+      val writer = CarbonWriter.builder
+        .outputPath(writerPath)
+        .enableLocalDictionary(false)
+        .uniqueIdentifier(System.currentTimeMillis())
+        .withAvroInput(nn)
+        .writtenBy("GenerateFiles")
+        .build()
+      writer.write(record1)
+      writer.write(record2)
+      writer.write(record3)
+      writer.write(record4)
+      writer.close()
+    } catch {
+      case e: Exception =>
+        e.printStackTrace()
+        Assert.fail(e.getMessage)
+    }
+  }
+
+  def threeLevelArrayFile() = {
+    val json1 =
+      """ {
+        | "array3_Int": [[[1,2,3], [4,5]], [[6,7,8], [9]], [[1,2], [4,5]]],
+        | "array3_BigInt":[[[90000,600000000],[8000]],[[911111111]]],
+        | "array3_Real":[[[1.111,2.2], [9.139,2.98]]],
+        | "array3_Double":[[[1.111,2.2]], [[9.139,2.98989898]]],
+        | "array3_String":[[["Japan", "China"], ["Brazil", "Paris"]], [["India"]]],
+        | "array3_Boolean":[[[false, false], [false]], [[true]]]
+        | } """.stripMargin
+    val json2 =
+      """ {
+        | "array3_Int": [[[1,2,3], [0,5], [1,2,3,4,5], [4,5]]],
+        | "array3_BigInt":[[[40000,600000000,8000],[9111111111]]],
+        | "array3_Real":[[[1.111,2.2], [9.139,2.98]], [[9.99]]],
+        | "array3_Double":[[[1.111,2.2], [9.139777,2.98]], [[9.99888]]],
+        | "array3_String":[[["China", "Brazil"], ["Paris", "France"]]],
+        | "array3_Boolean":[[[false], [true, false]]]
+        | } """.stripMargin
+    val json3 =
+      """ {
+        | "array3_Int": [[[1],[0],[3]],[[4,5]]],
+        | "array3_BigInt":[[[5000],[600000000],[8000,9111111111],[20000],[600000000,8000,
+        | 9111111111]]],
+        | "array3_Real":[[[9.198]]],
+        | "array3_Double":[[[0.1987979]]],
+        | "array3_String":[[["Japan", "China", "India"]]],
+        | "array3_Boolean":[[[false, true, false]]]
+        | } """.stripMargin
+    val json4 =
+      """ {
+        | "array3_Int": [[[0,9,0,1,3,2,3,4,7]]],
+        | "array3_BigInt":[[[5000,600087000,8000,9111111111,20000,600000000,8000,977777]]],
+        | "array3_Real":[[[1.111,2.2], [9.139,2.98,4.67]], [[2.91,2.2], [9.139,2.98]]],
+        | "array3_Double":[[[1.111,2,4.67, 2.91,2.2, 9.139,2.98]]],
+        | "array3_String":[[["Japan"], ["China"], ["India"]]],
+        | "array3_Boolean":[[[false], [true], [false]]]
+        | } """.stripMargin
+
+    val mySchema =
+      """ {
+        | "name": "address",
+        | "type": "record",
+        | "fields": [
+        |  {
+        | "name": "array3_Int",
+        | "type": {
+        | "type": "array",
+        | "items": {
+        | "type": "array",
+        | "items": {
+        |     "type": "array",
+        |           "items": {
+        | "type": "int"
+        |              }
+        | }
+        | }
+        | }
+        | },
+        |  {
+        | "name": "array3_BigInt",
+        | "type": {
+        | "type": "array",
+        | "items": {
+        | "type": "array",
+        | "items": {
+        |     "type": "array",
+        |           "items": {
+        | "type": "long"
+        |              }
+        | }
+        | }
+        | }
+        | },
+        |  {
+        | "name": "array3_Real",
+        | "type": {
+        | "type": "array",
+        | "items": {
+        | "type": "array",
+        | "items": {
+        |     "type": "array",
+        |           "items": {
+        | "type": "float"
+        |              }
+        | }
+        | }
+        | }
+        | },
+        |  {
+        | "name": "array3_Double",
+        | "type": {
+        | "type": "array",
+        | "items": {
+        | "type": "array",
+        | "items": {
+        |     "type": "array",
+        |           "items": {
+        | "type": "double"
+        |              }
+        | }
+        | }
+        | }
+        | },
+        |  {
+        | "name": "array3_String",
+        | "type": {
+        | "type": "array",
+        | "items": {
+        | "type": "array",
+        | "items": {
+        |     "type": "array",
+        |           "items": {
+        | "type": "string"
+        |              }
+        | }
+        | }
+        | }
+        | },
+        |  {
+        | "name": "array3_Boolean",
+        | "type": {
+        | "type": "array",
+        | "items": {
+        | "type": "array",
+        | "items": {
+        |     "type": "array",
+        |           "items": {
+        | "type": "boolean"
+        |              }
+        | }
+        | }
+        | }
+        | }
+        | ]
+        |} """.stripMargin
+
+    val nn = new avro.Schema.Parser().parse(mySchema)
+    val record1 = testUtil.jsonToAvro(json1, mySchema)
+    val record2 = testUtil.jsonToAvro(json2, mySchema)
+    val record3 = testUtil.jsonToAvro(json3, mySchema)
+    val record4 = testUtil.jsonToAvro(json4, mySchema)
+    var writerPath = new File(this.getClass.getResource("/").getPath
+                              + "../../target/store/sdk_output/files3")
+      .getCanonicalPath
+    //getCanonicalPath gives path with \, but the code expects /.
+    writerPath = writerPath.replace("\\", "/")
+    try {
+      val writer = CarbonWriter.builder
+        .outputPath(writerPath)
+        .enableLocalDictionary(false)
+        .uniqueIdentifier(System.currentTimeMillis())
+        .withAvroInput(nn)
+        .writtenBy("GenerateFiles")
+        .build()
+      writer.write(record1)
+      writer.write(record2)
+      writer.write(record3)
+      writer.write(record4)
+      writer.close()
+    } catch {
+      case e: Exception =>
+        e.printStackTrace()
+        Assert.fail(e.getMessage)
+    }
+  }
+
+  object testUtil {
+
+    def jsonToAvro(json: String, avroSchema: String): GenericRecord = {
+      var input: InputStream = null
+      var writer: DataFileWriter[GenericRecord] = null
+      var encoder: Encoder = null
+      var output: ByteArrayOutputStream = null
+      try {
+        val schema = new org.apache.avro.Schema.Parser().parse(avroSchema)
+        val reader = new GenericDatumReader[GenericRecord](schema)
+        input = new ByteArrayInputStream(json.getBytes())
+        output = new ByteArrayOutputStream()
+        val din = new DataInputStream(input)
+        writer = new DataFileWriter[GenericRecord](new GenericDatumWriter[GenericRecord]())
+        writer.create(schema, output)
+        val decoder = DecoderFactory.get().jsonDecoder(schema, din)
+        var datum: GenericRecord = reader.read(null, decoder)
+        return datum
+      } finally {
+        input.close()
+        writer.close()
+      }
+    }
+
+    /**
+     * this method returns true if local dictionary is created for all the blocklets or not
+     *
+     * @return
+     */
+    def getDimRawChunk(blockindex: Int,

Review comment:
       remvoved




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akkio-97 commented on a change in pull request #3773: [CARBONDATA-3830]Presto array columns read support

GitBox
In reply to this post by GitBox

akkio-97 commented on a change in pull request #3773:
URL: https://github.com/apache/carbondata/pull/3773#discussion_r466857970



##########
File path: integration/presto/src/test/scala/org/apache/carbondata/presto/integrationtest/GenerateFiles.scala
##########
@@ -0,0 +1,667 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.presto.integrationtest
+
+import java.io.{ByteArrayInputStream, ByteArrayOutputStream, DataInputStream, File, InputStream}
+import java.util
+
+import scala.collection.JavaConverters._
+
+import org.apache.avro
+import org.apache.avro.file.DataFileWriter
+import org.apache.avro.generic.{GenericDatumReader, GenericDatumWriter, GenericRecord}
+import org.apache.avro.io.{DecoderFactory, Encoder}
+import org.junit.Assert
+
+import org.apache.carbondata.core.cache.dictionary.DictionaryByteArrayWrapper
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.datastore.block.TableBlockInfo
+import org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk
+import org.apache.carbondata.core.datastore.chunk.reader.CarbonDataReaderFactory
+import org.apache.carbondata.core.datastore.chunk.reader.dimension.v3.DimensionChunkReaderV3
+import org.apache.carbondata.core.datastore.compression.CompressorFactory
+import org.apache.carbondata.core.datastore.filesystem.{CarbonFile, CarbonFileFilter}
+import org.apache.carbondata.core.datastore.impl.FileFactory
+import org.apache.carbondata.core.datastore.page.encoding.DefaultEncodingFactory
+import org.apache.carbondata.core.metadata.ColumnarFormatVersion
+import org.apache.carbondata.core.util.{CarbonMetadataUtil, DataFileFooterConverterV3}
+import org.apache.carbondata.sdk.file.CarbonWriter
+
+class GenerateFiles {
+
+  def singleLevelArrayFile() = {
+    val json1: String =
+      """ {"stringCol": "bob","intCol": 14,"doubleCol": 10.5,"realCol": 12.7,
+        |"boolCol": true,"arrayStringCol1":["Street1"],"arrayStringCol2": ["India", "Egypt"],
+        |"arrayIntCol": [1,2,3],"arrayBigIntCol":[70000,600000000],"arrayRealCol":[1.111,2.2],
+        |"arrayDoubleCol":[1.1,2.2,3.3], "arrayBooleanCol": [true, false, true]} """.stripMargin
+    val json2: String =
+      """ {"stringCol": "Alex","intCol": 15,"doubleCol": 11.5,"realCol": 13.7,
+        |"boolCol": true, "arrayStringCol1": ["Street1", "Street2"],"arrayStringCol2": ["Japan",
+        |"China", "India"],"arrayIntCol": [1,2,3,4],"arrayBigIntCol":[70000,600000000,8000],
+        |"arrayRealCol":[1.1,2.2,3.3],"arrayDoubleCol":[1.1,2.2,4.45,3.3],
+        |"arrayBooleanCol": [true, true, true]} """.stripMargin
+    val json3: String =
+      """ {"stringCol": "Rio","intCol": 16,"doubleCol": 12.5,"realCol": 14.7,
+        |"boolCol": true, "arrayStringCol1": ["Street1", "Street2","Street3"],
+        |"arrayStringCol2": ["China", "Brazil", "Paris", "France"],"arrayIntCol": [1,2,3,4,5],
+        |"arrayBigIntCol":[70000,600000000,8000,9111111111],"arrayRealCol":[1.1,2.2,3.3,4.45],
+        |"arrayDoubleCol":[1.1,2.2,4.45,5.5,3.3], "arrayBooleanCol": [true, false, true]} """
+        .stripMargin
+    val json4: String =
+      """ {"stringCol": "bob","intCol": 14,"doubleCol": 10.5,"realCol": 12.7,
+        |"boolCol": true, "arrayStringCol1":["Street1"],"arrayStringCol2": ["India", "Egypt"],
+        |"arrayIntCol": [1,2,3],"arrayBigIntCol":[70000,600000000],"arrayRealCol":[1.1,2.2],
+        |"arrayDoubleCol":[1.1,2.2,3.3], "arrayBooleanCol": [true, false, true]} """.stripMargin
+    val json5: String =
+      """ {"stringCol": "Alex","intCol": 15,"doubleCol": 11.5,"realCol": 13.7,
+        |"boolCol": true, "arrayStringCol1": ["Street1", "Street2"],"arrayStringCol2": ["Japan",
+        |"China", "India"],"arrayIntCol": [1,2,3,4],"arrayBigIntCol":[70000,600000000,8000],
+        |"arrayRealCol":[1.1,2.2,3.3],"arrayDoubleCol":[4,1,21.222,15.231],
+        |"arrayBooleanCol": [false, false, false]} """.stripMargin
+
+
+    val mySchema =
+      """ {
+        |      "name": "address",
+        |      "type": "record",
+        |      "fields": [
+        |      {
+        |      "name": "stringCol",
+        |      "type": "string"
+        |      },
+        |      {
+        |      "name": "intCol",
+        |      "type": "int"
+        |      },
+        |      {
+        |      "name": "doubleCol",
+        |      "type": "double"
+        |      },
+        |      {
+        |      "name": "realCol",
+        |      "type": "float"
+        |      },
+        |      {
+        |      "name": "boolCol",
+        |      "type": "boolean"
+        |      },
+        |      {
+        |      "name": "arrayStringCol1",
+        |      "type": {
+        |      "type": "array",
+        |      "items": {
+        |      "name": "street",
+        |      "type": "string"
+        |      }
+        |      }
+        |      },
+        |      {
+        |      "name": "arrayStringCol2",
+        |      "type": {
+        |      "type": "array",
+        |      "items": {
+        |      "name": "street",
+        |      "type": "string"
+        |      }
+        |      }
+        |      },
+        |      {
+        |      "name": "arrayIntCol",
+        |      "type": {
+        |      "type": "array",
+        |      "items": {
+        |      "name": "street",
+        |      "type": "int"
+        |      }
+        |      }
+        |      },
+        |      {
+        |      "name": "arrayBigIntCol",
+        |      "type": {
+        |      "type": "array",
+        |      "items": {
+        |      "name": "street",
+        |      "type": "long"
+        |      }
+        |      }
+        |      },
+        |      {
+        |      "name": "arrayRealCol",
+        |      "type": {
+        |      "type": "array",
+        |      "items": {
+        |      "name": "street",
+        |      "type": "float"
+        |      }
+        |      }
+        |      },
+        |      {
+        |      "name": "arrayDoubleCol",
+        |      "type": {
+        |      "type": "array",
+        |      "items": {
+        |      "name": "street",
+        |      "type": "double"
+        |      }
+        |      }
+        |      },
+        |      {
+        |      "name": "arrayBooleanCol",
+        |      "type": {
+        |      "type": "array",
+        |      "items": {
+        |      "name": "street",
+        |      "type": "boolean"
+        |      }
+        |      }
+        |      }
+        |      ]
+        |  }
+                   """.stripMargin
+
+    val nn = new avro.Schema.Parser().parse(mySchema)
+    val record1 = testUtil.jsonToAvro(json1, mySchema)
+    val record2 = testUtil.jsonToAvro(json2, mySchema)
+    val record3 = testUtil.jsonToAvro(json3, mySchema)
+    val record4 = testUtil.jsonToAvro(json4, mySchema)
+    val record5 = testUtil.jsonToAvro(json5, mySchema)
+    var writerPath = new File(this.getClass.getResource("/").getPath
+                              + "../../target/store/sdk_output/files")
+      .getCanonicalPath
+    //getCanonicalPath gives path with \, but the code expects /.
+    writerPath = writerPath.replace("\\", "/")
+    try {
+      val writer = CarbonWriter.builder
+        .outputPath(writerPath)
+        .enableLocalDictionary(false)
+        .uniqueIdentifier(System.currentTimeMillis())
+        .withAvroInput(nn)
+        .writtenBy("GenerateFiles")
+        .build()
+      writer.write(record1)
+      writer.write(record2)
+      writer.write(record3)
+      writer.write(record4)
+      writer.write(record5)
+      writer.close()
+    } catch {
+      case e: Exception =>
+        e.printStackTrace()

Review comment:
       done

##########
File path: integration/presto/src/main/java/org/apache/carbondata/presto/CarbonVectorBatch.java
##########
@@ -102,6 +89,12 @@ public static CarbonColumnVectorImpl createDirectStreamReader(int batchSize, Dat
       } else {
         return null;
       }
+    } else if (DataTypes.isArrayType(field.getDataType())) {
+      if (field.getChildren().size() > 1) {

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akkio-97 commented on a change in pull request #3773: [CARBONDATA-3830]Presto array columns read support

GitBox
In reply to this post by GitBox

akkio-97 commented on a change in pull request #3773:
URL: https://github.com/apache/carbondata/pull/3773#discussion_r466858210



##########
File path: core/src/main/java/org/apache/carbondata/core/scan/result/vector/impl/CarbonColumnVectorImpl.java
##########
@@ -102,6 +126,58 @@ public CarbonColumnVectorImpl(int batchSize, DataType dataType) {
 
   }
 
+  @Override
+  public CarbonColumnVector getColumnVector() {
+    return null;
+  }
+
+  @Override
+  public List<CarbonColumnVectorImpl> getChildrenVector() {
+    return childrenVector;
+  }
+
+  @Override
+  public void putArrayObject() {
+    return;
+  }
+
+  public void setChildrenVector(ArrayList<CarbonColumnVectorImpl> childrenVector) {
+    this.childrenVector = childrenVector;
+  }
+
+  public ArrayList<Integer> getChildrenElements() {
+    return childrenElements;
+  }
+
+  public void setChildrenElements(ArrayList<Integer> childrenElements) {
+    this.childrenElements = childrenElements;
+  }
+
+  public ArrayList<Integer> getChildrenOffset() {
+    return childrenOffset;
+  }
+
+  public void setChildrenOffset(ArrayList<Integer> childrenOffset) {
+    this.childrenOffset = childrenOffset;
+  }
+
+  public void setChildrenElementsAndOffset(byte[] childPageData) {
+    ByteBuffer childInfoBuffer = ByteBuffer.wrap(childPageData);
+    ArrayList<Integer> childElements = new ArrayList<>();
+    ArrayList<Integer> childOffset = new ArrayList<>();

Review comment:
       okay, removed




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akkio-97 commented on a change in pull request #3773: [CARBONDATA-3830]Presto array columns read support

GitBox
In reply to this post by GitBox

akkio-97 commented on a change in pull request #3773:
URL: https://github.com/apache/carbondata/pull/3773#discussion_r466927654



##########
File path: integration/presto/src/main/prestosql/org/apache/carbondata/presto/readers/ArrayStreamReader.java
##########
@@ -0,0 +1,163 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.presto.readers;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import io.prestosql.spi.type.*;
+
+import org.apache.carbondata.core.metadata.datatype.DataType;
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
+import org.apache.carbondata.core.metadata.datatype.StructField;
+import org.apache.carbondata.core.scan.result.vector.impl.CarbonColumnVectorImpl;
+
+import io.prestosql.spi.block.Block;
+import io.prestosql.spi.block.BlockBuilder;
+
+import org.apache.carbondata.presto.CarbonVectorBatch;
+
+/**
+ * Class to read the Array Stream
+ */
+
+public class ArrayStreamReader extends CarbonColumnVectorImpl implements PrestoVectorBlockBuilder {
+
+  protected int batchSize;
+
+  protected Type type;
+  protected BlockBuilder builder;
+  Block childBlock = null;
+  private int index = 0;
+
+  public ArrayStreamReader(int batchSize, DataType dataType, StructField field) {
+    super(batchSize, dataType);
+    this.batchSize = batchSize;
+    this.type = getArrayOfType(field, dataType);
+    ArrayList<CarbonColumnVectorImpl> childrenList= new ArrayList<>();
+    childrenList.add(CarbonVectorBatch.createDirectStreamReader(this.batchSize, field.getDataType(), field));
+    setChildrenVector(childrenList);
+    this.builder = type.createBlockBuilder(null, batchSize);
+  }
+
+  public int getIndex() {
+    return index;
+  }
+
+  public void setIndex(int index) {
+    this.index = index;
+  }
+
+  public String getDataTypeName() {
+    return "ARRAY";
+  }
+
+  Type getArrayOfType(StructField field, DataType dataType) {
+    if (dataType == DataTypes.STRING) {
+      return new ArrayType(VarcharType.VARCHAR);
+    } else if (dataType == DataTypes.BYTE) {
+      return new ArrayType(TinyintType.TINYINT);
+    } else if (dataType == DataTypes.SHORT) {
+      return new ArrayType(SmallintType.SMALLINT);
+    } else if (dataType == DataTypes.INT) {
+      return new ArrayType(IntegerType.INTEGER);
+    } else if (dataType == DataTypes.LONG) {
+      return new ArrayType(BigintType.BIGINT);
+    } else if (dataType == DataTypes.DOUBLE) {
+      return new ArrayType(DoubleType.DOUBLE);
+    } else if (dataType == DataTypes.FLOAT) {
+      return new ArrayType(RealType.REAL);
+    } else if (dataType == DataTypes.BOOLEAN) {
+      return new ArrayType(BooleanType.BOOLEAN);
+    } else if (dataType == DataTypes.TIMESTAMP) {
+      return new ArrayType(TimestampType.TIMESTAMP);
+    } else if (DataTypes.isArrayType(dataType)) {
+      StructField childField = field.getChildren().get(0);
+      return new ArrayType(getArrayOfType(childField, childField.getDataType()));
+    } else {
+      throw new UnsupportedOperationException("Unsupported type: " + dataType);
+    }
+  }
+
+  @Override
+  public Block buildBlock() {
+    return builder.build();
+  }
+
+  public boolean isComplex() {
+    return true;
+  }
+
+  @Override
+  public void setBatchSize(int batchSize) {
+    this.batchSize = batchSize;
+  }
+
+  @Override
+  public void putObject(int rowId, Object value) {
+    if (value == null) {

Review comment:
       putObject is used only by the primitive type. Once entire row is put, using putArrayObject() to put that into array.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akkio-97 commented on a change in pull request #3773: [CARBONDATA-3830]Presto array columns read support

GitBox
In reply to this post by GitBox

akkio-97 commented on a change in pull request #3773:
URL: https://github.com/apache/carbondata/pull/3773#discussion_r466927976



##########
File path: integration/presto/src/test/scala/org/apache/carbondata/presto/integrationtest/PrestoReadTableFilesTest.scala
##########
@@ -0,0 +1,443 @@
+package org.apache.carbondata.presto.integrationtest
+
+import java.io.File
+import java.sql.{SQLException, Timestamp}
+import java.util
+import java.util.Arrays.asList
+
+import io.prestosql.jdbc.PrestoArray
+import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile
+import org.apache.carbondata.core.datastore.impl.FileFactory
+import org.apache.carbondata.core.metadata.datatype.{DataTypes, Field}
+import org.apache.carbondata.core.util.{CarbonProperties, CarbonUtil}
+import org.apache.carbondata.presto.server.PrestoServer
+import org.apache.carbondata.sdk.file.{CarbonWriter, Schema}
+import org.apache.commons.io.FileUtils
+import org.apache.commons.lang.RandomStringUtils
+import org.apache.spark.sql.Row
+import org.scalatest.{BeforeAndAfterAll, FunSuiteLike, BeforeAndAfterEach}
+
+import scala.collection.mutable
+import scala.collection.JavaConverters._
+class PrestoReadTableFilesTest extends FunSuiteLike with BeforeAndAfterAll with BeforeAndAfterEach{
+  private val logger = LogServiceFactory
+    .getLogService(classOf[PrestoTestNonTransactionalTableFiles].getCanonicalName)
+
+  private val rootPath = new File(this.getClass.getResource("/").getPath
+    + "../../../..").getCanonicalPath
+  private val storePath = s"$rootPath/integration/presto/target/store"
+  private val systemPath = s"$rootPath/integration/presto/target/system"
+  private var writerPath = storePath + "/sdk_output/files"
+  private val prestoServer = new PrestoServer
+  private var varcharString = new String
+
+  override def beforeAll: Unit = {
+    CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_WRITTEN_BY_APPNAME,
+      "Presto")
+    CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_WRITTEN_BY_APPNAME,
+      "Presto")
+    val map = new util.HashMap[String, String]()
+    map.put("hive.metastore", "file")
+    map.put("hive.metastore.catalog.dir", s"file://$storePath")
+
+    prestoServer.startServer("sdk_output", map)
+  }
+
+  override def afterAll(): Unit = {
+    prestoServer.stopServer()
+    CarbonUtil.deleteFoldersAndFiles(FileFactory.getCarbonFile(storePath))
+  }
+
+  private def createComplexTableForSingleLevelArray = {
+    prestoServer.execute("drop table if exists sdk_output.files")
+    prestoServer.execute("drop schema if exists sdk_output")
+    prestoServer.execute("create schema sdk_output")
+    prestoServer
+      .execute(
+        "create table sdk_output.files(stringCol varchar, intCol int, doubleCol double, realCol real, boolCol boolean, arrayStringCol1 array(varchar), arrayStringcol2 array(varchar), arrayIntCol array(int), arrayBigIntCol array(bigint), arrayRealCol array(real), arrayDoubleCol array(double), arrayBooleanCol array(boolean)) with(format='CARBON') ")
+  }
+
+  private def createComplexTableFor2LevelArray = {
+    prestoServer.execute("drop table if exists sdk_output.files2")
+    prestoServer.execute("drop schema if exists sdk_output")
+    prestoServer.execute("create schema sdk_output")
+        prestoServer
+      .execute(
+        "create table sdk_output.files2(arrayArrayInt array(array(int)), arrayArrayBigInt array(array(bigint)), arrayArrayReal array(array(real)), arrayArrayDouble array(array(double)), arrayArrayString array(array(varchar)), arrayArrayBoolean array(array(boolean))) with(format='CARBON') ")
+  }
+
+  private def createComplexTableFor3LevelArray = {
+    prestoServer.execute("drop table if exists sdk_output.files3")
+    prestoServer.execute("drop schema if exists sdk_output")
+    prestoServer.execute("create schema sdk_output")
+    prestoServer
+        .execute(
+          "create table sdk_output.files3(array3_Int array(array(array(int))), array3_BigInt array(array(array(bigint))), array3_Real array(array(array(real))), array3_Double array(array(array(double))), array3_String array(array(array(varchar))), array3_Boolean array(array(array(boolean))) ) with(format='CARBON') ")
+    }
+
+  def buildComplexTestForSingleLevelArray(): Any = {
+    FileUtils.deleteDirectory(new File(writerPath))
+    createComplexTableForSingleLevelArray
+    import java.io.IOException
+    val source = new File(this.getClass.getResource("/").getPath + "../../" + "/temp/table1").getCanonicalPath
+    val srcDir = new File(source)
+    val destination = new File(this.getClass.getResource("/").getPath + "../../" + "/target/store/sdk_output/files/").getCanonicalPath
+    val destDir = new File(destination)
+    try FileUtils.copyDirectory(srcDir, destDir)
+    catch {
+      case e: IOException =>
+        e.printStackTrace()
+    }
+  }
+
+  def buildComplexTestFor2LevelArray(): Any = {
+    writerPath = storePath + "/sdk_output/files2"

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akkio-97 commented on a change in pull request #3773: [CARBONDATA-3830]Presto array columns read support

GitBox
In reply to this post by GitBox

akkio-97 commented on a change in pull request #3773:
URL: https://github.com/apache/carbondata/pull/3773#discussion_r466964195



##########
File path: integration/presto/src/main/prestosql/org/apache/carbondata/presto/readers/ArrayStreamReader.java
##########
@@ -0,0 +1,163 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.presto.readers;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import io.prestosql.spi.type.*;
+
+import org.apache.carbondata.core.metadata.datatype.DataType;
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
+import org.apache.carbondata.core.metadata.datatype.StructField;
+import org.apache.carbondata.core.scan.result.vector.impl.CarbonColumnVectorImpl;
+
+import io.prestosql.spi.block.Block;
+import io.prestosql.spi.block.BlockBuilder;
+
+import org.apache.carbondata.presto.CarbonVectorBatch;
+
+/**
+ * Class to read the Array Stream
+ */
+
+public class ArrayStreamReader extends CarbonColumnVectorImpl implements PrestoVectorBlockBuilder {
+
+  protected int batchSize;
+
+  protected Type type;
+  protected BlockBuilder builder;
+  Block childBlock = null;
+  private int index = 0;
+
+  public ArrayStreamReader(int batchSize, DataType dataType, StructField field) {
+    super(batchSize, dataType);
+    this.batchSize = batchSize;
+    this.type = getArrayOfType(field, dataType);
+    ArrayList<CarbonColumnVectorImpl> childrenList= new ArrayList<>();
+    childrenList.add(CarbonVectorBatch.createDirectStreamReader(this.batchSize, field.getDataType(), field));
+    setChildrenVector(childrenList);
+    this.builder = type.createBlockBuilder(null, batchSize);
+  }
+
+  public int getIndex() {
+    return index;
+  }
+
+  public void setIndex(int index) {
+    this.index = index;
+  }
+
+  public String getDataTypeName() {
+    return "ARRAY";
+  }
+
+  Type getArrayOfType(StructField field, DataType dataType) {
+    if (dataType == DataTypes.STRING) {
+      return new ArrayType(VarcharType.VARCHAR);
+    } else if (dataType == DataTypes.BYTE) {
+      return new ArrayType(TinyintType.TINYINT);
+    } else if (dataType == DataTypes.SHORT) {
+      return new ArrayType(SmallintType.SMALLINT);
+    } else if (dataType == DataTypes.INT) {
+      return new ArrayType(IntegerType.INTEGER);
+    } else if (dataType == DataTypes.LONG) {
+      return new ArrayType(BigintType.BIGINT);
+    } else if (dataType == DataTypes.DOUBLE) {
+      return new ArrayType(DoubleType.DOUBLE);
+    } else if (dataType == DataTypes.FLOAT) {
+      return new ArrayType(RealType.REAL);
+    } else if (dataType == DataTypes.BOOLEAN) {
+      return new ArrayType(BooleanType.BOOLEAN);
+    } else if (dataType == DataTypes.TIMESTAMP) {
+      return new ArrayType(TimestampType.TIMESTAMP);
+    } else if (DataTypes.isArrayType(dataType)) {
+      StructField childField = field.getChildren().get(0);
+      return new ArrayType(getArrayOfType(childField, childField.getDataType()));
+    } else {
+      throw new UnsupportedOperationException("Unsupported type: " + dataType);
+    }
+  }
+
+  @Override
+  public Block buildBlock() {
+    return builder.build();
+  }
+
+  public boolean isComplex() {
+    return true;
+  }
+
+  @Override
+  public void setBatchSize(int batchSize) {
+    this.batchSize = batchSize;
+  }
+
+  @Override
+  public void putObject(int rowId, Object value) {
+    if (value == null) {
+      putNull(rowId);
+    } else {
+      getChildrenVector().get(0).putObject(rowId, value);
+    }
+  }
+
+  public void putArrayObject() {
+    if (DataTypes.isArrayType(this.getType())) {
+      childBlock = ((ArrayStreamReader) getChildrenVector().get(0)).buildBlock();
+    } else if (this.getType() == DataTypes.STRING) {
+      childBlock = ((SliceStreamReader) getChildrenVector().get(0)).buildBlock();
+    } else if (this.getType() == DataTypes.INT) {
+      childBlock = ((IntegerStreamReader) getChildrenVector().get(0)).buildBlock();
+    } else if (this.getType() == DataTypes.LONG) {
+      childBlock = ((LongStreamReader) getChildrenVector().get(0)).buildBlock();
+    } else if (this.getType() == DataTypes.DOUBLE) {
+      childBlock = ((DoubleStreamReader) getChildrenVector().get(0)).buildBlock();
+    } else if (this.getType() == DataTypes.FLOAT) {
+      childBlock = ((FloatStreamReader) getChildrenVector().get(0)).buildBlock();
+    } else if (this.getType() == DataTypes.BOOLEAN) {
+      childBlock = ((BooleanStreamReader) getChildrenVector().get(0)).buildBlock();
+    } else if (this.getType() == DataTypes.BYTE) {
+      childBlock = ((ByteStreamReader) getChildrenVector().get(0)).buildBlock();
+    } else if (this.getType() == DataTypes.TIMESTAMP) {

Review comment:
       date has no specific stream reader




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3773: [CARBONDATA-3830]Presto array columns read support

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3773:
URL: https://github.com/apache/carbondata/pull/3773#issuecomment-670494726






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3773: [CARBONDATA-3830]Presto array columns read support

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #3773:
URL: https://github.com/apache/carbondata/pull/3773#discussion_r467090464



##########
File path: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/FillVector.java
##########
@@ -0,0 +1,346 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.datastore.page.encoding;
+
+import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.BitSet;
+
+import org.apache.carbondata.core.metadata.datatype.DataType;
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
+import org.apache.carbondata.core.metadata.datatype.DecimalConverterFactory;
+import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector;
+import org.apache.carbondata.core.scan.result.vector.ColumnVectorInfo;
+import org.apache.carbondata.core.scan.result.vector.impl.CarbonColumnVectorImpl;
+import org.apache.carbondata.core.util.ByteUtil;
+
+public class FillVector {
+  private byte[] pageData;
+  private float floatFactor = 0;
+  private double factor = 0;
+  private ColumnVectorInfo vectorInfo;
+  private BitSet nullBits;
+
+  public FillVector(byte[] pageData, ColumnVectorInfo vectorInfo, BitSet nullBits) {
+    this.pageData = pageData;
+    this.vectorInfo = vectorInfo;
+    this.nullBits = nullBits;
+  }
+
+  public void setFactor(double factor) {
+    this.factor = factor;
+  }
+
+  public void setFloatFactor(float floatFactor) {
+    this.floatFactor = floatFactor;
+  }
+
+  public void basedOnType(CarbonColumnVector vector, DataType vectorDataType, int pageSize,
+      DataType pageDataType) {
+    if (vectorInfo.vector.getColumnVector() != null && ((CarbonColumnVectorImpl) vectorInfo.vector
+        .getColumnVector()).isComplex()) {
+      fillComplexType(vector.getColumnVector(), pageDataType);
+    } else {
+      fillPrimitiveType(vector, vectorDataType, pageSize, pageDataType);
+      vector.setIndex(0);
+    }
+  }
+
+  private void fillComplexType(CarbonColumnVector vector, DataType pageDataType) {
+    CarbonColumnVectorImpl vectorImpl = (CarbonColumnVectorImpl) vector;
+    if (vector != null && vector.getChildrenVector() != null) {
+      ArrayList<Integer> childElements = ((CarbonColumnVectorImpl) vector).getChildrenElements();
+      for (int i = 0; i < childElements.size(); i++) {
+        int count = childElements.get(i);
+        typeComplexObject(vectorImpl.getChildrenVector().get(0), count, pageDataType);
+        vector.putArrayObject();
+      }
+    }

Review comment:
       reset the index of child vector as this page is processed here




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3773: [CARBONDATA-3830]Presto array columns read support

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #3773:
URL: https://github.com/apache/carbondata/pull/3773#discussion_r467095038



##########
File path: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/compress/DirectCompressCodec.java
##########
@@ -246,7 +239,29 @@ public void decodeAndFillVector(byte[] pageData, ColumnVectorInfo vectorInfo, Bi
       vector = ColumnarVectorWrapperDirectFactory
           .getDirectVectorWrapperFactory(vector, vectorInfo.invertedIndex, nullBits, deletedRows,
               true, false);
-      fillVector(pageData, vector, vectorDataType, pageDataType, pageSize, vectorInfo, nullBits);
+      Deque<CarbonColumnVectorImpl> vectorStack = vectorInfo.getVectorStack();
+      // Only if vectorStack is null, it is initialized with the parent vector
+      if (vectorStack == null && vectorInfo.vector.getColumnVector() != null) {
+        vectorStack = new ArrayDeque<>();
+        // pushing the parent vector
+        vectorStack.push((CarbonColumnVectorImpl) vectorInfo.vector.getColumnVector());
+        vectorInfo.setVectorStack(vectorStack);
+      }
+      /*
+       * if top of vector stack is a complex vector then
+       * add their children into the stack and load them too.
+       * TODO: If there are multiple children push them into stack and load them iteratively
+       */
+      if (vectorStack != null && vectorStack.peek().isComplex()) {
+        vectorStack.peek().setChildrenElements(pageData);
+        vectorStack.push(vectorStack.peek().getChildrenVector().get(0));
+        vectorStack.peek().loadPage();
+        return;
+      }
+
+      FillVector fill = new FillVector(pageData, vectorInfo, nullBits);
+      fill.basedOnType(vector, vectorDataType, pageSize, pageDataType);
+

Review comment:
       pop from the stack as child is processed




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3773: [CARBONDATA-3830]Presto array columns read support

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #3773:
URL: https://github.com/apache/carbondata/pull/3773#discussion_r467095602



##########
File path: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/compress/DirectCompressCodec.java
##########
@@ -246,7 +239,29 @@ public void decodeAndFillVector(byte[] pageData, ColumnVectorInfo vectorInfo, Bi
       vector = ColumnarVectorWrapperDirectFactory
           .getDirectVectorWrapperFactory(vector, vectorInfo.invertedIndex, nullBits, deletedRows,
               true, false);
-      fillVector(pageData, vector, vectorDataType, pageDataType, pageSize, vectorInfo, nullBits);
+      Deque<CarbonColumnVectorImpl> vectorStack = vectorInfo.getVectorStack();
+      // Only if vectorStack is null, it is initialized with the parent vector
+      if (vectorStack == null && vectorInfo.vector.getColumnVector() != null) {
+        vectorStack = new ArrayDeque<>();
+        // pushing the parent vector
+        vectorStack.push((CarbonColumnVectorImpl) vectorInfo.vector.getColumnVector());
+        vectorInfo.setVectorStack(vectorStack);
+      }
+      /*
+       * if top of vector stack is a complex vector then
+       * add their children into the stack and load them too.
+       * TODO: If there are multiple children push them into stack and load them iteratively
+       */
+      if (vectorStack != null && vectorStack.peek().isComplex()) {
+        vectorStack.peek().setChildrenElements(pageData);

Review comment:
       here, please consider pagesize as argument and break once elements size equals pagesize inside as this buffer is reusable buffer and it can be huge size, not actual size




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] akkio-97 commented on a change in pull request #3773: [CARBONDATA-3830]Presto array columns read support

GitBox
In reply to this post by GitBox

akkio-97 commented on a change in pull request #3773:
URL: https://github.com/apache/carbondata/pull/3773#discussion_r467639051



##########
File path: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/compress/DirectCompressCodec.java
##########
@@ -246,7 +239,29 @@ public void decodeAndFillVector(byte[] pageData, ColumnVectorInfo vectorInfo, Bi
       vector = ColumnarVectorWrapperDirectFactory
           .getDirectVectorWrapperFactory(vector, vectorInfo.invertedIndex, nullBits, deletedRows,
               true, false);
-      fillVector(pageData, vector, vectorDataType, pageDataType, pageSize, vectorInfo, nullBits);
+      Deque<CarbonColumnVectorImpl> vectorStack = vectorInfo.getVectorStack();
+      // Only if vectorStack is null, it is initialized with the parent vector
+      if (vectorStack == null && vectorInfo.vector.getColumnVector() != null) {
+        vectorStack = new ArrayDeque<>();
+        // pushing the parent vector
+        vectorStack.push((CarbonColumnVectorImpl) vectorInfo.vector.getColumnVector());
+        vectorInfo.setVectorStack(vectorStack);
+      }
+      /*
+       * if top of vector stack is a complex vector then
+       * add their children into the stack and load them too.
+       * TODO: If there are multiple children push them into stack and load them iteratively
+       */
+      if (vectorStack != null && vectorStack.peek().isComplex()) {
+        vectorStack.peek().setChildrenElements(pageData);

Review comment:
       done

##########
File path: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/compress/DirectCompressCodec.java
##########
@@ -246,7 +239,29 @@ public void decodeAndFillVector(byte[] pageData, ColumnVectorInfo vectorInfo, Bi
       vector = ColumnarVectorWrapperDirectFactory
           .getDirectVectorWrapperFactory(vector, vectorInfo.invertedIndex, nullBits, deletedRows,
               true, false);
-      fillVector(pageData, vector, vectorDataType, pageDataType, pageSize, vectorInfo, nullBits);
+      Deque<CarbonColumnVectorImpl> vectorStack = vectorInfo.getVectorStack();
+      // Only if vectorStack is null, it is initialized with the parent vector
+      if (vectorStack == null && vectorInfo.vector.getColumnVector() != null) {
+        vectorStack = new ArrayDeque<>();
+        // pushing the parent vector
+        vectorStack.push((CarbonColumnVectorImpl) vectorInfo.vector.getColumnVector());
+        vectorInfo.setVectorStack(vectorStack);
+      }
+      /*
+       * if top of vector stack is a complex vector then
+       * add their children into the stack and load them too.
+       * TODO: If there are multiple children push them into stack and load them iteratively
+       */
+      if (vectorStack != null && vectorStack.peek().isComplex()) {
+        vectorStack.peek().setChildrenElements(pageData);
+        vectorStack.push(vectorStack.peek().getChildrenVector().get(0));
+        vectorStack.peek().loadPage();
+        return;
+      }
+
+      FillVector fill = new FillVector(pageData, vectorInfo, nullBits);
+      fill.basedOnType(vector, vectorDataType, pageSize, pageDataType);
+

Review comment:
       done

##########
File path: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/FillVector.java
##########
@@ -0,0 +1,346 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.datastore.page.encoding;
+
+import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.BitSet;
+
+import org.apache.carbondata.core.metadata.datatype.DataType;
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
+import org.apache.carbondata.core.metadata.datatype.DecimalConverterFactory;
+import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector;
+import org.apache.carbondata.core.scan.result.vector.ColumnVectorInfo;
+import org.apache.carbondata.core.scan.result.vector.impl.CarbonColumnVectorImpl;
+import org.apache.carbondata.core.util.ByteUtil;
+
+public class FillVector {
+  private byte[] pageData;
+  private float floatFactor = 0;
+  private double factor = 0;
+  private ColumnVectorInfo vectorInfo;
+  private BitSet nullBits;
+
+  public FillVector(byte[] pageData, ColumnVectorInfo vectorInfo, BitSet nullBits) {
+    this.pageData = pageData;
+    this.vectorInfo = vectorInfo;
+    this.nullBits = nullBits;
+  }
+
+  public void setFactor(double factor) {
+    this.factor = factor;
+  }
+
+  public void setFloatFactor(float floatFactor) {
+    this.floatFactor = floatFactor;
+  }
+
+  public void basedOnType(CarbonColumnVector vector, DataType vectorDataType, int pageSize,
+      DataType pageDataType) {
+    if (vectorInfo.vector.getColumnVector() != null && ((CarbonColumnVectorImpl) vectorInfo.vector
+        .getColumnVector()).isComplex()) {
+      fillComplexType(vector.getColumnVector(), pageDataType);
+    } else {
+      fillPrimitiveType(vector, vectorDataType, pageSize, pageDataType);
+      vector.setIndex(0);
+    }
+  }
+
+  private void fillComplexType(CarbonColumnVector vector, DataType pageDataType) {
+    CarbonColumnVectorImpl vectorImpl = (CarbonColumnVectorImpl) vector;
+    if (vector != null && vector.getChildrenVector() != null) {
+      ArrayList<Integer> childElements = ((CarbonColumnVectorImpl) vector).getChildrenElements();
+      for (int i = 0; i < childElements.size(); i++) {
+        int count = childElements.get(i);
+        typeComplexObject(vectorImpl.getChildrenVector().get(0), count, pageDataType);
+        vector.putArrayObject();
+      }
+    }

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3773: [CARBONDATA-3830]Presto array columns read support

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3773:
URL: https://github.com/apache/carbondata/pull/3773#issuecomment-671129181


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3669/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3773: [CARBONDATA-3830]Presto array columns read support

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3773:
URL: https://github.com/apache/carbondata/pull/3773#issuecomment-671129567


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1930/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3773: [CARBONDATA-3830]Presto array columns read support

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3773:
URL: https://github.com/apache/carbondata/pull/3773#issuecomment-671270077


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3673/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3773: [CARBONDATA-3830]Presto array columns read support

GitBox
In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3773:
URL: https://github.com/apache/carbondata/pull/3773#issuecomment-671273851


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1934/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3773: [CARBONDATA-3830]Presto array columns read support

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #3773:
URL: https://github.com/apache/carbondata/pull/3773#discussion_r469026786



##########
File path: integration/presto/src/main/prestosql/org/apache/carbondata/presto/readers/ArrayStreamReader.java
##########
@@ -0,0 +1,163 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.presto.readers;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import io.prestosql.spi.type.*;
+
+import org.apache.carbondata.core.metadata.datatype.DataType;
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
+import org.apache.carbondata.core.metadata.datatype.StructField;
+import org.apache.carbondata.core.scan.result.vector.impl.CarbonColumnVectorImpl;
+
+import io.prestosql.spi.block.Block;
+import io.prestosql.spi.block.BlockBuilder;
+
+import org.apache.carbondata.presto.CarbonVectorBatch;
+
+/**
+ * Class to read the Array Stream
+ */
+
+public class ArrayStreamReader extends CarbonColumnVectorImpl implements PrestoVectorBlockBuilder {
+
+  protected int batchSize;
+
+  protected Type type;
+  protected BlockBuilder builder;
+  Block childBlock = null;
+  private int index = 0;
+
+  public ArrayStreamReader(int batchSize, DataType dataType, StructField field) {
+    super(batchSize, dataType);
+    this.batchSize = batchSize;
+    this.type = getArrayOfType(field, dataType);
+    ArrayList<CarbonColumnVectorImpl> childrenList= new ArrayList<>();
+    childrenList.add(CarbonVectorBatch.createDirectStreamReader(this.batchSize, field.getDataType(), field));
+    setChildrenVector(childrenList);
+    this.builder = type.createBlockBuilder(null, batchSize);
+  }
+
+  public int getIndex() {
+    return index;
+  }
+
+  public void setIndex(int index) {
+    this.index = index;
+  }
+
+  public String getDataTypeName() {
+    return "ARRAY";
+  }
+
+  Type getArrayOfType(StructField field, DataType dataType) {
+    if (dataType == DataTypes.STRING) {
+      return new ArrayType(VarcharType.VARCHAR);
+    } else if (dataType == DataTypes.BYTE) {
+      return new ArrayType(TinyintType.TINYINT);
+    } else if (dataType == DataTypes.SHORT) {
+      return new ArrayType(SmallintType.SMALLINT);
+    } else if (dataType == DataTypes.INT) {

Review comment:
       decimal datatype handling is also missing




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3773: [CARBONDATA-3830]Presto array columns read support

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #3773:
URL: https://github.com/apache/carbondata/pull/3773#discussion_r469161474



##########
File path: integration/presto/src/main/prestosql/org/apache/carbondata/presto/readers/ArrayStreamReader.java
##########
@@ -0,0 +1,163 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.presto.readers;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import io.prestosql.spi.type.*;
+
+import org.apache.carbondata.core.metadata.datatype.DataType;
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
+import org.apache.carbondata.core.metadata.datatype.StructField;
+import org.apache.carbondata.core.scan.result.vector.impl.CarbonColumnVectorImpl;
+
+import io.prestosql.spi.block.Block;
+import io.prestosql.spi.block.BlockBuilder;
+
+import org.apache.carbondata.presto.CarbonVectorBatch;
+
+/**
+ * Class to read the Array Stream
+ */
+
+public class ArrayStreamReader extends CarbonColumnVectorImpl implements PrestoVectorBlockBuilder {
+
+  protected int batchSize;
+
+  protected Type type;
+  protected BlockBuilder builder;
+  Block childBlock = null;
+  private int index = 0;
+
+  public ArrayStreamReader(int batchSize, DataType dataType, StructField field) {
+    super(batchSize, dataType);
+    this.batchSize = batchSize;
+    this.type = getArrayOfType(field, dataType);
+    ArrayList<CarbonColumnVectorImpl> childrenList= new ArrayList<>();
+    childrenList.add(CarbonVectorBatch.createDirectStreamReader(this.batchSize, field.getDataType(), field));
+    setChildrenVector(childrenList);
+    this.builder = type.createBlockBuilder(null, batchSize);
+  }
+
+  public int getIndex() {
+    return index;
+  }
+
+  public void setIndex(int index) {
+    this.index = index;
+  }
+
+  public String getDataTypeName() {
+    return "ARRAY";
+  }
+
+  Type getArrayOfType(StructField field, DataType dataType) {
+    if (dataType == DataTypes.STRING) {
+      return new ArrayType(VarcharType.VARCHAR);
+    } else if (dataType == DataTypes.BYTE) {
+      return new ArrayType(TinyintType.TINYINT);
+    } else if (dataType == DataTypes.SHORT) {
+      return new ArrayType(SmallintType.SMALLINT);
+    } else if (dataType == DataTypes.INT) {

Review comment:
       Also VARCHAR is missing and rebase PR to handle binary also




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


123456