Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #1581: [CARBONDATA-1779] GenericVectorizedReader

Classic

List

40 messages Options

Options

12

[GitHub] carbondata pull request #1581: [CARBONDATA-1779] GenericVectorizedReader

GitHub user bhavya411 opened a pull request:

https://github.com/apache/carbondata/pull/1581

[CARBONDATA-1779] GenericVectorizedReader

This PR removes the Spark Dependency from Presto Integration Module for using the CarbonVectorizedRecordreader, This PR consolidate CarbonVectorizedRecordReader into one,to make it shared for all integration modules

In the earlier version of Presto Integration we were using ColumnarBatch of Spark, which is not a good practice, here we provided our own implementation of the ColumnVector and the VectorBatch to eliminate the Spark all together. This generic ColumnVector can now be used for all the integration module wherever we want to have a VectorizedReader to speed up the processing.

There are some core module classes changed to ensure that we are using Java data types instead of Spark datatypes, Decimal being one of them.

This PR tries to limit the changes to Core module .

Newly Added Classes
1.CarbonColumnVectorImpl:This Class Implements the Interface CarbonColumnVector and provides the methods to store the data in a Vector and to retrieved the data from it as well

2.CarbonVectorBatch: This Class Creates A VectorizedRowBatch which is a set of rows, organized with each column as a CarbonVector. It is the unit of query execution, organized to minimize the cost per row and achieve high cycles-per-instruction. The major fields are public by design to allow fast and convenient access by the vectorized query execution code.

3.StructField:This class is used to pass the Schema Information to the Carbon Columnar Batch

No interfaces changed.

No backward compatibility impacted.

No Document update required.

[ Yes] Testing done
- All Unit test cases are passing, no new unit test cases were needed as this PR implements a Generic Vectorized Reader.
- Manual Testing completed for the same .

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bhavya411/incubator-carbondata GenericPrestoVectorizedReader

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/1581.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1581

----
commit 943dd6d01063409788d988bbd766c8597dc7de21
Author: Bhavya <[hidden email]>
Date: 2017-11-14T10:05:44Z

Added Generic vectorized Reader

----

---

[GitHub] carbondata issue #1581: [CARBONDATA-1779] GenericVectorizedReader

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1581

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1530/

---

[GitHub] carbondata issue #1581: [CARBONDATA-1779] GenericVectorizedReader

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1581

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1929/

---

[GitHub] carbondata issue #1581: [CARBONDATA-1779] GenericVectorizedReader

In reply to this post by qiuchenjian-2

Github user bhavya411 commented on the issue:

https://github.com/apache/carbondata/pull/1581

retest this please.

---

[GitHub] carbondata issue #1581: [CARBONDATA-1779] GenericVectorizedReader

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1581

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1570/

---

[GitHub] carbondata issue #1581: [CARBONDATA-1779] GenericVectorizedReader

In reply to this post by qiuchenjian-2

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/1581

retest this please

---

[GitHub] carbondata issue #1581: [CARBONDATA-1779] GenericVectorizedReader

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1581

Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/426/

---

[GitHub] carbondata pull request #1581: [CARBONDATA-1779] GenericVectorizedReader

In reply to this post by qiuchenjian-2

Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1581#discussion_r154812647

--- Diff: integration/presto/pom.xml ---
@@ -31,7 +31,7 @@
<packaging>presto-plugin</packaging>

<properties>
- <presto.version>0.186</presto.version>
+ <presto.version>0.187</presto.version>
--- End diff --

two question :1. as per mailing list discussed, the version of presto be suggested with 0.186, here changes to 0.187 , i think we should have a stable and fixed version number,it might not be good if carbon1.3 optionally upgrade version number for presto integration module. 2. if users still use the old presto version, whether it works , or not ?

---

[GitHub] carbondata issue #1581: [CARBONDATA-1779] GenericVectorizedReader

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1581

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1697/

---

[GitHub] carbondata pull request #1581: [CARBONDATA-1779] GenericVectorizedReader

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1581#discussion_r154836902

--- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/SafeVariableLengthDimensionDataChunkStore.java ---
@@ -141,24 +135,25 @@ public SafeVariableLengthDimensionDataChunkStore(boolean isInvertedIndex, int nu
// for last record
length = (short) (this.data.length - currentDataOffset);
}
- DataType dt = vector.getType();
- if ((!(dt instanceof StringType) && length == 0) || ByteUtil.UnsafeComparer.INSTANCE
+ org.apache.carbondata.core.metadata.datatype.DataType dt = vector.getType();
--- End diff --

Import this class

---

[GitHub] carbondata pull request #1581: [CARBONDATA-1779] GenericVectorizedReader

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1581#discussion_r154838240

--- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/ColumnarVectorWrapper.java ---
@@ -17,23 +17,24 @@

package org.apache.carbondata.presto;

+import java.math.BigDecimal;
--- End diff --

Better give some other name to this class, same name classes confuses

---

[GitHub] carbondata pull request #1581: [CARBONDATA-1779] GenericVectorizedReader

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1581#discussion_r154838347

--- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/StructField.java ---
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.presto;
+
+import org.apache.carbondata.core.metadata.datatype.DataType;
+
+/**
+ * This class is to pass the Schema Information to the Carbon Columnar Batch
+ */
+public class StructField {
--- End diff --

Carbon core already has this class right? why new class

---

[GitHub] carbondata pull request #1581: [CARBONDATA-1779] GenericVectorizedReader

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1581#discussion_r154838503

--- Diff: integration/spark2/src/main/java/org/apache/carbondata/spark/vectorreader/ColumnarVectorWrapper.java ---
@@ -200,7 +205,7 @@ public ColumnarVectorWrapper(ColumnVector columnVector, boolean[] filteredRows)
}

@Override public DataType getType() {
- return columnVector.dataType();
+ return CarbonScalaUtil.convertSparkToCarbonDataType(columnVector.dataType());
--- End diff --

store the dataType as class variable, don't convert each time.

---

[GitHub] carbondata pull request #1581: [CARBONDATA-1779] GenericVectorizedReader

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1581#discussion_r154838692

--- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/CarbonColumnVectorImpl.java ---
@@ -0,0 +1,249 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.carbondata.presto;
+
+import java.math.BigDecimal;
+import java.util.Arrays;
+import java.util.BitSet;
+
+import org.apache.carbondata.core.metadata.datatype.DataType;
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
+import org.apache.carbondata.core.metadata.datatype.DecimalType;
+import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector;
+
+
+
+public class CarbonColumnVectorImpl implements CarbonColumnVector {
--- End diff --

I think you can keep this class in core package, hive also can use it in future

---

[GitHub] carbondata issue #1581: [CARBONDATA-1779] GenericVectorizedReader

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1581

@bhavya411 Overall changes look fine for me apart from few minor comments.

---

[GitHub] carbondata pull request #1581: [CARBONDATA-1779] GenericVectorizedReader

In reply to this post by qiuchenjian-2

Github user bhavya411 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1581#discussion_r154860147

--- Diff: integration/presto/pom.xml ---
@@ -31,7 +31,7 @@
<packaging>presto-plugin</packaging>

<properties>
- <presto.version>0.186</presto.version>
+ <presto.version>0.187</presto.version>
--- End diff --

Yes I think we need to specify 0.187 version upgrade in another mailing list, with 0.187 in carbon it works with 0.186 but not with 0.166 as the interfaces for Connector is changed in 0.186

---

[GitHub] carbondata pull request #1581: [CARBONDATA-1779] GenericVectorizedReader

In reply to this post by qiuchenjian-2

Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1581#discussion_r155176489

--- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/RestructureBasedVectorResultCollector.java ---
@@ -238,7 +238,7 @@ private void fillDataForNonExistingMeasures() {
(long) defaultValue);
} else if (DataTypes.isDecimal(dataType)) {
vector.putDecimals(columnVectorInfo.vectorOffset, columnVectorInfo.size,
- (Decimal) defaultValue, measure.getPrecision());
+ ((Decimal) defaultValue).toJavaBigDecimal(), measure.getPrecision());
--- End diff --

please remove these import :
import org.apache.spark.sql.types.Decimal;
import org.apache.spark.unsafe.types.UTF8String;

---

[GitHub] carbondata pull request #1581: [CARBONDATA-1779] GenericVectorizedReader

In reply to this post by qiuchenjian-2

Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1581#discussion_r155178064

--- Diff: integration/presto/pom.xml ---
@@ -31,7 +31,7 @@
<packaging>presto-plugin</packaging>

<properties>
- <presto.version>0.186</presto.version>
+ <presto.version>0.187</presto.version>
--- End diff --

for 1.3.0, we have to finalize the presto version.

---

[GitHub] carbondata pull request #1581: [CARBONDATA-1779] GenericVectorizedReader

In reply to this post by qiuchenjian-2

Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1581#discussion_r155179202

--- Diff: integration/presto/pom.xml ---
@@ -431,14 +435,27 @@
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
- <artifactId>spark-sql_2.11</artifactId>
+ <artifactId>spark-network-common_2.11</artifactId>
--- End diff --

Even though we remove all spark dependency in presto integration module, why here still need spark dependency in pom.xml of presto?

---

[GitHub] carbondata pull request #1581: [CARBONDATA-1779] GenericVectorizedReader

In reply to this post by qiuchenjian-2

Github user bhavya411 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1581#discussion_r155185364

--- Diff: integration/presto/pom.xml ---
@@ -431,14 +435,27 @@
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
- <artifactId>spark-sql_2.11</artifactId>
+ <artifactId>spark-network-common_2.11</artifactId>
--- End diff --

For Running the Integration Tests as when we create a CarbonStore we need the Spark Dependency, so what we have done is that we have kept the scope of those dependency as Test , no spark Jar will be part of the final Jar that we copy to presto plugin

---

12