GitHub user ravipesala opened a pull request:
https://github.com/apache/carbondata/pull/2823 [CARBONDATA-3015] Support Lazy load in carbon vector This PR depends on PR https://github.com/apache/carbondata/pull/2822 Even though we prune the pages as per min/max there is a high chance of false positives in case of filters on high cardinality columns. So to avoid that we can use the lazy loading design. It does not read/decompresses data and fill the vector immediately when the call comes for data filling from spark/presto. First only reads the required filter columns give back to execution engine, execution engine starts filtering on the filtered column vector and if it finds some data need to be read from projection columns then only it starts reads the projection columns and fills the vector on demand. It is the concept of presto and same is integrated with spark 2.3. Older versions of spark cannot use this advantage as ColumnVector interfaces are non-extendable. For the above purpose added new classes 'LazyBlockletLoad' and 'LazyPageLoad' and changed the carbon vector interfaces. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ravipesala/incubator-carbondata perf-lazy-load Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2823.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2823 ---- commit 5443ce85a91c4ce28454d73699d62ef47255a845 Author: ravipesala <ravi.pesala@...> Date: 2018-10-16T05:02:18Z Add carbon property to configure vector based row pruning push down commit d024ccac82979a908a643b7198ba1f80d9c08e91 Author: ravipesala <ravi.pesala@...> Date: 2018-10-16T06:00:43Z Added support for full scan queries for vector direct fill. commit 77203f344dbe9dca8ce2697a0dbc364bb8a01cdc Author: ravipesala <ravi.pesala@...> Date: 2018-10-16T09:23:14Z Added support for pruning pages for vector direct fill. commit 46578850c5c02c4927da32efe7a1df27cfe2ebd1 Author: ravipesala <ravi.pesala@...> Date: 2018-10-16T11:07:18Z Added support for inverted index and delete delta for direct scan queries commit b08c254280198fbbcb41361c7b32a9eb51cf8473 Author: ravipesala <ravi.pesala@...> Date: 2018-10-16T13:09:16Z Support Lazy load in carbon vector ---- --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2823 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/814/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2823 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1011/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2823 Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9079/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2823 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/826/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2823 Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9091/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2823 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1023/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2823 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/828/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2823 Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9093/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2823 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1025/ --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2823#discussion_r226547892 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/AbstractNonDictionaryVectorFiller.java --- @@ -0,0 +1,274 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore.chunk.store.impl.safe; + +import java.nio.ByteBuffer; + +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.metadata.datatype.DataType; +import org.apache.carbondata.core.metadata.datatype.DataTypes; +import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector; +import org.apache.carbondata.core.util.ByteUtil; +import org.apache.carbondata.core.util.DataTypeUtil; + +public abstract class AbstractNonDictionaryVectorFiller { + + protected int lengthSize; + protected int numberOfRows; + + public AbstractNonDictionaryVectorFiller(int lengthSize, int numberOfRows) { + this.lengthSize = lengthSize; + this.numberOfRows = numberOfRows; + } + + public abstract void fillVector(byte[] data, CarbonColumnVector vector, ByteBuffer buffer); + + public int getLengthFromBuffer(ByteBuffer buffer) { + return buffer.getShort(); + } +} + +class NonDictionaryVectorFillerFactory { + + public static AbstractNonDictionaryVectorFiller getVectorFiller(DataType type, int lengthSize, + int numberOfRows) { + if (type == DataTypes.STRING || type == DataTypes.VARCHAR) { + if (lengthSize == 2) { --- End diff -- instead of judging by the `lengthSize`,I think you can dispatch the branch using the `type` --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2823#discussion_r226547597 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/SafeVariableLengthDimensionDataChunkStore.java --- @@ -91,6 +95,31 @@ public void putArray(final int[] invertedIndex, final int[] invertedIndexReverse } } + @Override + public void fillVector(int[] invertedIndex, int[] invertedIndexReverse, byte[] data, + ColumnVectorInfo vectorInfo) { + this.invertedIndexReverse = invertedIndex; + + // as first position will be start from 2 byte as data is stored first in the memory block --- End diff -- please fix the comment here since it is not always 2 bytes. --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2823#discussion_r226549110 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/AbstractNonDictionaryVectorFiller.java --- @@ -0,0 +1,274 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore.chunk.store.impl.safe; + +import java.nio.ByteBuffer; + +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.metadata.datatype.DataType; +import org.apache.carbondata.core.metadata.datatype.DataTypes; +import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector; +import org.apache.carbondata.core.util.ByteUtil; +import org.apache.carbondata.core.util.DataTypeUtil; + +public abstract class AbstractNonDictionaryVectorFiller { + + protected int lengthSize; + protected int numberOfRows; + + public AbstractNonDictionaryVectorFiller(int lengthSize, int numberOfRows) { + this.lengthSize = lengthSize; + this.numberOfRows = numberOfRows; + } + + public abstract void fillVector(byte[] data, CarbonColumnVector vector, ByteBuffer buffer); + + public int getLengthFromBuffer(ByteBuffer buffer) { + return buffer.getShort(); + } +} + +class NonDictionaryVectorFillerFactory { + + public static AbstractNonDictionaryVectorFiller getVectorFiller(DataType type, int lengthSize, + int numberOfRows) { + if (type == DataTypes.STRING || type == DataTypes.VARCHAR) { + if (lengthSize == 2) { + return new StringVectorFiller(lengthSize, numberOfRows); + } else { + return new LongStringVectorFiller(lengthSize, numberOfRows); + } + } else if (type == DataTypes.TIMESTAMP) { + return new TimeStampVectorFiller(lengthSize, numberOfRows); + } else if (type == DataTypes.BOOLEAN) { + return new BooleanVectorFiller(lengthSize, numberOfRows); + } else if (type == DataTypes.SHORT) { + return new ShortVectorFiller(lengthSize, numberOfRows); + } else if (type == DataTypes.INT) { + return new IntVectorFiller(lengthSize, numberOfRows); + } else if (type == DataTypes.LONG) { + return new LongStringVectorFiller(lengthSize, numberOfRows); + } + return new StringVectorFiller(lengthSize, numberOfRows); + } + +} + +class StringVectorFiller extends AbstractNonDictionaryVectorFiller { + + public StringVectorFiller(int lengthSize, int numberOfRows) { + super(lengthSize, numberOfRows); + } + + @Override + public void fillVector(byte[] data, CarbonColumnVector vector, ByteBuffer buffer) { + // start position will be used to store the current data position + int startOffset = 0; + int currentOffset = lengthSize; + ByteUtil.UnsafeComparer comparer = ByteUtil.UnsafeComparer.INSTANCE; + for (int i = 0; i < numberOfRows - 1; i++) { --- End diff -- Can you add some descriptions about how is the vector filled? Take this line for example, it seems the the last row is reserved to do something else, so a description is wanted. --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2823#discussion_r226550593 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveDeltaFloatingCodec.java --- @@ -225,6 +238,71 @@ public double decodeDouble(long value) { return (max - value) / factor; } + @Override + public void decodeAndFillVector(ColumnPage columnPage, ColumnVectorInfo vectorInfo) { + CarbonColumnVector vector = vectorInfo.vector; + BitSet nullBits = columnPage.getNullBits(); + DataType type = columnPage.getDataType(); + int pageSize = columnPage.getPageSize(); + BitSet deletedRows = vectorInfo.deletedRows; + DataType dataType = vector.getType(); --- End diff -- What's the relationship between `columnPage.getDataType()` and `vector.getType()`? --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2823#discussion_r226548594 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/AbstractNonDictionaryVectorFiller.java --- @@ -0,0 +1,274 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore.chunk.store.impl.safe; + +import java.nio.ByteBuffer; + +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.metadata.datatype.DataType; +import org.apache.carbondata.core.metadata.datatype.DataTypes; +import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector; +import org.apache.carbondata.core.util.ByteUtil; +import org.apache.carbondata.core.util.DataTypeUtil; + +public abstract class AbstractNonDictionaryVectorFiller { + + protected int lengthSize; + protected int numberOfRows; + + public AbstractNonDictionaryVectorFiller(int lengthSize, int numberOfRows) { + this.lengthSize = lengthSize; + this.numberOfRows = numberOfRows; + } + + public abstract void fillVector(byte[] data, CarbonColumnVector vector, ByteBuffer buffer); + + public int getLengthFromBuffer(ByteBuffer buffer) { + return buffer.getShort(); + } +} + +class NonDictionaryVectorFillerFactory { + + public static AbstractNonDictionaryVectorFiller getVectorFiller(DataType type, int lengthSize, + int numberOfRows) { + if (type == DataTypes.STRING || type == DataTypes.VARCHAR) { + if (lengthSize == 2) { + return new StringVectorFiller(lengthSize, numberOfRows); + } else { + return new LongStringVectorFiller(lengthSize, numberOfRows); + } + } else if (type == DataTypes.TIMESTAMP) { + return new TimeStampVectorFiller(lengthSize, numberOfRows); + } else if (type == DataTypes.BOOLEAN) { + return new BooleanVectorFiller(lengthSize, numberOfRows); + } else if (type == DataTypes.SHORT) { + return new ShortVectorFiller(lengthSize, numberOfRows); + } else if (type == DataTypes.INT) { + return new IntVectorFiller(lengthSize, numberOfRows); + } else if (type == DataTypes.LONG) { + return new LongStringVectorFiller(lengthSize, numberOfRows); + } + return new StringVectorFiller(lengthSize, numberOfRows); + } + +} + +class StringVectorFiller extends AbstractNonDictionaryVectorFiller { + + public StringVectorFiller(int lengthSize, int numberOfRows) { + super(lengthSize, numberOfRows); + } + + @Override + public void fillVector(byte[] data, CarbonColumnVector vector, ByteBuffer buffer) { + // start position will be used to store the current data position + int startOffset = 0; + int currentOffset = lengthSize; + ByteUtil.UnsafeComparer comparer = ByteUtil.UnsafeComparer.INSTANCE; --- End diff -- `Comparer`? It is french, not English. Do you mean `Comparator`? If it is so, please fix the name of this class. --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2823#discussion_r226549503 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/AbstractNonDictionaryVectorFiller.java --- @@ -0,0 +1,274 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore.chunk.store.impl.safe; + +import java.nio.ByteBuffer; + +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.metadata.datatype.DataType; +import org.apache.carbondata.core.metadata.datatype.DataTypes; +import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector; +import org.apache.carbondata.core.util.ByteUtil; +import org.apache.carbondata.core.util.DataTypeUtil; + +public abstract class AbstractNonDictionaryVectorFiller { + + protected int lengthSize; + protected int numberOfRows; + + public AbstractNonDictionaryVectorFiller(int lengthSize, int numberOfRows) { + this.lengthSize = lengthSize; + this.numberOfRows = numberOfRows; + } + + public abstract void fillVector(byte[] data, CarbonColumnVector vector, ByteBuffer buffer); + + public int getLengthFromBuffer(ByteBuffer buffer) { + return buffer.getShort(); + } +} + +class NonDictionaryVectorFillerFactory { + + public static AbstractNonDictionaryVectorFiller getVectorFiller(DataType type, int lengthSize, + int numberOfRows) { + if (type == DataTypes.STRING || type == DataTypes.VARCHAR) { + if (lengthSize == 2) { + return new StringVectorFiller(lengthSize, numberOfRows); + } else { + return new LongStringVectorFiller(lengthSize, numberOfRows); + } + } else if (type == DataTypes.TIMESTAMP) { + return new TimeStampVectorFiller(lengthSize, numberOfRows); + } else if (type == DataTypes.BOOLEAN) { + return new BooleanVectorFiller(lengthSize, numberOfRows); + } else if (type == DataTypes.SHORT) { + return new ShortVectorFiller(lengthSize, numberOfRows); + } else if (type == DataTypes.INT) { + return new IntVectorFiller(lengthSize, numberOfRows); + } else if (type == DataTypes.LONG) { + return new LongStringVectorFiller(lengthSize, numberOfRows); + } + return new StringVectorFiller(lengthSize, numberOfRows); + } + +} + +class StringVectorFiller extends AbstractNonDictionaryVectorFiller { + + public StringVectorFiller(int lengthSize, int numberOfRows) { + super(lengthSize, numberOfRows); + } + + @Override + public void fillVector(byte[] data, CarbonColumnVector vector, ByteBuffer buffer) { + // start position will be used to store the current data position + int startOffset = 0; --- End diff -- If this variable is used as the above comments, why not optimize the name to `currentOffset`? --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2823#discussion_r226548215 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/AbstractNonDictionaryVectorFiller.java --- @@ -0,0 +1,274 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore.chunk.store.impl.safe; + +import java.nio.ByteBuffer; + +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.metadata.datatype.DataType; +import org.apache.carbondata.core.metadata.datatype.DataTypes; +import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector; +import org.apache.carbondata.core.util.ByteUtil; +import org.apache.carbondata.core.util.DataTypeUtil; + +public abstract class AbstractNonDictionaryVectorFiller { + + protected int lengthSize; + protected int numberOfRows; + + public AbstractNonDictionaryVectorFiller(int lengthSize, int numberOfRows) { + this.lengthSize = lengthSize; + this.numberOfRows = numberOfRows; + } + + public abstract void fillVector(byte[] data, CarbonColumnVector vector, ByteBuffer buffer); + + public int getLengthFromBuffer(ByteBuffer buffer) { + return buffer.getShort(); + } +} + +class NonDictionaryVectorFillerFactory { + + public static AbstractNonDictionaryVectorFiller getVectorFiller(DataType type, int lengthSize, + int numberOfRows) { + if (type == DataTypes.STRING || type == DataTypes.VARCHAR) { + if (lengthSize == 2) { + return new StringVectorFiller(lengthSize, numberOfRows); + } else { + return new LongStringVectorFiller(lengthSize, numberOfRows); + } + } else if (type == DataTypes.TIMESTAMP) { + return new TimeStampVectorFiller(lengthSize, numberOfRows); + } else if (type == DataTypes.BOOLEAN) { + return new BooleanVectorFiller(lengthSize, numberOfRows); + } else if (type == DataTypes.SHORT) { + return new ShortVectorFiller(lengthSize, numberOfRows); + } else if (type == DataTypes.INT) { + return new IntVectorFiller(lengthSize, numberOfRows); + } else if (type == DataTypes.LONG) { + return new LongStringVectorFiller(lengthSize, numberOfRows); + } + return new StringVectorFiller(lengthSize, numberOfRows); --- End diff -- Will it by default use `StringVectorFiller` or throw an exception? Is this intended? --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2823#discussion_r226550873 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveDeltaIntegralCodec.java --- @@ -272,5 +295,169 @@ public double decodeDouble(double value) { // this codec is for integer type only throw new RuntimeException("internal error"); } + + @Override + public void decodeAndFillVector(ColumnPage columnPage, ColumnVectorInfo vectorInfo) { + CarbonColumnVector vector = vectorInfo.vector; + BitSet nullBits = columnPage.getNullBits(); + DataType dataType = vector.getType(); + DataType type = columnPage.getDataType(); + int pageSize = columnPage.getPageSize(); + BitSet deletedRows = vectorInfo.deletedRows; + vector = ColumnarVectorWrapperDirectFactory + .getDirectVectorWrapperFactory(vector, vectorInfo.invertedIndex, nullBits, deletedRows); + fillVector(columnPage, vector, dataType, type, pageSize, vectorInfo); + if (deletedRows == null || deletedRows.isEmpty()) { + for (int i = nullBits.nextSetBit(0); i >= 0; i = nullBits.nextSetBit(i + 1)) { + vector.putNull(i); + } + } + if (vector instanceof ConvertableVector) { + ((ConvertableVector) vector).convert(); + } + } + + private void fillVector(ColumnPage columnPage, CarbonColumnVector vector, DataType dataType, + DataType type, int pageSize, ColumnVectorInfo vectorInfo) { + if (type == DataTypes.BOOLEAN || type == DataTypes.BYTE) { + byte[] byteData = columnPage.getByteData(); + if (dataType == DataTypes.SHORT) { + for (int i = 0; i < pageSize; i++) { + vector.putShort(i, (short) (max - byteData[i])); + } + } else if (dataType == DataTypes.INT) { + for (int i = 0; i < pageSize; i++) { + vector.putInt(i, (int) (max - byteData[i])); + } + } else if (dataType == DataTypes.LONG) { + for (int i = 0; i < pageSize; i++) { + vector.putLong(i, (max - byteData[i])); + } + } else if (dataType == DataTypes.TIMESTAMP) { + for (int i = 0; i < pageSize; i++) { + vector.putLong(i, (max - byteData[i]) * 1000); + } + } else if (dataType == DataTypes.BOOLEAN) { + for (int i = 0; i < pageSize; i++) { + vector.putByte(i, (byte) (max - byteData[i])); + } + } else if (DataTypes.isDecimal(dataType)) { + DecimalConverterFactory.DecimalConverter decimalConverter = vectorInfo.decimalConverter; + int precision = vectorInfo.measure.getMeasure().getPrecision(); + for (int i = 0; i < pageSize; i++) { + BigDecimal decimal = decimalConverter.getDecimal(max - byteData[i]); + vector.putDecimal(i, decimal, precision); + } + } else { + for (int i = 0; i < pageSize; i++) { + vector.putDouble(i, (max - byteData[i])); + } + } + } else if (type == DataTypes.SHORT) { + short[] shortData = columnPage.getShortData(); + if (dataType == DataTypes.SHORT) { + for (int i = 0; i < pageSize; i++) { + vector.putShort(i, (short) (max - shortData[i])); + } + } else if (dataType == DataTypes.INT) { + for (int i = 0; i < pageSize; i++) { + vector.putInt(i, (int) (max - shortData[i])); + } + } else if (dataType == DataTypes.LONG) { + for (int i = 0; i < pageSize; i++) { + vector.putLong(i, (max - shortData[i])); + } + } else if (dataType == DataTypes.TIMESTAMP) { + for (int i = 0; i < pageSize; i++) { + vector.putLong(i, (max - shortData[i]) * 1000); + } + } else if (DataTypes.isDecimal(dataType)) { + DecimalConverterFactory.DecimalConverter decimalConverter = vectorInfo.decimalConverter; + int precision = vectorInfo.measure.getMeasure().getPrecision(); + for (int i = 0; i < pageSize; i++) { + BigDecimal decimal = decimalConverter.getDecimal(max - shortData[i]); + vector.putDecimal(i, decimal, precision); + } + } else { + for (int i = 0; i < pageSize; i++) { + vector.putDouble(i, (max - shortData[i])); + } + } + + } else if (type == DataTypes.SHORT_INT) { + int[] shortIntData = columnPage.getShortIntData(); + if (dataType == DataTypes.INT) { + for (int i = 0; i < pageSize; i++) { + vector.putInt(i, (int) (max - shortIntData[i])); + } + } else if (dataType == DataTypes.LONG) { + for (int i = 0; i < pageSize; i++) { + vector.putLong(i, (max - shortIntData[i])); + } + } else if (dataType == DataTypes.TIMESTAMP) { + for (int i = 0; i < pageSize; i++) { + vector.putLong(i, (max - shortIntData[i]) * 1000); + } + } else if (DataTypes.isDecimal(dataType)) { + DecimalConverterFactory.DecimalConverter decimalConverter = vectorInfo.decimalConverter; + int precision = vectorInfo.measure.getMeasure().getPrecision(); + for (int i = 0; i < pageSize; i++) { + BigDecimal decimal = decimalConverter.getDecimal(max - shortIntData[i]); + vector.putDecimal(i, decimal, precision); + } + } else { + for (int i = 0; i < pageSize; i++) { + vector.putDouble(i, (max - shortIntData[i])); + } + } + } else if (type == DataTypes.INT) { + int[] intData = columnPage.getIntData(); + if (dataType == DataTypes.INT) { + for (int i = 0; i < pageSize; i++) { + vector.putInt(i, (int) (max - intData[i])); + } + } else if (dataType == DataTypes.LONG) { + for (int i = 0; i < pageSize; i++) { + vector.putLong(i, (max - intData[i])); + } + } else if (dataType == DataTypes.TIMESTAMP) { + for (int i = 0; i < pageSize; i++) { + vector.putLong(i, (max - intData[i]) * 1000); + } + } else if (DataTypes.isDecimal(dataType)) { + DecimalConverterFactory.DecimalConverter decimalConverter = vectorInfo.decimalConverter; + int precision = vectorInfo.measure.getMeasure().getPrecision(); + for (int i = 0; i < pageSize; i++) { + BigDecimal decimal = decimalConverter.getDecimal(max - intData[i]); + vector.putDecimal(i, decimal, precision); + } + } else { + for (int i = 0; i < pageSize; i++) { + vector.putDouble(i, (max - intData[i])); + } + } + } else if (type == DataTypes.LONG) { + long[] longData = columnPage.getLongData(); + if (dataType == DataTypes.LONG) { + for (int i = 0; i < pageSize; i++) { + vector.putLong(i, (max - longData[i])); + } + } else if (dataType == DataTypes.TIMESTAMP) { + for (int i = 0; i < pageSize; i++) { + vector.putLong(i, (max - longData[i]) * 1000); + } + } else if (DataTypes.isDecimal(dataType)) { + DecimalConverterFactory.DecimalConverter decimalConverter = vectorInfo.decimalConverter; + int precision = vectorInfo.measure.getMeasure().getPrecision(); + for (int i = 0; i < pageSize; i++) { + BigDecimal decimal = decimalConverter.getDecimal(max - longData[i]); + vector.putDecimal(i, decimal, precision); + } + } + } else { + throw new RuntimeException("internal error: " + this.toString()); --- End diff -- required to optimize this exception message for better understanding --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2823#discussion_r226867258 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/SafeVariableLengthDimensionDataChunkStore.java --- @@ -91,6 +95,31 @@ public void putArray(final int[] invertedIndex, final int[] invertedIndexReverse } } + @Override + public void fillVector(int[] invertedIndex, int[] invertedIndexReverse, byte[] data, + ColumnVectorInfo vectorInfo) { + this.invertedIndexReverse = invertedIndex; + + // as first position will be start from 2 byte as data is stored first in the memory block --- End diff -- ok --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2823#discussion_r226867290 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/AbstractNonDictionaryVectorFiller.java --- @@ -0,0 +1,274 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore.chunk.store.impl.safe; + +import java.nio.ByteBuffer; + +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.metadata.datatype.DataType; +import org.apache.carbondata.core.metadata.datatype.DataTypes; +import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector; +import org.apache.carbondata.core.util.ByteUtil; +import org.apache.carbondata.core.util.DataTypeUtil; + +public abstract class AbstractNonDictionaryVectorFiller { + + protected int lengthSize; + protected int numberOfRows; + + public AbstractNonDictionaryVectorFiller(int lengthSize, int numberOfRows) { + this.lengthSize = lengthSize; + this.numberOfRows = numberOfRows; + } + + public abstract void fillVector(byte[] data, CarbonColumnVector vector, ByteBuffer buffer); + + public int getLengthFromBuffer(ByteBuffer buffer) { + return buffer.getShort(); + } +} + +class NonDictionaryVectorFillerFactory { + + public static AbstractNonDictionaryVectorFiller getVectorFiller(DataType type, int lengthSize, + int numberOfRows) { + if (type == DataTypes.STRING || type == DataTypes.VARCHAR) { + if (lengthSize == 2) { --- End diff -- ok --- |
Free forum by Nabble | Edit this page |