Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] incubator-carbondata pull request #584: [WIP] Added code for new V3 format t...

Classic

List

47 messages Options

Options

123

[GitHub] incubator-carbondata pull request #584: [CARBONDATA-726] Handled query and s...

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/584#discussion_r102617041

--- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/DimensionColumnChunkReader.java ---
@@ -35,7 +36,7 @@
* @param blockIndexes blocks to be read
* @return dimension column chunks
*/
- DimensionColumnDataChunk[] readDimensionChunks(FileHolder fileReader, int[][] blockIndexes)
+ DimensionRawColumnChunk[] readRawDimensionChunks(FileHolder fileReader, int[][] blockIndexes)
--- End diff --

Is this the `blockIndexes` or `blockletIndexes`?
I think it is blocklet index in each block, right? can you add description in function header

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #584: [CARBONDATA-726] Handled query and s...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/584#discussion_r102623985

--- Diff: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RowLevelFilterExecuterImpl.java ---
@@ -343,24 +339,39 @@ private int readSurrogatesFromColumnGroupBlock(BlocksChunkHolder blockChunkHolde
return 0;
}

- /**
- * Reading the blocks for no dictionary data, in no dictionary case
- * directly the filter data will read, no need to scan the dictionary
- * or read the dictionary value.
- *
- * @param dimensionColumnDataChunk
- * @param index
- * @return
- */
- private String readMemberBasedOnNoDictionaryVal(
- VariableLengthDimensionDataChunk dimensionColumnDataChunk, int index) {
- return new String(dimensionColumnDataChunk.getChunkData(index),
- Charset.forName(CarbonCommonConstants.DEFAULT_CHARSET));
- }

@Override public BitSet isScanRequired(byte[][] blockMaxValue, byte[][] blockMinValue) {
BitSet bitSet = new BitSet(1);
bitSet.set(0);
return bitSet;
}
+
+ @Override public void readBlocks(BlocksChunkHolder blockChunkHolder) throws IOException {
+ for (int i = 0; i < dimColEvaluatorInfoList.size(); i++) {
+ DimColumnResolvedFilterInfo dimColumnEvaluatorInfo = dimColEvaluatorInfoList.get(i);
+ if (dimColumnEvaluatorInfo.getDimension().getDataType() != DataType.ARRAY
+ && dimColumnEvaluatorInfo.getDimension().getDataType() != DataType.STRUCT) {
+ if (null == blockChunkHolder.getDimensionRawDataChunk()[blocksIndex[i]]) {
+ blockChunkHolder.getDimensionRawDataChunk()[blocksIndex[i]] =
+ blockChunkHolder.getDataBlock()
+ .getDimensionChunk(blockChunkHolder.getFileReader(), blocksIndex[i]);
+ }
+ } else {
+ GenericQueryType complexType = complexDimensionInfoMap.get(blocksIndex[i]);
+ complexType.fillRequiredBlockData(blockChunkHolder);
+ }
+ }
+
+ // CHECKSTYLE:OFF Approval No:Approval-V1R2C10_001
--- End diff --

remove this

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #584: [CARBONDATA-726] Handled query and s...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/584#discussion_r102614757

--- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ---
@@ -589,7 +591,7 @@
* INMEMORY_REOCRD_SIZE
*/
public static final String DETAIL_QUERY_BATCH_SIZE = "carbon.detail.batch.size";
--- End diff --

can you add comment for this parameter to describe what it controls

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #584: [CARBONDATA-726] Handled query and s...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/584#discussion_r102623479

--- Diff: core/src/main/java/org/apache/carbondata/core/datastore/impl/btree/BTreeNonLeafNode.java ---
@@ -24,8 +24,8 @@
import org.apache.carbondata.core.datastore.DataRefNode;
import org.apache.carbondata.core.datastore.FileHolder;
import org.apache.carbondata.core.datastore.IndexKey;
-import org.apache.carbondata.core.datastore.chunk.DimensionColumnDataChunk;
-import org.apache.carbondata.core.datastore.chunk.MeasureColumnDataChunk;
+import org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk;
--- End diff --

Why putting `DimensionRawColumnChunk` under `impl` package? It is exposed like interface right?

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #584: [CARBONDATA-726] Handled query and s...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/584#discussion_r102622345

--- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/measure/v1/CompressedMeasureChunkFileBasedReaderV1.java ---
@@ -77,20 +79,38 @@ public CompressedMeasureChunkFileBasedReaderV1(final BlockletInfo blockletInfo,
* @param blockIndex block to be read
* @return measure data chunk
*/
- @Override public MeasureColumnDataChunk readMeasureChunk(final FileHolder fileReader,
- final int blockIndex) throws IOException {
+ @Override public MeasureRawColumnChunk readRawMeasureChunk(FileHolder fileReader, int blockIndex)
+ throws IOException {
+ ByteBuffer buffer =
+ ByteBuffer.allocateDirect(measureColumnChunks.get(blockIndex).getDataPageLength());
--- End diff --

assign `measureColumnChunks.get(blockIndex)` to a local variable and use it in this function

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #584: [CARBONDATA-726] Handled query and s...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/584#discussion_r102624423

--- Diff: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RowLevelRangeLessThanFiterExecuterImpl.java ---
@@ -53,11 +55,19 @@ public RowLevelRangeLessThanFiterExecuterImpl(
BitSet bitSet = new BitSet(1);
byte[][] filterValues = this.filterRangeValues;
int columnIndex = this.dimColEvaluatorInfoList.get(0).getColumnIndex();
+ boolean isScanRequired = isScanRequired(blockMinValue[columnIndex], filterValues);
+ if (isScanRequired) {
+ bitSet.set(0);
+ }
+ return bitSet;
+
--- End diff --

remove empty line

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #584: [CARBONDATA-726] Handled query and s...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/584#discussion_r102619669

--- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/dimension/v1/CompressedDimensionChunkFileBasedReaderV1.java ---
@@ -56,51 +58,74 @@ public CompressedDimensionChunkFileBasedReaderV1(final BlockletInfo blockletInfo
}

/**
- * Below method will be used to read the chunk based on block indexes
+ * Below method will be used to read the raw chunk based on block indexes
*
* @param fileReader file reader to read the blocks from file
* @param blockIndexes blocks to be read
* @return dimension column chunks
*/
- @Override public DimensionColumnDataChunk[] readDimensionChunks(FileHolder fileReader,
+ @Override public DimensionRawColumnChunk[] readRawDimensionChunks(FileHolder fileReader,
int[][] blockIndexes) throws IOException {
- // read the column chunk based on block index and add
- DimensionColumnDataChunk[] dataChunks =
- new DimensionColumnDataChunk[dimensionColumnChunk.size()];
+ DimensionRawColumnChunk[] dataChunks = new DimensionRawColumnChunk[dimensionColumnChunk.size()];
for (int i = 0; i < blockIndexes.length; i++) {
for (int j = blockIndexes[i][0]; j <= blockIndexes[i][1]; j++) {
- dataChunks[j] = readDimensionChunk(fileReader, j);
+ dataChunks[j] = readRawDimensionChunk(fileReader, j);
}
}
return dataChunks;
}

/**
- * Below method will be used to read the chunk based on block index
+ * Below method will be used to read the raw chunk based on block index
*
* @param fileReader file reader to read the blocks from file
* @param blockIndex block to be read
* @return dimension column chunk
*/
- @Override public DimensionColumnDataChunk readDimensionChunk(FileHolder fileReader,
+ @Override public DimensionRawColumnChunk readRawDimensionChunk(FileHolder fileReader,
int blockIndex) throws IOException {
+ ByteBuffer buffer =
+ ByteBuffer.allocateDirect(dimensionColumnChunk.get(blockIndex).getDataPageLength());
+ synchronized (fileReader) {
+ fileReader.readByteBuffer(filePath, buffer,
+ dimensionColumnChunk.get(blockIndex).getDataPageOffset(),
+ dimensionColumnChunk.get(blockIndex).getDataPageLength());
+ }
+ DimensionRawColumnChunk rawColumnChunk = new DimensionRawColumnChunk(blockIndex, buffer, 0,
+ dimensionColumnChunk.get(blockIndex).getDataPageLength(), this);
+ rawColumnChunk.setFileHolder(fileReader);
+ rawColumnChunk.setPagesCount(1);
+ rawColumnChunk.setRowCount(new int[] { numberOfRows });
+ return rawColumnChunk;
+ }
+
+ @Override public DimensionColumnDataChunk convertToDimensionChunk(
+ DimensionRawColumnChunk dimensionRawColumnChunk, int pageNumber) throws IOException {
+ int blockIndex = dimensionRawColumnChunk.getBlockId();
byte[] dataPage = null;
int[] invertedIndexes = null;
int[] invertedIndexesReverse = null;
int[] rlePage = null;
+ FileHolder fileReader = dimensionRawColumnChunk.getFileReader();
+
+ ByteBuffer rawData = dimensionRawColumnChunk.getRawData();
+ rawData.position(dimensionRawColumnChunk.getOffSet());
+ byte[] data = new byte[dimensionRawColumnChunk.getLength()];
+ rawData.get(data);
+ dataPage = COMPRESSOR.unCompressByte(data);

- // first read the data and uncompressed it
- dataPage = COMPRESSOR.unCompressByte(fileReader
- .readByteArray(filePath, dimensionColumnChunk.get(blockIndex).getDataPageOffset(),
- dimensionColumnChunk.get(blockIndex).getDataPageLength()));
// if row id block is present then read the row id chunk and uncompress it
if (CarbonUtil.hasEncoding(dimensionColumnChunk.get(blockIndex).getEncodingList(),
--- End diff --

In this function, multiple place invoke `dimensionColumnChunk.get(blockIndex)`, can you get it once and use it

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #584: [CARBONDATA-726] Handled query and s...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/584#discussion_r102624587

--- Diff: core/src/main/java/org/apache/carbondata/core/scan/result/iterator/AbstractDetailQueryResultIterator.java ---
@@ -191,8 +192,13 @@ public void processNextBatch(CarbonColumnarBatch columnarBatch) {
}

@Override public void close() {
- CarbonUtil.freeMemory(blocksChunkHolder.getDimensionDataChunk(),
- blocksChunkHolder.getMeasureDataChunk());
+ try {
+ fileReader.finish();
+ } catch (IOException e) {
+ LOGGER.error(e);
+ }
+ CarbonUtil.freeMemory(blocksChunkHolder.getDimensionRawDataChunk(),
--- End diff --

put it in finally block

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #584: [CARBONDATA-726] Handled query and s...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/584#discussion_r102622167

--- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/dimension/v2/CompressedDimensionChunkFileBasedReaderV2.java ---
@@ -118,54 +122,109 @@ public CompressedDimensionChunkFileBasedReaderV2(final BlockletInfo blockletInfo
* @param blockIndex block to be read
* @return dimension column chunk
*/
- @Override public DimensionColumnDataChunk readDimensionChunk(FileHolder fileReader,
+ public DimensionRawColumnChunk readRawDimensionChunk(FileHolder fileReader,
int blockIndex) throws IOException {
+ int length = 0;
+ if (dimensionChunksOffset.size() - 1 == blockIndex) {
+ // Incase of last block read only for datachunk and read remaining while converting it.
+ length = dimensionChunksLength.get(blockIndex);
+ } else {
+ long currentDimensionOffset = dimensionChunksOffset.get(blockIndex);
+ length = (int) (dimensionChunksOffset.get(blockIndex + 1) - currentDimensionOffset);
+ }
+ ByteBuffer buffer = ByteBuffer.allocateDirect(length);
+ synchronized (fileReader) {
+ fileReader.readByteBuffer(filePath, buffer, dimensionChunksOffset.get(blockIndex), length);
+ }
+ DimensionRawColumnChunk rawColumnChunk =
+ new DimensionRawColumnChunk(blockIndex, buffer, 0, length, this);
+ rawColumnChunk.setFileHolder(fileReader);
+ rawColumnChunk.setPagesCount(1);
+ rawColumnChunk.setRowCount(new int[]{numberOfRows});
+ return rawColumnChunk;
+ }
+
+ private DimensionRawColumnChunk[] readRawDimensionChunksInGroup(FileHolder fileReader,
+ int startBlockIndex, int endBlockIndex) throws IOException {
+ long currentDimensionOffset = dimensionChunksOffset.get(startBlockIndex);
+ ByteBuffer buffer = ByteBuffer.allocateDirect(
+ (int) (dimensionChunksOffset.get(endBlockIndex + 1) - currentDimensionOffset));
+ synchronized (fileReader) {
+ fileReader.readByteBuffer(filePath, buffer, currentDimensionOffset,
+ (int) (dimensionChunksOffset.get(endBlockIndex + 1) - currentDimensionOffset));
+ }
+ DimensionRawColumnChunk[] dataChunks =
+ new DimensionRawColumnChunk[endBlockIndex - startBlockIndex + 1];
+ int index = 0;
+ int runningLength = 0;
+ for (int i = startBlockIndex; i <= endBlockIndex; i++) {
+ int currentLength = (int) (dimensionChunksOffset.get(i + 1) - dimensionChunksOffset.get(i));
+ dataChunks[index] =
+ new DimensionRawColumnChunk(i, buffer, runningLength, currentLength, this);
+ dataChunks[index].setFileHolder(fileReader);
+ dataChunks[index].setPagesCount(1);
+ dataChunks[index].setRowCount(new int[] { numberOfRows });
+ runningLength += currentLength;
+ index++;
+ }
+ return dataChunks;
+ }
+
+ public DimensionColumnDataChunk convertToDimensionChunk(
+ DimensionRawColumnChunk dimensionRawColumnChunk, int pageNumber) throws IOException {
byte[] dataPage = null;
int[] invertedIndexes = null;
int[] invertedIndexesReverse = null;
int[] rlePage = null;
DataChunk2 dimensionColumnChunk = null;
- byte[] data = null;
- int copySourcePoint = 0;
- byte[] dimensionChunk = null;
+ int copySourcePoint = dimensionRawColumnChunk.getOffSet();
+ int blockIndex = dimensionRawColumnChunk.getBlockId();
+ ByteBuffer rawData = dimensionRawColumnChunk.getRawData();
if (dimensionChunksOffset.size() - 1 == blockIndex) {
- dimensionChunk = fileReader.readByteArray(filePath, dimensionChunksOffset.get(blockIndex),
- dimensionChunksLength.get(blockIndex));
dimensionColumnChunk = CarbonUtil
- .readDataChunk(dimensionChunk, copySourcePoint, dimensionChunksLength.get(blockIndex));
+ .readDataChunk(rawData, copySourcePoint, dimensionRawColumnChunk.getLength());
int totalDimensionDataLength =
dimensionColumnChunk.data_page_length + dimensionColumnChunk.rle_page_length
+ dimensionColumnChunk.rowid_page_length;
- data = fileReader.readByteArray(filePath,
- dimensionChunksOffset.get(blockIndex) + dimensionChunksLength.get(blockIndex),
- totalDimensionDataLength);
+ synchronized (dimensionRawColumnChunk.getFileReader()) {
+ rawData = ByteBuffer.allocateDirect(totalDimensionDataLength);
+ dimensionRawColumnChunk.getFileReader().readByteBuffer(filePath, rawData,
+ dimensionChunksOffset.get(blockIndex) + dimensionChunksLength.get(blockIndex),
+ totalDimensionDataLength);
+ }
} else {
- long currentDimensionOffset = dimensionChunksOffset.get(blockIndex);
- data = fileReader.readByteArray(filePath, currentDimensionOffset,
- (int) (dimensionChunksOffset.get(blockIndex + 1) - currentDimensionOffset));
dimensionColumnChunk =
- CarbonUtil.readDataChunk(data, copySourcePoint, dimensionChunksLength.get(blockIndex));
+ CarbonUtil.readDataChunk(rawData, copySourcePoint, dimensionChunksLength.get(blockIndex));
copySourcePoint += dimensionChunksLength.get(blockIndex);
}

+ byte[] data = new byte[dimensionColumnChunk.data_page_length];
+ rawData.position(copySourcePoint);
+ rawData.get(data);
// first read the data and uncompressed it
dataPage =
- COMPRESSOR.unCompressByte(data, copySourcePoint, dimensionColumnChunk.data_page_length);
+ COMPRESSOR.unCompressByte(data, 0, dimensionColumnChunk.data_page_length);
copySourcePoint += dimensionColumnChunk.data_page_length;
// if row id block is present then read the row id chunk and uncompress it
if (hasEncoding(dimensionColumnChunk.encoders, Encoding.INVERTED_INDEX)) {
+ byte[] dataInv = new byte[dimensionColumnChunk.rowid_page_length];
+ rawData.position(copySourcePoint);
+ rawData.get(dataInv);
invertedIndexes = CarbonUtil
- .getUnCompressColumnIndex(dimensionColumnChunk.rowid_page_length, data, numberComressor,
- copySourcePoint);
+ .getUnCompressColumnIndex(dimensionColumnChunk.rowid_page_length, dataInv,
+ numberComressor, 0);
copySourcePoint += dimensionColumnChunk.rowid_page_length;
// get the reverse index
invertedIndexesReverse = getInvertedReverseIndex(invertedIndexes);
}
// if rle is applied then read the rle block chunk and then uncompress
//then actual data based on rle block
if (hasEncoding(dimensionColumnChunk.encoders, Encoding.RLE)) {
+ byte[] dataRle = new byte[dimensionColumnChunk.rle_page_length];
+ rawData.position(copySourcePoint);
+ rawData.get(dataRle);
rlePage =
- numberComressor.unCompress(data, copySourcePoint, dimensionColumnChunk.rle_page_length);
+ numberComressor.unCompress(dataRle, 0, dimensionColumnChunk.rle_page_length);
// uncompress the data with rle indexes
dataPage = UnBlockIndexer.uncompressData(dataPage, rlePage, eachColumnValueSize[blockIndex]);
rlePage = null;
--- End diff --

How about `dataRle`? Does it require to assign null to it as `rlePage`?

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #584: [CARBONDATA-726] Handled query and s...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/584#discussion_r102640748

--- Diff: core/src/main/java/org/apache/carbondata/core/cache/dictionary/Dictionary.java ---
@@ -59,6 +59,17 @@
String getDictionaryValueForKey(int surrogateKey);

/**
+ * This method will find and return the dictionary value for a given surrogate key in bytes.
+ * Applicable scenarios:
+ * 1. Query final result preparation : While convert the final result which will
+ * be surrogate key back to original dictionary values this method will be used
--- End diff --

It is added because `getDictionaryValueForKey` gets the bytes from dictionary and convert to String, but in our decoder we again convert to bytes. So I added new method to return bytes directly with out converting to String to avoid multiple conversions.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #584: [CARBONDATA-726] Handled query and s...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/584#discussion_r102641037

--- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ---
@@ -589,7 +591,7 @@
* INMEMORY_REOCRD_SIZE
*/
public static final String DETAIL_QUERY_BATCH_SIZE = "carbon.detail.batch.size";
--- End diff --

ok

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #584: [CARBONDATA-726] Handled query and s...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/584#discussion_r102641238

--- Diff: core/src/main/java/org/apache/carbondata/core/datastore/FileHolder.java ---
@@ -18,8 +18,12 @@
package org.apache.carbondata.core.datastore;

import java.io.IOException;
+import java.nio.ByteBuffer;

public interface FileHolder {
+
+ void readByteBuffer(String filePath, ByteBuffer byteBuffer, long offset, int length)
--- End diff --

ok

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #584: [CARBONDATA-726] Handled query and s...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/584#discussion_r102641961

--- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/AbstractRawColumnChunk.java ---
@@ -0,0 +1,124 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.core.datastore.chunk;
+
+import java.nio.ByteBuffer;
+
+
+/**
+ * It contains group of uncompressed blocklets on one column.
--- End diff --

The main benefit of V3 is to avoid multiple IO requests, in V3 format each column chunk has group of column chunks of size 32000 called as pages. This group( group size can be configured ) is stored continuously so for one IO request it gets complete column chunk with all pages.

Yes this description will be added in V3 reader/ writer. That will be another PR added by Vishal.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #584: [CARBONDATA-726] Handled query and s...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/584#discussion_r102642027

--- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/AbstractRawColumnChunk.java ---
@@ -0,0 +1,124 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.core.datastore.chunk;
+
+import java.nio.ByteBuffer;
+
+
+/**
+ * It contains group of uncompressed blocklets on one column.
--- End diff --

And for V1 and V2 formats always has one page for backward compatibility.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #584: [CARBONDATA-726] Handled query and s...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/584#discussion_r102642118

--- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/DimensionColumnChunkReader.java ---
@@ -35,7 +36,7 @@
* @param blockIndexes blocks to be read
* @return dimension column chunks
*/
- DimensionColumnDataChunk[] readDimensionChunks(FileHolder fileReader, int[][] blockIndexes)
+ DimensionRawColumnChunk[] readRawDimensionChunks(FileHolder fileReader, int[][] blockIndexes)
--- End diff --

Yes, it is bloclklet index not blockindex. I will update the same

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #584: [CARBONDATA-726] Handled query and s...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/584#discussion_r102645351

--- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/dimension/v1/CompressedDimensionChunkFileBasedReaderV1.java ---
@@ -56,51 +58,74 @@ public CompressedDimensionChunkFileBasedReaderV1(final BlockletInfo blockletInfo
}

/**
- * Below method will be used to read the chunk based on block indexes
+ * Below method will be used to read the raw chunk based on block indexes
*
* @param fileReader file reader to read the blocks from file
* @param blockIndexes blocks to be read
* @return dimension column chunks
*/
- @Override public DimensionColumnDataChunk[] readDimensionChunks(FileHolder fileReader,
+ @Override public DimensionRawColumnChunk[] readRawDimensionChunks(FileHolder fileReader,
int[][] blockIndexes) throws IOException {
- // read the column chunk based on block index and add
- DimensionColumnDataChunk[] dataChunks =
- new DimensionColumnDataChunk[dimensionColumnChunk.size()];
+ DimensionRawColumnChunk[] dataChunks = new DimensionRawColumnChunk[dimensionColumnChunk.size()];
for (int i = 0; i < blockIndexes.length; i++) {
for (int j = blockIndexes[i][0]; j <= blockIndexes[i][1]; j++) {
- dataChunks[j] = readDimensionChunk(fileReader, j);
+ dataChunks[j] = readRawDimensionChunk(fileReader, j);
}
}
return dataChunks;
}

/**
- * Below method will be used to read the chunk based on block index
+ * Below method will be used to read the raw chunk based on block index
*
* @param fileReader file reader to read the blocks from file
* @param blockIndex block to be read
* @return dimension column chunk
*/
- @Override public DimensionColumnDataChunk readDimensionChunk(FileHolder fileReader,
+ @Override public DimensionRawColumnChunk readRawDimensionChunk(FileHolder fileReader,
int blockIndex) throws IOException {
+ ByteBuffer buffer =
+ ByteBuffer.allocateDirect(dimensionColumnChunk.get(blockIndex).getDataPageLength());
+ synchronized (fileReader) {
+ fileReader.readByteBuffer(filePath, buffer,
+ dimensionColumnChunk.get(blockIndex).getDataPageOffset(),
+ dimensionColumnChunk.get(blockIndex).getDataPageLength());
+ }
+ DimensionRawColumnChunk rawColumnChunk = new DimensionRawColumnChunk(blockIndex, buffer, 0,
+ dimensionColumnChunk.get(blockIndex).getDataPageLength(), this);
+ rawColumnChunk.setFileHolder(fileReader);
+ rawColumnChunk.setPagesCount(1);
+ rawColumnChunk.setRowCount(new int[] { numberOfRows });
+ return rawColumnChunk;
+ }
+
+ @Override public DimensionColumnDataChunk convertToDimensionChunk(
+ DimensionRawColumnChunk dimensionRawColumnChunk, int pageNumber) throws IOException {
+ int blockIndex = dimensionRawColumnChunk.getBlockId();
byte[] dataPage = null;
int[] invertedIndexes = null;
int[] invertedIndexesReverse = null;
int[] rlePage = null;
+ FileHolder fileReader = dimensionRawColumnChunk.getFileReader();
+
+ ByteBuffer rawData = dimensionRawColumnChunk.getRawData();
+ rawData.position(dimensionRawColumnChunk.getOffSet());
+ byte[] data = new byte[dimensionRawColumnChunk.getLength()];
+ rawData.get(data);
+ dataPage = COMPRESSOR.unCompressByte(data);

- // first read the data and uncompressed it
- dataPage = COMPRESSOR.unCompressByte(fileReader
- .readByteArray(filePath, dimensionColumnChunk.get(blockIndex).getDataPageOffset(),
- dimensionColumnChunk.get(blockIndex).getDataPageLength()));
// if row id block is present then read the row id chunk and uncompress it
if (CarbonUtil.hasEncoding(dimensionColumnChunk.get(blockIndex).getEncodingList(),
--- End diff --

ok

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #584: [CARBONDATA-726] Handled query and s...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/584#discussion_r102646213

--- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/dimension/v1/CompressedDimensionChunkFileBasedReaderV1.java ---
@@ -56,51 +58,74 @@ public CompressedDimensionChunkFileBasedReaderV1(final BlockletInfo blockletInfo
}

/**
- * Below method will be used to read the chunk based on block indexes
+ * Below method will be used to read the raw chunk based on block indexes
*
* @param fileReader file reader to read the blocks from file
* @param blockIndexes blocks to be read
* @return dimension column chunks
*/
- @Override public DimensionColumnDataChunk[] readDimensionChunks(FileHolder fileReader,
+ @Override public DimensionRawColumnChunk[] readRawDimensionChunks(FileHolder fileReader,
int[][] blockIndexes) throws IOException {
- // read the column chunk based on block index and add
- DimensionColumnDataChunk[] dataChunks =
- new DimensionColumnDataChunk[dimensionColumnChunk.size()];
+ DimensionRawColumnChunk[] dataChunks = new DimensionRawColumnChunk[dimensionColumnChunk.size()];
for (int i = 0; i < blockIndexes.length; i++) {
for (int j = blockIndexes[i][0]; j <= blockIndexes[i][1]; j++) {
- dataChunks[j] = readDimensionChunk(fileReader, j);
+ dataChunks[j] = readRawDimensionChunk(fileReader, j);
}
}
return dataChunks;
}

/**
- * Below method will be used to read the chunk based on block index
+ * Below method will be used to read the raw chunk based on block index
*
* @param fileReader file reader to read the blocks from file
* @param blockIndex block to be read
* @return dimension column chunk
*/
- @Override public DimensionColumnDataChunk readDimensionChunk(FileHolder fileReader,
+ @Override public DimensionRawColumnChunk readRawDimensionChunk(FileHolder fileReader,
int blockIndex) throws IOException {
+ ByteBuffer buffer =
+ ByteBuffer.allocateDirect(dimensionColumnChunk.get(blockIndex).getDataPageLength());
+ synchronized (fileReader) {
+ fileReader.readByteBuffer(filePath, buffer,
+ dimensionColumnChunk.get(blockIndex).getDataPageOffset(),
+ dimensionColumnChunk.get(blockIndex).getDataPageLength());
+ }
+ DimensionRawColumnChunk rawColumnChunk = new DimensionRawColumnChunk(blockIndex, buffer, 0,
+ dimensionColumnChunk.get(blockIndex).getDataPageLength(), this);
+ rawColumnChunk.setFileHolder(fileReader);
+ rawColumnChunk.setPagesCount(1);
+ rawColumnChunk.setRowCount(new int[] { numberOfRows });
+ return rawColumnChunk;
+ }
+
+ @Override public DimensionColumnDataChunk convertToDimensionChunk(
+ DimensionRawColumnChunk dimensionRawColumnChunk, int pageNumber) throws IOException {
+ int blockIndex = dimensionRawColumnChunk.getBlockId();
byte[] dataPage = null;
int[] invertedIndexes = null;
int[] invertedIndexesReverse = null;
int[] rlePage = null;
+ FileHolder fileReader = dimensionRawColumnChunk.getFileReader();
+
+ ByteBuffer rawData = dimensionRawColumnChunk.getRawData();
+ rawData.position(dimensionRawColumnChunk.getOffSet());
+ byte[] data = new byte[dimensionRawColumnChunk.getLength()];
+ rawData.get(data);
+ dataPage = COMPRESSOR.unCompressByte(data);

- // first read the data and uncompressed it
- dataPage = COMPRESSOR.unCompressByte(fileReader
- .readByteArray(filePath, dimensionColumnChunk.get(blockIndex).getDataPageOffset(),
- dimensionColumnChunk.get(blockIndex).getDataPageLength()));
// if row id block is present then read the row id chunk and uncompress it
if (CarbonUtil.hasEncoding(dimensionColumnChunk.get(blockIndex).getEncodingList(),
Encoding.INVERTED_INDEX)) {
+ byte[] columnIndexData;
+ synchronized (fileReader) {
--- End diff --

it is required because IO is handled in different thread and blocklet processing handled in different thread. But in some cases like after filter processing the required projection blockets can be read in processing thread also. So that is why this synchronization is required.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #584: [CARBONDATA-726] Handled query and s...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/584#discussion_r102646332

--- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/dimension/v2/CompressedDimensionChunkFileBasedReaderV2.java ---
@@ -118,54 +122,109 @@ public CompressedDimensionChunkFileBasedReaderV2(final BlockletInfo blockletInfo
* @param blockIndex block to be read
* @return dimension column chunk
*/
- @Override public DimensionColumnDataChunk readDimensionChunk(FileHolder fileReader,
+ public DimensionRawColumnChunk readRawDimensionChunk(FileHolder fileReader,
int blockIndex) throws IOException {
+ int length = 0;
+ if (dimensionChunksOffset.size() - 1 == blockIndex) {
+ // Incase of last block read only for datachunk and read remaining while converting it.
+ length = dimensionChunksLength.get(blockIndex);
+ } else {
+ long currentDimensionOffset = dimensionChunksOffset.get(blockIndex);
+ length = (int) (dimensionChunksOffset.get(blockIndex + 1) - currentDimensionOffset);
+ }
+ ByteBuffer buffer = ByteBuffer.allocateDirect(length);
+ synchronized (fileReader) {
+ fileReader.readByteBuffer(filePath, buffer, dimensionChunksOffset.get(blockIndex), length);
+ }
+ DimensionRawColumnChunk rawColumnChunk =
+ new DimensionRawColumnChunk(blockIndex, buffer, 0, length, this);
+ rawColumnChunk.setFileHolder(fileReader);
+ rawColumnChunk.setPagesCount(1);
+ rawColumnChunk.setRowCount(new int[]{numberOfRows});
+ return rawColumnChunk;
+ }
+
+ private DimensionRawColumnChunk[] readRawDimensionChunksInGroup(FileHolder fileReader,
+ int startBlockIndex, int endBlockIndex) throws IOException {
+ long currentDimensionOffset = dimensionChunksOffset.get(startBlockIndex);
+ ByteBuffer buffer = ByteBuffer.allocateDirect(
+ (int) (dimensionChunksOffset.get(endBlockIndex + 1) - currentDimensionOffset));
+ synchronized (fileReader) {
+ fileReader.readByteBuffer(filePath, buffer, currentDimensionOffset,
+ (int) (dimensionChunksOffset.get(endBlockIndex + 1) - currentDimensionOffset));
+ }
+ DimensionRawColumnChunk[] dataChunks =
+ new DimensionRawColumnChunk[endBlockIndex - startBlockIndex + 1];
+ int index = 0;
+ int runningLength = 0;
+ for (int i = startBlockIndex; i <= endBlockIndex; i++) {
+ int currentLength = (int) (dimensionChunksOffset.get(i + 1) - dimensionChunksOffset.get(i));
+ dataChunks[index] =
+ new DimensionRawColumnChunk(i, buffer, runningLength, currentLength, this);
+ dataChunks[index].setFileHolder(fileReader);
+ dataChunks[index].setPagesCount(1);
+ dataChunks[index].setRowCount(new int[] { numberOfRows });
+ runningLength += currentLength;
+ index++;
+ }
+ return dataChunks;
+ }
+
+ public DimensionColumnDataChunk convertToDimensionChunk(
+ DimensionRawColumnChunk dimensionRawColumnChunk, int pageNumber) throws IOException {
byte[] dataPage = null;
int[] invertedIndexes = null;
int[] invertedIndexesReverse = null;
int[] rlePage = null;
DataChunk2 dimensionColumnChunk = null;
- byte[] data = null;
- int copySourcePoint = 0;
- byte[] dimensionChunk = null;
+ int copySourcePoint = dimensionRawColumnChunk.getOffSet();
+ int blockIndex = dimensionRawColumnChunk.getBlockId();
+ ByteBuffer rawData = dimensionRawColumnChunk.getRawData();
if (dimensionChunksOffset.size() - 1 == blockIndex) {
- dimensionChunk = fileReader.readByteArray(filePath, dimensionChunksOffset.get(blockIndex),
- dimensionChunksLength.get(blockIndex));
dimensionColumnChunk = CarbonUtil
- .readDataChunk(dimensionChunk, copySourcePoint, dimensionChunksLength.get(blockIndex));
+ .readDataChunk(rawData, copySourcePoint, dimensionRawColumnChunk.getLength());
int totalDimensionDataLength =
dimensionColumnChunk.data_page_length + dimensionColumnChunk.rle_page_length
+ dimensionColumnChunk.rowid_page_length;
- data = fileReader.readByteArray(filePath,
- dimensionChunksOffset.get(blockIndex) + dimensionChunksLength.get(blockIndex),
- totalDimensionDataLength);
+ synchronized (dimensionRawColumnChunk.getFileReader()) {
+ rawData = ByteBuffer.allocateDirect(totalDimensionDataLength);
+ dimensionRawColumnChunk.getFileReader().readByteBuffer(filePath, rawData,
+ dimensionChunksOffset.get(blockIndex) + dimensionChunksLength.get(blockIndex),
+ totalDimensionDataLength);
+ }
} else {
- long currentDimensionOffset = dimensionChunksOffset.get(blockIndex);
- data = fileReader.readByteArray(filePath, currentDimensionOffset,
- (int) (dimensionChunksOffset.get(blockIndex + 1) - currentDimensionOffset));
dimensionColumnChunk =
- CarbonUtil.readDataChunk(data, copySourcePoint, dimensionChunksLength.get(blockIndex));
+ CarbonUtil.readDataChunk(rawData, copySourcePoint, dimensionChunksLength.get(blockIndex));
copySourcePoint += dimensionChunksLength.get(blockIndex);
}

+ byte[] data = new byte[dimensionColumnChunk.data_page_length];
+ rawData.position(copySourcePoint);
+ rawData.get(data);
// first read the data and uncompressed it
dataPage =
- COMPRESSOR.unCompressByte(data, copySourcePoint, dimensionColumnChunk.data_page_length);
+ COMPRESSOR.unCompressByte(data, 0, dimensionColumnChunk.data_page_length);
copySourcePoint += dimensionColumnChunk.data_page_length;
// if row id block is present then read the row id chunk and uncompress it
if (hasEncoding(dimensionColumnChunk.encoders, Encoding.INVERTED_INDEX)) {
+ byte[] dataInv = new byte[dimensionColumnChunk.rowid_page_length];
+ rawData.position(copySourcePoint);
+ rawData.get(dataInv);
invertedIndexes = CarbonUtil
- .getUnCompressColumnIndex(dimensionColumnChunk.rowid_page_length, data, numberComressor,
- copySourcePoint);
+ .getUnCompressColumnIndex(dimensionColumnChunk.rowid_page_length, dataInv,
+ numberComressor, 0);
copySourcePoint += dimensionColumnChunk.rowid_page_length;
// get the reverse index
invertedIndexesReverse = getInvertedReverseIndex(invertedIndexes);
}
// if rle is applied then read the rle block chunk and then uncompress
//then actual data based on rle block
if (hasEncoding(dimensionColumnChunk.encoders, Encoding.RLE)) {
+ byte[] dataRle = new byte[dimensionColumnChunk.rle_page_length];
+ rawData.position(copySourcePoint);
+ rawData.get(dataRle);
rlePage =
- numberComressor.unCompress(data, copySourcePoint, dimensionColumnChunk.rle_page_length);
+ numberComressor.unCompress(dataRle, 0, dimensionColumnChunk.rle_page_length);
// uncompress the data with rle indexes
dataPage = UnBlockIndexer.uncompressData(dataPage, rlePage, eachColumnValueSize[blockIndex]);
rlePage = null;
--- End diff --

It is from old code, it is not rquired and removed now

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #584: [CARBONDATA-726] Handled query and s...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/584#discussion_r102646347

--- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/measure/v1/CompressedMeasureChunkFileBasedReaderV1.java ---
@@ -77,20 +79,38 @@ public CompressedMeasureChunkFileBasedReaderV1(final BlockletInfo blockletInfo,
* @param blockIndex block to be read
* @return measure data chunk
*/
- @Override public MeasureColumnDataChunk readMeasureChunk(final FileHolder fileReader,
- final int blockIndex) throws IOException {
+ @Override public MeasureRawColumnChunk readRawMeasureChunk(FileHolder fileReader, int blockIndex)
+ throws IOException {
+ ByteBuffer buffer =
+ ByteBuffer.allocateDirect(measureColumnChunks.get(blockIndex).getDataPageLength());
--- End diff --

ok

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #584: [CARBONDATA-726] Handled query and s...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/584#discussion_r102646478

--- Diff: core/src/main/java/org/apache/carbondata/core/datastore/impl/btree/BTreeNonLeafNode.java ---
@@ -24,8 +24,8 @@
import org.apache.carbondata.core.datastore.DataRefNode;
import org.apache.carbondata.core.datastore.FileHolder;
import org.apache.carbondata.core.datastore.IndexKey;
-import org.apache.carbondata.core.datastore.chunk.DimensionColumnDataChunk;
-import org.apache.carbondata.core.datastore.chunk.MeasureColumnDataChunk;
+import org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk;
--- End diff --

`DimensionRawColumnChunk` is not interface, it is implemented class

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

123