Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] incubator-carbondata pull request #584: [WIP] Added code for new V3 format t...

Classic

List

47 messages Options

Options

123

[GitHub] incubator-carbondata pull request #584: [WIP] Added code for new V3 format t...

GitHub user ravipesala opened a pull request:

https://github.com/apache/incubator-carbondata/pull/584

[WIP] Added code for new V3 format to optimize scan

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ravipesala/incubator-carbondata blocklet-group

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/584.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #584

----
commit 7fd8775c7d1244c475e980711a179b90a9abcbca
Author: ravipesala <[hidden email]>
Date: 2017-02-03T10:41:06Z

WIP Added code for new V3 format to optimize scan

----

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #584: [WIP] Added code for new V3 format to optim...

Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/584

Build Failed with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/811/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #584: [WIP] Added code for new V3 format to optim...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/584

Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/816/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #584: [WIP] Added code for new V3 format to optim...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/584

Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/817/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #584: [WIP] Added code for new V3 format to optim...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/584

Build Failed with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/818/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #584: [WIP] Added code for new V3 format to optim...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/584

Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/819/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #584: [WIP] Added code for new V3 format to optim...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/584

Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/820/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #584: [WIP] Added code for new V3 format to optim...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/584

Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/821/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #584: [WIP] Added code for new V3 format to optim...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/584

Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/827/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #584: [WIP] Added code for new V3 format to optim...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/584

Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/829/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #584: [WIP] Added code for new V3 format to optim...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/584

Build Failed with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/932/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #584: [WIP] Added code for new V3 format to optim...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/584

Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/933/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #584: [WIP] Added code for new V3 format to optim...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/584

Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/934/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #584: [CARBONDATA-726] Handled query and s...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/584#discussion_r102624180

--- Diff: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RestructureFilterExecuterImpl.java ---
@@ -35,18 +37,18 @@ public RestructureFilterExecuterImpl(DimColumnResolvedFilterInfo dimColumnResolv
dimColumnResolvedFilterInfo.getDimension(), dimColumnExecuterInfo);
}

- @Override public BitSet applyFilter(BlocksChunkHolder blocksChunkHolder) {
- BitSet bitSet = new BitSet(blocksChunkHolder.getDataBlock().nodeSize());
- byte[][] filterValues = dimColumnExecuterInfo.getFilterKeys();
- if (null != filterValues && filterValues.length > 0) {
- bitSet.set(0, blocksChunkHolder.getDataBlock().nodeSize());
- }
- return bitSet;
+ @Override public BitSetGroup applyFilter(BlocksChunkHolder blocksChunkHolder) {
+ // TODO find out what is this for?
+ return new BitSetGroup(0);
}

@Override public BitSet isScanRequired(byte[][] blockMaxValue, byte[][] blockMinValue) {
BitSet bitSet = new BitSet(1);
bitSet.set(0);
return bitSet;
}
+
+ @Override public void readBlocks(BlocksChunkHolder blockChunkHolder) throws IOException {
+ // TODO
--- End diff --

Throw UnSupportedOperationException instead of this?

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #584: [CARBONDATA-726] Handled query and s...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/584#discussion_r102624845

--- Diff: core/src/main/java/org/apache/carbondata/core/util/BitSetGroup.java ---
@@ -0,0 +1,81 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.core.util;
+
+import java.util.BitSet;
+
+/**
+ * Maintains the group of bitsets.
--- End diff --

Can you add something like "It is used for .... while decoding data chunk"

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #584: [CARBONDATA-726] Handled query and s...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/584#discussion_r102621173

--- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/dimension/v1/CompressedDimensionChunkFileBasedReaderV1.java ---
@@ -56,51 +58,74 @@ public CompressedDimensionChunkFileBasedReaderV1(final BlockletInfo blockletInfo
}

/**
- * Below method will be used to read the chunk based on block indexes
+ * Below method will be used to read the raw chunk based on block indexes
*
* @param fileReader file reader to read the blocks from file
* @param blockIndexes blocks to be read
* @return dimension column chunks
*/
- @Override public DimensionColumnDataChunk[] readDimensionChunks(FileHolder fileReader,
+ @Override public DimensionRawColumnChunk[] readRawDimensionChunks(FileHolder fileReader,
int[][] blockIndexes) throws IOException {
- // read the column chunk based on block index and add
- DimensionColumnDataChunk[] dataChunks =
- new DimensionColumnDataChunk[dimensionColumnChunk.size()];
+ DimensionRawColumnChunk[] dataChunks = new DimensionRawColumnChunk[dimensionColumnChunk.size()];
for (int i = 0; i < blockIndexes.length; i++) {
for (int j = blockIndexes[i][0]; j <= blockIndexes[i][1]; j++) {
- dataChunks[j] = readDimensionChunk(fileReader, j);
+ dataChunks[j] = readRawDimensionChunk(fileReader, j);
}
}
return dataChunks;
}

/**
- * Below method will be used to read the chunk based on block index
+ * Below method will be used to read the raw chunk based on block index
*
* @param fileReader file reader to read the blocks from file
* @param blockIndex block to be read
* @return dimension column chunk
*/
- @Override public DimensionColumnDataChunk readDimensionChunk(FileHolder fileReader,
+ @Override public DimensionRawColumnChunk readRawDimensionChunk(FileHolder fileReader,
int blockIndex) throws IOException {
+ ByteBuffer buffer =
+ ByteBuffer.allocateDirect(dimensionColumnChunk.get(blockIndex).getDataPageLength());
+ synchronized (fileReader) {
+ fileReader.readByteBuffer(filePath, buffer,
+ dimensionColumnChunk.get(blockIndex).getDataPageOffset(),
+ dimensionColumnChunk.get(blockIndex).getDataPageLength());
+ }
+ DimensionRawColumnChunk rawColumnChunk = new DimensionRawColumnChunk(blockIndex, buffer, 0,
+ dimensionColumnChunk.get(blockIndex).getDataPageLength(), this);
+ rawColumnChunk.setFileHolder(fileReader);
+ rawColumnChunk.setPagesCount(1);
+ rawColumnChunk.setRowCount(new int[] { numberOfRows });
+ return rawColumnChunk;
+ }
+
+ @Override public DimensionColumnDataChunk convertToDimensionChunk(
+ DimensionRawColumnChunk dimensionRawColumnChunk, int pageNumber) throws IOException {
+ int blockIndex = dimensionRawColumnChunk.getBlockId();
byte[] dataPage = null;
int[] invertedIndexes = null;
int[] invertedIndexesReverse = null;
int[] rlePage = null;
+ FileHolder fileReader = dimensionRawColumnChunk.getFileReader();
+
+ ByteBuffer rawData = dimensionRawColumnChunk.getRawData();
+ rawData.position(dimensionRawColumnChunk.getOffSet());
+ byte[] data = new byte[dimensionRawColumnChunk.getLength()];
+ rawData.get(data);
+ dataPage = COMPRESSOR.unCompressByte(data);

- // first read the data and uncompressed it
- dataPage = COMPRESSOR.unCompressByte(fileReader
- .readByteArray(filePath, dimensionColumnChunk.get(blockIndex).getDataPageOffset(),
- dimensionColumnChunk.get(blockIndex).getDataPageLength()));
// if row id block is present then read the row id chunk and uncompress it
if (CarbonUtil.hasEncoding(dimensionColumnChunk.get(blockIndex).getEncodingList(),
Encoding.INVERTED_INDEX)) {
+ byte[] columnIndexData;
+ synchronized (fileReader) {
--- End diff --

why this is needed?

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #584: [CARBONDATA-726] Handled query and s...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/584#discussion_r102624364

--- Diff: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RowLevelRangeLessThanEqualFilterExecuterImpl.java ---
@@ -53,11 +55,19 @@ public RowLevelRangeLessThanEqualFilterExecuterImpl(
BitSet bitSet = new BitSet(1);
byte[][] filterValues = this.filterRangeValues;
int columnIndex = this.dimColEvaluatorInfoList.get(0).getColumnIndex();
+ boolean isScanRequired = isScanRequired(blockMinValue[columnIndex], filterValues);
+ if (isScanRequired) {
+ bitSet.set(0);
+ }
+ return bitSet;
+
--- End diff --

remove empty line

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #584: [CARBONDATA-726] Handled query and s...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/584#discussion_r102614395

--- Diff: core/src/main/java/org/apache/carbondata/core/cache/dictionary/Dictionary.java ---
@@ -59,6 +59,17 @@
String getDictionaryValueForKey(int surrogateKey);

/**
+ * This method will find and return the dictionary value for a given surrogate key in bytes.
+ * Applicable scenarios:
+ * 1. Query final result preparation : While convert the final result which will
+ * be surrogate key back to original dictionary values this method will be used
--- End diff --

Can you add description why this function will be called instead of `getDictionaryValueForKey` to depict why this is needed?

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #584: [CARBONDATA-726] Handled query and s...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/584#discussion_r102615117

--- Diff: core/src/main/java/org/apache/carbondata/core/datastore/FileHolder.java ---
@@ -18,8 +18,12 @@
package org.apache.carbondata.core.datastore;

import java.io.IOException;
+import java.nio.ByteBuffer;

public interface FileHolder {
+
+ void readByteBuffer(String filePath, ByteBuffer byteBuffer, long offset, int length)
--- End diff --

add function description

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #584: [CARBONDATA-726] Handled query and s...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/584#discussion_r102615724

--- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/AbstractRawColumnChunk.java ---
@@ -0,0 +1,124 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.core.datastore.chunk;
+
+import java.nio.ByteBuffer;
+
+
+/**
+ * It contains group of uncompressed blocklets on one column.
--- End diff --

So it means reader will read one column in multiple blocklets? I have two doubts, please clarify:
1. This involves multiple IO request since the data is not continuous
2. There will be backward and forward disk seek if multiple columns need to be read in reader.
If these are not the case, can you add some description somewhere (maybe in the reader) to describe the reader behaviour

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

123