Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #2725: [WIP] Added code to support storing min max f...

Classic

List

50 messages Options

Options

123

[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2725#discussion_r218138883

--- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonMetadataUtil.java ---
@@ -250,6 +275,42 @@ public static BlockletIndex getBlockletIndex(EncodedBlocklet encodedBlocklet,
}

/**
+ * This method will combine the writeMinMax flag from all the pages. If any page for a given
+ * dimension has writeMinMax flag set to false then min max for that dimension will nto be
+ * written in any of the page and metadata
+ *
+ * @param blockletMinMaxIndex
+ * @param encodedBlocklet
+ */
+ private static List<Boolean> mergeWriteMinMaxFlagForAllPages(
+ BlockletMinMaxIndex blockletMinMaxIndex, EncodedBlocklet encodedBlocklet) {
+ Boolean[] mergedWriteMinMaxFlag =
+ new Boolean[encodedBlocklet.getNumberOfDimension() + encodedBlocklet.getNumberOfMeasure()];
+ // set writeMinMax flag to true for all the columns by default and then update if stats object
+ // has the this flag set to false
+ Arrays.fill(mergedWriteMinMaxFlag, true);
+ for (int i = 0; i < encodedBlocklet.getNumberOfDimension(); i++) {
+ for (int pageIndex = 0; pageIndex < encodedBlocklet.getNumberOfPages(); pageIndex++) {
+ EncodedColumnPage encodedColumnPage =
+ encodedBlocklet.getEncodedDimensionColumnPages().get(i).getEncodedColumnPageList()
+ .get(pageIndex);
+ SimpleStatsResult stats = encodedColumnPage.getStats();
+ if (!stats.writeMinMax()) {
+ mergedWriteMinMaxFlag[i] = stats.writeMinMax();
+ String columnName = encodedColumnPage.getActualPage().getColumnSpec().getFieldName();
+ LOGGER.info("Min Max writing ignored for column " + columnName + " from page 0 to "
--- End diff --

ok

---

[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

In reply to this post by qiuchenjian-2

Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2725#discussion_r218139378

--- Diff: core/src/main/java/org/apache/carbondata/core/scan/scanner/impl/BlockletFilterScanner.java ---
@@ -122,11 +122,11 @@ public boolean isScanRequired(DataRefNode dataBlock) {
bitSet = ((ImplicitColumnFilterExecutor) filterExecuter)
.isFilterValuesPresentInBlockOrBlocklet(
dataBlock.getColumnsMaxValue(),
- dataBlock.getColumnsMinValue(), blockletId);
+ dataBlock.getColumnsMinValue(), blockletId, dataBlock.isMinMaxSet());
} else {
bitSet = this.filterExecuter
.isScanRequired(dataBlock.getColumnsMaxValue(),
- dataBlock.getColumnsMinValue());
+ dataBlock.getColumnsMinValue(), dataBlock.isMinMaxSet());
--- End diff --

For blocklet min max comparison the footer is getting read in executor and the same min max flag information is getting used. The information serialized from driver is not getting used for blocklet min max comparison in executor

---

[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

In reply to this post by qiuchenjian-2

Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2725#discussion_r218139409

--- Diff: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RowLevelRangeGrtThanFiterExecuterImpl.java ---
@@ -194,9 +198,12 @@ public BitSetGroup applyFilter(RawBlockletColumnChunks rawBlockletColumnChunks,
BitSetGroup bitSetGroup = new BitSetGroup(rawColumnChunk.getPagesCount());
FilterExecuter filterExecuter = null;
boolean isExclude = false;
+ boolean isMinMaxSetForFilterDimension =
+ rawBlockletColumnChunks.getDataBlock().isMinMaxSet()[chunkIndex];
for (int i = 0; i < rawColumnChunk.getPagesCount(); i++) {
if (rawColumnChunk.getMaxValues() != null) {
- if (isScanRequired(rawColumnChunk.getMaxValues()[i], this.filterRangeValues)) {
+ if (!isMinMaxSetForFilterDimension || isScanRequired(rawColumnChunk.getMaxValues()[i],
--- End diff --

ok

---

[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

In reply to this post by qiuchenjian-2

Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2725#discussion_r218139444

--- Diff: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/IncludeFilterExecuterImpl.java ---
@@ -108,10 +108,13 @@ public BitSetGroup applyFilter(RawBlockletColumnChunks rawBlockletColumnChunks,
BitSetGroup bitSetGroup = new BitSetGroup(dimensionRawColumnChunk.getPagesCount());
filterValues = dimColumnExecuterInfo.getFilterKeys();
boolean isDecoded = false;
+ boolean isMinMaxSetForFilterDimension =
+ rawBlockletColumnChunks.getDataBlock().isMinMaxSet()[chunkIndex];
for (int i = 0; i < dimensionRawColumnChunk.getPagesCount(); i++) {
if (dimensionRawColumnChunk.getMaxValues() != null) {
- if (isScanRequired(dimensionRawColumnChunk.getMaxValues()[i],
- dimensionRawColumnChunk.getMinValues()[i], dimColumnExecuterInfo.getFilterKeys())) {
+ if (!isMinMaxSetForFilterDimension || isScanRequired(
--- End diff --

ok

---

[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

In reply to this post by qiuchenjian-2

Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2725#discussion_r218139467

--- Diff: core/src/main/java/org/apache/carbondata/core/metadata/blocklet/index/BlockletIndex.java ---
@@ -75,4 +80,12 @@ public void setMinMaxIndex(BlockletMinMaxIndex minMaxIndex) {
this.minMaxIndex = minMaxIndex;
}

+ @Override public void write(DataOutput out) throws IOException {
--- End diff --

ok

---

[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

In reply to this post by qiuchenjian-2

Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2725#discussion_r218139494

--- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataRefNode.java ---
@@ -133,6 +134,21 @@
return null;
}

+ @Override
+ public boolean[] isMinMaxSet() {
+ BlockletIndex blockletIndex =
+ blockInfos.get(index).getDetailInfo().getBlockletInfo().getBlockletIndex();
--- End diff --

ok

---

[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

In reply to this post by qiuchenjian-2

Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2725#discussion_r218139529

--- Diff: core/src/main/java/org/apache/carbondata/core/datastore/DataRefNode.java ---
@@ -125,4 +125,12 @@ DimensionRawColumnChunk readDimensionChunk(FileReader fileReader, int columnInde
* @return
*/
BitSetGroup getIndexedData();
+
+ /**
+ * Return the array which contains the flag for each column whether the min max for that column
+ * is written in metadata or not
+ *
+ * @return
+ */
+ boolean[] isMinMaxSet();
--- End diff --

ok

---

[GitHub] carbondata issue #2725: [CARBONDATA-2942] Add read and write support for wri...

In reply to this post by qiuchenjian-2

Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2725

@ravipesala ..Fixed review comments. Kindly review

---

[GitHub] carbondata issue #2725: [CARBONDATA-2942] Add read and write support for wri...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2725

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/317/

---

[GitHub] carbondata issue #2725: [CARBONDATA-2942] Add read and write support for wri...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2725

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/320/

---

[GitHub] carbondata issue #2725: [CARBONDATA-2942] Add read and write support for wri...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2725

Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8563/

---

[GitHub] carbondata issue #2725: [CARBONDATA-2942] Add read and write support for wri...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2725

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/493/

---

[GitHub] carbondata issue #2725: [CARBONDATA-2942] Add read and write support for wri...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2725

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/496/

---

[GitHub] carbondata issue #2725: [CARBONDATA-2942] Add read and write support for wri...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2725

Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8566/

---

[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2725#discussion_r218274542

--- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapRowIndexes.java ---
@@ -40,12 +40,14 @@

int BLOCK_LENGTH = 8;

+ int BLOCK_MIN_MAX_FLAG = 9;
+
// below variables are specific for blockletDataMap
- int BLOCKLET_INFO_INDEX = 9;
+ int BLOCKLET_INFO_INDEX = 10;
--- End diff --

will this modification affect legacy store?

---

[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2725#discussion_r218273696

--- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ---
@@ -1831,6 +1831,16 @@

public static final short LOCAL_DICT_ENCODED_BYTEARRAY_SIZE = 3;

+ /**
+ * property to be used for specifying the max character limit for string/varchar data type till
--- End diff --

âspecifying the max character limitâ ->'specifying the max byte limit'

---

[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2725#discussion_r218273764

--- Diff: core/src/main/java/org/apache/carbondata/core/datastore/DataRefNode.java ---
@@ -125,4 +125,12 @@ DimensionRawColumnChunk readDimensionChunk(FileReader fileReader, int columnInde
* @return
*/
BitSetGroup getIndexedData();
+
+ /**
+ * Return the array which contains the flag for each column whether the min max for that column
+ * is written in metadata or not
+ *
+ * @return
--- End diff --

unnecessary âreturnâ

---

[GitHub] carbondata issue #2725: [CARBONDATA-2942] Add read and write support for wri...

In reply to this post by qiuchenjian-2

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2725

@manishgupta88 Hi, actually I have another idea on the implementation for this requirement. I'd like to use an unique fake value to store the minmax for those columns.
While judging the filter value with the fake minmax, we know that is a fake minmax, so the filter procedure just returns true.
In this way, we do not need to modify the metadata and can reduce many code changes.

---

[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

In reply to this post by qiuchenjian-2

Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2725#discussion_r218298749

--- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapRowIndexes.java ---
@@ -40,12 +40,14 @@

int BLOCK_LENGTH = 8;

+ int BLOCK_MIN_MAX_FLAG = 9;
+
// below variables are specific for blockletDataMap
- int BLOCKLET_INFO_INDEX = 9;
+ int BLOCKLET_INFO_INDEX = 10;
--- End diff --

No it will not have any impact as the change this code is in query part

---

[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

In reply to this post by qiuchenjian-2

Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2725#discussion_r218298762

--- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ---
@@ -1831,6 +1831,16 @@

public static final short LOCAL_DICT_ENCODED_BYTEARRAY_SIZE = 3;

+ /**
+ * property to be used for specifying the max character limit for string/varchar data type till
--- End diff --

ok

---

123