Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #1359: [CARBONDATA-1480]Min Max DataMap

Classic

List

118 messages Options

Options

1234 ... 6

[GitHub] carbondata pull request #1359: [CARBONDATA-1480]Min Max DataMap

GitHub user sounakr opened a pull request:

https://github.com/apache/carbondata/pull/1359

[CARBONDATA-1480]Min Max DataMap

Datamap Example. Implementation of Min Max Index through Datamap. And Using the Index while prunning.
---

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sounakr/incubator-carbondata minmax

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/1359.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1359

----
commit a46e3b7c609e070f052017edabef9355668cf00a
Author: sounakr <[hidden email]>
Date: 2017-09-13T11:57:23Z

Min Max DataMap

----

---

[GitHub] carbondata issue #1359: [CARBONDATA-1480]Min Max Index Example for DataMap

Github user QACarbonData commented on the issue:

https://github.com/apache/carbondata/pull/1359

Build Success with Spark 1.6, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/32/

---

[GitHub] carbondata issue #1359: [CARBONDATA-1480]Min Max Index Example for DataMap

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1359

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/153/

---

[GitHub] carbondata issue #1359: [CARBONDATA-1480]Min Max Index Example for DataMap

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1359

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/782/

---

[GitHub] carbondata pull request #1359: [CARBONDATA-1480]Min Max Index Example for Da...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1359#discussion_r139058897

--- Diff: core/src/main/java/org/apache/carbondata/core/datamap/dev/DataMapWriter.java ---
@@ -32,7 +32,12 @@
/**
* End of block notification
*/
- void onBlockEnd(String blockId);
+ void onBlockEnd(String blockId, String directoryPath);
+
+ /**
+ * End of block notification when index got created.
+ */
+ void onBlockEndWithIndex(String blockId, String directoryPath);
--- End diff --

Why is this method required, why not `onBlockEnd` is enough?

---

[GitHub] carbondata pull request #1359: [CARBONDATA-1480]Min Max Index Example for Da...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1359#discussion_r139059342

--- Diff: core/src/main/java/org/apache/carbondata/core/datamap/dev/DataMap.java ---
@@ -31,7 +31,8 @@
/**
* It is called to load the data map to memory or to initialize it.
*/
- void init(String filePath) throws MemoryException, IOException;
+ void init(String blockletIndexPath, String customIndexPath, String segmentId)
--- End diff --

The `filepath` supposed to be either index folder name or index file name, so I don't think this extra information is required here.
And also `blockletIndexPath` is not supposed passed as we have carbonIndex exists in other datamap and we supposed to use it.

---

[GitHub] carbondata pull request #1359: [CARBONDATA-1480]Min Max Index Example for Da...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1359#discussion_r139059518

--- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/MinMaxDataMapFactory.java ---
@@ -0,0 +1,141 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.examples;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.carbondata.core.cache.Cache;
+import org.apache.carbondata.core.cache.CacheProvider;
+import org.apache.carbondata.core.cache.CacheType;
+import org.apache.carbondata.core.datamap.DataMapDistributable;
+import org.apache.carbondata.core.datamap.DataMapMeta;
+import org.apache.carbondata.core.datamap.TableDataMap;
+import org.apache.carbondata.core.datamap.dev.DataMap;
+import org.apache.carbondata.core.datamap.dev.DataMapFactory;
+import org.apache.carbondata.core.datamap.dev.DataMapWriter;
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile;
+import org.apache.carbondata.core.datastore.filesystem.CarbonFileFilter;
+import org.apache.carbondata.core.datastore.impl.FileFactory;
+import org.apache.carbondata.core.events.ChangeEvent;
+import org.apache.carbondata.core.indexstore.TableBlockIndexUniqueIdentifier;
+import org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMap;
+import org.apache.carbondata.core.indexstore.schema.FilterType;
+import org.apache.carbondata.core.memory.MemoryException;
+import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier;
+
+
+/**
+ * Table map for blocklet
+ */
+public class MinMaxDataMapFactory implements DataMapFactory {
+
+ private AbsoluteTableIdentifier identifier;
+
+ // segmentId -> list of index file
+ private Map<String, List<TableBlockIndexUniqueIdentifier>> segmentMap = new HashMap<>();
+
+ private Cache<TableBlockIndexUniqueIdentifier, DataMap> cache;
+
+ @Override
+ public void init(AbsoluteTableIdentifier identifier, String dataMapName) {
+ this.identifier = identifier;
+ cache = CacheProvider.getInstance()
--- End diff --

what is the use of this cache when don't use anywhere

---

[GitHub] carbondata pull request #1359: [CARBONDATA-1480]Min Max Index Example for Da...

In reply to this post by qiuchenjian-2

Github user sounakr commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1359#discussion_r139068734

--- Diff: core/src/main/java/org/apache/carbondata/core/datamap/dev/DataMapWriter.java ---
@@ -32,7 +32,12 @@
/**
* End of block notification
*/
- void onBlockEnd(String blockId);
+ void onBlockEnd(String blockId, String directoryPath);
+
+ /**
+ * End of block notification when index got created.
+ */
+ void onBlockEndWithIndex(String blockId, String directoryPath);
--- End diff --

onBlockEnd Method is called once the block is written. onBlockEndWithIndex is called once the index is also written after the carbondata is written out.

---

[GitHub] carbondata pull request #1359: [CARBONDATA-1480]Min Max Index Example for Da...

In reply to this post by qiuchenjian-2

Github user sounakr commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1359#discussion_r139092331

--- Diff: core/src/main/java/org/apache/carbondata/core/datamap/dev/DataMap.java ---
@@ -31,7 +31,8 @@
/**
* It is called to load the data map to memory or to initialize it.
*/
- void init(String filePath) throws MemoryException, IOException;
+ void init(String blockletIndexPath, String customIndexPath, String segmentId)
--- End diff --

For Min Max Index creation like segment properties and other things i am taking input from regular carbonindex file too. So by design we can have one parameter as primitive index path other can be of the new custom index file path.

---

[GitHub] carbondata pull request #1359: [CARBONDATA-1480]Min Max Index Example for Da...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1359#discussion_r139122030

--- Diff: core/src/main/java/org/apache/carbondata/core/datamap/dev/DataMapWriter.java ---
@@ -32,7 +32,12 @@
/**
* End of block notification
*/
- void onBlockEnd(String blockId);
+ void onBlockEnd(String blockId, String directoryPath);
+
+ /**
+ * End of block notification when index got created.
+ */
+ void onBlockEndWithIndex(String blockId, String directoryPath);
--- End diff --

I did not get the meaning of index. it is supposed to be independent of other indexes. I think onBlockEnd event is enough for writing the index file.

---

[GitHub] carbondata pull request #1359: [CARBONDATA-1480]Min Max Index Example for Da...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1359#discussion_r139122092

--- Diff: core/src/main/java/org/apache/carbondata/core/datamap/dev/DataMap.java ---
@@ -31,7 +31,8 @@
/**
* It is called to load the data map to memory or to initialize it.
*/
- void init(String filePath) throws MemoryException, IOException;
+ void init(String blockletIndexPath, String customIndexPath, String segmentId)
--- End diff --

it should be independent of other indexes

---

[GitHub] carbondata pull request #1359: [CARBONDATA-1480]Min Max Index Example for Da...

In reply to this post by qiuchenjian-2

Github user sounakr commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1359#discussion_r139123481

--- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/MinMaxDataMapFactory.java ---
@@ -0,0 +1,141 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.examples;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.carbondata.core.cache.Cache;
+import org.apache.carbondata.core.cache.CacheProvider;
+import org.apache.carbondata.core.cache.CacheType;
+import org.apache.carbondata.core.datamap.DataMapDistributable;
+import org.apache.carbondata.core.datamap.DataMapMeta;
+import org.apache.carbondata.core.datamap.TableDataMap;
+import org.apache.carbondata.core.datamap.dev.DataMap;
+import org.apache.carbondata.core.datamap.dev.DataMapFactory;
+import org.apache.carbondata.core.datamap.dev.DataMapWriter;
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile;
+import org.apache.carbondata.core.datastore.filesystem.CarbonFileFilter;
+import org.apache.carbondata.core.datastore.impl.FileFactory;
+import org.apache.carbondata.core.events.ChangeEvent;
+import org.apache.carbondata.core.indexstore.TableBlockIndexUniqueIdentifier;
+import org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMap;
+import org.apache.carbondata.core.indexstore.schema.FilterType;
+import org.apache.carbondata.core.memory.MemoryException;
+import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier;
+
+
+/**
+ * Table map for blocklet
+ */
+public class MinMaxDataMapFactory implements DataMapFactory {
+
+ private AbsoluteTableIdentifier identifier;
+
+ // segmentId -> list of index file
+ private Map<String, List<TableBlockIndexUniqueIdentifier>> segmentMap = new HashMap<>();
+
+ private Cache<TableBlockIndexUniqueIdentifier, DataMap> cache;
+
+ @Override
+ public void init(AbsoluteTableIdentifier identifier, String dataMapName) {
+ this.identifier = identifier;
+ cache = CacheProvider.getInstance()
--- End diff --

Removed.

---

[GitHub] carbondata pull request #1359: [CARBONDATA-1480]Min Max Index Example for Da...

In reply to this post by qiuchenjian-2

Github user sounakr commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1359#discussion_r139123880

--- Diff: core/src/main/java/org/apache/carbondata/core/datamap/dev/DataMapWriter.java ---
@@ -32,7 +32,12 @@
/**
* End of block notification
*/
- void onBlockEnd(String blockId);
+ void onBlockEnd(String blockId, String directoryPath);
+
+ /**
+ * End of block notification when index got created.
+ */
+ void onBlockEndWithIndex(String blockId, String directoryPath);
--- End diff --

But during onBlockEnd as the carbonIndex is not yet written, we wont be able to access the carbonIndex files. In the example i am gathering informations from CarbonIndex Files too.
Better to keep hook after writing Index Files also. In future we may need some more hooks at different points.

---

[GitHub] carbondata pull request #1359: [CARBONDATA-1480]Min Max Index Example for Da...

In reply to this post by qiuchenjian-2

Github user sounakr commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1359#discussion_r139124564

--- Diff: core/src/main/java/org/apache/carbondata/core/datamap/dev/DataMap.java ---
@@ -31,7 +31,8 @@
/**
* It is called to load the data map to memory or to initialize it.
*/
- void init(String filePath) throws MemoryException, IOException;
+ void init(String blockletIndexPath, String customIndexPath, String segmentId)
--- End diff --

In this example Along with Min and Max Information i am keeping few more information for building the BlockLet. Both indexes are independent but with the current example implementation i read the Min and Max index and and then read the carbonindex index also in order to get the column cardanality and segmentproperties. These values are used to form the blocklet used for pruning.

---

[GitHub] carbondata issue #1359: [CARBONDATA-1480]Min Max Index Example for DataMap

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1359

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/171/

---

[GitHub] carbondata issue #1359: [CARBONDATA-1480]Min Max Index Example for DataMap

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1359

Build Success with Spark 1.6, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/47/

---

[GitHub] carbondata issue #1359: [CARBONDATA-1480]Min Max Index Example for DataMap

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1359

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/801/

---

[GitHub] carbondata issue #1359: [CARBONDATA-1480]Min Max Index Example for DataMap

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1359

Build Success with Spark 1.6, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/105/

---

[GitHub] carbondata issue #1359: [CARBONDATA-1480]Min Max Index Example for DataMap

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1359

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/229/

---

[GitHub] carbondata issue #1359: [CARBONDATA-1480]Min Max Index Example for DataMap

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1359

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/860/

---

1234 ... 6