[GitHub] carbondata pull request #958: [WIP] Added interfaces for index frame work.

classic Classic list List threaded Threaded
60 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #958: [WIP] Added interfaces for index frame work.

qiuchenjian-2
GitHub user ravipesala opened a pull request:

    https://github.com/apache/carbondata/pull/958

    [WIP] Added interfaces for index frame work.

    All Block index and blocklet indexes will be moved this interface. And also the future secondary index also can be used the same interfaces.
   
    Interfaces for Index storage and retrieval implementations.
    1.IndexStore class which maintains all index tables existed in carbondata.So it can be worked as independent service as well.
    2.IndexTable interface has facility to add and retrieve index data.
    3.Here all the data added to above interface will be stored in unsafe offheap/onheap memory.
   
   


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ravipesala/incubator-carbondata index-interface

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/958.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #958
   
----
commit 04253e5740edd7cfc051b0c1355232b6f2f43514
Author: ravipesala <[hidden email]>
Date:   2017-05-26T16:19:25Z

    Added interfaces for index.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #958: [WIP] Added interfaces for index frame work.

qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/958
 
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #958: [CARBONDATA-1088] Added interfaces for Data Ma...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/958#discussion_r119279365
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/block/TableBlockInfo.java ---
    @@ -307,4 +316,45 @@ public void setVersion(ColumnarFormatVersion version) {
       public void setBlockStorageIdMap(Map<String, String> blockStorageIdMap) {
         this.blockStorageIdMap = blockStorageIdMap;
       }
    +
    +  public byte[] getSerializedData() throws IOException {
    --- End diff --
   
    why not extend from Serializable?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #958: [CARBONDATA-1088] Added interfaces for Data Ma...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/958#discussion_r119282074
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/schema/DataMapSchemaType.java ---
    @@ -0,0 +1,24 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.core.indexstore.schema;
    +
    +/**
    + * Index schema type.
    + */
    +public enum DataMapSchemaType {
    +  FIXED, VARIABLE;
    --- End diff --
   
    what do you mean by fixed and variable?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #958: [CARBONDATA-1088] Added interfaces for Data Ma...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/958#discussion_r119282180
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/DataMap.java ---
    @@ -0,0 +1,105 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.core.indexstore;
    +
    +import java.util.Comparator;
    +import java.util.Iterator;
    +
    +import org.apache.carbondata.core.indexstore.row.DataMapKey;
    +import org.apache.carbondata.core.indexstore.row.DataMapRow;
    +import org.apache.carbondata.core.indexstore.schema.DataMapRowSchema;
    +
    +/**
    + * Interface for adding and retrieving index data.
    + */
    +public interface DataMap {
    --- End diff --
   
    Please remove implementation in this PR, keep necessary interface in this PR to make it easier for review and future reference. Thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #958: [CARBONDATA-1088] Added interfaces for Data Ma...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/958#discussion_r119282288
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/DataMap.java ---
    @@ -0,0 +1,105 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.core.indexstore;
    +
    +import java.util.Comparator;
    +import java.util.Iterator;
    +
    +import org.apache.carbondata.core.indexstore.row.DataMapKey;
    +import org.apache.carbondata.core.indexstore.row.DataMapRow;
    +import org.apache.carbondata.core.indexstore.schema.DataMapRowSchema;
    +
    +/**
    + * Interface for adding and retrieving index data.
    + */
    +public interface DataMap {
    +
    +  /**
    +   * Schema of the index key and its attributes. It should be called only once after creating
    +   * the instance.
    +   *
    +   * @param schema
    +   */
    +  void init(DataMapRowSchema schema);
    +
    +  /**
    +   * Add the index row to the in-memory store.
    +   *
    +   * @param row
    +   */
    +  void addIndex(DataMapRow row);
    +
    +  /**
    +   * Finish writing of index table, otherwise it will not be allowed to read.
    +   */
    +  void finishWriting();
    +
    +  /**
    +   * Retrieve the index row by using index key.
    +   *
    +   * @param key
    +   * @return IndexRow
    +   */
    +  DataMapRow getIndex(DataMapKey key, Comparator<DataMapKey> indexComparator);
    +
    +  /**
    +   * Retrieve the index row by using index number.
    +   *
    +   * @param index
    +   * @return IndexRow
    +   */
    +  DataMapRow getIndex(int index);
    +
    +  /**
    +   * Get the scan using start and end index key.
    +   * 1. if start key is null then scan starts from starting and end till endKey(including)
    +   * 2. if end key is null then scan starts at start key and end till last key.
    +   * 3. if both are null then scan starts at begining and end till last key.
    +   *
    +   * @param startKey
    +   * @param endKey
    +   * @return Iterator<IndexRow>
    +   */
    +  Iterator<DataMapRow> getIndexScan(DataMapKey startKey, DataMapKey endKey,
    +      Comparator<DataMapKey> indexComparator);
    +
    +  /**
    +   * Gets the total index row count in the store.
    +   *
    +   * @return
    +   */
    +  int getTotalCount();
    +
    +  /**
    +   * Clear complete index table and release memory.
    +   */
    +  void clear();
    +
    +  /**
    +   * Get the total size used by index table.
    +   *
    +   * @return
    +   */
    +  long getTotalSizeInBytesUsed();
    --- End diff --
   
    Seems this is not used


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #958: [CARBONDATA-1088] Added interfaces for Data Ma...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/958#discussion_r119282300
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/DataMap.java ---
    @@ -0,0 +1,105 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.core.indexstore;
    +
    +import java.util.Comparator;
    +import java.util.Iterator;
    +
    +import org.apache.carbondata.core.indexstore.row.DataMapKey;
    +import org.apache.carbondata.core.indexstore.row.DataMapRow;
    +import org.apache.carbondata.core.indexstore.schema.DataMapRowSchema;
    +
    +/**
    + * Interface for adding and retrieving index data.
    + */
    +public interface DataMap {
    +
    +  /**
    +   * Schema of the index key and its attributes. It should be called only once after creating
    +   * the instance.
    +   *
    +   * @param schema
    +   */
    +  void init(DataMapRowSchema schema);
    +
    +  /**
    +   * Add the index row to the in-memory store.
    +   *
    +   * @param row
    +   */
    +  void addIndex(DataMapRow row);
    +
    +  /**
    +   * Finish writing of index table, otherwise it will not be allowed to read.
    +   */
    +  void finishWriting();
    +
    +  /**
    +   * Retrieve the index row by using index key.
    +   *
    +   * @param key
    +   * @return IndexRow
    +   */
    +  DataMapRow getIndex(DataMapKey key, Comparator<DataMapKey> indexComparator);
    +
    +  /**
    +   * Retrieve the index row by using index number.
    +   *
    +   * @param index
    +   * @return IndexRow
    +   */
    +  DataMapRow getIndex(int index);
    +
    +  /**
    +   * Get the scan using start and end index key.
    +   * 1. if start key is null then scan starts from starting and end till endKey(including)
    +   * 2. if end key is null then scan starts at start key and end till last key.
    +   * 3. if both are null then scan starts at begining and end till last key.
    +   *
    +   * @param startKey
    +   * @param endKey
    +   * @return Iterator<IndexRow>
    +   */
    +  Iterator<DataMapRow> getIndexScan(DataMapKey startKey, DataMapKey endKey,
    +      Comparator<DataMapKey> indexComparator);
    +
    +  /**
    +   * Gets the total index row count in the store.
    +   *
    +   * @return
    +   */
    +  int getTotalCount();
    +
    +  /**
    +   * Clear complete index table and release memory.
    +   */
    +  void clear();
    +
    +  /**
    +   * Get the total size used by index table.
    +   *
    +   * @return
    +   */
    +  long getTotalSizeInBytesUsed();
    +
    +  /**
    +   * Get the row schema which is used in index table
    +   * @return
    +   */
    +  DataMapRowSchema getIndexRowSchema();
    --- End diff --
   
    Seems this is not used


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #958: [CARBONDATA-1088] Added interfaces for Data Ma...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/958#discussion_r119288412
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/schema/IndexSchema.java ---
    @@ -0,0 +1,37 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.core.indexstore.schema;
    +
    +/**
    + * It just have 2 types right now, either fixed or variable.
    + */
    +public interface IndexSchema {
    --- End diff --
   
    Seems this is not necessary, keep IndexSchemaType is enough


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #958: [CARBONDATA-1088] Added interfaces for Data Ma...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/958#discussion_r119289157
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/impl/array/DataMapStoreFactory.java ---
    @@ -0,0 +1,93 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.core.datastore.impl.array;
    +
    +import java.util.List;
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants;
    +import org.apache.carbondata.core.datastore.BTreeBuilderInfo;
    +import org.apache.carbondata.core.datastore.DataRefNode;
    +import org.apache.carbondata.core.datastore.DataRefNodeFinder;
    +import org.apache.carbondata.core.datastore.impl.btree.BTreeDataRefNodeFinder;
    +import org.apache.carbondata.core.datastore.impl.btree.BlockBTreeBuilder;
    +import org.apache.carbondata.core.datastore.impl.btree.BlockletBTreeBuilder;
    +import org.apache.carbondata.core.indexstore.DataMap;
    +import org.apache.carbondata.core.indexstore.builder.BlockDMBuilder;
    +import org.apache.carbondata.core.metadata.blocklet.DataFileFooter;
    +import org.apache.carbondata.core.util.CarbonProperties;
    +
    +/**
    + * Factory for index builder and finder
    + */
    +public class DataMapStoreFactory {
    +
    +  public static DataRefNode buildDriverIndex(List<DataFileFooter> footerList, int[] minMaxLen) {
    --- End diff --
   
    Why not return `DataMap` directly? `DataMap` should be driver side entity used for `Pruner`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #958: [CARBONDATA-1088] Added interfaces for Data Ma...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/958#discussion_r119290612
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/DataMap.java ---
    @@ -0,0 +1,105 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.core.indexstore;
    +
    +import java.util.Comparator;
    +import java.util.Iterator;
    +
    +import org.apache.carbondata.core.indexstore.row.DataMapKey;
    +import org.apache.carbondata.core.indexstore.row.DataMapRow;
    +import org.apache.carbondata.core.indexstore.schema.DataMapRowSchema;
    +
    +/**
    + * Interface for adding and retrieving index data.
    + */
    +public interface DataMap {
    +
    +  /**
    +   * Schema of the index key and its attributes. It should be called only once after creating
    +   * the instance.
    +   *
    +   * @param schema
    +   */
    +  void init(DataMapRowSchema schema);
    +
    +  /**
    +   * Add the index row to the in-memory store.
    +   *
    +   * @param row
    +   */
    +  void addIndex(DataMapRow row);
    --- End diff --
   
    I feel it is better to separate DataMap building interface and other functional interface.
    You can refer to `IndexLoader` in master (not used), andI think after we have `DataMapBuilder` interface, `IndexLoader` can be removed.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #958: [CARBONDATA-1088] Added interfaces for Data Ma...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/958#discussion_r119291017
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/DataMap.java ---
    @@ -0,0 +1,105 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.core.indexstore;
    +
    +import java.util.Comparator;
    +import java.util.Iterator;
    +
    +import org.apache.carbondata.core.indexstore.row.DataMapKey;
    +import org.apache.carbondata.core.indexstore.row.DataMapRow;
    +import org.apache.carbondata.core.indexstore.schema.DataMapRowSchema;
    +
    +/**
    + * Interface for adding and retrieving index data.
    + */
    +public interface DataMap {
    --- End diff --
   
    Let's try to keep this Api minimun. And try to identify Api that can have default implementation which can be class or abstract class. This will encourage more people to contribute more DataMap implementation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #958: [CARBONDATA-1088] Added interfaces for Data Ma...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/958#discussion_r119293599
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/row/DMAttribute.java ---
    @@ -0,0 +1,30 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.core.indexstore.row;
    +
    +/**
    + * Attribute of index key.
    + */
    +public interface DMAttribute<T> {
    --- End diff --
   
    Seems this represent the positionId which is indexed? Can we re-use `Distributable` for the same purpose? I guess we need some abstraction over blockId/blockletId/rowId


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #958: [CARBONDATA-1088] Added interfaces for Data Ma...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/958#discussion_r119527699
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/block/TableBlockInfo.java ---
    @@ -307,4 +316,45 @@ public void setVersion(ColumnarFormatVersion version) {
       public void setBlockStorageIdMap(Map<String, String> blockStorageIdMap) {
         this.blockStorageIdMap = blockStorageIdMap;
       }
    +
    +  public byte[] getSerializedData() throws IOException {
    --- End diff --
   
    If we implement serializable we cannot control what to write and what not to write, here we have complete control of how to write the data.  I will create the new interface CarbonWritable like hadoop's Writable and add these methods to there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #958: [CARBONDATA-1088] Added interfaces for Data Ma...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/958#discussion_r119527857
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/schema/DataMapSchemaType.java ---
    @@ -0,0 +1,24 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.core.indexstore.schema;
    +
    +/**
    + * Index schema type.
    + */
    +public enum DataMapSchemaType {
    +  FIXED, VARIABLE;
    --- End diff --
   
    it depends on the data, if the data is primitive datatypes or dictionary types then those are fixed length schema and if the data is string or any no dictionary types then those will be variable type.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #958: [CARBONDATA-1088] Added interfaces for Data Ma...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/958#discussion_r119527882
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/schema/DataMapSchemaType.java ---
    @@ -0,0 +1,24 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.core.indexstore.schema;
    +
    +/**
    + * Index schema type.
    + */
    +public enum DataMapSchemaType {
    +  FIXED, VARIABLE;
    --- End diff --
   
    I will add the comment in code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #958: [CARBONDATA-1088] Added interfaces for Data Ma...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/958#discussion_r119527950
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/DataMap.java ---
    @@ -0,0 +1,105 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.core.indexstore;
    +
    +import java.util.Comparator;
    +import java.util.Iterator;
    +
    +import org.apache.carbondata.core.indexstore.row.DataMapKey;
    +import org.apache.carbondata.core.indexstore.row.DataMapRow;
    +import org.apache.carbondata.core.indexstore.schema.DataMapRowSchema;
    +
    +/**
    + * Interface for adding and retrieving index data.
    + */
    +public interface DataMap {
    --- End diff --
   
    Ok, I will remove implementation from this PR and move to another PR after it is merged.
   



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #958: [CARBONDATA-1088] Added interfaces for Data Ma...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/958#discussion_r119527990
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/DataMap.java ---
    @@ -0,0 +1,105 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.core.indexstore;
    +
    +import java.util.Comparator;
    +import java.util.Iterator;
    +
    +import org.apache.carbondata.core.indexstore.row.DataMapKey;
    +import org.apache.carbondata.core.indexstore.row.DataMapRow;
    +import org.apache.carbondata.core.indexstore.schema.DataMapRowSchema;
    +
    +/**
    + * Interface for adding and retrieving index data.
    + */
    +public interface DataMap {
    --- End diff --
   
    And yes, will try to add abstract class and minimize the api


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #958: [CARBONDATA-1088] Added interfaces for Data Ma...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/958#discussion_r119528141
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/DataMap.java ---
    @@ -0,0 +1,105 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.core.indexstore;
    +
    +import java.util.Comparator;
    +import java.util.Iterator;
    +
    +import org.apache.carbondata.core.indexstore.row.DataMapKey;
    +import org.apache.carbondata.core.indexstore.row.DataMapRow;
    +import org.apache.carbondata.core.indexstore.schema.DataMapRowSchema;
    +
    +/**
    + * Interface for adding and retrieving index data.
    + */
    +public interface DataMap {
    +
    +  /**
    +   * Schema of the index key and its attributes. It should be called only once after creating
    +   * the instance.
    +   *
    +   * @param schema
    +   */
    +  void init(DataMapRowSchema schema);
    +
    +  /**
    +   * Add the index row to the in-memory store.
    +   *
    +   * @param row
    +   */
    +  void addIndex(DataMapRow row);
    +
    +  /**
    +   * Finish writing of index table, otherwise it will not be allowed to read.
    +   */
    +  void finishWriting();
    +
    +  /**
    +   * Retrieve the index row by using index key.
    +   *
    +   * @param key
    +   * @return IndexRow
    +   */
    +  DataMapRow getIndex(DataMapKey key, Comparator<DataMapKey> indexComparator);
    +
    +  /**
    +   * Retrieve the index row by using index number.
    +   *
    +   * @param index
    +   * @return IndexRow
    +   */
    +  DataMapRow getIndex(int index);
    +
    +  /**
    +   * Get the scan using start and end index key.
    +   * 1. if start key is null then scan starts from starting and end till endKey(including)
    +   * 2. if end key is null then scan starts at start key and end till last key.
    +   * 3. if both are null then scan starts at begining and end till last key.
    +   *
    +   * @param startKey
    +   * @param endKey
    +   * @return Iterator<IndexRow>
    +   */
    +  Iterator<DataMapRow> getIndexScan(DataMapKey startKey, DataMapKey endKey,
    +      Comparator<DataMapKey> indexComparator);
    +
    +  /**
    +   * Gets the total index row count in the store.
    +   *
    +   * @return
    +   */
    +  int getTotalCount();
    +
    +  /**
    +   * Clear complete index table and release memory.
    +   */
    +  void clear();
    +
    +  /**
    +   * Get the total size used by index table.
    +   *
    +   * @return
    +   */
    +  long getTotalSizeInBytesUsed();
    --- End diff --
   
    It is required for LRU cache to know how much size this datamap occupied.  I think it is better to move LRU cache inside DataMap. I will remove it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #958: [CARBONDATA-1088] Added interfaces for Data Ma...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/958#discussion_r119528159
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/DataMap.java ---
    @@ -0,0 +1,105 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.core.indexstore;
    +
    +import java.util.Comparator;
    +import java.util.Iterator;
    +
    +import org.apache.carbondata.core.indexstore.row.DataMapKey;
    +import org.apache.carbondata.core.indexstore.row.DataMapRow;
    +import org.apache.carbondata.core.indexstore.schema.DataMapRowSchema;
    +
    +/**
    + * Interface for adding and retrieving index data.
    + */
    +public interface DataMap {
    +
    +  /**
    +   * Schema of the index key and its attributes. It should be called only once after creating
    +   * the instance.
    +   *
    +   * @param schema
    +   */
    +  void init(DataMapRowSchema schema);
    +
    +  /**
    +   * Add the index row to the in-memory store.
    +   *
    +   * @param row
    +   */
    +  void addIndex(DataMapRow row);
    +
    +  /**
    +   * Finish writing of index table, otherwise it will not be allowed to read.
    +   */
    +  void finishWriting();
    +
    +  /**
    +   * Retrieve the index row by using index key.
    +   *
    +   * @param key
    +   * @return IndexRow
    +   */
    +  DataMapRow getIndex(DataMapKey key, Comparator<DataMapKey> indexComparator);
    +
    +  /**
    +   * Retrieve the index row by using index number.
    +   *
    +   * @param index
    +   * @return IndexRow
    +   */
    +  DataMapRow getIndex(int index);
    +
    +  /**
    +   * Get the scan using start and end index key.
    +   * 1. if start key is null then scan starts from starting and end till endKey(including)
    +   * 2. if end key is null then scan starts at start key and end till last key.
    +   * 3. if both are null then scan starts at begining and end till last key.
    +   *
    +   * @param startKey
    +   * @param endKey
    +   * @return Iterator<IndexRow>
    +   */
    +  Iterator<DataMapRow> getIndexScan(DataMapKey startKey, DataMapKey endKey,
    +      Comparator<DataMapKey> indexComparator);
    +
    +  /**
    +   * Gets the total index row count in the store.
    +   *
    +   * @return
    +   */
    +  int getTotalCount();
    +
    +  /**
    +   * Clear complete index table and release memory.
    +   */
    +  void clear();
    +
    +  /**
    +   * Get the total size used by index table.
    +   *
    +   * @return
    +   */
    +  long getTotalSizeInBytesUsed();
    +
    +  /**
    +   * Get the row schema which is used in index table
    +   * @return
    +   */
    +  DataMapRowSchema getIndexRowSchema();
    --- End diff --
   
    Ok, will remove it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #958: [CARBONDATA-1088] Added interfaces for Data Ma...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/958#discussion_r119528455
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/schema/IndexSchema.java ---
    @@ -0,0 +1,37 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.core.indexstore.schema;
    +
    +/**
    + * It just have 2 types right now, either fixed or variable.
    + */
    +public interface IndexSchema {
    --- End diff --
   
    Actually it is needed as for FixedSchema type we require length from user


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
123