[GitHub] carbondata pull request #1238: [CARBONDATA-1363] Add DataMapWriter interface

classic Classic list List threaded Threaded
32 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1238: [CARBONDATA-1363] Add DataMapWriter interface

qiuchenjian-2
GitHub user jackylk opened a pull request:

    https://github.com/apache/carbondata/pull/1238

    [CARBONDATA-1363] Add DataMapWriter interface

   

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jackylk/incubator-carbondata datamap

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/1238.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1238
   
----
commit e59f29cf82f4aca518ba468d80eaa5788fbfc3a0
Author: Jacky Li <[hidden email]>
Date:   2017-08-05T16:38:45Z

    add DataMapWriter interface

commit 3ca14b0099b2285512653d7489aee07a7b76f1d3
Author: Jacky Li <[hidden email]>
Date:   2017-08-05T17:19:18Z

    fix style

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1238: [CARBONDATA-1363] Add DataMapWriter interface

qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1238
 
    Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/791/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1238: [CARBONDATA-1363] Add DataMapWriter interface

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1238
 
    SDV Build Success with Spark 2.1, Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/118/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1238: [CARBONDATA-1363] Add DataMapWriter interface

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1238
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3388/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1238: [CARBONDATA-1363] Add DataMapWriter interface

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1238
 
    SDV Build Success with Spark 2.1, Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/119/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1238: [CARBONDATA-1363] Add DataMapWriter interface

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1238
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3389/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1238: [CARBONDATA-1363] Add DataMapWriter interface

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1238
 
    Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/792/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1238: [CARBONDATA-1363] Add DataMapWriter interface

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1238
 
    SDV Build Success with Spark 2.1, Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/120/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1238: [CARBONDATA-1363] Add DataMapWriter interface

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1238
 
    Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/793/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1238: [CARBONDATA-1363] Add DataMapWriter interface

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1238
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3390/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1238: [CARBONDATA-1363] Add DataMapWriter interface

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1238#discussion_r131574410
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/OperationType.java ---
    @@ -0,0 +1,25 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.carbondata.core.datamap;
    +
    +public enum OperationType {
    --- End diff --
   
    Already a class `FilterType` why can't it is used?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1238: [CARBONDATA-1363] Add DataMapWriter interface

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1238#discussion_r131574534
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/DataMapMeta.java ---
    @@ -14,37 +14,25 @@
      * See the License for the specific language governing permissions and
      * limitations under the License.
      */
    -package org.apache.carbondata.core.indexstore;
     
    -import java.io.DataOutput;
    +package org.apache.carbondata.core.datamap;
     
    -/**
    - * Data Map writer
    - */
    -public interface DataMapWriter<T> {
    +public class DataMapMeta {
     
    -  /**
    -   * Initialize the data map writer with output stream
    -   *
    -   * @param outStream
    -   */
    -  void init(DataOutput outStream);
    +  public DataMapMeta(String indexedColumn, OperationType optimizedOperation) {
    --- End diff --
   
    I think it should be array of columns, user can create composite index as well


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1238: [CARBONDATA-1363] Add DataMapWriter interface

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1238#discussion_r131575067
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/dev/DataMap.java ---
    @@ -26,13 +28,6 @@
     public interface DataMap {
     
       /**
    --- End diff --
   
    I think DataMapWriter should belong to DataMap, user should get writer from here. No need of seperate register for DataMapWriter as already DataMapStoreManager manages it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1238: [CARBONDATA-1363] Add DataMapWriter interface

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1238#discussion_r131575443
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/datamap/DataMapWriterListener.java ---
    @@ -0,0 +1,109 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.carbondata.processing.datamap;
    +
    +import java.util.ArrayList;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +import java.util.concurrent.ConcurrentHashMap;
    +
    +import org.apache.carbondata.common.logging.LogService;
    +import org.apache.carbondata.common.logging.LogServiceFactory;
    +import org.apache.carbondata.core.datamap.DataMapMeta;
    +import org.apache.carbondata.core.datamap.dev.DataMapWriter;
    +import org.apache.carbondata.core.datastore.page.ColumnPage;
    +import org.apache.carbondata.processing.store.TablePage;
    +
    +/**
    + * It is for writing DataMap for one table
    + */
    +public class DataMapWriterListener {
    +
    +  private static final LogService LOG = LogServiceFactory.getLogService(
    +      DataMapWriterListener.class.getCanonicalName());
    +
    +  // indexed column name -> list of data map writer
    +  private static Map<String, List<DataMapWriter>> registry = new ConcurrentHashMap<>();
    +
    +  /**
    +   * Register a DataMapWriter
    +   */
    +  public void register(DataMapWriter writer) {
    +    DataMapMeta meta = writer.getMeta();
    +    List<DataMapWriter> writers = registry.get(meta.getIndexedColumn());
    +    if (writers != null) {
    +      writers.add(writer);
    +    } else {
    +      writers = new ArrayList<>();
    +      writers.add(writer);
    +      registry.put(meta.getIndexedColumn(), writers);
    +    }
    +    LOG.info("DataMapWriter " + writer + " added");
    +  }
    +
    +  public void onBlockStart(String blockId) {
    +    for (List<DataMapWriter> writers : registry.values()) {
    +      for (DataMapWriter writer : writers) {
    +        writer.onBlockStart(blockId);
    +      }
    +    }
    +  }
    +
    +  public void onBlockEnd(String blockId) {
    +    for (List<DataMapWriter> writers : registry.values()) {
    +      for (DataMapWriter writer : writers) {
    +        writer.onBlockEnd(blockId);
    +      }
    +    }
    +  }
    +
    +  public void onBlockletStart(int blockletId) {
    +    for (List<DataMapWriter> writers : registry.values()) {
    +      for (DataMapWriter writer : writers) {
    +        writer.onBlockletStart(blockletId);
    +      }
    +    }
    +  }
    +
    +  public void onBlockletEnd(int blockletId) {
    +    for (List<DataMapWriter> writers : registry.values()) {
    +      for (DataMapWriter writer : writers) {
    +        writer.onBlockletEnd(blockletId);
    +      }
    +    }
    +  }
    +
    +  /**
    +   * Pick corresponding column page and add to all registered datamap
    +   *
    +   * @param pageId     sequence number of page, start from 0
    +   * @param tablePage  page data
    +   */
    +  public void add(int blockletId, int pageId, TablePage tablePage) {
    --- End diff --
   
    I think listener should be only for sending events, not for adding rows. So better remove from here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1238: [CARBONDATA-1363] Add DataMapWriter interface

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1238#discussion_r131575505
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/datamap/DummyDataMapWriter.java ---
    @@ -0,0 +1,67 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.carbondata.processing.datamap;
    +
    +import org.apache.carbondata.common.logging.LogService;
    +import org.apache.carbondata.common.logging.LogServiceFactory;
    +import org.apache.carbondata.core.datamap.DataMapMeta;
    +import org.apache.carbondata.core.datamap.OperationType;
    +import org.apache.carbondata.core.datamap.dev.DataMapWriter;
    +import org.apache.carbondata.core.datastore.page.ColumnPage;
    +
    +public class DummyDataMapWriter implements DataMapWriter {
    --- End diff --
   
    Please remove if not used


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1238: [CARBONDATA-1363] Add DataMapWriter interface

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1238#discussion_r131575663
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/dev/DataMapWriter.java ---
    @@ -0,0 +1,58 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.core.datamap.dev;
    +
    +import org.apache.carbondata.core.datamap.DataMapMeta;
    +import org.apache.carbondata.core.datastore.page.ColumnPage;
    +
    +/**
    + * Data Map writer
    + */
    +public interface DataMapWriter {
    +
    +  /**
    +   *  Start of new block notification.
    +   *  @param blockId file name of the carbondata file
    +   */
    +  void onBlockStart(String blockId);
    +
    +  /**
    +   * End of block notification
    +   */
    +  void onBlockEnd(String blockId);
    +
    +  /**
    +   * Start of new blocklet notification.
    +   * @param blockletId sequence number of blocklet in the block
    +   */
    +  void onBlockletStart(int blockletId);
    +
    +  /**
    +   * End of blocklet notification
    +   * @param blockletId sequence number of blocklet in the block
    +   */
    +  void onBlockletEnd(int blockletId);
    +
    +  /**
    +   * Add the column page row to the datamap
    +   * The order of pageId is not guranteed
    +   */
    +  void add(int blockletId, int pageId, ColumnPage page);
    +
    +  DataMapMeta getMeta();
    --- End diff --
   
    I think DataMapMeta should have belonged to DataMap/DataMapFactory not writer


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1238: [CARBONDATA-1363] Add DataMapWriter interface

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1238#discussion_r131575712
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/datamap/TableDataMapWriterRegistry.java ---
    @@ -0,0 +1,63 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.carbondata.processing.datamap;
    +
    +import java.util.Map;
    +import java.util.concurrent.ConcurrentHashMap;
    +
    +import org.apache.carbondata.core.datamap.dev.DataMapWriter;
    +import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier;
    +import org.apache.carbondata.core.util.CarbonUtil;
    +
    +/**
    + * Registry of DataMapWriter in all tables
    + */
    +public class TableDataMapWriterRegistry {
    --- End diff --
   
    It should not be necessary as DataMapStoreManager should have all information


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1238: [CARBONDATA-1363] Add DataMapWriter interface

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1238#discussion_r131579700
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/writer/v3/CarbonFactDataWriterImplV3.java ---
    @@ -100,88 +104,117 @@ public CarbonFactDataWriterImplV3(CarbonDataWriterVo dataWriterVo) {
       }
     
       /**
    -   * Below method will be used to write one table page data
    +   * Below method will be used to write one table page data, invoked by Consumer
    +   * @param tablePage
        */
    -  @Override public void writeTablePage(EncodedTablePage encodedTablePage)
    +  @Override public void writeTablePage(TablePage tablePage)
           throws CarbonDataWriterException {
         // condition for writting all the pages
    -    if (!encodedTablePage.isLastPage()) {
    +    if (!tablePage.isLastPage()) {
           boolean isAdded = false;
           // check if size more than blocklet size then write the page to file
    -      if (dataWriterHolder.getSize() + encodedTablePage.getEncodedSize() >= blockletSize) {
    -        // if one page size is more than blocklet size
    -        if (dataWriterHolder.getEncodedTablePages().size() == 0) {
    +      if (blockletDataHolder.getSize() + tablePage.getEncodedTablePage().getEncodedSize() >=
    +          blockletSizeThreshold) {
    +        // if blocklet size exceeds threshold, write blocklet data
    +        if (blockletDataHolder.getEncodedTablePages().size() == 0) {
               isAdded = true;
    -          dataWriterHolder.addPage(encodedTablePage);
    +          blockletDataHolder.addPage(tablePage);
             }
     
    -        LOGGER.info("Number of Pages for blocklet is: " + dataWriterHolder.getNumberOfPagesAdded()
    -            + " :Rows Added: " + dataWriterHolder.getTotalRows());
    +        LOGGER.info("Number of Pages for blocklet is: " + blockletDataHolder.getNumberOfPagesAdded()
    +            + " :Rows Added: " + blockletDataHolder.getTotalRows());
    +
             // write the data
             writeBlockletToFile();
    +
           }
           if (!isAdded) {
    -        dataWriterHolder.addPage(encodedTablePage);
    +        blockletDataHolder.addPage(tablePage);
           }
         } else {
           //for last blocklet check if the last page will exceed the blocklet size then write
           // existing pages and then last page
    -      if (encodedTablePage.getPageSize() > 0) {
    -        dataWriterHolder.addPage(encodedTablePage);
    +
    +      if (tablePage.getPageSize() > 0) {
    +        blockletDataHolder.addPage(tablePage);
           }
    -      if (dataWriterHolder.getNumberOfPagesAdded() > 0) {
    -        LOGGER.info("Number of Pages for blocklet is: " + dataWriterHolder.getNumberOfPagesAdded()
    -            + " :Rows Added: " + dataWriterHolder.getTotalRows());
    +      if (blockletDataHolder.getNumberOfPagesAdded() > 0) {
    +        LOGGER.info("Number of Pages for blocklet is: " + blockletDataHolder.getNumberOfPagesAdded()
    +            + " :Rows Added: " + blockletDataHolder.getTotalRows());
             writeBlockletToFile();
           }
         }
       }
     
       /**
    -   * Write one blocklet data to file
    +   * Write the collect blocklet data (blockletDataHolder) to file
        */
       private void writeBlockletToFile() {
         // get the list of all encoded table page
    -    List<EncodedTablePage> encodedTablePageList = dataWriterHolder.getEncodedTablePages();
    +    List<EncodedTablePage> encodedTablePageList = blockletDataHolder.getEncodedTablePages();
         int numDimensions = encodedTablePageList.get(0).getNumDimensions();
         int numMeasures = encodedTablePageList.get(0).getNumMeasures();
    -    long blockletDataSize = 0;
         // get data chunks for all the column
         byte[][] dataChunkBytes = new byte[numDimensions + numMeasures][];
    +    long metadataSize = fillDataChunk(encodedTablePageList, dataChunkBytes);
    +    // calculate the total size of data to be written
    +    long blockletSize = blockletDataHolder.getSize() + metadataSize;
    +    // to check if data size will exceed the block size then create a new file
    +    createNewFileIfReachThreshold(blockletSize);
    +    notifyDataMapBlockletCommit();
    +
    +    // write data to file
    +    try {
    +      if (fileChannel.size() == 0) {
    +        // write the header if file is empty
    +        writeHeaderToFile(fileChannel);
    +      }
    +      writeBlockletToFile(fileChannel, dataChunkBytes);
    +    } catch (IOException e) {
    +      throw new CarbonDataWriterException("Problem when writing file", e);
    +    }
    +    // clear the data holder
    +    blockletDataHolder.clear();
    +  }
    +
    +  private void notifyDataMapBlockletCommit() {
    --- End diff --
   
    I think we should not wait till all pages are collected as it will increase the memory and also slows down. Better add tablepage to datamapwriter as soon as Consumer calls `writeTablePage` method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1238: [CARBONDATA-1363] Add DataMapWriter interface

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1238#discussion_r131607087
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/OperationType.java ---
    @@ -0,0 +1,25 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.carbondata.core.datamap;
    +
    +public enum OperationType {
    --- End diff --
   
    ok, I will remove


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1238: [CARBONDATA-1363] Add DataMapWriter interface

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1238#discussion_r131607244
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/DataMapMeta.java ---
    @@ -14,37 +14,25 @@
      * See the License for the specific language governing permissions and
      * limitations under the License.
      */
    -package org.apache.carbondata.core.indexstore;
     
    -import java.io.DataOutput;
    +package org.apache.carbondata.core.datamap;
     
    -/**
    - * Data Map writer
    - */
    -public interface DataMapWriter<T> {
    +public class DataMapMeta {
     
    -  /**
    -   * Initialize the data map writer with output stream
    -   *
    -   * @param outStream
    -   */
    -  void init(DataOutput outStream);
    +  public DataMapMeta(String indexedColumn, OperationType optimizedOperation) {
    --- End diff --
   
    I prefer to keep it simple in this PR. We can do it later


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
12