[GitHub] carbondata pull request #2402: [CARBONDATA-2587][CARBONDATA-2588] Local Dict...

classic Classic list List threaded Threaded
51 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2402: [CARBONDATA-2587][CARBONDATA-2588] Local Dictionary ...

qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2402
 
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5325/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2402: [CARBONDATA-2587][CARBONDATA-2588] Local Dictionary ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2402
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6494/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2402: [CARBONDATA-2587][CARBONDATA-2588] Local Dictionary ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2402
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5415/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2402: [CARBONDATA-2587][CARBONDATA-2588] Local Dictionary ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on the issue:

    https://github.com/apache/carbondata/pull/2402
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2402: [CARBONDATA-2587][CARBONDATA-2588] Local Dictionary ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2402
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5416/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2402: [CARBONDATA-2587][CARBONDATA-2588] Local Dictionary ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2402
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6497/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2402: [CARBONDATA-2587][CARBONDATA-2588] Local Dictionary ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2402
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5328/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2402: [CARBONDATA-2587][CARBONDATA-2588] Local Dict...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2402#discussion_r197803602
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/blocklet/BlockletEncodedColumnPage.java ---
    @@ -0,0 +1,187 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.core.datastore.blocklet;
    +
    +import java.io.IOException;
    +import java.util.ArrayDeque;
    +import java.util.ArrayList;
    +import java.util.List;
    +import java.util.concurrent.ExecutionException;
    +import java.util.concurrent.ExecutorService;
    +import java.util.concurrent.Future;
    +
    +import org.apache.carbondata.common.logging.LogService;
    +import org.apache.carbondata.common.logging.LogServiceFactory;
    +import org.apache.carbondata.core.datastore.page.FallbackColumnPageEncoder;
    +import org.apache.carbondata.core.datastore.page.FallbackEncodedColumnPage;
    +import org.apache.carbondata.core.datastore.page.encoding.EncodedColumnPage;
    +import org.apache.carbondata.core.localdictionary.PageLevelDictionary;
    +import org.apache.carbondata.core.memory.MemoryException;
    +import org.apache.carbondata.format.LocalDictionaryChunk;
    +
    +/**
    + * Maintains the list of encoded page of a column in a blocklet
    + * and encoded dictionary values only if column is encoded using local
    + * dictionary
    + * Handle the fallback if all the pages in blocklet are not
    + * encoded with local dictionary
    + */
    +public class BlockletEncodedColumnPage {
    +
    +  /**
    +   * LOGGER
    +   */
    +  private static final LogService LOGGER =
    +      LogServiceFactory.getLogService(BlockletEncodedColumnPage.class.getName());
    +
    +  /**
    +   * list of encoded page of a column in a blocklet
    +   */
    +  private List<EncodedColumnPage> encodedColumnPageList;
    +
    +  /**
    +   * fallback executor service
    +   */
    +  private ExecutorService fallbackExecutorService;
    +
    +  /**
    +   * to check whether pages are local dictionary encoded or not
    +   */
    +  private boolean isLocalDictEncoded;
    +
    +  /**
    +   * page level dictionary only when column is encoded with local dictionary
    +   */
    +  private PageLevelDictionary pageLevelDictionary;
    +
    +  /**
    +   * fallback future task queue;
    +   */
    +  private ArrayDeque<Future<FallbackEncodedColumnPage>> fallbackFutureQueue;
    +
    +  BlockletEncodedColumnPage(ExecutorService fallbackExecutorService,
    +      EncodedColumnPage encodedColumnPage) {
    +    this.encodedColumnPageList = new ArrayList<>();
    +    this.fallbackExecutorService = fallbackExecutorService;
    +    this.encodedColumnPageList.add(encodedColumnPage);
    +    // if dimension page is local dictionary enabled and encoded with local dictionary
    +    if (encodedColumnPage.isLocalDictionaryEnabled() && encodedColumnPage
    --- End diff --
   
    Just keep `this.isLocalDictEncoded =encodedColumnPage.isLocalDictGeneratedPage()` should be ok


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2402: [CARBONDATA-2587][CARBONDATA-2588] Local Dict...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2402#discussion_r197804388
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/blocklet/BlockletEncodedColumnPage.java ---
    @@ -0,0 +1,187 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.core.datastore.blocklet;
    +
    +import java.io.IOException;
    +import java.util.ArrayDeque;
    +import java.util.ArrayList;
    +import java.util.List;
    +import java.util.concurrent.ExecutionException;
    +import java.util.concurrent.ExecutorService;
    +import java.util.concurrent.Future;
    +
    +import org.apache.carbondata.common.logging.LogService;
    +import org.apache.carbondata.common.logging.LogServiceFactory;
    +import org.apache.carbondata.core.datastore.page.FallbackColumnPageEncoder;
    +import org.apache.carbondata.core.datastore.page.FallbackEncodedColumnPage;
    +import org.apache.carbondata.core.datastore.page.encoding.EncodedColumnPage;
    +import org.apache.carbondata.core.localdictionary.PageLevelDictionary;
    +import org.apache.carbondata.core.memory.MemoryException;
    +import org.apache.carbondata.format.LocalDictionaryChunk;
    +
    +/**
    + * Maintains the list of encoded page of a column in a blocklet
    + * and encoded dictionary values only if column is encoded using local
    + * dictionary
    + * Handle the fallback if all the pages in blocklet are not
    + * encoded with local dictionary
    + */
    +public class BlockletEncodedColumnPage {
    +
    +  /**
    +   * LOGGER
    +   */
    +  private static final LogService LOGGER =
    +      LogServiceFactory.getLogService(BlockletEncodedColumnPage.class.getName());
    +
    +  /**
    +   * list of encoded page of a column in a blocklet
    +   */
    +  private List<EncodedColumnPage> encodedColumnPageList;
    +
    +  /**
    +   * fallback executor service
    +   */
    +  private ExecutorService fallbackExecutorService;
    +
    +  /**
    +   * to check whether pages are local dictionary encoded or not
    +   */
    +  private boolean isLocalDictEncoded;
    +
    +  /**
    +   * page level dictionary only when column is encoded with local dictionary
    +   */
    +  private PageLevelDictionary pageLevelDictionary;
    +
    +  /**
    +   * fallback future task queue;
    +   */
    +  private ArrayDeque<Future<FallbackEncodedColumnPage>> fallbackFutureQueue;
    +
    +  BlockletEncodedColumnPage(ExecutorService fallbackExecutorService,
    +      EncodedColumnPage encodedColumnPage) {
    --- End diff --
   
    Don't add `encodedColumnPage` from constructor, use `addEncodedColumnColumnPage`


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2402: [CARBONDATA-2587][CARBONDATA-2588] Local Dict...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2402#discussion_r197807784
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/blocklet/BlockletEncodedColumnPage.java ---
    @@ -0,0 +1,187 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.core.datastore.blocklet;
    +
    +import java.io.IOException;
    +import java.util.ArrayDeque;
    +import java.util.ArrayList;
    +import java.util.List;
    +import java.util.concurrent.ExecutionException;
    +import java.util.concurrent.ExecutorService;
    +import java.util.concurrent.Future;
    +
    +import org.apache.carbondata.common.logging.LogService;
    +import org.apache.carbondata.common.logging.LogServiceFactory;
    +import org.apache.carbondata.core.datastore.page.FallbackColumnPageEncoder;
    +import org.apache.carbondata.core.datastore.page.FallbackEncodedColumnPage;
    +import org.apache.carbondata.core.datastore.page.encoding.EncodedColumnPage;
    +import org.apache.carbondata.core.localdictionary.PageLevelDictionary;
    +import org.apache.carbondata.core.memory.MemoryException;
    +import org.apache.carbondata.format.LocalDictionaryChunk;
    +
    +/**
    + * Maintains the list of encoded page of a column in a blocklet
    + * and encoded dictionary values only if column is encoded using local
    + * dictionary
    + * Handle the fallback if all the pages in blocklet are not
    + * encoded with local dictionary
    + */
    +public class BlockletEncodedColumnPage {
    +
    +  /**
    +   * LOGGER
    +   */
    +  private static final LogService LOGGER =
    +      LogServiceFactory.getLogService(BlockletEncodedColumnPage.class.getName());
    +
    +  /**
    +   * list of encoded page of a column in a blocklet
    +   */
    +  private List<EncodedColumnPage> encodedColumnPageList;
    +
    +  /**
    +   * fallback executor service
    +   */
    +  private ExecutorService fallbackExecutorService;
    +
    +  /**
    +   * to check whether pages are local dictionary encoded or not
    +   */
    +  private boolean isLocalDictEncoded;
    +
    +  /**
    +   * page level dictionary only when column is encoded with local dictionary
    +   */
    +  private PageLevelDictionary pageLevelDictionary;
    +
    +  /**
    +   * fallback future task queue;
    +   */
    +  private ArrayDeque<Future<FallbackEncodedColumnPage>> fallbackFutureQueue;
    +
    +  BlockletEncodedColumnPage(ExecutorService fallbackExecutorService,
    +      EncodedColumnPage encodedColumnPage) {
    +    this.encodedColumnPageList = new ArrayList<>();
    +    this.fallbackExecutorService = fallbackExecutorService;
    +    this.encodedColumnPageList.add(encodedColumnPage);
    +    // if dimension page is local dictionary enabled and encoded with local dictionary
    +    if (encodedColumnPage.isLocalDictionaryEnabled() && encodedColumnPage
    +        .isLocalDictGeneratedPage()) {
    +      this.isLocalDictEncoded = true;
    +      // get first page dictionary
    +      this.pageLevelDictionary = encodedColumnPage.getPageDictionary();
    +    }
    +  }
    +
    +  /**
    +   * Below method will be used to add column page of a column
    +   *
    +   * @param encodedColumnPage
    +   * encoded column page
    +   * @throws ExecutionException
    +   * failure in fallback
    +   * @throws InterruptedException
    +   * failure during fallback
    +   */
    +  void addEncodedColumnColumnPage(EncodedColumnPage encodedColumnPage)
    +      throws ExecutionException, InterruptedException {
    +    // if local dictionary is false or column is encoded with local dictionary then
    +    // add a page
    +    if (!isLocalDictEncoded || encodedColumnPage.isLocalDictGeneratedPage()) {
    +      this.encodedColumnPageList.add(encodedColumnPage);
    +      // merge page level dictionary values
    +      if (null != this.pageLevelDictionary) {
    +        pageLevelDictionary.mergerDictionaryValues(encodedColumnPage.getPageDictionary());
    +      }
    +    } else {
    +      // if older pages were encoded with dictionary and new pages are without dictionary
    +      isLocalDictEncoded = false;
    +      pageLevelDictionary = null;
    +      this.fallbackFutureQueue = new ArrayDeque<>();
    +      LOGGER.info(
    +          "Local dictionary Fallback is initiated for column: " + encodedColumnPageList.get(0)
    +              .getActualPage().getColumnSpec().getFieldName());
    +      // submit all the older pages encoded with dictionary for fallback
    +      for (int pageIndex = 0; pageIndex < encodedColumnPageList.size(); pageIndex++) {
    +        fallbackFutureQueue.add(fallbackExecutorService.submit(
    +            new FallbackColumnPageEncoder(encodedColumnPageList.get(pageIndex), pageIndex)));
    +      }
    +      //add to page list
    +      this.encodedColumnPageList.add(encodedColumnPage);
    +    }
    +  }
    +
    +  /**
    +   * Return the list of encoded page list for a column in a blocklet
    +   *
    +   * @return list of encoded page list
    +   */
    +  public List<EncodedColumnPage> getEncodedColumnPageList() {
    +    // if fallback queue is null then for some pages fallback was initiated
    +    if (null != this.fallbackFutureQueue) {
    +      try {
    +        // check if queue is not empty
    +        while (!fallbackFutureQueue.isEmpty()) {
    +          // get the head element of queue
    +          Future<FallbackEncodedColumnPage> fallbackTask = fallbackFutureQueue.getFirst();
    --- End diff --
   
    I think just use `poll` and `future.get` to fill the pages instead of using `sleep`


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2402: [CARBONDATA-2587][CARBONDATA-2588] Local Dict...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2402#discussion_r197813606
 
    --- Diff: core/src/test/java/org/apache/carbondata/core/util/CarbonMetadataUtilTest.java ---
    @@ -172,71 +172,71 @@
         IndexHeader indexheaderResult = getIndexHeader(columnCardinality, columnSchemaList, 0, 0L);
         assertEquals(indexHeader, indexheaderResult);
       }
    -
    -  @Test public void testConvertFileFooter() throws Exception {
    -    int[] cardinality = { 1, 2, 3, 4, 5 };
    -
    -    org.apache.carbondata.core.metadata.schema.table.column.ColumnSchema colSchema =
    -        new org.apache.carbondata.core.metadata.schema.table.column.ColumnSchema();
    -    org.apache.carbondata.core.metadata.schema.table.column.ColumnSchema colSchema1 =
    -        new org.apache.carbondata.core.metadata.schema.table.column.ColumnSchema();
    -    List<org.apache.carbondata.core.metadata.schema.table.column.ColumnSchema>
    -        columnSchemaList = new ArrayList<>();
    -    columnSchemaList.add(colSchema);
    -    columnSchemaList.add(colSchema1);
    -
    -    SegmentProperties segmentProperties = new SegmentProperties(columnSchemaList, cardinality);
    -
    -    final EncodedColumnPage measure = new EncodedColumnPage(new DataChunk2(), new byte[]{0,1},
    -        PrimitivePageStatsCollector.newInstance(
    -        org.apache.carbondata.core.metadata.datatype.DataTypes.BYTE));
    -    new MockUp<EncodedTablePage>() {
    -      @SuppressWarnings("unused") @Mock
    -      public EncodedColumnPage getMeasure(int measureIndex) {
    -        return measure;
    -      }
    -    };
    -
    -    new MockUp<TablePageKey>() {
    -      @SuppressWarnings("unused") @Mock
    -      public byte[] serializeStartKey() {
    -        return new byte[]{1, 2};
    -      }
    -
    -      @SuppressWarnings("unused") @Mock
    -      public byte[] serializeEndKey() {
    -        return new byte[]{1, 2};
    -      }
    -    };
    -
    -    TablePageKey key = new TablePageKey(3, segmentProperties, false);
    -    EncodedTablePage encodedTablePage = EncodedTablePage.newInstance(3, new EncodedColumnPage[0], new EncodedColumnPage[0],
    -        key);
    -
    -    List<EncodedTablePage> encodedTablePageList = new ArrayList<>();
    -    encodedTablePageList.add(encodedTablePage);
    -
    -    BlockletInfo3 blockletInfoColumnar1 = new BlockletInfo3();
    -
    -    List<BlockletInfo3> blockletInfoColumnarList = new ArrayList<>();
    -    blockletInfoColumnarList.add(blockletInfoColumnar1);
    -
    -    byte[] byteMaxArr = "1".getBytes();
    -    byte[] byteMinArr = "2".getBytes();
    -
    -    BlockletIndex index = getBlockletIndex(encodedTablePageList, segmentProperties.getMeasures());
    -    List<BlockletIndex> indexList = new ArrayList<>();
    -    indexList.add(index);
    -
    -    BlockletMinMaxIndex blockletMinMaxIndex = new BlockletMinMaxIndex();
    -    blockletMinMaxIndex.addToMax_values(ByteBuffer.wrap(byteMaxArr));
    -    blockletMinMaxIndex.addToMin_values(ByteBuffer.wrap(byteMinArr));
    -    FileFooter3 footer = convertFileFooterVersion3(blockletInfoColumnarList,
    -        indexList,
    -        cardinality, 2);
    -    assertEquals(footer.getBlocklet_index_list(), indexList);
    -
    -  }
    +//
    --- End diff --
   
    remove if not required


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2402: [CARBONDATA-2587][CARBONDATA-2588] Local Dict...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2402#discussion_r197813935
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerModel.java ---
    @@ -623,5 +640,83 @@ public DataMapWriterListener getDataMapWriterlistener() {
         return dataMapWriterlistener;
       }
     
    +  public Map<String, LocalDictionaryGenerator> getColumnLocalDictGenMap() {
    +    return columnLocalDictGenMap;
    +  }
    +
    +  /**
    +   * This method prepares a map which will have column and local dictionary generator mapping for
    +   * all the local dictionary columns.
    +   * @param carbonTable
    +   * @param wrapperColumnSchema
    +   * @param carbonFactDataHandlerModel
    +   */
    +  public static void setLocalDictToModel(CarbonTable carbonTable,
    --- End diff --
   
    Keep as `private`


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2402: [CARBONDATA-2587][CARBONDATA-2588] Local Dict...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2402#discussion_r197817666
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/localdictionary/dictionaryholder/MapBasedDictionaryStore.java ---
    @@ -0,0 +1,135 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.core.localdictionary.dictionaryholder;
    +
    +import java.util.Map;
    +import java.util.concurrent.ConcurrentHashMap;
    +
    +import org.apache.carbondata.core.cache.dictionary.DictionaryByteArrayWrapper;
    +import org.apache.carbondata.core.localdictionary.exception.DictionaryThresholdReachedException;
    +
    +/**
    + * Map based dictionary holder class, it will use map to hold
    + * the dictionary key and its value
    + */
    +public class MapBasedDictionaryStore implements DictionaryStore {
    +
    +  /**
    +   * use to assign dictionary value to new key
    +   */
    +  private int lastAssignValue;
    +
    +  /**
    +   * to maintain dictionary key value
    +   */
    +  private final Map<DictionaryByteArrayWrapper, Integer> dictionary;
    +
    +  /**
    +   * maintaining array for reverse lookup
    +   * otherwise iterating everytime in map for reverse lookup will be slowdown the performance
    +   * It will only maintain the reference
    +   */
    +  private byte[][] referenceDictionaryArray;
    --- End diff --
   
    Better directly use `DictionaryByteArrayWrapper` array here


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2402: [CARBONDATA-2587][CARBONDATA-2588] Local Dict...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2402#discussion_r197820697
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/localdictionary/PageLevelDictionary.java ---
    @@ -0,0 +1,118 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.core.localdictionary;
    +
    +import java.io.IOException;
    +import java.util.BitSet;
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants;
    +import org.apache.carbondata.core.datastore.ColumnType;
    +import org.apache.carbondata.core.datastore.TableSpec;
    +import org.apache.carbondata.core.datastore.page.ColumnPage;
    +import org.apache.carbondata.core.datastore.page.encoding.ColumnPageEncoder;
    +import org.apache.carbondata.core.datastore.page.encoding.compress.DirectCompressCodec;
    +import org.apache.carbondata.core.datastore.page.statistics.DummyStatsCollector;
    +import org.apache.carbondata.core.localdictionary.exception.DictionaryThresholdReachedException;
    +import org.apache.carbondata.core.localdictionary.generator.LocalDictionaryGenerator;
    +import org.apache.carbondata.core.memory.MemoryException;
    +import org.apache.carbondata.core.metadata.datatype.DataTypes;
    +import org.apache.carbondata.format.LocalDictionaryChunk;
    +
    +/**
    + * Class to maintain page level dictionary. It will store all unique dictionary values
    + * used in a page. This is required while writing blocklet level dictionary in carbondata
    + * file
    + */
    +public class PageLevelDictionary {
    +
    +  /**
    +   * dictionary generator to generate dictionary values for page data
    +   */
    +  private LocalDictionaryGenerator localDictionaryGenerator;
    +
    +  /**
    +   * set of dictionary surrogate key in this page
    +   */
    +  private BitSet usedDictionaryValues;
    +
    +  private int maxDictValue;
    +
    +  private String columnName;
    +
    +  public PageLevelDictionary(LocalDictionaryGenerator localDictionaryGenerator,String columnName) {
    +    this.localDictionaryGenerator = localDictionaryGenerator;
    +    this.usedDictionaryValues = new BitSet();
    +    this.columnName = columnName;
    +  }
    +
    +  /**
    +   * Below method will be used to get the dictionary value
    +   *
    +   * @param data column data
    +   * @return dictionary value
    +   * @throws DictionaryThresholdReachedException when threshold crossed for column
    +   */
    +  public int getDictionaryValue(byte[] data) throws DictionaryThresholdReachedException {
    +    int dictionaryValue = localDictionaryGenerator.generateDictionary(data);
    +    this.usedDictionaryValues.set(dictionaryValue);
    +    if (maxDictValue < dictionaryValue) {
    +      maxDictValue = dictionaryValue;
    +    }
    +    return dictionaryValue;
    +  }
    +
    +  /**
    +   * Method to merge the dictionary value across pages
    +   *
    +   * @param pageLevelDictionary other page level dictionary
    +   */
    +  public void mergerDictionaryValues(PageLevelDictionary pageLevelDictionary) {
    +    usedDictionaryValues.and(pageLevelDictionary.usedDictionaryValues);
    +  }
    +
    +  /**
    +   * Below method will be used to get the local dictionary chunk for writing
    +   * @TODO Support for numeric data type dictionary exclude columns
    +   * @return encoded local dictionary chunk
    +   * @throws MemoryException
    +   * in case of problem in encoding
    +   * @throws IOException
    +   * in case of problem in encoding
    +   */
    +  public LocalDictionaryChunk getLocalDictionaryChunkForBlocklet()
    +      throws MemoryException, IOException {
    +    // TODO support for actual data type dictionary ColumnSPEC
    +    TableSpec.ColumnSpec spec = TableSpec.ColumnSpec
    +        .newInstance(columnName, DataTypes.BYTE_ARRAY, ColumnType.PLAIN_VALUE);
    +    ColumnPage dictionaryColumnPage = ColumnPage.newPage(spec, DataTypes.BYTE_ARRAY, maxDictValue);
    +    // TODO support data type specific stats collector for numeric data types
    +    dictionaryColumnPage.setStatsCollector(new DummyStatsCollector());
    +    int rowId = 0;
    +    //starting index is 1 as dictionary value starts from 1
    +    for (int i = 1; i <= maxDictValue; i++) {
    +      if (usedDictionaryValues.get(i)) {
    +        dictionaryColumnPage
    +            .putData(rowId++, localDictionaryGenerator.getDictionaryKeyBasedOnValue(i));
    +      } else {
    +        dictionaryColumnPage
    +            .putData(rowId++, CarbonCommonConstants.EMPTY_BYTE_ARRAY);
    --- End diff --
   
    Check any other cases in data we get empty binary.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2402: [CARBONDATA-2587][CARBONDATA-2588] Local Dict...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2402#discussion_r197821658
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/writer/v3/CarbonFactDataWriterImplV3.java ---
    @@ -76,7 +79,29 @@ public CarbonFactDataWriterImplV3(CarbonFactDataHandlerModel model) {
           blockletSizeThreshold = fileSizeInBytes;
           LOGGER.info("Blocklet size configure for table is: " + blockletSizeThreshold);
         }
    -    blockletDataHolder = new BlockletDataHolder();
    +    int numberOfCores = CarbonProperties.getInstance().getNumberOfCores();
    --- End diff --
   
    please remove unused code


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2402: [CARBONDATA-2587][CARBONDATA-2588] Local Dict...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2402#discussion_r197821950
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/writer/v3/CarbonFactDataWriterImplV3.java ---
    @@ -110,44 +135,51 @@ public CarbonFactDataWriterImplV3(CarbonFactDataHandlerModel model) {
        */
       @Override public void writeTablePage(TablePage tablePage)
           throws CarbonDataWriterException,IOException {
    -    // condition for writting all the pages
    -    if (!tablePage.isLastPage()) {
    -      boolean isAdded = false;
    -      // check if size more than blocklet size then write the page to file
    -      if (blockletDataHolder.getSize() + tablePage.getEncodedTablePage().getEncodedSize() >=
    -          blockletSizeThreshold) {
    -        // if blocklet size exceeds threshold, write blocklet data
    -        if (blockletDataHolder.getEncodedTablePages().size() == 0) {
    -          isAdded = true;
    -          addPageData(tablePage);
    -        }
    +    try {
    --- End diff --
   
    dnt format the code if code is not changed


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2402: [CARBONDATA-2587][CARBONDATA-2588] Local Dictionary ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2402
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6536/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2402: [CARBONDATA-2587][CARBONDATA-2588] Local Dictionary ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2402
 
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5365/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2402: [CARBONDATA-2587][CARBONDATA-2588] Local Dictionary ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2402
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5440/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2402: [CARBONDATA-2587][CARBONDATA-2588] Local Dictionary ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2402
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6537/



---
123