[GitHub] carbondata pull request #2823: [CARBONDATA-3015] Support Lazy load in carbon...

classic Classic list List threaded Threaded
82 messages Options
12345
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2823: [CARBONDATA-3015] Support Lazy load in carbon...

qiuchenjian-2
GitHub user ravipesala opened a pull request:

    https://github.com/apache/carbondata/pull/2823

    [CARBONDATA-3015]  Support Lazy load in carbon vector

    This PR depends on PR https://github.com/apache/carbondata/pull/2822
   
    Even though we prune the pages as per min/max there is a high chance of false positives in case of filters on high cardinality columns. So to avoid that we can use the lazy loading design. It does not read/decompresses data and fill the vector immediately when the call comes for data filling from spark/presto.
    First only reads the required filter columns give back to execution engine, execution engine starts filtering on the filtered column vector and if it finds some data need to be read from projection columns then only it starts reads the projection columns and fills the vector on demand. It is the concept of presto and same is integrated with spark 2.3. Older versions of spark cannot use this advantage as ColumnVector interfaces are non-extendable.
    For the above purpose added new classes 'LazyBlockletLoad' and 'LazyPageLoad' and changed the carbon vector interfaces.
   
   
    Be sure to do all of the following checklist to help us incorporate
    your contribution quickly and easily:
   
     - [ ] Any interfaces changed?
     
     - [ ] Any backward compatibility impacted?
     
     - [ ] Document update required?
   
     - [ ] Testing done
            Please provide details on
            - Whether new unit test cases have been added or why no new tests are required?
            - How it is tested? Please attach test report.
            - Is it a performance related change? Please attach the performance test report.
            - Any additional information to help reviewers in testing this change.
           
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ravipesala/incubator-carbondata perf-lazy-load

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2823.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2823
   
----
commit 5443ce85a91c4ce28454d73699d62ef47255a845
Author: ravipesala <ravi.pesala@...>
Date:   2018-10-16T05:02:18Z

    Add carbon property to configure vector based row pruning push down

commit d024ccac82979a908a643b7198ba1f80d9c08e91
Author: ravipesala <ravi.pesala@...>
Date:   2018-10-16T06:00:43Z

    Added support for full scan queries for vector direct fill.

commit 77203f344dbe9dca8ce2697a0dbc364bb8a01cdc
Author: ravipesala <ravi.pesala@...>
Date:   2018-10-16T09:23:14Z

    Added support for pruning pages for vector direct fill.

commit 46578850c5c02c4927da32efe7a1df27cfe2ebd1
Author: ravipesala <ravi.pesala@...>
Date:   2018-10-16T11:07:18Z

    Added support for inverted index and delete delta for direct scan queries

commit b08c254280198fbbcb41361c7b32a9eb51cf8473
Author: ravipesala <ravi.pesala@...>
Date:   2018-10-16T13:09:16Z

    Support Lazy load in carbon vector

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2823: [CARBONDATA-3015] Support Lazy load in carbon vector

qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2823
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/814/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2823: [CARBONDATA-3015] Support Lazy load in carbon vector

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2823
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1011/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2823: [CARBONDATA-3015] Support Lazy load in carbon vector

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2823
 
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9079/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2823: [CARBONDATA-3015] Support Lazy load in carbon vector

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2823
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/826/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2823: [CARBONDATA-3015] Support Lazy load in carbon vector

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2823
 
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9091/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2823: [CARBONDATA-3015] Support Lazy load in carbon vector

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2823
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1023/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2823: [CARBONDATA-3015] Support Lazy load in carbon vector

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2823
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/828/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2823: [CARBONDATA-3015] Support Lazy load in carbon vector

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2823
 
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9093/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2823: [CARBONDATA-3015] Support Lazy load in carbon vector

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2823
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1025/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2823: [CARBONDATA-3015] Support Lazy load in carbon...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2823#discussion_r226547892
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/AbstractNonDictionaryVectorFiller.java ---
    @@ -0,0 +1,274 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.carbondata.core.datastore.chunk.store.impl.safe;
    +
    +import java.nio.ByteBuffer;
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants;
    +import org.apache.carbondata.core.metadata.datatype.DataType;
    +import org.apache.carbondata.core.metadata.datatype.DataTypes;
    +import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector;
    +import org.apache.carbondata.core.util.ByteUtil;
    +import org.apache.carbondata.core.util.DataTypeUtil;
    +
    +public abstract class AbstractNonDictionaryVectorFiller {
    +
    +  protected int lengthSize;
    +  protected int numberOfRows;
    +
    +  public AbstractNonDictionaryVectorFiller(int lengthSize, int numberOfRows) {
    +    this.lengthSize = lengthSize;
    +    this.numberOfRows = numberOfRows;
    +  }
    +
    +  public abstract void fillVector(byte[] data, CarbonColumnVector vector, ByteBuffer buffer);
    +
    +  public int getLengthFromBuffer(ByteBuffer buffer) {
    +    return buffer.getShort();
    +  }
    +}
    +
    +class NonDictionaryVectorFillerFactory {
    +
    +  public static AbstractNonDictionaryVectorFiller getVectorFiller(DataType type, int lengthSize,
    +      int numberOfRows) {
    +    if (type == DataTypes.STRING || type == DataTypes.VARCHAR) {
    +      if (lengthSize == 2) {
    --- End diff --
   
    instead of judging by the `lengthSize`,I think you can dispatch the branch using the `type`


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2823: [CARBONDATA-3015] Support Lazy load in carbon...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2823#discussion_r226547597
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/SafeVariableLengthDimensionDataChunkStore.java ---
    @@ -91,6 +95,31 @@ public void putArray(final int[] invertedIndex, final int[] invertedIndexReverse
         }
       }
     
    +  @Override
    +  public void fillVector(int[] invertedIndex, int[] invertedIndexReverse, byte[] data,
    +      ColumnVectorInfo vectorInfo) {
    +    this.invertedIndexReverse = invertedIndex;
    +
    +    // as first position will be start from 2 byte as data is stored first in the memory block
    --- End diff --
   
    please fix the comment here since it is not always 2 bytes.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2823: [CARBONDATA-3015] Support Lazy load in carbon...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2823#discussion_r226549110
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/AbstractNonDictionaryVectorFiller.java ---
    @@ -0,0 +1,274 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.carbondata.core.datastore.chunk.store.impl.safe;
    +
    +import java.nio.ByteBuffer;
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants;
    +import org.apache.carbondata.core.metadata.datatype.DataType;
    +import org.apache.carbondata.core.metadata.datatype.DataTypes;
    +import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector;
    +import org.apache.carbondata.core.util.ByteUtil;
    +import org.apache.carbondata.core.util.DataTypeUtil;
    +
    +public abstract class AbstractNonDictionaryVectorFiller {
    +
    +  protected int lengthSize;
    +  protected int numberOfRows;
    +
    +  public AbstractNonDictionaryVectorFiller(int lengthSize, int numberOfRows) {
    +    this.lengthSize = lengthSize;
    +    this.numberOfRows = numberOfRows;
    +  }
    +
    +  public abstract void fillVector(byte[] data, CarbonColumnVector vector, ByteBuffer buffer);
    +
    +  public int getLengthFromBuffer(ByteBuffer buffer) {
    +    return buffer.getShort();
    +  }
    +}
    +
    +class NonDictionaryVectorFillerFactory {
    +
    +  public static AbstractNonDictionaryVectorFiller getVectorFiller(DataType type, int lengthSize,
    +      int numberOfRows) {
    +    if (type == DataTypes.STRING || type == DataTypes.VARCHAR) {
    +      if (lengthSize == 2) {
    +        return new StringVectorFiller(lengthSize, numberOfRows);
    +      } else {
    +        return new LongStringVectorFiller(lengthSize, numberOfRows);
    +      }
    +    } else if (type == DataTypes.TIMESTAMP) {
    +      return new TimeStampVectorFiller(lengthSize, numberOfRows);
    +    } else if (type == DataTypes.BOOLEAN) {
    +      return new BooleanVectorFiller(lengthSize, numberOfRows);
    +    } else if (type == DataTypes.SHORT) {
    +      return new ShortVectorFiller(lengthSize, numberOfRows);
    +    } else if (type == DataTypes.INT) {
    +      return new IntVectorFiller(lengthSize, numberOfRows);
    +    } else if (type == DataTypes.LONG) {
    +      return new LongStringVectorFiller(lengthSize, numberOfRows);
    +    }
    +    return new StringVectorFiller(lengthSize, numberOfRows);
    +  }
    +
    +}
    +
    +class StringVectorFiller extends AbstractNonDictionaryVectorFiller {
    +
    +  public StringVectorFiller(int lengthSize, int numberOfRows) {
    +    super(lengthSize, numberOfRows);
    +  }
    +
    +  @Override
    +  public void fillVector(byte[] data, CarbonColumnVector vector, ByteBuffer buffer) {
    +    // start position will be used to store the current data position
    +    int startOffset = 0;
    +    int currentOffset = lengthSize;
    +    ByteUtil.UnsafeComparer comparer = ByteUtil.UnsafeComparer.INSTANCE;
    +    for (int i = 0; i < numberOfRows - 1; i++) {
    --- End diff --
   
    Can you add some descriptions about how is the vector filled?
   
    Take this line for example, it seems the the last row is reserved to do something else, so a description is wanted.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2823: [CARBONDATA-3015] Support Lazy load in carbon...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2823#discussion_r226550593
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveDeltaFloatingCodec.java ---
    @@ -225,6 +238,71 @@ public double decodeDouble(long value) {
           return (max - value) / factor;
         }
     
    +    @Override
    +    public void decodeAndFillVector(ColumnPage columnPage, ColumnVectorInfo vectorInfo) {
    +      CarbonColumnVector vector = vectorInfo.vector;
    +      BitSet nullBits = columnPage.getNullBits();
    +      DataType type = columnPage.getDataType();
    +      int pageSize = columnPage.getPageSize();
    +      BitSet deletedRows = vectorInfo.deletedRows;
    +      DataType dataType = vector.getType();
    --- End diff --
   
    What's the relationship between `columnPage.getDataType()` and `vector.getType()`?


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2823: [CARBONDATA-3015] Support Lazy load in carbon...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2823#discussion_r226548594
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/AbstractNonDictionaryVectorFiller.java ---
    @@ -0,0 +1,274 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.carbondata.core.datastore.chunk.store.impl.safe;
    +
    +import java.nio.ByteBuffer;
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants;
    +import org.apache.carbondata.core.metadata.datatype.DataType;
    +import org.apache.carbondata.core.metadata.datatype.DataTypes;
    +import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector;
    +import org.apache.carbondata.core.util.ByteUtil;
    +import org.apache.carbondata.core.util.DataTypeUtil;
    +
    +public abstract class AbstractNonDictionaryVectorFiller {
    +
    +  protected int lengthSize;
    +  protected int numberOfRows;
    +
    +  public AbstractNonDictionaryVectorFiller(int lengthSize, int numberOfRows) {
    +    this.lengthSize = lengthSize;
    +    this.numberOfRows = numberOfRows;
    +  }
    +
    +  public abstract void fillVector(byte[] data, CarbonColumnVector vector, ByteBuffer buffer);
    +
    +  public int getLengthFromBuffer(ByteBuffer buffer) {
    +    return buffer.getShort();
    +  }
    +}
    +
    +class NonDictionaryVectorFillerFactory {
    +
    +  public static AbstractNonDictionaryVectorFiller getVectorFiller(DataType type, int lengthSize,
    +      int numberOfRows) {
    +    if (type == DataTypes.STRING || type == DataTypes.VARCHAR) {
    +      if (lengthSize == 2) {
    +        return new StringVectorFiller(lengthSize, numberOfRows);
    +      } else {
    +        return new LongStringVectorFiller(lengthSize, numberOfRows);
    +      }
    +    } else if (type == DataTypes.TIMESTAMP) {
    +      return new TimeStampVectorFiller(lengthSize, numberOfRows);
    +    } else if (type == DataTypes.BOOLEAN) {
    +      return new BooleanVectorFiller(lengthSize, numberOfRows);
    +    } else if (type == DataTypes.SHORT) {
    +      return new ShortVectorFiller(lengthSize, numberOfRows);
    +    } else if (type == DataTypes.INT) {
    +      return new IntVectorFiller(lengthSize, numberOfRows);
    +    } else if (type == DataTypes.LONG) {
    +      return new LongStringVectorFiller(lengthSize, numberOfRows);
    +    }
    +    return new StringVectorFiller(lengthSize, numberOfRows);
    +  }
    +
    +}
    +
    +class StringVectorFiller extends AbstractNonDictionaryVectorFiller {
    +
    +  public StringVectorFiller(int lengthSize, int numberOfRows) {
    +    super(lengthSize, numberOfRows);
    +  }
    +
    +  @Override
    +  public void fillVector(byte[] data, CarbonColumnVector vector, ByteBuffer buffer) {
    +    // start position will be used to store the current data position
    +    int startOffset = 0;
    +    int currentOffset = lengthSize;
    +    ByteUtil.UnsafeComparer comparer = ByteUtil.UnsafeComparer.INSTANCE;
    --- End diff --
   
    `Comparer`? It is french, not English.
    Do you mean `Comparator`? If it is so, please fix the name of this class.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2823: [CARBONDATA-3015] Support Lazy load in carbon...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2823#discussion_r226549503
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/AbstractNonDictionaryVectorFiller.java ---
    @@ -0,0 +1,274 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.carbondata.core.datastore.chunk.store.impl.safe;
    +
    +import java.nio.ByteBuffer;
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants;
    +import org.apache.carbondata.core.metadata.datatype.DataType;
    +import org.apache.carbondata.core.metadata.datatype.DataTypes;
    +import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector;
    +import org.apache.carbondata.core.util.ByteUtil;
    +import org.apache.carbondata.core.util.DataTypeUtil;
    +
    +public abstract class AbstractNonDictionaryVectorFiller {
    +
    +  protected int lengthSize;
    +  protected int numberOfRows;
    +
    +  public AbstractNonDictionaryVectorFiller(int lengthSize, int numberOfRows) {
    +    this.lengthSize = lengthSize;
    +    this.numberOfRows = numberOfRows;
    +  }
    +
    +  public abstract void fillVector(byte[] data, CarbonColumnVector vector, ByteBuffer buffer);
    +
    +  public int getLengthFromBuffer(ByteBuffer buffer) {
    +    return buffer.getShort();
    +  }
    +}
    +
    +class NonDictionaryVectorFillerFactory {
    +
    +  public static AbstractNonDictionaryVectorFiller getVectorFiller(DataType type, int lengthSize,
    +      int numberOfRows) {
    +    if (type == DataTypes.STRING || type == DataTypes.VARCHAR) {
    +      if (lengthSize == 2) {
    +        return new StringVectorFiller(lengthSize, numberOfRows);
    +      } else {
    +        return new LongStringVectorFiller(lengthSize, numberOfRows);
    +      }
    +    } else if (type == DataTypes.TIMESTAMP) {
    +      return new TimeStampVectorFiller(lengthSize, numberOfRows);
    +    } else if (type == DataTypes.BOOLEAN) {
    +      return new BooleanVectorFiller(lengthSize, numberOfRows);
    +    } else if (type == DataTypes.SHORT) {
    +      return new ShortVectorFiller(lengthSize, numberOfRows);
    +    } else if (type == DataTypes.INT) {
    +      return new IntVectorFiller(lengthSize, numberOfRows);
    +    } else if (type == DataTypes.LONG) {
    +      return new LongStringVectorFiller(lengthSize, numberOfRows);
    +    }
    +    return new StringVectorFiller(lengthSize, numberOfRows);
    +  }
    +
    +}
    +
    +class StringVectorFiller extends AbstractNonDictionaryVectorFiller {
    +
    +  public StringVectorFiller(int lengthSize, int numberOfRows) {
    +    super(lengthSize, numberOfRows);
    +  }
    +
    +  @Override
    +  public void fillVector(byte[] data, CarbonColumnVector vector, ByteBuffer buffer) {
    +    // start position will be used to store the current data position
    +    int startOffset = 0;
    --- End diff --
   
    If this variable is used as the above comments, why not optimize the name to `currentOffset`?


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2823: [CARBONDATA-3015] Support Lazy load in carbon...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2823#discussion_r226548215
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/AbstractNonDictionaryVectorFiller.java ---
    @@ -0,0 +1,274 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.carbondata.core.datastore.chunk.store.impl.safe;
    +
    +import java.nio.ByteBuffer;
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants;
    +import org.apache.carbondata.core.metadata.datatype.DataType;
    +import org.apache.carbondata.core.metadata.datatype.DataTypes;
    +import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector;
    +import org.apache.carbondata.core.util.ByteUtil;
    +import org.apache.carbondata.core.util.DataTypeUtil;
    +
    +public abstract class AbstractNonDictionaryVectorFiller {
    +
    +  protected int lengthSize;
    +  protected int numberOfRows;
    +
    +  public AbstractNonDictionaryVectorFiller(int lengthSize, int numberOfRows) {
    +    this.lengthSize = lengthSize;
    +    this.numberOfRows = numberOfRows;
    +  }
    +
    +  public abstract void fillVector(byte[] data, CarbonColumnVector vector, ByteBuffer buffer);
    +
    +  public int getLengthFromBuffer(ByteBuffer buffer) {
    +    return buffer.getShort();
    +  }
    +}
    +
    +class NonDictionaryVectorFillerFactory {
    +
    +  public static AbstractNonDictionaryVectorFiller getVectorFiller(DataType type, int lengthSize,
    +      int numberOfRows) {
    +    if (type == DataTypes.STRING || type == DataTypes.VARCHAR) {
    +      if (lengthSize == 2) {
    +        return new StringVectorFiller(lengthSize, numberOfRows);
    +      } else {
    +        return new LongStringVectorFiller(lengthSize, numberOfRows);
    +      }
    +    } else if (type == DataTypes.TIMESTAMP) {
    +      return new TimeStampVectorFiller(lengthSize, numberOfRows);
    +    } else if (type == DataTypes.BOOLEAN) {
    +      return new BooleanVectorFiller(lengthSize, numberOfRows);
    +    } else if (type == DataTypes.SHORT) {
    +      return new ShortVectorFiller(lengthSize, numberOfRows);
    +    } else if (type == DataTypes.INT) {
    +      return new IntVectorFiller(lengthSize, numberOfRows);
    +    } else if (type == DataTypes.LONG) {
    +      return new LongStringVectorFiller(lengthSize, numberOfRows);
    +    }
    +    return new StringVectorFiller(lengthSize, numberOfRows);
    --- End diff --
   
    Will it by default use `StringVectorFiller` or throw an exception? Is this intended?


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2823: [CARBONDATA-3015] Support Lazy load in carbon...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2823#discussion_r226550873
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveDeltaIntegralCodec.java ---
    @@ -272,5 +295,169 @@ public double decodeDouble(double value) {
           // this codec is for integer type only
           throw new RuntimeException("internal error");
         }
    +
    +    @Override
    +    public void decodeAndFillVector(ColumnPage columnPage, ColumnVectorInfo vectorInfo) {
    +      CarbonColumnVector vector = vectorInfo.vector;
    +      BitSet nullBits = columnPage.getNullBits();
    +      DataType dataType = vector.getType();
    +      DataType type = columnPage.getDataType();
    +      int pageSize = columnPage.getPageSize();
    +      BitSet deletedRows = vectorInfo.deletedRows;
    +      vector = ColumnarVectorWrapperDirectFactory
    +          .getDirectVectorWrapperFactory(vector, vectorInfo.invertedIndex, nullBits, deletedRows);
    +      fillVector(columnPage, vector, dataType, type, pageSize, vectorInfo);
    +      if (deletedRows == null || deletedRows.isEmpty()) {
    +        for (int i = nullBits.nextSetBit(0); i >= 0; i = nullBits.nextSetBit(i + 1)) {
    +          vector.putNull(i);
    +        }
    +      }
    +      if (vector instanceof ConvertableVector) {
    +        ((ConvertableVector) vector).convert();
    +      }
    +    }
    +
    +    private void fillVector(ColumnPage columnPage, CarbonColumnVector vector, DataType dataType,
    +        DataType type, int pageSize, ColumnVectorInfo vectorInfo) {
    +      if (type == DataTypes.BOOLEAN || type == DataTypes.BYTE) {
    +        byte[] byteData = columnPage.getByteData();
    +        if (dataType == DataTypes.SHORT) {
    +          for (int i = 0; i < pageSize; i++) {
    +            vector.putShort(i, (short) (max - byteData[i]));
    +          }
    +        } else if (dataType == DataTypes.INT) {
    +          for (int i = 0; i < pageSize; i++) {
    +            vector.putInt(i, (int) (max - byteData[i]));
    +          }
    +        } else if (dataType == DataTypes.LONG) {
    +          for (int i = 0; i < pageSize; i++) {
    +            vector.putLong(i, (max - byteData[i]));
    +          }
    +        } else if (dataType == DataTypes.TIMESTAMP) {
    +          for (int i = 0; i < pageSize; i++) {
    +            vector.putLong(i, (max - byteData[i]) * 1000);
    +          }
    +        } else if (dataType == DataTypes.BOOLEAN) {
    +          for (int i = 0; i < pageSize; i++) {
    +            vector.putByte(i, (byte) (max - byteData[i]));
    +          }
    +        } else if (DataTypes.isDecimal(dataType)) {
    +          DecimalConverterFactory.DecimalConverter decimalConverter = vectorInfo.decimalConverter;
    +          int precision = vectorInfo.measure.getMeasure().getPrecision();
    +          for (int i = 0; i < pageSize; i++) {
    +            BigDecimal decimal = decimalConverter.getDecimal(max - byteData[i]);
    +            vector.putDecimal(i, decimal, precision);
    +          }
    +        } else {
    +          for (int i = 0; i < pageSize; i++) {
    +            vector.putDouble(i, (max - byteData[i]));
    +          }
    +        }
    +      } else if (type == DataTypes.SHORT) {
    +        short[] shortData = columnPage.getShortData();
    +        if (dataType == DataTypes.SHORT) {
    +          for (int i = 0; i < pageSize; i++) {
    +            vector.putShort(i, (short) (max - shortData[i]));
    +          }
    +        } else if (dataType == DataTypes.INT) {
    +          for (int i = 0; i < pageSize; i++) {
    +            vector.putInt(i, (int) (max - shortData[i]));
    +          }
    +        } else if (dataType == DataTypes.LONG) {
    +          for (int i = 0; i < pageSize; i++) {
    +            vector.putLong(i, (max - shortData[i]));
    +          }
    +        }  else if (dataType == DataTypes.TIMESTAMP) {
    +          for (int i = 0; i < pageSize; i++) {
    +            vector.putLong(i, (max - shortData[i]) * 1000);
    +          }
    +        } else if (DataTypes.isDecimal(dataType)) {
    +          DecimalConverterFactory.DecimalConverter decimalConverter = vectorInfo.decimalConverter;
    +          int precision = vectorInfo.measure.getMeasure().getPrecision();
    +          for (int i = 0; i < pageSize; i++) {
    +            BigDecimal decimal = decimalConverter.getDecimal(max - shortData[i]);
    +            vector.putDecimal(i, decimal, precision);
    +          }
    +        } else {
    +          for (int i = 0; i < pageSize; i++) {
    +            vector.putDouble(i, (max - shortData[i]));
    +          }
    +        }
    +
    +      } else if (type == DataTypes.SHORT_INT) {
    +        int[] shortIntData = columnPage.getShortIntData();
    +        if (dataType == DataTypes.INT) {
    +          for (int i = 0; i < pageSize; i++) {
    +            vector.putInt(i, (int) (max - shortIntData[i]));
    +          }
    +        } else if (dataType == DataTypes.LONG) {
    +          for (int i = 0; i < pageSize; i++) {
    +            vector.putLong(i, (max - shortIntData[i]));
    +          }
    +        }  else if (dataType == DataTypes.TIMESTAMP) {
    +          for (int i = 0; i < pageSize; i++) {
    +            vector.putLong(i, (max - shortIntData[i]) * 1000);
    +          }
    +        } else if (DataTypes.isDecimal(dataType)) {
    +          DecimalConverterFactory.DecimalConverter decimalConverter = vectorInfo.decimalConverter;
    +          int precision = vectorInfo.measure.getMeasure().getPrecision();
    +          for (int i = 0; i < pageSize; i++) {
    +            BigDecimal decimal = decimalConverter.getDecimal(max - shortIntData[i]);
    +            vector.putDecimal(i, decimal, precision);
    +          }
    +        } else {
    +          for (int i = 0; i < pageSize; i++) {
    +            vector.putDouble(i, (max - shortIntData[i]));
    +          }
    +        }
    +      } else if (type == DataTypes.INT) {
    +        int[] intData = columnPage.getIntData();
    +        if (dataType == DataTypes.INT) {
    +          for (int i = 0; i < pageSize; i++) {
    +            vector.putInt(i, (int) (max - intData[i]));
    +          }
    +        } else if (dataType == DataTypes.LONG) {
    +          for (int i = 0; i < pageSize; i++) {
    +            vector.putLong(i, (max - intData[i]));
    +          }
    +        } else if (dataType == DataTypes.TIMESTAMP) {
    +          for (int i = 0; i < pageSize; i++) {
    +            vector.putLong(i, (max - intData[i]) * 1000);
    +          }
    +        } else if (DataTypes.isDecimal(dataType)) {
    +          DecimalConverterFactory.DecimalConverter decimalConverter = vectorInfo.decimalConverter;
    +          int precision = vectorInfo.measure.getMeasure().getPrecision();
    +          for (int i = 0; i < pageSize; i++) {
    +            BigDecimal decimal = decimalConverter.getDecimal(max - intData[i]);
    +            vector.putDecimal(i, decimal, precision);
    +          }
    +        } else {
    +          for (int i = 0; i < pageSize; i++) {
    +            vector.putDouble(i, (max - intData[i]));
    +          }
    +        }
    +      } else if (type == DataTypes.LONG) {
    +        long[] longData = columnPage.getLongData();
    +        if (dataType == DataTypes.LONG) {
    +          for (int i = 0; i < pageSize; i++) {
    +            vector.putLong(i, (max - longData[i]));
    +          }
    +        } else if (dataType == DataTypes.TIMESTAMP) {
    +          for (int i = 0; i < pageSize; i++) {
    +            vector.putLong(i, (max - longData[i]) * 1000);
    +          }
    +        } else if (DataTypes.isDecimal(dataType)) {
    +          DecimalConverterFactory.DecimalConverter decimalConverter = vectorInfo.decimalConverter;
    +          int precision = vectorInfo.measure.getMeasure().getPrecision();
    +          for (int i = 0; i < pageSize; i++) {
    +            BigDecimal decimal = decimalConverter.getDecimal(max - longData[i]);
    +            vector.putDecimal(i, decimal, precision);
    +          }
    +        }
    +      } else {
    +        throw new RuntimeException("internal error: " + this.toString());
    --- End diff --
   
    required to optimize this exception message for better understanding


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2823: [CARBONDATA-3015] Support Lazy load in carbon...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2823#discussion_r226867258
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/SafeVariableLengthDimensionDataChunkStore.java ---
    @@ -91,6 +95,31 @@ public void putArray(final int[] invertedIndex, final int[] invertedIndexReverse
         }
       }
     
    +  @Override
    +  public void fillVector(int[] invertedIndex, int[] invertedIndexReverse, byte[] data,
    +      ColumnVectorInfo vectorInfo) {
    +    this.invertedIndexReverse = invertedIndex;
    +
    +    // as first position will be start from 2 byte as data is stored first in the memory block
    --- End diff --
   
    ok


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2823: [CARBONDATA-3015] Support Lazy load in carbon...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2823#discussion_r226867290
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/AbstractNonDictionaryVectorFiller.java ---
    @@ -0,0 +1,274 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.carbondata.core.datastore.chunk.store.impl.safe;
    +
    +import java.nio.ByteBuffer;
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants;
    +import org.apache.carbondata.core.metadata.datatype.DataType;
    +import org.apache.carbondata.core.metadata.datatype.DataTypes;
    +import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector;
    +import org.apache.carbondata.core.util.ByteUtil;
    +import org.apache.carbondata.core.util.DataTypeUtil;
    +
    +public abstract class AbstractNonDictionaryVectorFiller {
    +
    +  protected int lengthSize;
    +  protected int numberOfRows;
    +
    +  public AbstractNonDictionaryVectorFiller(int lengthSize, int numberOfRows) {
    +    this.lengthSize = lengthSize;
    +    this.numberOfRows = numberOfRows;
    +  }
    +
    +  public abstract void fillVector(byte[] data, CarbonColumnVector vector, ByteBuffer buffer);
    +
    +  public int getLengthFromBuffer(ByteBuffer buffer) {
    +    return buffer.getShort();
    +  }
    +}
    +
    +class NonDictionaryVectorFillerFactory {
    +
    +  public static AbstractNonDictionaryVectorFiller getVectorFiller(DataType type, int lengthSize,
    +      int numberOfRows) {
    +    if (type == DataTypes.STRING || type == DataTypes.VARCHAR) {
    +      if (lengthSize == 2) {
    --- End diff --
   
    ok


---
12345