[GitHub] carbondata pull request #2706: [WIP] multiple issue fixes for varchar column...

classic Classic list List threaded Threaded
57 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2706: [CARBONDATA-2927] multiple issue fixes for va...

qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2706#discussion_r217048144
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/sort/SortStepRowHandler.java ---
    @@ -570,23 +589,31 @@ public int writeRawRowAsIntermediateSortTempRowToUnsafeMemory(Object[] row,
       private void packNoSortFieldsToBytes(Object[] row, ByteBuffer rowBuffer) {
         // convert dict & no-sort
         for (int idx = 0; idx < this.dictNoSortDimCnt; idx++) {
    +      // cannot exceed default 2MB, hence no need to call ensureArraySize
           rowBuffer.putInt((int) row[this.dictNoSortDimIdx[idx]]);
         }
         // convert no-dict & no-sort
         for (int idx = 0; idx < this.noDictNoSortDimCnt; idx++) {
           byte[] bytes = (byte[]) row[this.noDictNoSortDimIdx[idx]];
    +      // cannot exceed default 2MB, hence no need to call ensureArraySize
    --- End diff --
   
    2 MB is not enough for varchar and complex complex columns.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2706: [CARBONDATA-2927] multiple issue fixes for va...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2706#discussion_r217048602
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/sort/unsafe/UnsafeCarbonRowPage.java ---
    @@ -59,12 +60,11 @@ public UnsafeCarbonRowPage(TableFieldStat tableFieldStat, MemoryBlock memoryBloc
         this.taskId = taskId;
         buffer = new IntPointerBuffer(this.taskId);
         this.dataBlock = memoryBlock;
    -    // TODO Only using 98% of space for safe side.May be we can have different logic.
    -    sizeToBeUsed = dataBlock.size() - (dataBlock.size() * 5) / 100;
    +    sizeToBeUsed = dataBlock.size();
    --- End diff --
   
    Please keep back this check, it is for reserving memory for a row in case it exceeds


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2706: [CARBONDATA-2927] multiple issue fixes for va...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2706#discussion_r217049111
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/sort/unsafe/UnsafeSortDataRows.java ---
    @@ -94,21 +94,30 @@
     
       private final long taskId;
     
    -  public UnsafeSortDataRows(SortParameters parameters,
    -      UnsafeIntermediateMerger unsafeInMemoryIntermediateFileMerger, int inMemoryChunkSize) {
    -    this.parameters = parameters;
    -    this.tableFieldStat = new TableFieldStat(parameters);
    -    this.rowBuffer = new ThreadLocal<ByteBuffer>() {
    +  public static ThreadLocal<ByteBuffer> getRowBuffer() {
    +    return rowBuffer;
    +  }
    +
    +  static  {
    +    rowBuffer = new ThreadLocal<ByteBuffer>() {
    --- End diff --
   
    Please try using DataOutSteam backed by ByteOutPutStream, it can expand dynamically.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2706: [CARBONDATA-2927] multiple issue fixes for varchar c...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2706
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/307/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2706: [CARBONDATA-2927] multiple issue fixes for varchar c...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2706
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/483/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2706: [CARBONDATA-2927] multiple issue fixes for varchar c...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2706
 
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8553/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2706: [CARBONDATA-2927] multiple issue fixes for va...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2706#discussion_r217987255
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerColumnar.java ---
    @@ -239,6 +239,7 @@ public void addDataToStore(CarbonRow row) throws CarbonDataWriterException {
        * @return false if any varchar column page cannot add one more value(2MB)
        */
       private boolean isVarcharColumnFull(CarbonRow row) {
    +    //TODO: test and remove this as now  UnsafeSortDataRows can exceed 2MB
    --- End diff --
   
    This is for a column page cannot exceed 2GB (actually it is 1.67GB since snappy cannot compress a bigger size in one run), so there is no need to add a comment here


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2706: [CARBONDATA-2927] multiple issue fixes for varchar c...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2706
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/308/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2706: [CARBONDATA-2927] multiple issue fixes for varchar c...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2706
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/484/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2706: [CARBONDATA-2927] multiple issue fixes for varchar c...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2706
 
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8554/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2706: [CARBONDATA-2927] multiple issue fixes for varchar c...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2706
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/309/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2706: [CARBONDATA-2927] multiple issue fixes for va...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2706#discussion_r218038298
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/util/ReUsableByteArrayDataOutputStream.java ---
    @@ -0,0 +1,47 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.carbondata.core.util;
    +
    +import java.io.ByteArrayOutputStream;
    +import java.io.DataOutputStream;
    +
    +/**
    + * wrapper around DataOutputStream. Which clears the buffer for reuse
    + */
    +public final class ReUsableByteArrayDataOutputStream extends DataOutputStream {
    +
    +  private ByteArrayOutputStream outputStream;
    +
    +  public ReUsableByteArrayDataOutputStream(ByteArrayOutputStream outputStream) {
    +    super(outputStream);
    +    this.outputStream = outputStream;
    +  }
    +
    +  public void resetByteArrayOutputStream() {
    --- End diff --
   
    Just change name to reset


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2706: [CARBONDATA-2927] multiple issue fixes for va...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2706#discussion_r218038401
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/util/ReUsableByteArrayDataOutputStream.java ---
    @@ -0,0 +1,47 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.carbondata.core.util;
    +
    +import java.io.ByteArrayOutputStream;
    +import java.io.DataOutputStream;
    +
    +/**
    + * wrapper around DataOutputStream. Which clears the buffer for reuse
    + */
    +public final class ReUsableByteArrayDataOutputStream extends DataOutputStream {
    +
    +  private ByteArrayOutputStream outputStream;
    +
    +  public ReUsableByteArrayDataOutputStream(ByteArrayOutputStream outputStream) {
    +    super(outputStream);
    +    this.outputStream = outputStream;
    +  }
    +
    +  public void resetByteArrayOutputStream() {
    +    outputStream.reset();
    +  }
    +
    +  public int getByteArrayOutputStreamSize() {
    --- End diff --
   
    Just change name to getSize


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2706: [CARBONDATA-2927] multiple issue fixes for va...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2706#discussion_r218047291
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/util/ReUsableByteArrayDataOutputStream.java ---
    @@ -0,0 +1,47 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.carbondata.core.util;
    +
    +import java.io.ByteArrayOutputStream;
    +import java.io.DataOutputStream;
    +
    +/**
    + * wrapper around DataOutputStream. Which clears the buffer for reuse
    + */
    +public final class ReUsableByteArrayDataOutputStream extends DataOutputStream {
    +
    +  private ByteArrayOutputStream outputStream;
    +
    +  public ReUsableByteArrayDataOutputStream(ByteArrayOutputStream outputStream) {
    +    super(outputStream);
    +    this.outputStream = outputStream;
    +  }
    +
    +  public void resetByteArrayOutputStream() {
    --- End diff --
   
    done


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2706: [CARBONDATA-2927] multiple issue fixes for va...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2706#discussion_r218047331
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/util/ReUsableByteArrayDataOutputStream.java ---
    @@ -0,0 +1,47 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.carbondata.core.util;
    +
    +import java.io.ByteArrayOutputStream;
    +import java.io.DataOutputStream;
    +
    +/**
    + * wrapper around DataOutputStream. Which clears the buffer for reuse
    + */
    +public final class ReUsableByteArrayDataOutputStream extends DataOutputStream {
    +
    +  private ByteArrayOutputStream outputStream;
    +
    +  public ReUsableByteArrayDataOutputStream(ByteArrayOutputStream outputStream) {
    +    super(outputStream);
    +    this.outputStream = outputStream;
    +  }
    +
    +  public void resetByteArrayOutputStream() {
    +    outputStream.reset();
    +  }
    +
    +  public int getByteArrayOutputStreamSize() {
    --- End diff --
   
    done


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2706: [CARBONDATA-2927] multiple issue fixes for varchar c...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2706
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/312/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2706: [CARBONDATA-2927] multiple issue fixes for varchar c...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2706
 
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8558/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2706: [CARBONDATA-2927] multiple issue fixes for varchar c...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2706
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/488/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2706: [CARBONDATA-2927] multiple issue fixes for varchar c...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on the issue:

    https://github.com/apache/carbondata/pull/2706
 
    @ravipesala : PR is ready. Please check.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2706: [CARBONDATA-2927] multiple issue fixes for va...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2706#discussion_r218403232
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerColumnar.java ---
    @@ -239,6 +239,7 @@ public void addDataToStore(CarbonRow row) throws CarbonDataWriterException {
        * @return false if any varchar column page cannot add one more value(2MB)
        */
       private boolean isVarcharColumnFull(CarbonRow row) {
    +    //TODO: test and remove this as now  UnsafeSortDataRows can exceed 2MB
    --- End diff --
   
    complex column also can grow very big (so checking only for var char is not good)
    Also now columns grow more than 2MB. so, we need to modify this check.
    This can be handled in separate PR.
   
    now no impact from this method as "if 2MB itself space not there, more than 2MB space will never be there". so functionality remains same.



---
123