[GitHub] carbondata pull request #2454: [WIP] [CARBONDATA-2701] Refactor code to stor...

classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2454: [WIP] [CARBONDATA-2701] Refactor code to stor...

qiuchenjian-2
GitHub user manishgupta88 opened a pull request:

    https://github.com/apache/carbondata/pull/2454

    [WIP] [CARBONDATA-2701] Refactor code to store minimal required info in Block and Blocklet Cache

    Things done as part of this PR
    1. Refactored code to keep only minimal information in block and blocklet cache.
    2. Introduced segment properties holder at JVM level to hold the segment properties. As it is heavy object, new segment properties object will be created only when schema or cardinality is changed for a table.
    This PR depends on PR #2437
   
     - [ ] Any interfaces changed?
     No
     - [ ] Any backward compatibility impacted?
     NA
     - [ ] Document update required?
    No
     - [ ] Testing done
    Yes      
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
    NA


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/manishgupta88/carbondata refactor_segmentproperties

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2454.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2454
   
----
commit c06de06046da4efe6dc606f410686dcea256d46f
Author: manishgupta88 <tomanishgupta18@...>
Date:   2018-06-25T06:43:00Z

    segregate block and blocklet cache

commit a5017751f45a43ce75a98610214049e1c894e1e7
Author: manishgupta88 <tomanishgupta18@...>
Date:   2018-07-04T15:30:54Z

    Refactor Block and Blocklet DataMap to store only segmentProeprties Index instead of segmentProperties

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2454: [WIP] [CARBONDATA-2701] Refactor code to store minim...

qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2454
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5637/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2454: [WIP] [CARBONDATA-2701] Refactor code to store minim...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2454
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5638/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2454: [WIP] [CARBONDATA-2701] Refactor code to store minim...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2454
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6843/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2454: [WIP] [CARBONDATA-2701] Refactor code to store minim...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2454
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6845/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2454: [WIP] [CARBONDATA-2701] Refactor code to store minim...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2454
 
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5636/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2454: [CARBONDATA-2701] Refactor code to store minimal req...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2454
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6890/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2454: [CARBONDATA-2701] Refactor code to store minimal req...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2454
 
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5670/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2454: [CARBONDATA-2701] Refactor code to store minimal req...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2454
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6927/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2454: [CARBONDATA-2701] Refactor code to store minimal req...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2454
 
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5712/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2454: [CARBONDATA-2701] Refactor code to store minimal req...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2454
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5700/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2454: [CARBONDATA-2701] Refactor code to store minimal req...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2454
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5701/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2454: [CARBONDATA-2701] Refactor code to store minimal req...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2454
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6928/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2454: [CARBONDATA-2701] Refactor code to store minimal req...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2454
 
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5713/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2454: [CARBONDATA-2701] Refactor code to store mini...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2454#discussion_r200870385
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/util/BlockletDataMapUtil.java ---
    @@ -321,4 +328,43 @@ private static boolean isSameColumnSchemaList(List<ColumnSchema> indexFileColumn
         }
         return updatedValues;
       }
    +
    +  /**
    +   * Convert schema to binary
    +   */
    +  public static byte[] convertSchemaToBinary(List<ColumnSchema> columnSchemas) throws IOException {
    +    ByteArrayOutputStream stream = new ByteArrayOutputStream();
    +    DataOutput dataOutput = new DataOutputStream(stream);
    +    dataOutput.writeShort(columnSchemas.size());
    +    for (ColumnSchema columnSchema : columnSchemas) {
    +      if (columnSchema.getColumnReferenceId() == null) {
    +        columnSchema.setColumnReferenceId(columnSchema.getColumnUniqueId());
    +      }
    +      columnSchema.write(dataOutput);
    +    }
    +    byte[] byteArray = stream.toByteArray();
    +    // Compress with snappy to reduce the size of schema
    +    return Snappy.rawCompress(byteArray, byteArray.length);
    --- End diff --
   
    Use compressor factory.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2454: [CARBONDATA-2701] Refactor code to store mini...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2454#discussion_r200870442
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/util/BlockletDataMapUtil.java ---
    @@ -321,4 +328,43 @@ private static boolean isSameColumnSchemaList(List<ColumnSchema> indexFileColumn
         }
         return updatedValues;
       }
    +
    +  /**
    +   * Convert schema to binary
    +   */
    +  public static byte[] convertSchemaToBinary(List<ColumnSchema> columnSchemas) throws IOException {
    +    ByteArrayOutputStream stream = new ByteArrayOutputStream();
    +    DataOutput dataOutput = new DataOutputStream(stream);
    +    dataOutput.writeShort(columnSchemas.size());
    +    for (ColumnSchema columnSchema : columnSchemas) {
    +      if (columnSchema.getColumnReferenceId() == null) {
    +        columnSchema.setColumnReferenceId(columnSchema.getColumnUniqueId());
    +      }
    +      columnSchema.write(dataOutput);
    +    }
    +    byte[] byteArray = stream.toByteArray();
    +    // Compress with snappy to reduce the size of schema
    +    return Snappy.rawCompress(byteArray, byteArray.length);
    +  }
    +
    +  /**
    +   * Read column schema from binary
    +   *
    +   * @param schemaArray
    +   * @throws IOException
    +   */
    +  public static List<ColumnSchema> readColumnSchema(byte[] schemaArray) throws IOException {
    +    // uncompress it.
    +    schemaArray = Snappy.uncompress(schemaArray);
    --- End diff --
   
    Same as abive


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2454: [CARBONDATA-2701] Refactor code to store mini...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2454#discussion_r200870697
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java ---
    @@ -17,7 +17,11 @@
     package org.apache.carbondata.core.indexstore.blockletindex;
     
     import java.io.IOException;
    -import java.util.*;
    +import java.util.ArrayList;
    --- End diff --
   
    Remove unnecessary changes


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2454: [CARBONDATA-2701] Refactor code to store minimal req...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2454
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5718/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2454: [CARBONDATA-2701] Refactor code to store minimal req...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2454
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6934/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2454: [CARBONDATA-2701] Refactor code to store minimal req...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2454
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5706/



---
12