[GitHub] carbondata pull request #1377: [WIP] Fixed refresh of segments in datamap fo...

classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1377: [WIP] Fixed refresh of segments in datamap fo...

qiuchenjian-2
GitHub user ravipesala opened a pull request:

    https://github.com/apache/carbondata/pull/1377

    [WIP] Fixed refresh of segments in datamap for update and partition

    Currently datamap scans complete segment every time for query execution to get all the carbonindex files. It will be slow when data/number of files are big.
    So this PR caches the content of segment and refreshes when any updation or store changes.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ravipesala/incubator-carbondata datamap-refresh

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/1377.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1377
   
----
commit 1f0e834353946dc53ed2f5686e223951fd8c9707
Author: Ravindra Pesala <[hidden email]>
Date:   2017-09-21T13:33:06Z

    Fixed refersh of segments in datamap for update and partition

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1377: [WIP] Fixed refresh of segments in datamap for updat...

qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1377
 
    Build Success with Spark 1.6, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/139/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1377: [WIP] Fixed refresh of segments in datamap for updat...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1377
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/263/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1377: [WIP] Fixed refresh of segments in datamap for updat...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1377
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/894/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1377: [CARBONDATA-1504] Fixed refresh of segments i...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1377#discussion_r140658131
 
    --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/datamap/DataMapWriterSuite.scala ---
    @@ -41,9 +41,9 @@ class C2DataMapFactory() extends DataMapFactory {
     
       override def fireEvent(event: ChangeEvent[_]): Unit = ???
     
    -  override def clear(segmentId: String): Unit = ???
    +  override def clear(segmentId: String): Unit = {}
    --- End diff --
   
    why change this?


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1377: [CARBONDATA-1504] Fixed refresh of segments i...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1377#discussion_r141355777
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java ---
    @@ -87,6 +93,7 @@ public TableDataMap getDataMap(AbsoluteTableIdentifier identifier,
       public TableDataMap createAndRegisterDataMap(AbsoluteTableIdentifier identifier,
           String factoryClassName, String dataMapName) {
         String table = identifier.uniqueName();
    +    getTableSegmentRefresher(identifier);
    --- End diff --
   
    ignoring the return object?


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1377: [CARBONDATA-1504] Fixed refresh of segments i...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1377#discussion_r141356113
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java ---
    @@ -126,20 +133,20 @@ private TableDataMap getTableDataMap(String dataMapName,
       /**
        * Clear the datamap/datamaps of a mentioned datamap name and table from memory
    --- End diff --
   
    please modify the comment also, datamap name is removed


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1377: [CARBONDATA-1504] Fixed refresh of segments i...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1377#discussion_r141356751
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java ---
    @@ -151,4 +158,61 @@ public static DataMapStoreManager getInstance() {
         return instance;
       }
     
    +  public TableSegmentRefresher getTableSegmentRefresher(AbsoluteTableIdentifier identifier) {
    +    String uniqueName = identifier.uniqueName();
    +    if (segmentRefreshMap.get(uniqueName) == null) {
    +      segmentRefreshMap.put(uniqueName, new TableSegmentRefresher(identifier));
    +    }
    +    return segmentRefreshMap.get(uniqueName);
    +  }
    +
    +  /**
    +   * Keep track of the segment refresh time.
    +   */
    +  public static class TableSegmentRefresher {
    +
    +    private Map<String, Long> segmentRefreshTime = new HashMap<>();
    --- End diff --
   
    add comment to describe the content, like map segmentId to last updated timestamp


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1377: [CARBONDATA-1504] Fixed refresh of segments i...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1377#discussion_r141357648
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java ---
    @@ -120,15 +121,17 @@ public void clear(String segmentId) {
         if (blockIndexes != null) {
           for (TableBlockIndexUniqueIdentifier blockIndex : blockIndexes) {
             DataMap dataMap = cache.getIfPresent(blockIndex);
    -        dataMap.clear();
    -        cache.invalidate(blockIndex);
    +        if (dataMap != null) {
    +          cache.invalidate(blockIndex);
    +          dataMap.clear();
    +        }
           }
         }
       }
     
       @Override
       public void clear() {
    -    for (String segmentId: segmentMap.keySet()) {
    +    for (String segmentId: segmentMap.keySet().toArray(new String[segmentMap.size()])) {
    --- End diff --
   
    Why this is needed?


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1377: [CARBONDATA-1504] Fixed refresh of segments i...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1377#discussion_r141359606
 
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java ---
    @@ -292,6 +291,26 @@ private AbsoluteTableIdentifier getAbsoluteTableIdentifier(Configuration configu
           }
         }
     
    +    // Clean the updated segments from memory if the update happens on segments
    --- End diff --
   
    Suggest to extract these logic as a method to refresh the datamap, and put this method in TableDataMap


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1377: [CARBONDATA-1504] Fixed refresh of segments i...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1377#discussion_r141526946
 
    --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/datamap/DataMapWriterSuite.scala ---
    @@ -41,9 +41,9 @@ class C2DataMapFactory() extends DataMapFactory {
     
       override def fireEvent(event: ChangeEvent[_]): Unit = ???
     
    -  override def clear(segmentId: String): Unit = ???
    +  override def clear(segmentId: String): Unit = {}
    --- End diff --
   
    Earlier clear was not calling, but not it is called so implementation has to be provided otherwise testcase fails.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1377: [CARBONDATA-1504] Fixed refresh of segments i...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1377#discussion_r141527158
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java ---
    @@ -87,6 +93,7 @@ public TableDataMap getDataMap(AbsoluteTableIdentifier identifier,
       public TableDataMap createAndRegisterDataMap(AbsoluteTableIdentifier identifier,
           String factoryClassName, String dataMapName) {
         String table = identifier.uniqueName();
    +    getTableSegmentRefresher(identifier);
    --- End diff --
   
    Yes, not required as we want only update segmentRefreshMap here at first time.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1377: [CARBONDATA-1504] Fixed refresh of segments i...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1377#discussion_r141527297
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java ---
    @@ -126,20 +133,20 @@ private TableDataMap getTableDataMap(String dataMapName,
       /**
        * Clear the datamap/datamaps of a mentioned datamap name and table from memory
    --- End diff --
   
    ok


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1377: [CARBONDATA-1504] Fixed refresh of segments i...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1377#discussion_r141528126
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java ---
    @@ -151,4 +158,61 @@ public static DataMapStoreManager getInstance() {
         return instance;
       }
     
    +  public TableSegmentRefresher getTableSegmentRefresher(AbsoluteTableIdentifier identifier) {
    +    String uniqueName = identifier.uniqueName();
    +    if (segmentRefreshMap.get(uniqueName) == null) {
    +      segmentRefreshMap.put(uniqueName, new TableSegmentRefresher(identifier));
    +    }
    +    return segmentRefreshMap.get(uniqueName);
    +  }
    +
    +  /**
    +   * Keep track of the segment refresh time.
    +   */
    +  public static class TableSegmentRefresher {
    +
    +    private Map<String, Long> segmentRefreshTime = new HashMap<>();
    --- End diff --
   
    ok


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1377: [CARBONDATA-1504] Fixed refresh of segments i...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1377#discussion_r141528239
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java ---
    @@ -120,15 +121,17 @@ public void clear(String segmentId) {
         if (blockIndexes != null) {
           for (TableBlockIndexUniqueIdentifier blockIndex : blockIndexes) {
             DataMap dataMap = cache.getIfPresent(blockIndex);
    -        dataMap.clear();
    -        cache.invalidate(blockIndex);
    +        if (dataMap != null) {
    +          cache.invalidate(blockIndex);
    +          dataMap.clear();
    +        }
           }
         }
       }
     
       @Override
       public void clear() {
    -    for (String segmentId: segmentMap.keySet()) {
    +    for (String segmentId: segmentMap.keySet().toArray(new String[segmentMap.size()])) {
    --- End diff --
   
    It is needed as we cannot we cannot iterate on set and remove an entry at the same time. So here it is converted to array so that we can remove an entry from map in clear method


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1377: [CARBONDATA-1504] Fixed refresh of segments i...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1377#discussion_r141528559
 
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java ---
    @@ -292,6 +291,26 @@ private AbsoluteTableIdentifier getAbsoluteTableIdentifier(Configuration configu
           }
         }
     
    +    // Clean the updated segments from memory if the update happens on segments
    --- End diff --
   
    Do we need send the `SegmentUpdateDetails` detail information to `TableDataMap` ? I think it is better finding out the refreshed segments can be kept here.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1377: [CARBONDATA-1504] Fixed refresh of segments in datam...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1377
 
    Build Success with Spark 1.6, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/188/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1377: [CARBONDATA-1504] Fixed refresh of segments in datam...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1377
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/313/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1377: [CARBONDATA-1504] Fixed refresh of segments in datam...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on the issue:

    https://github.com/apache/carbondata/pull/1377
 
    LGTM


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1377: [CARBONDATA-1504] Fixed refresh of segments i...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user asfgit closed the pull request at:

    https://github.com/apache/carbondata/pull/1377


---
12