GitHub user ravipesala opened a pull request:
https://github.com/apache/carbondata/pull/1377 [WIP] Fixed refresh of segments in datamap for update and partition Currently datamap scans complete segment every time for query execution to get all the carbonindex files. It will be slow when data/number of files are big. So this PR caches the content of segment and refreshes when any updation or store changes. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ravipesala/incubator-carbondata datamap-refresh Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1377.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1377 ---- commit 1f0e834353946dc53ed2f5686e223951fd8c9707 Author: Ravindra Pesala <[hidden email]> Date: 2017-09-21T13:33:06Z Fixed refersh of segments in datamap for update and partition ---- --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1377 Build Success with Spark 1.6, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/139/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1377 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/263/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1377 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/894/ --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1377#discussion_r140658131 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/datamap/DataMapWriterSuite.scala --- @@ -41,9 +41,9 @@ class C2DataMapFactory() extends DataMapFactory { override def fireEvent(event: ChangeEvent[_]): Unit = ??? - override def clear(segmentId: String): Unit = ??? + override def clear(segmentId: String): Unit = {} --- End diff -- why change this? --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1377#discussion_r141355777 --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java --- @@ -87,6 +93,7 @@ public TableDataMap getDataMap(AbsoluteTableIdentifier identifier, public TableDataMap createAndRegisterDataMap(AbsoluteTableIdentifier identifier, String factoryClassName, String dataMapName) { String table = identifier.uniqueName(); + getTableSegmentRefresher(identifier); --- End diff -- ignoring the return object? --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1377#discussion_r141356113 --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java --- @@ -126,20 +133,20 @@ private TableDataMap getTableDataMap(String dataMapName, /** * Clear the datamap/datamaps of a mentioned datamap name and table from memory --- End diff -- please modify the comment also, datamap name is removed --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1377#discussion_r141356751 --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java --- @@ -151,4 +158,61 @@ public static DataMapStoreManager getInstance() { return instance; } + public TableSegmentRefresher getTableSegmentRefresher(AbsoluteTableIdentifier identifier) { + String uniqueName = identifier.uniqueName(); + if (segmentRefreshMap.get(uniqueName) == null) { + segmentRefreshMap.put(uniqueName, new TableSegmentRefresher(identifier)); + } + return segmentRefreshMap.get(uniqueName); + } + + /** + * Keep track of the segment refresh time. + */ + public static class TableSegmentRefresher { + + private Map<String, Long> segmentRefreshTime = new HashMap<>(); --- End diff -- add comment to describe the content, like map segmentId to last updated timestamp --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1377#discussion_r141357648 --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java --- @@ -120,15 +121,17 @@ public void clear(String segmentId) { if (blockIndexes != null) { for (TableBlockIndexUniqueIdentifier blockIndex : blockIndexes) { DataMap dataMap = cache.getIfPresent(blockIndex); - dataMap.clear(); - cache.invalidate(blockIndex); + if (dataMap != null) { + cache.invalidate(blockIndex); + dataMap.clear(); + } } } } @Override public void clear() { - for (String segmentId: segmentMap.keySet()) { + for (String segmentId: segmentMap.keySet().toArray(new String[segmentMap.size()])) { --- End diff -- Why this is needed? --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1377#discussion_r141359606 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java --- @@ -292,6 +291,26 @@ private AbsoluteTableIdentifier getAbsoluteTableIdentifier(Configuration configu } } + // Clean the updated segments from memory if the update happens on segments --- End diff -- Suggest to extract these logic as a method to refresh the datamap, and put this method in TableDataMap --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1377#discussion_r141526946 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/datamap/DataMapWriterSuite.scala --- @@ -41,9 +41,9 @@ class C2DataMapFactory() extends DataMapFactory { override def fireEvent(event: ChangeEvent[_]): Unit = ??? - override def clear(segmentId: String): Unit = ??? + override def clear(segmentId: String): Unit = {} --- End diff -- Earlier clear was not calling, but not it is called so implementation has to be provided otherwise testcase fails. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1377#discussion_r141527158 --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java --- @@ -87,6 +93,7 @@ public TableDataMap getDataMap(AbsoluteTableIdentifier identifier, public TableDataMap createAndRegisterDataMap(AbsoluteTableIdentifier identifier, String factoryClassName, String dataMapName) { String table = identifier.uniqueName(); + getTableSegmentRefresher(identifier); --- End diff -- Yes, not required as we want only update segmentRefreshMap here at first time. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1377#discussion_r141527297 --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java --- @@ -126,20 +133,20 @@ private TableDataMap getTableDataMap(String dataMapName, /** * Clear the datamap/datamaps of a mentioned datamap name and table from memory --- End diff -- ok --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1377#discussion_r141528126 --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java --- @@ -151,4 +158,61 @@ public static DataMapStoreManager getInstance() { return instance; } + public TableSegmentRefresher getTableSegmentRefresher(AbsoluteTableIdentifier identifier) { + String uniqueName = identifier.uniqueName(); + if (segmentRefreshMap.get(uniqueName) == null) { + segmentRefreshMap.put(uniqueName, new TableSegmentRefresher(identifier)); + } + return segmentRefreshMap.get(uniqueName); + } + + /** + * Keep track of the segment refresh time. + */ + public static class TableSegmentRefresher { + + private Map<String, Long> segmentRefreshTime = new HashMap<>(); --- End diff -- ok --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1377#discussion_r141528239 --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java --- @@ -120,15 +121,17 @@ public void clear(String segmentId) { if (blockIndexes != null) { for (TableBlockIndexUniqueIdentifier blockIndex : blockIndexes) { DataMap dataMap = cache.getIfPresent(blockIndex); - dataMap.clear(); - cache.invalidate(blockIndex); + if (dataMap != null) { + cache.invalidate(blockIndex); + dataMap.clear(); + } } } } @Override public void clear() { - for (String segmentId: segmentMap.keySet()) { + for (String segmentId: segmentMap.keySet().toArray(new String[segmentMap.size()])) { --- End diff -- It is needed as we cannot we cannot iterate on set and remove an entry at the same time. So here it is converted to array so that we can remove an entry from map in clear method --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1377#discussion_r141528559 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java --- @@ -292,6 +291,26 @@ private AbsoluteTableIdentifier getAbsoluteTableIdentifier(Configuration configu } } + // Clean the updated segments from memory if the update happens on segments --- End diff -- Do we need send the `SegmentUpdateDetails` detail information to `TableDataMap` ? I think it is better finding out the refreshed segments can be kept here. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1377 Build Success with Spark 1.6, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/188/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1377 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/313/ --- |
In reply to this post by qiuchenjian-2
|
In reply to this post by qiuchenjian-2
|
Free forum by Nabble | Edit this page |