Loading... |
Reply to author |
Edit post |
Move post |
Delete this post |
Delete this post and replies |
Change post date |
Print post |
Permalink |
Raw mail |
In every query, carbondata has to scan all the segment file.
So when there is too much segments, it take too much time to get all the file info. The customer hope comminity can solve this. When there is no segment changed, carbondata should not scan all the segment file. This is the stack of call: at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1641) at org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.<init>(AbstractDFSCarbonFile.java:77) at org.apache.carbondata.core.datastore.filesystem.HDFSCarbonFile.<init>(HDFSCarbonFile.java:44) at org.apache.carbondata.spark.acl.filesystem.HDFSACLCarbonFile.<init>(HDFSACLCarbonFile.java:46) at org.apache.carbondata.spark.acl.ACLFileFactory.getCarbonFile(ACLFileFactory.java:48) at org.apache.carbondata.core.datastore.impl.FileFactory.getCarbonFile(FileFactory.java:167) at org.apache.carbondata.core.readcommitter.TableStatusReadCommittedScope.getCommittedSegmentRefreshInfo(TableStatusReadCommittedScope.java:97) at org.apache.carbondata.core.datamap.Segment.getSegmentRefreshInfo(Segment.java:177) at org.apache.carbondata.core.datamap.DataMapStoreManager$TableSegmentRefresher.isRefreshNeeded(DataMapStoreManager.java:772) at org.apache.carbondata.core.datamap.DataMapStoreManager.getSegmentsToBeRefreshed(DataMapStoreManager.java:505) at org.apache.carbondata.core.datamap.DataMapStoreManager.refreshSegmentCacheIfRequired(DataMapStoreManager.java:519) at org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:465) at org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:199) at org.apache.carbondata.spark.rdd.CarbonScanRDD.internalGetPartitions(CarbonScanRDD.scala:170) at org.apache.carbondata.spark.rdd.CarbonRDD.getPartitions(CarbonRDD.scala:68) |
Loading... |
Reply to author |
Edit post |
Move post |
Delete this post |
Delete this post and replies |
Change post date |
Print post |
Permalink |
Raw mail |
I think there can be a file named LAST_MODIFY.
It contains the last update time of the segment file. When carbon try to refresh the segment cache, if found that the update time in LAST_MODIFY time is the same with the cache, then there is no need to refresh all segment file. |
Loading... |
Reply to author |
Edit post |
Move post |
Delete this post |
Delete this post and replies |
Change post date |
Print post |
Permalink |
Raw mail |
Hi,
This issue has already been fixed. The segments do not refresh from cache if the segment file name has not been updated. Please find the solution in the following PR :https://github.com/apache/carbondata/pull/3988 Kindly check the changes in TableStatusReadCommittedScope.java class Thanks Vikram Ahuja |
Loading... |
Reply to author |
Edit post |
Move post |
Delete this post |
Delete this post and replies |
Change post date |
Print post |
Permalink |
Raw mail |
Hi Vikram, I wanna make a contribution to our community by working on the Spark 3.1.1 support in CarbonData. First of all, I would like to build a connection with you.
I've sent several emails to you, not response yet. How can I communicate with you guys? Maybe a slack link? |
Free forum by Nabble | Edit this page |