[GitHub] [carbondata] Karan980 opened a new pull request #4099: [CARBONDATA-4141] Index Server is not caching indexes for external tables with sdk segments.

classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Karan980 opened a new pull request #4099: [CARBONDATA-4141] Index Server is not caching indexes for external tables with sdk segments.

GitBox

Karan980 opened a new pull request #4099:
URL: https://github.com/apache/carbondata/pull/4099


    ### Why is this PR needed?
   Indexes cached in Executor cache are not dropped when drop table is called for external table with SDK segments. Because, external tables with sdk segments will not have metadata like table status file. So in drop table command we send zero segments to indexServer clearIndexes job, which clears nothing from executor side.
   
    ### What changes were proposed in this PR?
   Prepared the validSegments from indexFiles present at external table location and send it to IndexServer clearIndexes job through IndexInputFormat.
       
    ### Does this PR introduce any user interface change?
    - No
   
    ### Is any new testcase added?
    - No
   
       
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4099: [CARBONDATA-4141] Index Server is not caching indexes for external tables with sdk segments.

GitBox

CarbonDataQA2 commented on pull request #4099:
URL: https://github.com/apache/carbondata/pull/4099#issuecomment-788903537


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5106/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4099: [CARBONDATA-4141] Index Server is not caching indexes for external tables with sdk segments.

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4099:
URL: https://github.com/apache/carbondata/pull/4099#issuecomment-788908019


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3347/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kunal642 commented on a change in pull request #4099: [CARBONDATA-4141] Index Server is not caching indexes for external tables with sdk segments.

GitBox
In reply to this post by GitBox

kunal642 commented on a change in pull request #4099:
URL: https://github.com/apache/carbondata/pull/4099#discussion_r589176617



##########
File path: core/src/main/java/org/apache/carbondata/core/index/IndexUtil.java
##########
@@ -112,15 +116,31 @@ public static IndexJob getIndexJob(Configuration configuration) throws IOExcepti
    */
   private static void executeClearIndexJob(IndexJob indexJob,
       CarbonTable carbonTable, String indexToClear) throws IOException {
-    SegmentStatusManager.ValidAndInvalidSegmentsInfo validAndInvalidSegmentsInfo =
-            getValidAndInvalidSegments(carbonTable, FileFactory.getConfiguration());
-    List<String> invalidSegment = new ArrayList<>();
-    for (Segment segment : validAndInvalidSegmentsInfo.getInvalidSegments()) {
-      invalidSegment.add(segment.getSegmentNo());
+    IndexInputFormat indexInputFormat;
+    if (!carbonTable.isTransactionalTable()) {

Review comment:
       Please reuse the LatestFilesReadCommittedScope class to take a snapshot of the non-transactional table.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kunal642 commented on a change in pull request #4099: [CARBONDATA-4141] Index Server is not caching indexes for external tables with sdk segments.

GitBox
In reply to this post by GitBox

kunal642 commented on a change in pull request #4099:
URL: https://github.com/apache/carbondata/pull/4099#discussion_r589176617



##########
File path: core/src/main/java/org/apache/carbondata/core/index/IndexUtil.java
##########
@@ -112,15 +116,31 @@ public static IndexJob getIndexJob(Configuration configuration) throws IOExcepti
    */
   private static void executeClearIndexJob(IndexJob indexJob,
       CarbonTable carbonTable, String indexToClear) throws IOException {
-    SegmentStatusManager.ValidAndInvalidSegmentsInfo validAndInvalidSegmentsInfo =
-            getValidAndInvalidSegments(carbonTable, FileFactory.getConfiguration());
-    List<String> invalidSegment = new ArrayList<>();
-    for (Segment segment : validAndInvalidSegmentsInfo.getInvalidSegments()) {
-      invalidSegment.add(segment.getSegmentNo());
+    IndexInputFormat indexInputFormat;
+    if (!carbonTable.isTransactionalTable()) {

Review comment:
       Please reuse the LatestFilesReadCommittedScope class to take a snapshot of the non-transactional table. No need to create a separate method




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Karan980 commented on a change in pull request #4099: [CARBONDATA-4141] Index Server is not caching indexes for external tables with sdk segments.

GitBox
In reply to this post by GitBox

Karan980 commented on a change in pull request #4099:
URL: https://github.com/apache/carbondata/pull/4099#discussion_r589323912



##########
File path: core/src/main/java/org/apache/carbondata/core/index/IndexUtil.java
##########
@@ -112,15 +116,31 @@ public static IndexJob getIndexJob(Configuration configuration) throws IOExcepti
    */
   private static void executeClearIndexJob(IndexJob indexJob,
       CarbonTable carbonTable, String indexToClear) throws IOException {
-    SegmentStatusManager.ValidAndInvalidSegmentsInfo validAndInvalidSegmentsInfo =
-            getValidAndInvalidSegments(carbonTable, FileFactory.getConfiguration());
-    List<String> invalidSegment = new ArrayList<>();
-    for (Segment segment : validAndInvalidSegmentsInfo.getInvalidSegments()) {
-      invalidSegment.add(segment.getSegmentNo());
+    IndexInputFormat indexInputFormat;
+    if (!carbonTable.isTransactionalTable()) {

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4099: [CARBONDATA-4141] Index Server is not caching indexes for external tables with sdk segments.

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4099:
URL: https://github.com/apache/carbondata/pull/4099#issuecomment-792733994


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5113/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4099: [CARBONDATA-4141] Index Server is not caching indexes for external tables with sdk segments.

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4099:
URL: https://github.com/apache/carbondata/pull/4099#issuecomment-792735786


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3354/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] vikramahuja1001 commented on pull request #4099: [CARBONDATA-4141] Index Server is not caching indexes for external tables with sdk segments.

GitBox
In reply to this post by GitBox

vikramahuja1001 commented on pull request #4099:
URL: https://github.com/apache/carbondata/pull/4099#issuecomment-793477512


   please correct the splits path of getSplits in case of External Tables in DistributedPruneRDD as well


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Karan980 commented on pull request #4099: [CARBONDATA-4141] Index Server is not caching indexes for external tables with sdk segments.

GitBox
In reply to this post by GitBox

Karan980 commented on pull request #4099:
URL: https://github.com/apache/carbondata/pull/4099#issuecomment-793527054


   > please correct the splits path of getSplits in case of External Tables in DistributedPruneRDD as well
   
   Done


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4099: [CARBONDATA-4141] Index Server is not caching indexes for external tables with sdk segments.

GitBox
In reply to this post by GitBox

vikramahuja1001 commented on a change in pull request #4099:
URL: https://github.com/apache/carbondata/pull/4099#discussion_r590253814



##########
File path: core/src/main/java/org/apache/carbondata/core/index/IndexUtil.java
##########
@@ -112,15 +115,32 @@ public static IndexJob getIndexJob(Configuration configuration) throws IOExcepti
    */
   private static void executeClearIndexJob(IndexJob indexJob,
       CarbonTable carbonTable, String indexToClear) throws IOException {
-    SegmentStatusManager.ValidAndInvalidSegmentsInfo validAndInvalidSegmentsInfo =
-            getValidAndInvalidSegments(carbonTable, FileFactory.getConfiguration());
-    List<String> invalidSegment = new ArrayList<>();
-    for (Segment segment : validAndInvalidSegmentsInfo.getInvalidSegments()) {
-      invalidSegment.add(segment.getSegmentNo());
+    IndexInputFormat indexInputFormat;
+    if (!carbonTable.isTransactionalTable()) {
+      ReadCommittedScope readCommittedScope =
+          new LatestFilesReadCommittedScope(carbonTable.getTablePath(),
+              FileFactory.getConfiguration());
+      LoadMetadataDetails[] loadMetadataDetails = readCommittedScope.getSegmentList();
+      List<Segment> listOfValidSegments = new ArrayList<>();
+      for (LoadMetadataDetails segment : loadMetadataDetails) {
+        Segment seg = new Segment(segment.getLoadName(), segment.getSegmentFile());
+        seg.setLoadMetadataDetails(segment);
+        listOfValidSegments.add(seg);
+      }
+      indexInputFormat =
+          new IndexInputFormat(carbonTable, listOfValidSegments, new ArrayList<>(0), true,
+              indexToClear);
+    } else {
+      SegmentStatusManager.ValidAndInvalidSegmentsInfo validAndInvalidSegmentsInfo =
+          getValidAndInvalidSegments(carbonTable, FileFactory.getConfiguration());
+      List<String> invalidSegment = new ArrayList<>();
+      for (Segment segment : validAndInvalidSegmentsInfo.getInvalidSegments()) {

Review comment:
       Use Java stream here instead of a for loop




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4099: [CARBONDATA-4141] Index Server is not caching indexes for external tables with sdk segments.

GitBox
In reply to this post by GitBox

vikramahuja1001 commented on a change in pull request #4099:
URL: https://github.com/apache/carbondata/pull/4099#discussion_r590255909



##########
File path: core/src/main/java/org/apache/carbondata/core/index/IndexUtil.java
##########
@@ -112,15 +115,32 @@ public static IndexJob getIndexJob(Configuration configuration) throws IOExcepti
    */
   private static void executeClearIndexJob(IndexJob indexJob,
       CarbonTable carbonTable, String indexToClear) throws IOException {
-    SegmentStatusManager.ValidAndInvalidSegmentsInfo validAndInvalidSegmentsInfo =
-            getValidAndInvalidSegments(carbonTable, FileFactory.getConfiguration());
-    List<String> invalidSegment = new ArrayList<>();
-    for (Segment segment : validAndInvalidSegmentsInfo.getInvalidSegments()) {
-      invalidSegment.add(segment.getSegmentNo());
+    IndexInputFormat indexInputFormat;
+    if (!carbonTable.isTransactionalTable()) {
+      ReadCommittedScope readCommittedScope =
+          new LatestFilesReadCommittedScope(carbonTable.getTablePath(),
+              FileFactory.getConfiguration());
+      LoadMetadataDetails[] loadMetadataDetails = readCommittedScope.getSegmentList();
+      List<Segment> listOfValidSegments = new ArrayList<>();

Review comment:
       give arrayList limit equals to loadMetadataDetails size, because it will never increase that size




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4099: [CARBONDATA-4141] Index Server is not caching indexes for external tables with sdk segments.

GitBox
In reply to this post by GitBox

vikramahuja1001 commented on a change in pull request #4099:
URL: https://github.com/apache/carbondata/pull/4099#discussion_r590256333



##########
File path: core/src/main/java/org/apache/carbondata/core/index/IndexUtil.java
##########
@@ -112,15 +115,32 @@ public static IndexJob getIndexJob(Configuration configuration) throws IOExcepti
    */
   private static void executeClearIndexJob(IndexJob indexJob,
       CarbonTable carbonTable, String indexToClear) throws IOException {
-    SegmentStatusManager.ValidAndInvalidSegmentsInfo validAndInvalidSegmentsInfo =
-            getValidAndInvalidSegments(carbonTable, FileFactory.getConfiguration());
-    List<String> invalidSegment = new ArrayList<>();
-    for (Segment segment : validAndInvalidSegmentsInfo.getInvalidSegments()) {
-      invalidSegment.add(segment.getSegmentNo());
+    IndexInputFormat indexInputFormat;
+    if (!carbonTable.isTransactionalTable()) {
+      ReadCommittedScope readCommittedScope =
+          new LatestFilesReadCommittedScope(carbonTable.getTablePath(),
+              FileFactory.getConfiguration());
+      LoadMetadataDetails[] loadMetadataDetails = readCommittedScope.getSegmentList();
+      List<Segment> listOfValidSegments = new ArrayList<>();
+      for (LoadMetadataDetails segment : loadMetadataDetails) {

Review comment:
       try using java stream api here instead of a for loop wherever possible




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Karan980 commented on a change in pull request #4099: [CARBONDATA-4141] Index Server is not caching indexes for external tables with sdk segments.

GitBox
In reply to this post by GitBox

Karan980 commented on a change in pull request #4099:
URL: https://github.com/apache/carbondata/pull/4099#discussion_r590318352



##########
File path: core/src/main/java/org/apache/carbondata/core/index/IndexUtil.java
##########
@@ -112,15 +115,32 @@ public static IndexJob getIndexJob(Configuration configuration) throws IOExcepti
    */
   private static void executeClearIndexJob(IndexJob indexJob,
       CarbonTable carbonTable, String indexToClear) throws IOException {
-    SegmentStatusManager.ValidAndInvalidSegmentsInfo validAndInvalidSegmentsInfo =
-            getValidAndInvalidSegments(carbonTable, FileFactory.getConfiguration());
-    List<String> invalidSegment = new ArrayList<>();
-    for (Segment segment : validAndInvalidSegmentsInfo.getInvalidSegments()) {
-      invalidSegment.add(segment.getSegmentNo());
+    IndexInputFormat indexInputFormat;
+    if (!carbonTable.isTransactionalTable()) {
+      ReadCommittedScope readCommittedScope =
+          new LatestFilesReadCommittedScope(carbonTable.getTablePath(),
+              FileFactory.getConfiguration());
+      LoadMetadataDetails[] loadMetadataDetails = readCommittedScope.getSegmentList();
+      List<Segment> listOfValidSegments = new ArrayList<>();
+      for (LoadMetadataDetails segment : loadMetadataDetails) {
+        Segment seg = new Segment(segment.getLoadName(), segment.getSegmentFile());
+        seg.setLoadMetadataDetails(segment);
+        listOfValidSegments.add(seg);
+      }
+      indexInputFormat =
+          new IndexInputFormat(carbonTable, listOfValidSegments, new ArrayList<>(0), true,
+              indexToClear);
+    } else {
+      SegmentStatusManager.ValidAndInvalidSegmentsInfo validAndInvalidSegmentsInfo =
+          getValidAndInvalidSegments(carbonTable, FileFactory.getConfiguration());
+      List<String> invalidSegment = new ArrayList<>();
+      for (Segment segment : validAndInvalidSegmentsInfo.getInvalidSegments()) {

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Karan980 commented on a change in pull request #4099: [CARBONDATA-4141] Index Server is not caching indexes for external tables with sdk segments.

GitBox
In reply to this post by GitBox

Karan980 commented on a change in pull request #4099:
URL: https://github.com/apache/carbondata/pull/4099#discussion_r590318643



##########
File path: core/src/main/java/org/apache/carbondata/core/index/IndexUtil.java
##########
@@ -112,15 +115,32 @@ public static IndexJob getIndexJob(Configuration configuration) throws IOExcepti
    */
   private static void executeClearIndexJob(IndexJob indexJob,
       CarbonTable carbonTable, String indexToClear) throws IOException {
-    SegmentStatusManager.ValidAndInvalidSegmentsInfo validAndInvalidSegmentsInfo =
-            getValidAndInvalidSegments(carbonTable, FileFactory.getConfiguration());
-    List<String> invalidSegment = new ArrayList<>();
-    for (Segment segment : validAndInvalidSegmentsInfo.getInvalidSegments()) {
-      invalidSegment.add(segment.getSegmentNo());
+    IndexInputFormat indexInputFormat;
+    if (!carbonTable.isTransactionalTable()) {
+      ReadCommittedScope readCommittedScope =
+          new LatestFilesReadCommittedScope(carbonTable.getTablePath(),
+              FileFactory.getConfiguration());
+      LoadMetadataDetails[] loadMetadataDetails = readCommittedScope.getSegmentList();
+      List<Segment> listOfValidSegments = new ArrayList<>();

Review comment:
       Done

##########
File path: core/src/main/java/org/apache/carbondata/core/index/IndexUtil.java
##########
@@ -112,15 +115,32 @@ public static IndexJob getIndexJob(Configuration configuration) throws IOExcepti
    */
   private static void executeClearIndexJob(IndexJob indexJob,
       CarbonTable carbonTable, String indexToClear) throws IOException {
-    SegmentStatusManager.ValidAndInvalidSegmentsInfo validAndInvalidSegmentsInfo =
-            getValidAndInvalidSegments(carbonTable, FileFactory.getConfiguration());
-    List<String> invalidSegment = new ArrayList<>();
-    for (Segment segment : validAndInvalidSegmentsInfo.getInvalidSegments()) {
-      invalidSegment.add(segment.getSegmentNo());
+    IndexInputFormat indexInputFormat;
+    if (!carbonTable.isTransactionalTable()) {
+      ReadCommittedScope readCommittedScope =
+          new LatestFilesReadCommittedScope(carbonTable.getTablePath(),
+              FileFactory.getConfiguration());
+      LoadMetadataDetails[] loadMetadataDetails = readCommittedScope.getSegmentList();
+      List<Segment> listOfValidSegments = new ArrayList<>();
+      for (LoadMetadataDetails segment : loadMetadataDetails) {

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kunal642 commented on pull request #4099: [CARBONDATA-4141] Index Server is not caching indexes for external tables with sdk segments.

GitBox
In reply to this post by GitBox

kunal642 commented on pull request #4099:
URL: https://github.com/apache/carbondata/pull/4099#issuecomment-793873234


   LGTM


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4099: [CARBONDATA-4141] Index Server is not caching indexes for external tables with sdk segments.

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4099:
URL: https://github.com/apache/carbondata/pull/4099#issuecomment-794000275


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3363/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4099: [CARBONDATA-4141] Index Server is not caching indexes for external tables with sdk segments.

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4099:
URL: https://github.com/apache/carbondata/pull/4099#issuecomment-794004052


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5123/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] asfgit closed pull request #4099: [CARBONDATA-4141] Index Server is not caching indexes for external tables with sdk segments.

GitBox
In reply to this post by GitBox

asfgit closed pull request #4099:
URL: https://github.com/apache/carbondata/pull/4099


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]