[GitHub] [carbondata] ShreelekhyaG opened a new pull request #4080: [WIP] Filter query having invalid results when add segment to SI with Indexserver

classic Classic list List threaded Threaded
51 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ShreelekhyaG opened a new pull request #4080: [WIP] Filter query having invalid results when add segment to SI with Indexserver

GitBox

ShreelekhyaG opened a new pull request #4080:
URL: https://github.com/apache/carbondata/pull/4080


    ### Why is this PR needed?
    The rows added by the external segment are not visible on filter queries with the index server.
   
    ### What changes were proposed in this PR?
   Added segment path to the segment to identify as an external segment in filter resolver step.
       
    ### Does this PR introduce any user interface change?
    - No
   
    ### Is any new testcase added?
    - No, tested in cluster.
   
       
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4080: [WIP] Filter query having invalid results when add segment to SI with Indexserver

GitBox

CarbonDataQA2 commented on pull request #4080:
URL: https://github.com/apache/carbondata/pull/4080#issuecomment-766782521


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5352/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4080: [WIP] Filter query having invalid results when add segment to SI with Indexserver

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4080:
URL: https://github.com/apache/carbondata/pull/4080#issuecomment-766783288


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3592/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4080: [WIP] Filter query having invalid results when add segment to SI with Indexserver

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4080:
URL: https://github.com/apache/carbondata/pull/4080#issuecomment-766782521






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4080: [CARBONDATA-4111] Filter query having invalid results after add segment to table having SI with Indexserver

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4080:
URL: https://github.com/apache/carbondata/pull/4080#issuecomment-768159789


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3598/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4080: [CARBONDATA-4111] Filter query having invalid results after add segment to table having SI with Indexserver

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4080:
URL: https://github.com/apache/carbondata/pull/4080#issuecomment-768161298


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5358/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ydvpankaj99 commented on pull request #4080: [CARBONDATA-4111] Filter query having invalid results after add segment to table having SI with Indexserver

GitBox
In reply to this post by GitBox

ydvpankaj99 commented on pull request #4080:
URL: https://github.com/apache/carbondata/pull/4080#issuecomment-768880913


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4080: [CARBONDATA-4111] Filter query having invalid results after add segment to table having SI with Indexserver

GitBox
In reply to this post by GitBox

vikramahuja1001 commented on a change in pull request #4080:
URL: https://github.com/apache/carbondata/pull/4080#discussion_r565949370



##########
File path: core/src/main/java/org/apache/carbondata/core/indexstore/SegmentWrapperContainer.java
##########
@@ -31,6 +31,9 @@
 
   private SegmentWrapper[] segmentWrappers;
 
+  public SegmentWrapperContainer() {

Review comment:
       why is this change required?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4080: [CARBONDATA-4111] Filter query having invalid results after add segment to table having SI with Indexserver

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4080:
URL: https://github.com/apache/carbondata/pull/4080#issuecomment-768935266


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5369/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4080: [CARBONDATA-4111] Filter query having invalid results after add segment to table having SI with Indexserver

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4080:
URL: https://github.com/apache/carbondata/pull/4080#issuecomment-768940103


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3611/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #4080: [CARBONDATA-4111] Filter query having invalid results after add segment to table having SI with Indexserver

GitBox
In reply to this post by GitBox

ShreelekhyaG commented on a change in pull request #4080:
URL: https://github.com/apache/carbondata/pull/4080#discussion_r566032652



##########
File path: core/src/main/java/org/apache/carbondata/core/indexstore/SegmentWrapperContainer.java
##########
@@ -31,6 +31,9 @@
 
   private SegmentWrapper[] segmentWrappers;
 
+  public SegmentWrapperContainer() {

Review comment:
       `NoSuchMethodException` is thrown as the default instance is called from reflectionUtils.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #4080: [CARBONDATA-4111] Filter query having invalid results after add segment to table having SI with Indexserver

GitBox
In reply to this post by GitBox

Indhumathi27 commented on a change in pull request #4080:
URL: https://github.com/apache/carbondata/pull/4080#discussion_r567582402



##########
File path: core/src/main/java/org/apache/carbondata/core/indexstore/ExtendedBlocklet.java
##########
@@ -221,7 +221,7 @@ public void deserializeFields(DataInput in, String[] locations, String tablePath
       indexUniqueId = in.readUTF();
     }
     String filePath = getPath();
-    if (filePath.startsWith(File.separator)) {
+    if (!FileFactory.isFileExist(filePath)) {

Review comment:
       Please revert this change to do a rpc call. Use FileFactory.getupdatedFilePath instead as discussed




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #4080: [CARBONDATA-4111] Filter query having invalid results after add segment to table having SI with Indexserver

GitBox
In reply to this post by GitBox

Indhumathi27 commented on a change in pull request #4080:
URL: https://github.com/apache/carbondata/pull/4080#discussion_r567594628



##########
File path: core/src/main/java/org/apache/carbondata/core/index/IndexInputFormat.java
##########
@@ -159,6 +162,19 @@ public void initialize(InputSplit inputSplit, TaskAttemptContext taskAttemptCont
           if (filterResolverIntf != null) {
             filter.setExpression(filterResolverIntf.getFilterExpression());
           }
+          for (Segment segment : segmentsToLoad) {

Review comment:
       I think, after [PR-3656](https://github.com/apache/carbondata/pull/3656), we are not loading external segment to SI. External segment will be queried from main table only. But currently, i think SILoadEventListenerForFailedSegments listener is loading external segment while, si repair is enabled. Please check and handle it




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #4080: [CARBONDATA-4111] Filter query having invalid results after add segment to table having SI with Indexserver

GitBox
In reply to this post by GitBox

ShreelekhyaG commented on a change in pull request #4080:
URL: https://github.com/apache/carbondata/pull/4080#discussion_r567812890



##########
File path: core/src/main/java/org/apache/carbondata/core/index/IndexInputFormat.java
##########
@@ -159,6 +162,19 @@ public void initialize(InputSplit inputSplit, TaskAttemptContext taskAttemptCont
           if (filterResolverIntf != null) {
             filter.setExpression(filterResolverIntf.getFilterExpression());
           }
+          for (Segment segment : segmentsToLoad) {

Review comment:
       Done. Added check to skip for external segments.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #4080: [CARBONDATA-4111] Filter query having invalid results after add segment to table having SI with Indexserver

GitBox
In reply to this post by GitBox

Indhumathi27 commented on a change in pull request #4080:
URL: https://github.com/apache/carbondata/pull/4080#discussion_r567844159



##########
File path: integration/spark/src/main/java/org/apache/spark/sql/secondaryindex/load/CarbonInternalLoaderUtil.java
##########
@@ -51,9 +51,10 @@
   public static List<String> getListOfValidSlices(LoadMetadataDetails[] details) {
     List<String> activeSlices = new ArrayList<>(CarbonCommonConstants.DEFAULT_COLLECTION_SIZE);
     for (LoadMetadataDetails oneLoad : details) {
-      if (SegmentStatus.SUCCESS.equals(oneLoad.getSegmentStatus())
+      // No need to consider external segments for SI load
+      if (oneLoad.getPath() == null && (SegmentStatus.SUCCESS.equals(oneLoad.getSegmentStatus())

Review comment:
       Please add code to not repair si, when add load is executed




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #4080: [CARBONDATA-4111] Filter query having invalid results after add segment to table having SI with Indexserver

GitBox
In reply to this post by GitBox

Indhumathi27 commented on a change in pull request #4080:
URL: https://github.com/apache/carbondata/pull/4080#discussion_r567844159



##########
File path: integration/spark/src/main/java/org/apache/spark/sql/secondaryindex/load/CarbonInternalLoaderUtil.java
##########
@@ -51,9 +51,10 @@
   public static List<String> getListOfValidSlices(LoadMetadataDetails[] details) {
     List<String> activeSlices = new ArrayList<>(CarbonCommonConstants.DEFAULT_COLLECTION_SIZE);
     for (LoadMetadataDetails oneLoad : details) {
-      if (SegmentStatus.SUCCESS.equals(oneLoad.getSegmentStatus())
+      // No need to consider external segments for SI load
+      if (oneLoad.getPath() == null && (SegmentStatus.SUCCESS.equals(oneLoad.getSegmentStatus())

Review comment:
       Please add code to not repair si, when add external load is executed




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4080: [CARBONDATA-4111] Filter query having invalid results after add segment to table having SI with Indexserver

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4080:
URL: https://github.com/apache/carbondata/pull/4080#issuecomment-770909939


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5395/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4080: [CARBONDATA-4111] Filter query having invalid results after add segment to table having SI with Indexserver

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4080:
URL: https://github.com/apache/carbondata/pull/4080#issuecomment-770913921


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3635/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #4080: [CARBONDATA-4111] Filter query having invalid results after add segment to table having SI with Indexserver

GitBox
In reply to this post by GitBox

ShreelekhyaG commented on a change in pull request #4080:
URL: https://github.com/apache/carbondata/pull/4080#discussion_r567903474



##########
File path: integration/spark/src/main/java/org/apache/spark/sql/secondaryindex/load/CarbonInternalLoaderUtil.java
##########
@@ -51,9 +51,10 @@
   public static List<String> getListOfValidSlices(LoadMetadataDetails[] details) {
     List<String> activeSlices = new ArrayList<>(CarbonCommonConstants.DEFAULT_COLLECTION_SIZE);
     for (LoadMetadataDetails oneLoad : details) {
-      if (SegmentStatus.SUCCESS.equals(oneLoad.getSegmentStatus())
+      // No need to consider external segments for SI load
+      if (oneLoad.getPath() == null && (SegmentStatus.SUCCESS.equals(oneLoad.getSegmentStatus())

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #4080: [CARBONDATA-4111] Filter query having invalid results after add segment to table having SI with Indexserver

GitBox
In reply to this post by GitBox

ShreelekhyaG commented on a change in pull request #4080:
URL: https://github.com/apache/carbondata/pull/4080#discussion_r567904349



##########
File path: core/src/main/java/org/apache/carbondata/core/indexstore/ExtendedBlocklet.java
##########
@@ -221,7 +221,7 @@ public void deserializeFields(DataInput in, String[] locations, String tablePath
       indexUniqueId = in.readUTF();
     }
     String filePath = getPath();
-    if (filePath.startsWith(File.separator)) {
+    if (!FileFactory.isFileExist(filePath)) {

Review comment:
       As discussed, using FileExists only for local file path.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


123