[GitHub] [carbondata] ShreelekhyaG opened a new pull request #4047: [CARBONDATA-4078] Add external segment and query with index server fails

classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ShreelekhyaG opened a new pull request #4047: [CARBONDATA-4078] Add external segment and query with index server fails

GitBox

ShreelekhyaG opened a new pull request #4047:
URL: https://github.com/apache/carbondata/pull/4047


    ### Why is this PR needed?
   Query after adding an external segment to carbon table tries to `getSplits `from Index server and throws an exception as it cannot read the external(orc/parquet) file format.
   When the fallback mode is disabled, it throws an exception and fails.
   
   ### What changes were proposed in this PR?
   To avoid the exception, filtered only valid carbon segments to cache in index server.
       
    ### Does this PR introduce any user interface change?
    - No
   
    ### Is any new testcase added?
    - No
   
       
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4047: [CARBONDATA-4078] Add external segment and query with index server fails

GitBox

CarbonDataQA2 commented on pull request #4047:
URL: https://github.com/apache/carbondata/pull/4047#issuecomment-740028766


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5101/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4047: [CARBONDATA-4078] Add external segment and query with index server fails

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4047:
URL: https://github.com/apache/carbondata/pull/4047#issuecomment-740036591


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3340/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4047: [CARBONDATA-4078] Add external segment and query with index server fails

GitBox
In reply to this post by GitBox

vikramahuja1001 commented on a change in pull request #4047:
URL: https://github.com/apache/carbondata/pull/4047#discussion_r539988131



##########
File path: core/src/main/java/org/apache/carbondata/core/index/IndexUtil.java
##########
@@ -290,6 +291,9 @@ private static FileInputFormat createIndexJob(CarbonTable carbonTable,
       Boolean isFallbackJob, List<String> segmentsToBeRefreshed, boolean isCountJob,
       Configuration configuration) {
     List<String> invalidSegmentNo = new ArrayList<>();

Review comment:
       This is not the only place where we call IndexJobs from, please check it in all the places and add there. Also check it in the case of prepriming too, you can ignore it if it's a non carbon segment.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4047: [CARBONDATA-4078] Add external segment and query with index server fails

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4047:
URL: https://github.com/apache/carbondata/pull/4047#issuecomment-743167499


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5145/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4047: [CARBONDATA-4078] Add external segment and query with index server fails

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4047:
URL: https://github.com/apache/carbondata/pull/4047#issuecomment-743172316


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3383/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #4047: [CARBONDATA-4078] Add external segment and query with index server fails

GitBox
In reply to this post by GitBox

Indhumathi27 commented on a change in pull request #4047:
URL: https://github.com/apache/carbondata/pull/4047#discussion_r542244695



##########
File path: integration/spark/src/main/scala/org/apache/carbondata/indexserver/IndexJobs.scala
##########
@@ -47,6 +47,9 @@ class DistributedIndexJob extends AbstractIndexJob {
 
   override def execute(indexFormat: IndexInputFormat,
       configuration: Configuration): util.List[ExtendedBlocklet] = {
+    // get only carbon segments.
+    indexFormat.setValidSegments(indexFormat.getValidSegments.asScala

Review comment:
       In Fall back mode, this scenario is handled?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on pull request #4047: [CARBONDATA-4078] Add external segment and query with index server fails

GitBox
In reply to this post by GitBox

Indhumathi27 commented on pull request #4047:
URL: https://github.com/apache/carbondata/pull/4047#issuecomment-744319236


   @ShreelekhyaG Check and handle in BroadCastSIFilterPushJoin


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4047: [CARBONDATA-4078] Add external segment and query with index server fails

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4047:
URL: https://github.com/apache/carbondata/pull/4047#issuecomment-744597792


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5160/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4047: [CARBONDATA-4078] Add external segment and query with index server fails

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4047:
URL: https://github.com/apache/carbondata/pull/4047#issuecomment-744599007


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3398/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #4047: [CARBONDATA-4078] Add external segment and query with index server fails

GitBox
In reply to this post by GitBox

ShreelekhyaG commented on a change in pull request #4047:
URL: https://github.com/apache/carbondata/pull/4047#discussion_r543052039



##########
File path: core/src/main/java/org/apache/carbondata/core/index/IndexUtil.java
##########
@@ -290,6 +291,9 @@ private static FileInputFormat createIndexJob(CarbonTable carbonTable,
       Boolean isFallbackJob, List<String> segmentsToBeRefreshed, boolean isCountJob,
       Configuration configuration) {
     List<String> invalidSegmentNo = new ArrayList<>();

Review comment:
       Done. Added in IndexJobs and in pre-priming.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #4047: [CARBONDATA-4078] Add external segment and query with index server fails

GitBox
In reply to this post by GitBox

ShreelekhyaG commented on a change in pull request #4047:
URL: https://github.com/apache/carbondata/pull/4047#discussion_r543052297



##########
File path: integration/spark/src/main/scala/org/apache/carbondata/indexserver/IndexJobs.scala
##########
@@ -47,6 +47,9 @@ class DistributedIndexJob extends AbstractIndexJob {
 
   override def execute(indexFormat: IndexInputFormat,
       configuration: Configuration): util.List[ExtendedBlocklet] = {
+    // get only carbon segments.
+    indexFormat.setValidSegments(indexFormat.getValidSegments.asScala

Review comment:
       Yes, in the fallback mode it uses `getCarbonSegments` method.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #4047: [CARBONDATA-4078] Add external segment and query with index server fails

GitBox
In reply to this post by GitBox

Indhumathi27 commented on a change in pull request #4047:
URL: https://github.com/apache/carbondata/pull/4047#discussion_r543056033



##########
File path: integration/spark/src/main/scala/org/apache/carbondata/indexserver/IndexJobs.scala
##########
@@ -47,6 +47,9 @@ class DistributedIndexJob extends AbstractIndexJob {
 
   override def execute(indexFormat: IndexInputFormat,
       configuration: Configuration): util.List[ExtendedBlocklet] = {
+    // get only carbon segments.
+    indexFormat.setValidSegments(indexFormat.getValidSegments.asScala

Review comment:
       In EmbeddedIndexJob fall back mode, i think, you have to handle




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #4047: [CARBONDATA-4078] Add external segment and query with index server fails

GitBox
In reply to this post by GitBox

ShreelekhyaG commented on a change in pull request #4047:
URL: https://github.com/apache/carbondata/pull/4047#discussion_r543263594



##########
File path: integration/spark/src/main/scala/org/apache/carbondata/indexserver/IndexJobs.scala
##########
@@ -47,6 +47,9 @@ class DistributedIndexJob extends AbstractIndexJob {
 
   override def execute(indexFormat: IndexInputFormat,
       configuration: Configuration): util.List[ExtendedBlocklet] = {
+    // get only carbon segments.
+    indexFormat.setValidSegments(indexFormat.getValidSegments.asScala

Review comment:
       in EmbeddedIndexJob fallback mode, we have `toDistributable ` method, here `loadmetadatadetails `for segments are present and carbon segments are filtered.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ShreelekhyaG commented on pull request #4047: [CARBONDATA-4078] Add external segment and query with index server fails

GitBox
In reply to this post by GitBox

ShreelekhyaG commented on pull request #4047:
URL: https://github.com/apache/carbondata/pull/4047#issuecomment-745296247


   > @ShreelekhyaG Check and handle in BroadCastSIFilterPushJoin
   
   Tested queries in cluster with SI, no issues.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on pull request #4047: [CARBONDATA-4078] Add external segment and query with index server fails

GitBox
In reply to this post by GitBox

Indhumathi27 commented on pull request #4047:
URL: https://github.com/apache/carbondata/pull/4047#issuecomment-745813548


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4047: [CARBONDATA-4078] Add external segment and query with index server fails

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4047:
URL: https://github.com/apache/carbondata/pull/4047#issuecomment-745933441


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5177/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4047: [CARBONDATA-4078] Add external segment and query with index server fails

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4047:
URL: https://github.com/apache/carbondata/pull/4047#issuecomment-745935299


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3415/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] Indhumathi27 commented on pull request #4047: [CARBONDATA-4078] Add external segment and query with index server fails

GitBox
In reply to this post by GitBox

Indhumathi27 commented on pull request #4047:
URL: https://github.com/apache/carbondata/pull/4047#issuecomment-747490611


   LGTM


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] asfgit closed pull request #4047: [CARBONDATA-4078] Add external segment and query with index server fails

GitBox
In reply to this post by GitBox

asfgit closed pull request #4047:
URL: https://github.com/apache/carbondata/pull/4047


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]