[GitHub] [carbondata] ShreelekhyaG opened a new pull request #4124: [WIP] Fix inverted index query issue and handle exception for desc column

classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ShreelekhyaG opened a new pull request #4124: [WIP] Fix inverted index query issue and handle exception for desc column

GitBox

ShreelekhyaG opened a new pull request #4124:
URL: https://github.com/apache/carbondata/pull/4124


    ### Why is this PR needed?
   1. After creating an Inverted index on the dimension column, some of the filter queries give incorrect results.
   2. handle exception for higher level non-existing children column in desc column.
   
    ### What changes were proposed in this PR?
   1. While sorting byte arrays with inverted index, we use `compareTo `method of `ByteArrayColumnWithRowId`.  Here, it was sorting based on the last byte only. Made changes to sort properly based on the entire byte length.
   2. handled exception and added in testcase.
       
    ### Does this PR introduce any user interface change?
    - No
   
    ### Is any new testcase added?
    - Yes
   
       
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4124: [WIP] Fix inverted index query issue and handle exception for desc column

GitBox

CarbonDataQA2 commented on pull request #4124:
URL: https://github.com/apache/carbondata/pull/4124#issuecomment-825001904


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5235/
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4124: [WIP] Fix inverted index query issue and handle exception for desc column

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4124:
URL: https://github.com/apache/carbondata/pull/4124#issuecomment-825009182


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3487/
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4124: [CARBONDATA-4173][CARBONDATA-4174] Fix inverted index query issue and handle exception for desc column

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #4124:
URL: https://github.com/apache/carbondata/pull/4124#discussion_r618988679



##########
File path: core/src/main/java/org/apache/carbondata/core/datastore/columnar/ByteArrayColumnWithRowId.java
##########
@@ -42,7 +42,7 @@ public short getRowId() {
   @Override
   public int compareTo(ByteArrayColumnWithRowId o) {
     return UnsafeComparer.INSTANCE
-        .compareTo(column, 2, column.length - 2, o.column, 2, o.column.length - 2);
+        .compareTo(column, 0, column.length, o.column, 0, o.column.length);

Review comment:
       I can see that they were ignoring first two bytes before as it was LV and L is short length field.
   So, now you have changed it to compare whole array. that means it is not coming as LV now ?
   which PR has induced it ?




--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4124: [CARBONDATA-4173][CARBONDATA-4174] Fix inverted index query issue and handle exception for desc column

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #4124:
URL: https://github.com/apache/carbondata/pull/4124#discussion_r618988679



##########
File path: core/src/main/java/org/apache/carbondata/core/datastore/columnar/ByteArrayColumnWithRowId.java
##########
@@ -42,7 +42,7 @@ public short getRowId() {
   @Override
   public int compareTo(ByteArrayColumnWithRowId o) {
     return UnsafeComparer.INSTANCE
-        .compareTo(column, 2, column.length - 2, o.column, 2, o.column.length - 2);
+        .compareTo(column, 0, column.length, o.column, 0, o.column.length);

Review comment:
       I can see that they were ignoring first two bytes before as it was LV and L is short length field.
   So, now you have changed it to compare whole array because it is not coming as LV, only V is present in byte Array?
   which PR has changed this behavior ?




--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4124: [CARBONDATA-4173][CARBONDATA-4174] Fix inverted index query issue and handle exception for desc column

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #4124:
URL: https://github.com/apache/carbondata/pull/4124#discussion_r619010376



##########
File path: core/src/main/java/org/apache/carbondata/core/datastore/columnar/ByteArrayColumnWithRowId.java
##########
@@ -42,7 +42,7 @@ public short getRowId() {
   @Override
   public int compareTo(ByteArrayColumnWithRowId o) {
     return UnsafeComparer.INSTANCE
-        .compareTo(column, 2, column.length - 2, o.column, 2, o.column.length - 2);
+        .compareTo(column, 0, column.length, o.column, 0, o.column.length);

Review comment:
       This ByteArrayColumnWithRowId is common for other flows also. please check the impact. test cases may not fail as it is internal logic.




--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4124: [CARBONDATA-4173][CARBONDATA-4174] Fix inverted index query issue and handle exception for desc column

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #4124:
URL: https://github.com/apache/carbondata/pull/4124#discussion_r619026501



##########
File path: core/src/main/java/org/apache/carbondata/core/datastore/columnar/ByteArrayColumnWithRowId.java
##########
@@ -42,7 +42,7 @@ public short getRowId() {
   @Override
   public int compareTo(ByteArrayColumnWithRowId o) {
     return UnsafeComparer.INSTANCE
-        .compareTo(column, 2, column.length - 2, o.column, 2, o.column.length - 2);
+        .compareTo(column, 0, column.length, o.column, 0, o.column.length);

Review comment:
       As discussed, I think no dictionary string column comparison will go wrong after this change. no dictionary string to be alphabetically sorted we need to ignore L in LV. so please test and handle compare logic based on the flag.




--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4124: [CARBONDATA-4173][CARBONDATA-4174] Fix inverted index query issue and handle exception for desc column

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4124:
URL: https://github.com/apache/carbondata/pull/4124#issuecomment-825694242


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5246/
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4124: [CARBONDATA-4173][CARBONDATA-4174] Fix inverted index query issue and handle exception for desc column

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4124:
URL: https://github.com/apache/carbondata/pull/4124#issuecomment-825696505


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3499/
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #4124: [CARBONDATA-4173][CARBONDATA-4174] Fix inverted index query issue and handle exception for desc column

GitBox
In reply to this post by GitBox

ShreelekhyaG commented on a change in pull request #4124:
URL: https://github.com/apache/carbondata/pull/4124#discussion_r619294183



##########
File path: core/src/main/java/org/apache/carbondata/core/datastore/columnar/ByteArrayColumnWithRowId.java
##########
@@ -42,7 +42,7 @@ public short getRowId() {
   @Override
   public int compareTo(ByteArrayColumnWithRowId o) {
     return UnsafeComparer.INSTANCE
-        .compareTo(column, 2, column.length - 2, o.column, 2, o.column.length - 2);
+        .compareTo(column, 0, column.length, o.column, 0, o.column.length);

Review comment:
       yes, this flow will be used for string types only. I have added a check for the dictionary and handled the compare logic.




--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4124: [CARBONDATA-4173][CARBONDATA-4174] Fix inverted index query issue and handle exception for desc column

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #4124:
URL: https://github.com/apache/carbondata/pull/4124#discussion_r619395321



##########
File path: core/src/main/java/org/apache/carbondata/core/datastore/columnar/ByteArrayColumnWithRowId.java
##########
@@ -51,11 +59,12 @@ public boolean equals(Object obj) {
       return false;
     }
     ByteArrayColumnWithRowId o = (ByteArrayColumnWithRowId)obj;
-    return Arrays.equals(column, o.column) && rowId == o.rowId;
+    return Arrays.equals(column, o.column) && rowId == o.rowId
+        && isNoDictionary == o.isNoDictionary;

Review comment:
       Do we really need to consider `isNoDictionary` here ? when it will be not equal? I think as it is a comparison of data within a column, this condition might always be true.
   
   If not required, remove from equals and hashcode




--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4124: [CARBONDATA-4173][CARBONDATA-4174] Fix inverted index query issue and handle exception for desc column

GitBox
In reply to this post by GitBox

ajantha-bhat commented on a change in pull request #4124:
URL: https://github.com/apache/carbondata/pull/4124#discussion_r619395321



##########
File path: core/src/main/java/org/apache/carbondata/core/datastore/columnar/ByteArrayColumnWithRowId.java
##########
@@ -51,11 +59,12 @@ public boolean equals(Object obj) {
       return false;
     }
     ByteArrayColumnWithRowId o = (ByteArrayColumnWithRowId)obj;
-    return Arrays.equals(column, o.column) && rowId == o.rowId;
+    return Arrays.equals(column, o.column) && rowId == o.rowId
+        && isNoDictionary == o.isNoDictionary;

Review comment:
       Do we really need to consider `isNoDictionary` here ? when it will be not equal? I think as it is a comparison of data within a column (that too within a column page), this condition might always be true.
   
   If not required, remove from equals and hashcode




--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #4124: [CARBONDATA-4173][CARBONDATA-4174] Fix inverted index query issue and handle exception for desc column

GitBox
In reply to this post by GitBox

ShreelekhyaG commented on a change in pull request #4124:
URL: https://github.com/apache/carbondata/pull/4124#discussion_r619991184



##########
File path: core/src/main/java/org/apache/carbondata/core/datastore/columnar/ByteArrayColumnWithRowId.java
##########
@@ -51,11 +59,12 @@ public boolean equals(Object obj) {
       return false;
     }
     ByteArrayColumnWithRowId o = (ByteArrayColumnWithRowId)obj;
-    return Arrays.equals(column, o.column) && rowId == o.rowId;
+    return Arrays.equals(column, o.column) && rowId == o.rowId
+        && isNoDictionary == o.isNoDictionary;

Review comment:
       Done. Removed.




--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #4124: [CARBONDATA-4173][CARBONDATA-4174] Fix inverted index query issue and handle exception for desc column

GitBox
In reply to this post by GitBox

ShreelekhyaG commented on a change in pull request #4124:
URL: https://github.com/apache/carbondata/pull/4124#discussion_r619991184



##########
File path: core/src/main/java/org/apache/carbondata/core/datastore/columnar/ByteArrayColumnWithRowId.java
##########
@@ -51,11 +59,12 @@ public boolean equals(Object obj) {
       return false;
     }
     ByteArrayColumnWithRowId o = (ByteArrayColumnWithRowId)obj;
-    return Arrays.equals(column, o.column) && rowId == o.rowId;
+    return Arrays.equals(column, o.column) && rowId == o.rowId
+        && isNoDictionary == o.isNoDictionary;

Review comment:
       Done. Removed.




--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4124: [CARBONDATA-4173][CARBONDATA-4174] Fix inverted index query issue and handle exception for desc column

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4124:
URL: https://github.com/apache/carbondata/pull/4124#issuecomment-826577218


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5251/
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4124: [CARBONDATA-4173][CARBONDATA-4174] Fix inverted index query issue and handle exception for desc column

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4124:
URL: https://github.com/apache/carbondata/pull/4124#issuecomment-826583885


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3504/
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ShreelekhyaG commented on pull request #4124: [CARBONDATA-4173][CARBONDATA-4174] Fix inverted index query issue and handle exception for desc column

GitBox
In reply to this post by GitBox

ShreelekhyaG commented on pull request #4124:
URL: https://github.com/apache/carbondata/pull/4124#issuecomment-826606820


   retest this please


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4124: [CARBONDATA-4173][CARBONDATA-4174] Fix inverted index query issue and handle exception for desc column

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4124:
URL: https://github.com/apache/carbondata/pull/4124#issuecomment-826681254


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5256/
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4124: [CARBONDATA-4173][CARBONDATA-4174] Fix inverted index query issue and handle exception for desc column

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4124:
URL: https://github.com/apache/carbondata/pull/4124#issuecomment-826691148


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3509/
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] ajantha-bhat commented on pull request #4124: [CARBONDATA-4173][CARBONDATA-4174] Fix inverted index query issue and handle exception for desc column

GitBox
In reply to this post by GitBox

ajantha-bhat commented on pull request #4124:
URL: https://github.com/apache/carbondata/pull/4124#issuecomment-826723024


   LGTM


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


12