Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] dhatchayani opened a new pull request #3126: [WIP][CARBONDATA-3293] Prune datamaps improvement

Classic

List

Threaded

1 message

GitBox

[GitHub] dhatchayani opened a new pull request #3126: [WIP][CARBONDATA-3293] Prune datamaps improvement

dhatchayani opened a new pull request #3126: [WIP][CARBONDATA-3293] Prune datamaps improvement
URL: https://github.com/apache/carbondata/pull/3126

**Problem:**

(1) Currently for count (*) , the prune is same as select * query. Blocklet and ExtendedBlocklet are formed from the DataMapRow and that is of no need and it is a time consuming process.

(2) Pruning in select * query consumes time in convertToSafeRow() - converting the DataMapRow to safe as in an unsafe row to get the position of data, we need to traverse through the whole row to reach a position.

(3) In case of filter queries, even if the blocklet is valid or invalid, we are converting the DataMapRow to safeRow. This conversion is time consuming increasing the number of blocklets.

**Solution:**

(1) We have the blocklet row count in the DataMapRow itself, so it is just enough to read the count. With this count (*) query performance can be improved.

(2) Maintain the data length also to the DataMapRow, so that traversing the whole row can be avoided. With the length we can directly hit the data position.

(3) Read only the MinMax from the DataMapRow, decide whether scan is required on that blocklet, if required only then it can be converted to safeRow, if needed.

- [ ] Any interfaces changed?

- [ ] Any backward compatibility impacted?

- [ ] Document update required?

- [x] Testing done
Existing UT

- [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

With regards,
Apache Git Services