Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] [carbondata] ajantha-bhat opened a new pull request #3901: [CARBONDATA-3820] CDC update as new segment should use target table sort_scope

Classic

List

12 messages Options

Options

GitBox

[GitHub] [carbondata] ajantha-bhat opened a new pull request #3901: [CARBONDATA-3820] CDC update as new segment should use target table sort_scope

ajantha-bhat opened a new pull request #3901:
URL: https://github.com/apache/carbondata/pull/3901

### Why is this PR needed?
#3764 has added nosort (this is wrong code, but no functional impact as it was not changing new segment load to no_sort)
#3856 has changed it to no_sort (creates a functional impact by changing target table new segment to use to no_sort)

### What changes were proposed in this PR?
CDC update as new segment should use target table sort_scope

### Does this PR introduce any user interface change?
- No

### Is any new testcase added?
- No (verified manually the flows)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] ajantha-bhat commented on pull request #3901: [CARBONDATA-3820] CDC update as new segment should use target table sort_scope

ajantha-bhat commented on pull request #3901:
URL: https://github.com/apache/carbondata/pull/3901#issuecomment-680714175

@QiangCai , @ravipesala @marchpure @akashrn5 : please check this

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3901: [CARBONDATA-3820] CDC update as new segment should use target table sort_scope

In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3901:
URL: https://github.com/apache/carbondata/pull/3901#issuecomment-680771095

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3875/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3901: [CARBONDATA-3820] CDC update as new segment should use target table sort_scope

In reply to this post by GitBox

CarbonDataQA1 commented on pull request #3901:
URL: https://github.com/apache/carbondata/pull/3901#issuecomment-680773026

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2134/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] Zhangshunyu commented on pull request #3901: [CARBONDATA-3820] CDC update as new segment should use target table sort_scope

In reply to this post by GitBox

Zhangshunyu commented on pull request #3901:
URL: https://github.com/apache/carbondata/pull/3901#issuecomment-682379495

set 'no_sort' for cdc is for better load performance during merge, but i think we should keep same as target table.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] ajantha-bhat commented on pull request #3901: [CARBONDATA-3820] CDC update as new segment should use target table sort_scope

In reply to this post by GitBox

ajantha-bhat commented on pull request #3901:
URL: https://github.com/apache/carbondata/pull/3901#issuecomment-682380195

> set 'no_sort' for cdc is for better load performance during merge, but i think we should keep same as target table.

@Zhangshunyu : But the target table itself can be created with no_sort. Now some segments can be sorted and some are not. so, I fixed it.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] ajantha-bhat edited a comment on pull request #3901: [CARBONDATA-3820] CDC update as new segment should use target table sort_scope

In reply to this post by GitBox

ajantha-bhat edited a comment on pull request #3901:
URL: https://github.com/apache/carbondata/pull/3901#issuecomment-682380195

> set 'no_sort' for cdc is for better load performance during merge, but i think we should keep same as target table.

@Zhangshunyu : But the target table itself can be created with no_sort. Now if target table is global sort, old segments are sorted and new CDC segmets are not. so, I fixed it.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] ajantha-bhat edited a comment on pull request #3901: [CARBONDATA-3820] CDC update as new segment should use target table sort_scope

In reply to this post by GitBox

ajantha-bhat edited a comment on pull request #3901:
URL: https://github.com/apache/carbondata/pull/3901#issuecomment-682380195

> set 'no_sort' for cdc is for better load performance during merge, but i think we should keep same as target table.

@Zhangshunyu : to have a faster CDC merge, target table itself can be created with no_sort. Now if target table is global sort, old segments are sorted and new CDC segmets are not. so, I fixed it.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] Zhangshunyu commented on pull request #3901: [CARBONDATA-3820] CDC update as new segment should use target table sort_scope

In reply to this post by GitBox

Zhangshunyu commented on pull request #3901:
URL: https://github.com/apache/carbondata/pull/3901#issuecomment-682381033

@ajantha-bhat yes, agree with this pr's change, to keep same as target.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] akashrn5 commented on pull request #3901: [CARBONDATA-3820] CDC update as new segment should use target table sort_scope

In reply to this post by GitBox

akashrn5 commented on pull request #3901:
URL: https://github.com/apache/carbondata/pull/3901#issuecomment-682408178

@ajantha-bhat if the target table is no sort and since we are inserting new segment as a separate segment during merge, we can sort this segment and write which will help in query, instead of blindly going with target table sort?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] ajantha-bhat commented on pull request #3901: [CARBONDATA-3820] CDC update as new segment should use target table sort_scope

In reply to this post by GitBox

ajantha-bhat commented on pull request #3901:
URL: https://github.com/apache/carbondata/pull/3901#issuecomment-682419143

> @ajantha-bhat if the target table is no sort and since we are inserting new segment as a separate segment during merge, we can sort this segment and write which will help in query, instead of blindly going with target table sort?

It is not blindly. The user has decided whether his table needs to be sorted or not based on his requirement (no_sort if want good load speed, global_sort if want good query speed), so it is better to have all segment follow user decision.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

GitBox

[GitHub] [carbondata] asfgit closed pull request #3901: [CARBONDATA-3820] CDC update as new segment should use target table sort_scope

In reply to this post by GitBox

asfgit closed pull request #3901:
URL: https://github.com/apache/carbondata/pull/3901

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]