[DISCUSSION]Merge index property and operations improvement.

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSSION]Merge index property and operations improvement.

akashnilugal@gmail.com
Hi All,

Currently, we have the merge index feature which can be enabled or disabled
and by default it's enabled.
Now during load or compaction, we first create index files and then create
merge index,
if merge index generation fails we don't fail load, we have the alter
compact command to do for unmerged
index files.

here are few things I want to suggest.

1. Deprecate the merge index property and keep for only for the developer
purpose.
2. do not allow the alter compact merge index command for new table as
already merge index is created and allow for only legacy tables.
       Alter merge index can be allowed only in the below conditions.
       a) when the update has happened on segment.
       b) when merge index creation failed during load or compaction.
3. Also no need to fail the load if the merge index fails(same as exiting
behavior)

Please suggest any modifications or any additions to this.

Thanks

Regards,
Akash R Nilugal
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSSION]Merge index property and operations improvement.

Ajantha Bhat
Hi Akash,
In point 3, you have mentioned no need to fail load if merge index fails,
So, how to create merge index again (as first-time query is slow without
merge index) If you block for new tables (as per point 2)? It is
contradicting I guess.

Here are my inputs for this,
*For Transactional tables*, As the merge index immediately deletes the
index files, concurrent queries can fail. So,

a) we can avoid exposing index files to the query (user), by making load
status success only after merge index created.
Also, update the table status file and segment file once after merge index
is created. no need to update with index file info before.
Also here keep maximum retry for table status file as this is the last
operation of load, failing here is costly to retry the whole load again.

b) After ensuring point a), If the merge index creation fails (which cannot
happen in most of the case), we can fail the load

c)  We still need to support the Alter table merge index command (mainly
required for old table upgrade scenario), no need to block for new tables.
when the user runs it, if index file doesn't exist (can know by reading
segment file), command can finish immediately and print a warning log that
no index files present to merge.

d) merge index carbon property (carbon.merge.index.in.segment), we can
directly remove it.


Thanks,
Ajantha

On Mon, Nov 9, 2020 at 1:56 PM Akash Nilugal <[hidden email]> wrote:

> Hi All,
>
> Currently, we have the merge index feature which can be enabled or disabled
> and by default it's enabled.
> Now during load or compaction, we first create index files and then create
> merge index,
> if merge index generation fails we don't fail load, we have the alter
> compact command to do for unmerged
> index files.
>
> here are few things I want to suggest.
>
> 1. Deprecate the merge index property and keep for only for the developer
> purpose.
> 2. do not allow the alter compact merge index command for new table as
> already merge index is created and allow for only legacy tables.
>        Alter merge index can be allowed only in the below conditions.
>        a) when the update has happened on segment.
>        b) when merge index creation failed during load or compaction.
> 3. Also no need to fail the load if the merge index fails(same as exiting
> behavior)
>
> Please suggest any modifications or any additions to this.
>
> Thanks
>
> Regards,
> Akash R Nilugal
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSSION]Merge index property and operations improvement.

akashrn5
Hi Ajantha,

Thanks for the reply, please find my comments

*a) and b)* agree to the point that, no need to make load success if the
merge index fails, we can fail the load
and update the status and segment file  only after merge index to avoid many
reliability and concurrent and cache issues.

*c) for your point C*, please check my point 2, i have already mentioned
that the alter can be allowed on legacy stores, for new tables i was
planning to keep in case of update, but now since updated data is written to
new segment, that scenario will also be not required.

So we can agree on point that, if user triggers for new tables, we can check
and give warn message and exit gracefully.

*d) for your point D*, as i said in my point 1, we can deprecate that
property and hide from user and keep only for developer purpose for any
issue analysis or debug.

I think these above conclusions can be finalized.

Thanks for the suggestions

Regards,
Akash R





--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSSION]Merge index property and operations improvement.

David CaiQiang
In reply to this post by Ajantha Bhat
   a) remove mergeIndex property and event listener, add mergeIndex as a part
of loading/compaction transaction.
   b) if the merging index failed, loading/compaction should fail directly.
   c) keep merge_index command and mark it deprecated.
      for a new table, maybe it will do nothing.
      for an old table, maybe we need to tolerate the probable query issue
(not found index files).
      It could be a deprecated feature in the future.
   b). At the end of loading,  retrying to finish index merging or
tablestatus update is a good suggestion.



-----
Best Regards
David Cai
--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Best Regards
David Cai
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSSION]Merge index property and operations improvement.

akashrn5
Hi david,

Thanks for reply

 a) remove mergeIndex property and event listener, add mergeIndex as a part
of loading/compaction transaction.
==> yes, this can be done, as already discussed.

b) if the merging index failed, loading/compaction should fail directly.
==> Agree to this, same as replied to ajantha

c) keep merge_index command and mark it deprecated.
      for a new table, maybe it will do nothing.
      for an old table, maybe we need to tolerate the probable query issue
(not found index files).
      It could be a deprecated feature in the future.
===> for new table we can follow similar to what i replied to ajantha, and
old tables we still need
and since we are avoiding to delete index files immediately for old tables,
we can avoid issue i think.
As you said, we can complete deprecate after sometime,


b). At the end of loading,  retrying to finish index merging or
tablestatus update is a good suggestion.
===> Do you mean, retry count for table status updation and merge index
should be more?

Regards,
Akash



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSSION]Merge index property and operations improvement.

kunalkapoor
+1

On Mon, Nov 23, 2020 at 3:12 PM akashrn5 <[hidden email]> wrote:

> Hi david,
>
> Thanks for reply
>
>  a) remove mergeIndex property and event listener, add mergeIndex as a part
> of loading/compaction transaction.
> ==> yes, this can be done, as already discussed.
>
> b) if the merging index failed, loading/compaction should fail directly.
> ==> Agree to this, same as replied to ajantha
>
> c) keep merge_index command and mark it deprecated.
>       for a new table, maybe it will do nothing.
>       for an old table, maybe we need to tolerate the probable query issue
> (not found index files).
>       It could be a deprecated feature in the future.
> ===> for new table we can follow similar to what i replied to ajantha, and
> old tables we still need
> and since we are avoiding to delete index files immediately for old tables,
> we can avoid issue i think.
> As you said, we can complete deprecate after sometime,
>
>
> b). At the end of loading,  retrying to finish index merging or
> tablestatus update is a good suggestion.
> ===> Do you mean, retry count for table status updation and merge index
> should be more?
>
> Regards,
> Akash
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSSION]Merge index property and operations improvement.

akashrn5
In reply to this post by akashnilugal@gmail.com
Hi,

final points to be considered are:

1. make merge index by default enable and fail the compaction, load if the
merge index operation fails.
2. Merge index property can be removed completely, if any developer wants to
check something code will be simpler to add some check to skip the merge
index.
3. update table status file and the segment file only if the merge index
operation succeeds.
4. Keep retry count max for table status updation of success and have retry
count logic for merge index also.
5. As david suggested, merge index listener will be removed and cab be added
as a logic in load and compaction transactions
6. Alter compact command for merge index will not do anything for new
tables. It will merge for old tables only and do not delete the old index
files immediately

These will be considered in PRs, please see if i had missed something.

Thanks

Regards
Akash R



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSSION]Merge index property and operations improvement.

Shreelekhya
Hi,

Based on the final points considered, I have included the following 3 points
details of implementation and test cases in the document attached to JIRA.
https://issues.apache.org/jira/browse/CARBONDATA-4037
PR: https://github.com/apache/carbondata/pull/3988

1. make merge index by default enable and fail the compaction, load if the
merge index operation fails.
3. update table status file and the segment file only if the merge index
operation succeeds.
6. Alter compact command for merge index will not do anything for new
tables. It will merge for old tables only and do not delete the old index
files immediately

Inputs and suggestions for any changes are most welcomed.




--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/