Dear Community.!
Have recently encountered a problem of Segment directory and segment file in metadata directory are not created for partitioned table when 'carbon.merge.index.in.segment' property is set to 'false'. And actual index files which were present in respective partition's '.tmp' directory are also deleted without moving them out to respective partition directory where its '.carbondata' file exist. Thus queries throw exception while reading index files. Please refer jira issue - https://issues.apache.org/jira/browse/CARBONDATA-3834 <https://issues.apache.org/jira/browse/CARBONDATA-3834> To address this issue, we have 2 options to go with - 1. Either fix it to work for 'carbon.merge.index.in.segment' set to false case. There is an open PR https://github.com/apache/carbondata/pull/3776 <https://github.com/apache/carbondata/pull/3776> for it. 2. Or Deprecate the 'carbon.merge.index.in.segment' property itself. As the query performance is better when merge index files are in use when compared to normal index files, and default behavior is to generate merge index files, probably it is not necessary to support 'carbon.merge.index.in.segment' anymore. What do you think about it ? Please let me know your opinion. Thanks, Venu -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ |
Better to always merge index.
-1 for 1, +1 for 2, ----- Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Best Regards
David Cai |
Hi, What if too many index files in a segment and user want to finish load
fast and don't want to wait for merge index? That time setting merge index = false can help to save load time and in off peak time user can create merge index. So I still feel we need to fix issue exist when merge index = false. Thanks, Ajantha On Thu, 9 Jul, 2020, 3:05 pm David CaiQiang, <[hidden email]> wrote: > Better to always merge index. > > -1 for 1, > > +1 for 2, > > > > ----- > Best Regards > David Cai > -- > Sent from: > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ > |
Agree with david on always using merge index.
+1 for deprecating the property. On Thu, Jul 9, 2020 at 3:32 PM Ajantha Bhat <[hidden email]> wrote: > Hi, What if too many index files in a segment and user want to finish load > fast and don't want to wait for merge index? > > That time setting merge index = false can help to save load time and in off > peak time user can create merge index. > > So I still feel we need to fix issue exist when merge index = false. > > Thanks, > Ajantha > > On Thu, 9 Jul, 2020, 3:05 pm David CaiQiang, <[hidden email]> wrote: > > > Better to always merge index. > > > > -1 for 1, > > > > +1 for 2, > > > > > > > > ----- > > Best Regards > > David Cai > > -- > > Sent from: > > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ > > > |
+1 for deprecating this property. We can have it as an internal property
for dev, not expose to user. +1 for fixing the issue. If merge index creation is failing, then also query should work fine with index file. -Regards Kumar Vishal On Thu, 9 Jul 2020 at 10:58 PM, Kunal Kapoor <[hidden email]> wrote: > Agree with david on always using merge index. > > +1 for deprecating the property. > > On Thu, Jul 9, 2020 at 3:32 PM Ajantha Bhat <[hidden email]> wrote: > > > Hi, What if too many index files in a segment and user want to finish > load > > fast and don't want to wait for merge index? > > > > That time setting merge index = false can help to save load time and in > off > > peak time user can create merge index. > > > > So I still feel we need to fix issue exist when merge index = false. > > > > Thanks, > > Ajantha > > > > On Thu, 9 Jul, 2020, 3:05 pm David CaiQiang, <[hidden email]> > wrote: > > > > > Better to always merge index. > > > > > > -1 for 1, > > > > > > +1 for 2, > > > > > > > > > > > > ----- > > > Best Regards > > > David Cai > > > -- > > > Sent from: > > > > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ > > > > > >
kumar vishal
|
In reply to this post by VenuReddy
Hi,
+1, we can deprecate it and as Vishal suggested we can keep as internal property for developer purpose. Regards, Akash R Nilugal On Thu, Jul 9, 2020, 2:46 PM VenuReddy <[hidden email]> wrote: > Dear Community.! > > Have recently encountered a problem of Segment directory and segment file > in > metadata directory are not created for partitioned table when > 'carbon.merge.index.in.segment' property is set to 'false'. And actual > index > files which were present in respective partition's '.tmp' directory are > also > deleted without moving them out to respective partition directory where its > '.carbondata' file exist. Thus queries throw exception while reading index > files. Please refer jira issue - > https://issues.apache.org/jira/browse/CARBONDATA-3834 > <https://issues.apache.org/jira/browse/CARBONDATA-3834> > > To address this issue, we have 2 options to go with - > 1. Either fix it to work for 'carbon.merge.index.in.segment' set to false > case. There is an open PR https://github.com/apache/carbondata/pull/3776 > <https://github.com/apache/carbondata/pull/3776> for it. > > 2. Or Deprecate the 'carbon.merge.index.in.segment' property itself. As > the > query performance is better when merge index files are in use when compared > to normal index files, and default behavior is to generate merge index > files, probably it is not necessary to support > 'carbon.merge.index.in.segment' anymore. > > What do you think about it ? Please let me know your opinion. > > Thanks, > Venu > > > > -- > Sent from: > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ > |
Hi,
I didn't reply to deprecation. *+1 for deprecating it*. *And +1 for issue fix also.* Issue fix, I didn't mean when *carbon.merge.index.in <http://carbon.merge.index.in/>.segment = false.* but when when *carbon.merge.index.in <http://carbon.merge.index.in/>.segment = true and merge index creation failed for some reason.* code needs to take care of a. Moving index files from temp folder to final folder in case of partition table load. b. Not failing the current partition load. (same as normal load behavior) I think these two are not handled after partition optimization, you can check and handle it. Thanks, Ajantha On Thu, 9 Jul, 2020, 9:29 pm Akash r, <[hidden email]> wrote: > Hi, > > +1, we can deprecate it and as Vishal suggested we can keep as internal > property for developer purpose. > > Regards, > Akash R Nilugal > > On Thu, Jul 9, 2020, 2:46 PM VenuReddy <[hidden email]> wrote: > > > Dear Community.! > > > > Have recently encountered a problem of Segment directory and segment file > > in > > metadata directory are not created for partitioned table when > > 'carbon.merge.index.in.segment' property is set to 'false'. And actual > > index > > files which were present in respective partition's '.tmp' directory are > > also > > deleted without moving them out to respective partition directory where > its > > '.carbondata' file exist. Thus queries throw exception while reading > index > > files. Please refer jira issue - > > https://issues.apache.org/jira/browse/CARBONDATA-3834 > > <https://issues.apache.org/jira/browse/CARBONDATA-3834> > > > > To address this issue, we have 2 options to go with - > > 1. Either fix it to work for 'carbon.merge.index.in.segment' set to > false > > case. There is an open PR > https://github.com/apache/carbondata/pull/3776 > > <https://github.com/apache/carbondata/pull/3776> for it. > > > > 2. Or Deprecate the 'carbon.merge.index.in.segment' property itself. As > > the > > query performance is better when merge index files are in use when > compared > > to normal index files, and default behavior is to generate merge index > > files, probably it is not necessary to support > > 'carbon.merge.index.in.segment' anymore. > > > > What do you think about it ? Please let me know your opinion. > > > > Thanks, > > Venu > > > > > > > > -- > > Sent from: > > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ > > > > |
Hi,
+1 I agree with Vishal, let's deprecate the configuration and keep it as internal. Regards, Ravindra. On Fri, 10 Jul 2020 at 01:54, Ajantha Bhat <[hidden email]> wrote: > Hi, > I didn't reply to deprecation. *+1 for deprecating it*. > > *And +1 for issue fix also.* > Issue fix, I didn't mean when *carbon.merge.index.in > <http://carbon.merge.index.in/>.segment = false.* > > but when when *carbon.merge.index.in > <http://carbon.merge.index.in/>.segment = true and merge index creation > failed for some reason.* > code needs to take care of > a. Moving index files from temp folder to final folder in case of partition > table load. > b. Not failing the current partition load. (same as normal load behavior) > I think these two are not handled after partition optimization, you can > check and handle it. > > > Thanks, > Ajantha > > On Thu, 9 Jul, 2020, 9:29 pm Akash r, <[hidden email]> wrote: > > > Hi, > > > > +1, we can deprecate it and as Vishal suggested we can keep as internal > > property for developer purpose. > > > > Regards, > > Akash R Nilugal > > > > On Thu, Jul 9, 2020, 2:46 PM VenuReddy <[hidden email]> > wrote: > > > > > Dear Community.! > > > > > > Have recently encountered a problem of Segment directory and segment > file > > > in > > > metadata directory are not created for partitioned table when > > > 'carbon.merge.index.in.segment' property is set to 'false'. And actual > > > index > > > files which were present in respective partition's '.tmp' directory are > > > also > > > deleted without moving them out to respective partition directory where > > its > > > '.carbondata' file exist. Thus queries throw exception while reading > > index > > > files. Please refer jira issue - > > > https://issues.apache.org/jira/browse/CARBONDATA-3834 > > > <https://issues.apache.org/jira/browse/CARBONDATA-3834> > > > > > > To address this issue, we have 2 options to go with - > > > 1. Either fix it to work for 'carbon.merge.index.in.segment' set to > > false > > > case. There is an open PR > > https://github.com/apache/carbondata/pull/3776 > > > <https://github.com/apache/carbondata/pull/3776> for it. > > > > > > 2. Or Deprecate the 'carbon.merge.index.in.segment' property itself. > As > > > the > > > query performance is better when merge index files are in use when > > compared > > > to normal index files, and default behavior is to generate merge index > > > files, probably it is not necessary to support > > > 'carbon.merge.index.in.segment' anymore. > > > > > > What do you think about it ? Please let me know your opinion. > > > > > > Thanks, > > > Venu > > > > > > > > > > > > -- > > > Sent from: > > > > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ > > > > > > > > -- Thanks & Regards, Ravi |
In reply to this post by kumarvishal09
The merging index should be a part of loading. It is not good to extract the
merging index to an independent process, it brought the query issue (the system can't find the index files when/after merging). In my opinion, during loading, new .carbonindex files should be temporary, we should merge them to a .carbonindexmerge file in a segment before updating the segment status to success in tablestatus file. When the merging index failed, loading should be failed. for query: 1. support reading .carbonindex files and .carbonindexmerge files for loading: (also include the loading part of compaction/create index/create mv/merge operations) better to do like this. step 1. update tablestatus file to add an in-progress segment step 2. generate carbondata file and temporary .carbonindex files, for a partitioned table, it also generates a temporary segment file for each related partition. step 3. merge .carbonindex files to a .carbonindexmerge file. step 4. write a segment file. for a partitioned table, merge all temporary segment files to one segment file. step 5. update tablestatus file with final status, segment file name and some statistics. So in total, update tablestatus file twice, write segment file once, write .carbonindex files once, write and delete .carbonindex files once. for updating: 1. Now only updating operation can keep .carbonindex file in the future, maybe we can change updating operations to the same with merge operation to generate new files into a new segment. ----- Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Best Regards
David Cai |
update reply:
The merging index should be a part of loading. It is not good to extract the merging index to an independent process, it brought the query issue (the system can't find the index files when/after merging). In my opinion, during loading, new .carbonindex files should be temporary, we should merge them to a .carbonindexmerge file in a segment before updating the segment status to success in tablestatus file. When the merging index failed, loading should be failed. for query: 1. support reading .carbonindex files and .carbonindexmerge files for loading: (also include the loading part of compaction/create index/create mv/merge operations) better to do like this. step 1. update tablestatus file to add an in-progress segment step 2. generate carbondata files and temporary .carbonindex files. step 3. merge .carbonindex files to a .carbonindexmerge file. step 4. write a segment file. step 5. update tablestatus file with final status, segment file name and some statistics. So in total, update tablestatus file twice, write segment file once, write .carbonindexmerge files once, write and delete .carbonindex files once. for updating: 1. Now only updating operation can keep .carbonindex file in the future, maybe we can change updating operations to the same with merge operations to generate new files into a new segment. ----- Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Best Regards
David Cai |
Free forum by Nabble | Edit this page |