[GitHub] [carbondata] maheshrajus opened a new pull request #4134: [CARBONDATA-4185] Doc Changes for Heterogeneous format segments in carbondata

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] maheshrajus opened a new pull request #4134: [CARBONDATA-4185] Doc Changes for Heterogeneous format segments in carbondata

GitBox

maheshrajus opened a new pull request #4134:
URL: https://github.com/apache/carbondata/pull/4134


   
    ### Why is this PR needed?
    Heterogeneous format segments in carbondata documenation.
   
    ### What changes were proposed in this PR?
   Add segment feature background and impact on existed carbondata features
   
    ### Does this PR introduce any user interface change?
    - No
    ### Is any new testcase added?
    - No
   
       
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4134: [CARBONDATA-4185] Doc Changes for Heterogeneous format segments in carbondata

GitBox

CarbonDataQA2 commented on pull request #4134:
URL: https://github.com/apache/carbondata/pull/4134#issuecomment-840037269


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3609/
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4134: [CARBONDATA-4185] Doc Changes for Heterogeneous format segments in carbondata

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4134:
URL: https://github.com/apache/carbondata/pull/4134#issuecomment-840038514


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5354/
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kunal642 commented on a change in pull request #4134: [Doc][CARBONDATA-4185] Doc Changes for Heterogeneous format segments in carbondata

GitBox
In reply to this post by GitBox

kunal642 commented on a change in pull request #4134:
URL: https://github.com/apache/carbondata/pull/4134#discussion_r635788818



##########
File path: docs/addsegment-guide.md
##########
@@ -0,0 +1,78 @@
+<!--
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to you under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+    
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+# Heterogeneous format segments in carbondata
+
+###Background
+In the industry, many users already adopted to data with different formats like ORC, Parquet, JSON, CSV etc.,  
+If users want to migrate to Carbondata for better performance or for better features then there is no direct way.
+All the existing data needs to be converted to Carbondata to migrate.  
+This solution works out if the existing data is less, what if the existing data is more?  
+This process of converting is very time consuming and error-prone.

Review comment:
       Line 25 can be changed to "Heterogeneous format segments aims to solve this problem by avoiding data conversion"




--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kunal642 commented on a change in pull request #4134: [Doc][CARBONDATA-4185] Doc Changes for Heterogeneous format segments in carbondata

GitBox
In reply to this post by GitBox

kunal642 commented on a change in pull request #4134:
URL: https://github.com/apache/carbondata/pull/4134#discussion_r635789250



##########
File path: docs/addsegment-guide.md
##########
@@ -0,0 +1,78 @@
+<!--
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to you under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+    
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+# Heterogeneous format segments in carbondata
+
+###Background
+In the industry, many users already adopted to data with different formats like ORC, Parquet, JSON, CSV etc.,  
+If users want to migrate to Carbondata for better performance or for better features then there is no direct way.
+All the existing data needs to be converted to Carbondata to migrate.  
+This solution works out if the existing data is less, what if the existing data is more?  
+This process of converting is very time consuming and error-prone.
+
+###Add segment with path and format
+To solve the above problem carbon introduces the heterogeneous format segments which can read the data as per the mentioned format of the segment.

Review comment:
       after line 25 is changed "To solve the above problem carbon introduces the" can be removed in line 28




--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kunal642 commented on a change in pull request #4134: [Doc][CARBONDATA-4185] Doc Changes for Heterogeneous format segments in carbondata

GitBox
In reply to this post by GitBox

kunal642 commented on a change in pull request #4134:
URL: https://github.com/apache/carbondata/pull/4134#discussion_r635801606



##########
File path: docs/addsegment-guide.md
##########
@@ -0,0 +1,78 @@
+<!--
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to you under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+    
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+# Heterogeneous format segments in carbondata
+
+###Background
+In the industry, many users already adopted to data with different formats like ORC, Parquet, JSON, CSV etc.,  
+If users want to migrate to Carbondata for better performance or for better features then there is no direct way.
+All the existing data needs to be converted to Carbondata to migrate.  
+This solution works out if the existing data is less, what if the existing data is more?  
+This process of converting is very time consuming and error-prone.
+
+###Add segment with path and format
+To solve the above problem carbon introduces the heterogeneous format segments which can read the data as per the mentioned format of the segment.
+So users can add the existing data as a segment to the carbon table provided the schema of the data and the carbon table should be the same.
+
+```
+Alter table table_name add segment options (‘path’= 'hdfs://usr/oldtable,'format'=parquet)
+```
+In the above command user can add the existing data to the carbon table as a new segment and also
+ can provide the data format.
+
+During add segment, it will infer the schema from data and validates the schema against the carbon table.
+If the schema doesn’t match it throws an exception.
+
+###Changes to tablestatus file
+Carbon adds the new segment by adding segment information to tablestatus file. In order to add the path and format information to tablestatus, we are going to add `segmentPath`  and ‘format’  to the tablestatus file.
+And any extra `options` will be added to the segment file.
+
+
+###Changes to Spark Integration
+During select query carbon reads data through RDD which is created by
+  CarbonDatasourceHadoopRelation.buildScan, This RDD reads data from physical carbondata files and provides data to spark query plan.
+To support multiple formats per segment basis we can create multiple RDD using the existing Spark
+ file format scan class FileSourceScanExec . This class can generate scan RDD for all spark supported formats. We can union all these multi-format RDD and create a single RDD and provide it to spark query plan.
+
+Note: This integration will be clean as we use the sparks optimized reading, pruning and it
+ involves whole codegen and vector processing with unsafe support.
+
+###Changes to Presto Integration
+CarbondataSplitManager can create the splits for carbon and as well as for other formats and
+ choose the page source as per the split.  
+
+### Impact on existed feature
+**Count(\*) query:**  In case if the segments are mixed with different formats then driver side
+ optimization for count(*) query will not work so it will be executed on executor side.
+
+**INdex DataMaps:** Datamaps like block/blocklet datamap will only work for carbondata format

Review comment:
       change "INdex" to "Index"




--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kunal642 commented on a change in pull request #4134: [Doc][CARBONDATA-4185] Doc Changes for Heterogeneous format segments in carbondata

GitBox
In reply to this post by GitBox

kunal642 commented on a change in pull request #4134:
URL: https://github.com/apache/carbondata/pull/4134#discussion_r635801967



##########
File path: docs/addsegment-guide.md
##########
@@ -0,0 +1,78 @@
+<!--
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to you under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+    
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+# Heterogeneous format segments in carbondata
+
+###Background
+In the industry, many users already adopted to data with different formats like ORC, Parquet, JSON, CSV etc.,  
+If users want to migrate to Carbondata for better performance or for better features then there is no direct way.
+All the existing data needs to be converted to Carbondata to migrate.  
+This solution works out if the existing data is less, what if the existing data is more?  
+This process of converting is very time consuming and error-prone.
+
+###Add segment with path and format
+To solve the above problem carbon introduces the heterogeneous format segments which can read the data as per the mentioned format of the segment.
+So users can add the existing data as a segment to the carbon table provided the schema of the data and the carbon table should be the same.
+
+```
+Alter table table_name add segment options (‘path’= 'hdfs://usr/oldtable,'format'=parquet)
+```
+In the above command user can add the existing data to the carbon table as a new segment and also
+ can provide the data format.
+
+During add segment, it will infer the schema from data and validates the schema against the carbon table.
+If the schema doesn’t match it throws an exception.
+
+###Changes to tablestatus file
+Carbon adds the new segment by adding segment information to tablestatus file. In order to add the path and format information to tablestatus, we are going to add `segmentPath`  and ‘format’  to the tablestatus file.
+And any extra `options` will be added to the segment file.
+
+
+###Changes to Spark Integration
+During select query carbon reads data through RDD which is created by
+  CarbonDatasourceHadoopRelation.buildScan, This RDD reads data from physical carbondata files and provides data to spark query plan.
+To support multiple formats per segment basis we can create multiple RDD using the existing Spark
+ file format scan class FileSourceScanExec . This class can generate scan RDD for all spark supported formats. We can union all these multi-format RDD and create a single RDD and provide it to spark query plan.
+
+Note: This integration will be clean as we use the sparks optimized reading, pruning and it
+ involves whole codegen and vector processing with unsafe support.
+
+###Changes to Presto Integration
+CarbondataSplitManager can create the splits for carbon and as well as for other formats and
+ choose the page source as per the split.  
+
+### Impact on existed feature
+**Count(\*) query:**  In case if the segments are mixed with different formats then driver side
+ optimization for count(*) query will not work so it will be executed on executor side.
+
+**INdex DataMaps:** Datamaps like block/blocklet datamap will only work for carbondata format
+ segments so there would not be any driver side pruning for other formats.
+
+**Update/Delete:** Update & Delete operations cannot be allowed on the table which has mixed formats
+But it can be allowed if the external segments are added with carbondata format.
+
+**Compaction:** The other format segments cannot be compacted but carbondata segments inside that
+ table will be compacted.
+
+**Show Segments:** Now it shows the format and path of the segment along with current information.
+
+**Delete Segments & Clean Files:**  If the segment to be deleted is external then it will not be
+ deleted physically. If the segment is present internally only will be deleted physically.
+
+**MV/Preaggregate DataMap:** These datamaps can be created on the mixed format table without any

Review comment:
       No need to mention "Preaggregate" now




--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] maheshrajus commented on a change in pull request #4134: [Doc][CARBONDATA-4185] Doc Changes for Heterogeneous format segments in carbondata

GitBox
In reply to this post by GitBox

maheshrajus commented on a change in pull request #4134:
URL: https://github.com/apache/carbondata/pull/4134#discussion_r635874835



##########
File path: docs/addsegment-guide.md
##########
@@ -0,0 +1,78 @@
+<!--
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to you under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+    
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+# Heterogeneous format segments in carbondata
+
+###Background
+In the industry, many users already adopted to data with different formats like ORC, Parquet, JSON, CSV etc.,  
+If users want to migrate to Carbondata for better performance or for better features then there is no direct way.
+All the existing data needs to be converted to Carbondata to migrate.  
+This solution works out if the existing data is less, what if the existing data is more?  
+This process of converting is very time consuming and error-prone.

Review comment:
       done




--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] maheshrajus commented on a change in pull request #4134: [Doc][CARBONDATA-4185] Doc Changes for Heterogeneous format segments in carbondata

GitBox
In reply to this post by GitBox

maheshrajus commented on a change in pull request #4134:
URL: https://github.com/apache/carbondata/pull/4134#discussion_r635874971



##########
File path: docs/addsegment-guide.md
##########
@@ -0,0 +1,78 @@
+<!--
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to you under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+    
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+# Heterogeneous format segments in carbondata
+
+###Background
+In the industry, many users already adopted to data with different formats like ORC, Parquet, JSON, CSV etc.,  
+If users want to migrate to Carbondata for better performance or for better features then there is no direct way.
+All the existing data needs to be converted to Carbondata to migrate.  
+This solution works out if the existing data is less, what if the existing data is more?  
+This process of converting is very time consuming and error-prone.
+
+###Add segment with path and format
+To solve the above problem carbon introduces the heterogeneous format segments which can read the data as per the mentioned format of the segment.

Review comment:
       done




--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] maheshrajus commented on a change in pull request #4134: [Doc][CARBONDATA-4185] Doc Changes for Heterogeneous format segments in carbondata

GitBox
In reply to this post by GitBox

maheshrajus commented on a change in pull request #4134:
URL: https://github.com/apache/carbondata/pull/4134#discussion_r635876067



##########
File path: docs/addsegment-guide.md
##########
@@ -0,0 +1,78 @@
+<!--
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to you under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+    
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+# Heterogeneous format segments in carbondata
+
+###Background
+In the industry, many users already adopted to data with different formats like ORC, Parquet, JSON, CSV etc.,  
+If users want to migrate to Carbondata for better performance or for better features then there is no direct way.
+All the existing data needs to be converted to Carbondata to migrate.  
+This solution works out if the existing data is less, what if the existing data is more?  
+This process of converting is very time consuming and error-prone.
+
+###Add segment with path and format
+To solve the above problem carbon introduces the heterogeneous format segments which can read the data as per the mentioned format of the segment.
+So users can add the existing data as a segment to the carbon table provided the schema of the data and the carbon table should be the same.
+
+```
+Alter table table_name add segment options (‘path’= 'hdfs://usr/oldtable,'format'=parquet)
+```
+In the above command user can add the existing data to the carbon table as a new segment and also
+ can provide the data format.
+
+During add segment, it will infer the schema from data and validates the schema against the carbon table.
+If the schema doesn’t match it throws an exception.
+
+###Changes to tablestatus file
+Carbon adds the new segment by adding segment information to tablestatus file. In order to add the path and format information to tablestatus, we are going to add `segmentPath`  and ‘format’  to the tablestatus file.
+And any extra `options` will be added to the segment file.
+
+
+###Changes to Spark Integration
+During select query carbon reads data through RDD which is created by
+  CarbonDatasourceHadoopRelation.buildScan, This RDD reads data from physical carbondata files and provides data to spark query plan.
+To support multiple formats per segment basis we can create multiple RDD using the existing Spark
+ file format scan class FileSourceScanExec . This class can generate scan RDD for all spark supported formats. We can union all these multi-format RDD and create a single RDD and provide it to spark query plan.
+
+Note: This integration will be clean as we use the sparks optimized reading, pruning and it
+ involves whole codegen and vector processing with unsafe support.
+
+###Changes to Presto Integration
+CarbondataSplitManager can create the splits for carbon and as well as for other formats and
+ choose the page source as per the split.  
+
+### Impact on existed feature
+**Count(\*) query:**  In case if the segments are mixed with different formats then driver side
+ optimization for count(*) query will not work so it will be executed on executor side.
+
+**INdex DataMaps:** Datamaps like block/blocklet datamap will only work for carbondata format

Review comment:
       done




--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] maheshrajus commented on a change in pull request #4134: [Doc][CARBONDATA-4185] Doc Changes for Heterogeneous format segments in carbondata

GitBox
In reply to this post by GitBox

maheshrajus commented on a change in pull request #4134:
URL: https://github.com/apache/carbondata/pull/4134#discussion_r635876163



##########
File path: docs/addsegment-guide.md
##########
@@ -0,0 +1,78 @@
+<!--
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to you under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+    
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+# Heterogeneous format segments in carbondata
+
+###Background
+In the industry, many users already adopted to data with different formats like ORC, Parquet, JSON, CSV etc.,  
+If users want to migrate to Carbondata for better performance or for better features then there is no direct way.
+All the existing data needs to be converted to Carbondata to migrate.  
+This solution works out if the existing data is less, what if the existing data is more?  
+This process of converting is very time consuming and error-prone.
+
+###Add segment with path and format
+To solve the above problem carbon introduces the heterogeneous format segments which can read the data as per the mentioned format of the segment.
+So users can add the existing data as a segment to the carbon table provided the schema of the data and the carbon table should be the same.
+
+```
+Alter table table_name add segment options (‘path’= 'hdfs://usr/oldtable,'format'=parquet)
+```
+In the above command user can add the existing data to the carbon table as a new segment and also
+ can provide the data format.
+
+During add segment, it will infer the schema from data and validates the schema against the carbon table.
+If the schema doesn’t match it throws an exception.
+
+###Changes to tablestatus file
+Carbon adds the new segment by adding segment information to tablestatus file. In order to add the path and format information to tablestatus, we are going to add `segmentPath`  and ‘format’  to the tablestatus file.
+And any extra `options` will be added to the segment file.
+
+
+###Changes to Spark Integration
+During select query carbon reads data through RDD which is created by
+  CarbonDatasourceHadoopRelation.buildScan, This RDD reads data from physical carbondata files and provides data to spark query plan.
+To support multiple formats per segment basis we can create multiple RDD using the existing Spark
+ file format scan class FileSourceScanExec . This class can generate scan RDD for all spark supported formats. We can union all these multi-format RDD and create a single RDD and provide it to spark query plan.
+
+Note: This integration will be clean as we use the sparks optimized reading, pruning and it
+ involves whole codegen and vector processing with unsafe support.
+
+###Changes to Presto Integration
+CarbondataSplitManager can create the splits for carbon and as well as for other formats and
+ choose the page source as per the split.  
+
+### Impact on existed feature
+**Count(\*) query:**  In case if the segments are mixed with different formats then driver side
+ optimization for count(*) query will not work so it will be executed on executor side.
+
+**INdex DataMaps:** Datamaps like block/blocklet datamap will only work for carbondata format
+ segments so there would not be any driver side pruning for other formats.
+
+**Update/Delete:** Update & Delete operations cannot be allowed on the table which has mixed formats
+But it can be allowed if the external segments are added with carbondata format.
+
+**Compaction:** The other format segments cannot be compacted but carbondata segments inside that
+ table will be compacted.
+
+**Show Segments:** Now it shows the format and path of the segment along with current information.
+
+**Delete Segments & Clean Files:**  If the segment to be deleted is external then it will not be
+ deleted physically. If the segment is present internally only will be deleted physically.
+
+**MV/Preaggregate DataMap:** These datamaps can be created on the mixed format table without any

Review comment:
       done




--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4134: [Doc][CARBONDATA-4185] Doc Changes for Heterogeneous format segments in carbondata

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4134:
URL: https://github.com/apache/carbondata/pull/4134#issuecomment-844931633


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3663/
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4134: [Doc][CARBONDATA-4185] Doc Changes for Heterogeneous format segments in carbondata

GitBox
In reply to this post by GitBox

CarbonDataQA2 commented on pull request #4134:
URL: https://github.com/apache/carbondata/pull/4134#issuecomment-844934031


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5407/
   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] asfgit closed pull request #4134: [Doc][CARBONDATA-4185] Doc Changes for Heterogeneous format segments in carbondata

GitBox
In reply to this post by GitBox

asfgit closed pull request #4134:
URL: https://github.com/apache/carbondata/pull/4134


   


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]


Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] kunal642 commented on pull request #4134: [Doc][CARBONDATA-4185] Doc Changes for Heterogeneous format segments in carbondata

GitBox
In reply to this post by GitBox

kunal642 commented on pull request #4134:
URL: https://github.com/apache/carbondata/pull/4134#issuecomment-845190741


   LGTM


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]