marchpure opened a new pull request #3792: URL: https://github.com/apache/carbondata/pull/3792 ### Why is this PR needed? When the number segments is large, "SHOW SEGMENTS" time overhead is too high. The LIMIT operator shall be supported in the SHOW SEGMENTS command. ### What changes were proposed in this PR? Add LIMIT operator in the grammar parsing and read segments function. ### Does this PR introduce any user interface change? Yes ### Is any new testcase added? Yes ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
CarbonDataQA1 commented on pull request #3792: URL: https://github.com/apache/carbondata/pull/3792#issuecomment-645594860 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1443/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3792: URL: https://github.com/apache/carbondata/pull/3792#issuecomment-645595925 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3168/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3792: URL: https://github.com/apache/carbondata/pull/3792#discussion_r442128902 ########## File path: docs/segment-management-on-carbondata.md ########## @@ -54,6 +54,12 @@ concept which helps to maintain consistency of data and easy transaction managem SHOW SEGMENTS ON CarbonDatabase.CarbonTable ``` + Show lastest 10 visible segments Review comment: After compaction also, it should show latest segments right? For example, there are 6 segments, after major compaction, show segments with limit 2 will display 4.1 and 5th segment only, whereas latest segments are 0.2 and 0.3. I think we should get latest segments based on timestamp. ########## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/segment/ShowSegmentTestCase.scala ########## @@ -191,7 +214,11 @@ class ShowSegmentTestCase extends QueryTest with BeforeAndAfterAll { sql("drop table if exists a") sql("create table a(a string) stored as carbondata") sql("insert into a select 'k'") + sql("insert into a select 'j'") + sql("insert into a select 'k'") val rows = sql("show segments for table a").collect() + assert(sql(s"show segments for table a").collect().length == 3) Review comment: ```suggestion assert(rows.length == 3) ``` ########## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/segmentreading/TestSegmentReading.scala ########## @@ -251,13 +251,16 @@ class TestSegmentReading extends QueryTest with BeforeAndAfterAll { s"""LOAD DATA local inpath '$resourcesPath/data.csv' INTO TABLE carbon_table_show_seg OPTIONS |('DELIMITER'= ',', 'QUOTECHAR'= '\"')""".stripMargin) val df = sql("SHOW SEGMENTS for table carbon_table_show_seg as select * from carbon_table_show_seg_segments") + sql("SHOW SEGMENTS for table carbon_table_show_seg as select * from carbon_table_show_seg_segments").show() Review comment: Can remove this line ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
marchpure commented on a change in pull request #3792: URL: https://github.com/apache/carbondata/pull/3792#discussion_r442151742 ########## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/segment/ShowSegmentTestCase.scala ########## @@ -191,7 +214,11 @@ class ShowSegmentTestCase extends QueryTest with BeforeAndAfterAll { sql("drop table if exists a") sql("create table a(a string) stored as carbondata") sql("insert into a select 'k'") + sql("insert into a select 'j'") + sql("insert into a select 'k'") val rows = sql("show segments for table a").collect() + assert(sql(s"show segments for table a").collect().length == 3) Review comment: modified ########## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/segmentreading/TestSegmentReading.scala ########## @@ -251,13 +251,16 @@ class TestSegmentReading extends QueryTest with BeforeAndAfterAll { s"""LOAD DATA local inpath '$resourcesPath/data.csv' INTO TABLE carbon_table_show_seg OPTIONS |('DELIMITER'= ',', 'QUOTECHAR'= '\"')""".stripMargin) val df = sql("SHOW SEGMENTS for table carbon_table_show_seg as select * from carbon_table_show_seg_segments") + sql("SHOW SEGMENTS for table carbon_table_show_seg as select * from carbon_table_show_seg_segments").show() Review comment: modified ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
marchpure commented on a change in pull request #3792: URL: https://github.com/apache/carbondata/pull/3792#discussion_r442154411 ########## File path: docs/segment-management-on-carbondata.md ########## @@ -54,6 +54,12 @@ concept which helps to maintain consistency of data and easy transaction managem SHOW SEGMENTS ON CarbonDatabase.CarbonTable ``` + Show lastest 10 visible segments Review comment: Yeah. currently, the showed segments are ordered by loadname desc. in the type of: +---+---------+-----------------------+ |ID |Status |Load Start Time | +---+---------+-----------------------+ |5 |Compacted|2020-06-18 04:20:09.041| |4.1|Success |2020-06-18 04:20:09.041| |4 |Compacted|2020-06-18 04:20:08.69 | |3 |Compacted|2020-06-18 04:20:07.622| |2.1|Compacted|2020-06-18 04:20:07.622| |2 |Compacted|2020-06-18 04:20:07.226| ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3792: URL: https://github.com/apache/carbondata/pull/3792#discussion_r442174299 ########## File path: docs/segment-management-on-carbondata.md ########## @@ -54,6 +54,12 @@ concept which helps to maintain consistency of data and easy transaction managem SHOW SEGMENTS ON CarbonDatabase.CarbonTable ``` + Show lastest 10 visible segments Review comment: yes. i think it is better to display latest segments based on timestamp on limit ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3792: URL: https://github.com/apache/carbondata/pull/3792#discussion_r442174299 ########## File path: docs/segment-management-on-carbondata.md ########## @@ -54,6 +54,12 @@ concept which helps to maintain consistency of data and easy transaction managem SHOW SEGMENTS ON CarbonDatabase.CarbonTable ``` + Show lastest 10 visible segments Review comment: yes. i think it is better to display latest segments based on timestamp on query with limit ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3792: URL: https://github.com/apache/carbondata/pull/3792#issuecomment-646019412 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3169/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3792: URL: https://github.com/apache/carbondata/pull/3792#issuecomment-646020176 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1444/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Indhumathi27 commented on pull request #3792: URL: https://github.com/apache/carbondata/pull/3792#issuecomment-646429823 LGTM ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
asfgit closed pull request #3792: URL: https://github.com/apache/carbondata/pull/3792 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
Free forum by Nabble | Edit this page |