[GitHub] carbondata pull request #3047: [CARBONDATA-3223] Fixed Wrong Datasize and In...

classic Classic list List threaded Threaded
28 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #3047: [CARBONDATA-3223] Fixed Wrong Datasize and In...

qiuchenjian-2
GitHub user manishnalla1994 opened a pull request:

    https://github.com/apache/carbondata/pull/3047

    [CARBONDATA-3223] Fixed Wrong Datasize and Indexsize calculation for old store using Show Segments

    Problem: Table Created and Loading on older version(1.1) was showing data-size and index-size 0B when refreshed on new version. This was because when the data-size was coming as "null" we were not computing it, directly assigning 0 value to it.
   
    Solution: Computed the correct data-size and index-size using CarbonTable.
     
    Be sure to do all of the following checklist to help us incorporate
    your contribution quickly and easily:
   
     - [ ] Any interfaces changed?
     
     - [ ] Any backward compatibility impacted?
     
     - [ ] Document update required?
   
     - [x] Testing done
            Please provide details on
            - Whether new unit test cases have been added or why no new tests are required?
            - How it is tested? Please attach test report.
            - Is it a performance related change? Please attach the performance test report.
            - Any additional information to help reviewers in testing this change.
           
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/manishnalla1994/carbondata Datasize0Issue

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/3047.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3047
   
----
commit 6bf65d7a0b42e8d9a822fd234a510550bd8d2f17
Author: manishnalla1994 <manish.nalla1994@...>
Date:   2019-01-02T12:30:36Z

    Fixed Wrong Datasize and Indexsize calculation for old store

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #3047: [CARBONDATA-3223] Fixed Wrong Datasize and Indexsize...

qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/3047
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2124/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #3047: [CARBONDATA-3223] Fixed Wrong Datasize and Indexsize...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/3047
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2330/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #3047: [CARBONDATA-3223] Fixed Wrong Datasize and Indexsize...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/3047
 
    Build Failed  with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10378/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #3047: [CARBONDATA-3223] Fixed Wrong Datasize and In...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user qiuchenjian commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/3047#discussion_r244895354
 
    --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/api/CarbonStore.scala ---
    @@ -101,14 +102,21 @@ object CarbonStore {
               val (dataSize, indexSize) = if (load.getFileFormat == FileFormat.ROW_V1) {
                 // for streaming segment, we should get the actual size from the index file
                 // since it is continuously inserting data
    -            val segmentDir = CarbonTablePath.getSegmentPath(tablePath, load.getLoadName)
    +            val segmentDir = CarbonTablePath
    +              .getSegmentPath(carbonTable.getTablePath, load.getLoadName)
                 val indexPath = CarbonTablePath.getCarbonStreamIndexFilePath(segmentDir)
                 val indices = StreamSegment.readIndexFile(indexPath, FileFactory.getFileType(indexPath))
                 (indices.asScala.map(_.getFile_size).sum, FileFactory.getCarbonFile(indexPath).getSize)
               } else {
                 // for batch segment, we can get the data size from table status file directly
    -            (if (load.getDataSize == null) 0L else load.getDataSize.toLong,
    -              if (load.getIndexSize == null) 0L else load.getIndexSize.toLong)
    +            if (null == load.getDataSize && null == load.getIndexSize) {
    +              val dataIndexSize = CarbonUtil.calculateDataIndexSize(carbonTable, false)
    +              (dataIndexSize.get(CarbonCommonConstants.CARBON_TOTAL_DATA_SIZE).toLong,
    +                dataIndexSize.get(CarbonCommonConstants.CARBON_TOTAL_INDEX_SIZE).toLong)
    +            } else {
    +              (load.getDataSize.toLong,
    --- End diff --
   
    if one of load.getDataSize and load.getIndexSize is null, it will throw exception, i think this scene should be considered


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #3047: [CARBONDATA-3223] Fixed Wrong Datasize and In...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishnalla1994 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/3047#discussion_r244911752
 
    --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/api/CarbonStore.scala ---
    @@ -101,14 +102,21 @@ object CarbonStore {
               val (dataSize, indexSize) = if (load.getFileFormat == FileFormat.ROW_V1) {
                 // for streaming segment, we should get the actual size from the index file
                 // since it is continuously inserting data
    -            val segmentDir = CarbonTablePath.getSegmentPath(tablePath, load.getLoadName)
    +            val segmentDir = CarbonTablePath
    +              .getSegmentPath(carbonTable.getTablePath, load.getLoadName)
                 val indexPath = CarbonTablePath.getCarbonStreamIndexFilePath(segmentDir)
                 val indices = StreamSegment.readIndexFile(indexPath, FileFactory.getFileType(indexPath))
                 (indices.asScala.map(_.getFile_size).sum, FileFactory.getCarbonFile(indexPath).getSize)
               } else {
                 // for batch segment, we can get the data size from table status file directly
    -            (if (load.getDataSize == null) 0L else load.getDataSize.toLong,
    -              if (load.getIndexSize == null) 0L else load.getIndexSize.toLong)
    +            if (null == load.getDataSize && null == load.getIndexSize) {
    +              val dataIndexSize = CarbonUtil.calculateDataIndexSize(carbonTable, false)
    +              (dataIndexSize.get(CarbonCommonConstants.CARBON_TOTAL_DATA_SIZE).toLong,
    +                dataIndexSize.get(CarbonCommonConstants.CARBON_TOTAL_INDEX_SIZE).toLong)
    +            } else {
    +              (load.getDataSize.toLong,
    --- End diff --
   
    Yes, fixed it now.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #3047: [CARBONDATA-3223] Fixed Wrong Datasize and Indexsize...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/3047
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2135/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #3047: [CARBONDATA-3223] Fixed Wrong Datasize and Indexsize...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/3047
 
    Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10389/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #3047: [CARBONDATA-3223] Fixed Wrong Datasize and Indexsize...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/3047
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2341/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #3047: [CARBONDATA-3223] Fixed Wrong Datasize and In...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/3047#discussion_r244922117
 
    --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/api/CarbonStore.scala ---
    @@ -101,14 +102,23 @@ object CarbonStore {
               val (dataSize, indexSize) = if (load.getFileFormat == FileFormat.ROW_V1) {
                 // for streaming segment, we should get the actual size from the index file
                 // since it is continuously inserting data
    -            val segmentDir = CarbonTablePath.getSegmentPath(tablePath, load.getLoadName)
    +            val segmentDir = CarbonTablePath
    +              .getSegmentPath(carbonTable.getTablePath, load.getLoadName)
                 val indexPath = CarbonTablePath.getCarbonStreamIndexFilePath(segmentDir)
                 val indices = StreamSegment.readIndexFile(indexPath, FileFactory.getFileType(indexPath))
                 (indices.asScala.map(_.getFile_size).sum, FileFactory.getCarbonFile(indexPath).getSize)
               } else {
                 // for batch segment, we can get the data size from table status file directly
    -            (if (load.getDataSize == null) 0L else load.getDataSize.toLong,
    -              if (load.getIndexSize == null) 0L else load.getIndexSize.toLong)
    +            if (null == load.getDataSize || null == load.getIndexSize) {
    +              // If either of datasize or indexsize comes to be null the we calculate the correct
    +              // size and assign
    +              val dataIndexSize = CarbonUtil.calculateDataIndexSize(carbonTable, false)
    --- End diff --
   
    Boolean flag in the method call is to update the data and index size in the table status file. Pass the flag as true so that it computes the size and update the table status file. This will avoid calculation for each Show Segment call


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #3047: [CARBONDATA-3223] Fixed Wrong Datasize and In...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/3047#discussion_r244920921
 
    --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/api/CarbonStore.scala ---
    @@ -46,9 +47,9 @@ object CarbonStore {
     
       def showSegments(
           limit: Option[String],
    -      tablePath: String,
    +      carbonTable: CarbonTable,
    --- End diff --
   
    Move `carbonTable` as the first argument of method


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #3047: [CARBONDATA-3223] Fixed Wrong Datasize and Indexsize...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/3047
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2143/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #3047: [CARBONDATA-3223] Fixed Wrong Datasize and Indexsize...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on the issue:

    https://github.com/apache/carbondata/pull/3047
 
    LGTM...can be merged once build passes


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #3047: [CARBONDATA-3223] Fixed Wrong Datasize and Indexsize...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/3047
 
    Build Failed  with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10397/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #3047: [CARBONDATA-3223] Fixed Wrong Datasize and In...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishnalla1994 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/3047#discussion_r244957693
 
    --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/api/CarbonStore.scala ---
    @@ -46,9 +47,9 @@ object CarbonStore {
     
       def showSegments(
           limit: Option[String],
    -      tablePath: String,
    +      carbonTable: CarbonTable,
    --- End diff --
   
    Done.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #3047: [CARBONDATA-3223] Fixed Wrong Datasize and In...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishnalla1994 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/3047#discussion_r244957746
 
    --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/api/CarbonStore.scala ---
    @@ -101,14 +102,23 @@ object CarbonStore {
               val (dataSize, indexSize) = if (load.getFileFormat == FileFormat.ROW_V1) {
                 // for streaming segment, we should get the actual size from the index file
                 // since it is continuously inserting data
    -            val segmentDir = CarbonTablePath.getSegmentPath(tablePath, load.getLoadName)
    +            val segmentDir = CarbonTablePath
    +              .getSegmentPath(carbonTable.getTablePath, load.getLoadName)
                 val indexPath = CarbonTablePath.getCarbonStreamIndexFilePath(segmentDir)
                 val indices = StreamSegment.readIndexFile(indexPath, FileFactory.getFileType(indexPath))
                 (indices.asScala.map(_.getFile_size).sum, FileFactory.getCarbonFile(indexPath).getSize)
               } else {
                 // for batch segment, we can get the data size from table status file directly
    -            (if (load.getDataSize == null) 0L else load.getDataSize.toLong,
    -              if (load.getIndexSize == null) 0L else load.getIndexSize.toLong)
    +            if (null == load.getDataSize || null == load.getIndexSize) {
    +              // If either of datasize or indexsize comes to be null the we calculate the correct
    +              // size and assign
    +              val dataIndexSize = CarbonUtil.calculateDataIndexSize(carbonTable, false)
    --- End diff --
   
    Fixed.



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #3047: [CARBONDATA-3223] Fixed Wrong Datasize and Indexsize...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/3047
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2349/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #3047: [CARBONDATA-3223] Fixed Wrong Datasize and Indexsize...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishnalla1994 commented on the issue:

    https://github.com/apache/carbondata/pull/3047
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #3047: [CARBONDATA-3223] Fixed Wrong Datasize and Indexsize...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/3047
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2361/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #3047: [CARBONDATA-3223] Fixed Wrong Datasize and In...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user KanakaKumar commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/3047#discussion_r244980360
 
    --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/api/CarbonStore.scala ---
    @@ -101,14 +102,23 @@ object CarbonStore {
               val (dataSize, indexSize) = if (load.getFileFormat == FileFormat.ROW_V1) {
                 // for streaming segment, we should get the actual size from the index file
                 // since it is continuously inserting data
    -            val segmentDir = CarbonTablePath.getSegmentPath(tablePath, load.getLoadName)
    +            val segmentDir = CarbonTablePath
    +              .getSegmentPath(carbonTable.getTablePath, load.getLoadName)
                 val indexPath = CarbonTablePath.getCarbonStreamIndexFilePath(segmentDir)
                 val indices = StreamSegment.readIndexFile(indexPath, FileFactory.getFileType(indexPath))
                 (indices.asScala.map(_.getFile_size).sum, FileFactory.getCarbonFile(indexPath).getSize)
               } else {
                 // for batch segment, we can get the data size from table status file directly
    -            (if (load.getDataSize == null) 0L else load.getDataSize.toLong,
    -              if (load.getIndexSize == null) 0L else load.getIndexSize.toLong)
    +            if (null == load.getDataSize || null == load.getIndexSize) {
    +              // If either of datasize or indexsize comes to be null the we calculate the correct
    +              // size and assign
    +              val dataIndexSize = CarbonUtil.calculateDataIndexSize(carbonTable, true)
    --- End diff --
   
    Show segments is a read only query. I think we should not perform write operation in a query.
    So, I feel its better to calculate every time and show OR just display as not available.


---
12