[GitHub] carbondata pull request #1660: [CARBONDATA-1731] [BugFix] Update fails incor...

classic Classic list List threaded Threaded
32 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1660: [CARBONDATA-1731,CARBONDATA-1728] [BugFix] Update fa...

qiuchenjian-2
Github user chenliang613 commented on the issue:

    https://github.com/apache/carbondata/pull/1660
 
    @anubhav100 @sounakr  you guys also can use my example script to reproduce. This example simulate 7500000 data, can reproduce 1728, and this pr also can fix this issue.  please @sounakr  double check it again.
   
    @anubhav100  i still have some queries, why need append "return true" after "blockletDetails.get(index).addDeletedRows(blocklet.getDeletedRows());" ?
    ---------------------------------------------------------------------------------------
    package org.apache.carbondata.examples
   
    import java.io.File
    import java.text.SimpleDateFormat
   
    import org.apache.spark.sql.SaveMode
    import org.apache.spark.sql.SparkSession
   
    import org.apache.carbondata.core.constants.CarbonCommonConstants
    import org.apache.carbondata.core.util.CarbonProperties
   
    object DataUpdateDeleteExample {
   
      def main(args: Array[String]) {
   
        // for local files
        val rootPath = new File(this.getClass.getResource("/").getPath
          + "../../../..").getCanonicalPath
        // for hdfs files
        // var rootPath = "hdfs://hdfs-host/carbon"
   
        var storeLocation = s"$rootPath/examples/spark2/target/store"
        var warehouse = s"$rootPath/examples/spark2/target/warehouse"
        var metastoredb = s"$rootPath/examples/spark2/target"
   
        import org.apache.spark.sql.CarbonSession._
        val spark = SparkSession
          .builder()
          .master("local")
          .appName("DataUpdateDeleteExample")
          .config("spark.sql.warehouse.dir", warehouse)
          .config("spark.driver.host", "localhost")
          .config("spark.sql.crossJoin.enabled", "true")
          .getOrCreateCarbonSession(storeLocation)
        spark.sparkContext.setLogLevel("WARN")
   
        // Specify date format based on raw data
        CarbonProperties.getInstance()
          .addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "yyyy-MM-dd")
   
        import spark.implicits._
        // Drop table
        spark.sql("DROP TABLE IF EXISTS t3")
   
   
         // Simulate data and write to table t3
        var sdf = new SimpleDateFormat("yyyy-MM-dd")
        var df = spark.sparkContext.parallelize(1 to 7500000)
          .map(x => (x, new java.sql.Date(sdf.parse("2015-07-" + (x % 10 + 10)).getTime),
            "china", "aaa" + x, "phone" + 555 * x, "ASD" + (60000 + x), 14999 + x))
          .toDF("t3_id", "t3_date", "t3_country", "t3_name",
              "t3_phonetype", "t3_serialname", "t3_salary")
        df.write
          .format("carbondata")
          .option("tableName", "t3")
          .option("tempCSV", "true")
          .option("compress", "true")
          .mode(SaveMode.Overwrite)
          .save()
   
        // Query data again after the above update
        spark.sql("""
               SELECT * FROM t3 ORDER BY t3_id
               """).show()
   
   
        spark.sql("delete from t3 where exists (select 1 from t3)").show()
        spark.sql("""
               SELECT count(*) FROM t3
               """).show()
   
        // Drop table
        spark.sql("DROP TABLE IF EXISTS t3")
   
        spark.stop()
      }
    }


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1660: [CARBONDATA-1731,CARBONDATA-1728] [BugFix] Update fa...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user anubhav100 commented on the issue:

    https://github.com/apache/carbondata/pull/1660
 
    @chenliang reason that i return true is that when blockletDetails.get(index).addDeletedRows(blocklet.getDeletedRows())
   
    it was adding the same row again to deletedrows treeset in   DeleteDeltaBlockletDetail class and we are validating the result of whether treeset can add the rows in it or not if these rows are duplicated it will not add them but will return me false,so what i analyze that is it is not required to check whether this treeset can add row again or not
   
    because this check can also validate whether row is added to deleted rows treeset or not
    blocklet.addDeletedRow(CarbonUpdateUtil.getIntegerValue(offset));
   
    so what i do is  IsRowAddedForDeletion is true that means deletion is succesfull and if that blocklet is  there  in blockletdetails simply add the deleted rows to  deletedrows treeset in DeleteDeltaBlockletDetails Class even if it is duplicated treeset will not add it
    but what
    @sounakr said is correct if it is adding the same row again that means root cause is that somehow split has choosen duplicate blocks which is correct when i further debug the code if found out that in my table i have 15 lakh rows after 1408000 rows same block is picked up again by the splits which is not correct i am looking to debug it more and seeing why it happend


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1660: [CARBONDATA-1731,CARBONDATA-1728] [BugFix] Update fa...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1660
 
    Build Failed with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/943/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1660: [CARBONDATA-1731,CARBONDATA-1728] [BugFix] Update fa...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1660
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2172/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1660: [CARBONDATA-1731,CARBONDATA-1728] [BugFix] Update fa...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user chenliang613 commented on the issue:

    https://github.com/apache/carbondata/pull/1660
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1660: [CARBONDATA-1731,CARBONDATA-1728] [BugFix] Update fa...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1660
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2203/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1660: [CARBONDATA-1731,CARBONDATA-1728] [BugFix] Update fa...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1660
 
    Build Failed with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/980/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1660: [CARBONDATA-1731,CARBONDATA-1728] [BugFix] Update fa...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user chenliang613 commented on the issue:

    https://github.com/apache/carbondata/pull/1660
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1660: [CARBONDATA-1731,CARBONDATA-1728] [BugFix] Update fa...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1660
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2214/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1660: [CARBONDATA-1731,CARBONDATA-1728] [BugFix] Update fa...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1660
 
    Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/992/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1660: [CARBONDATA-1731,CARBONDATA-1728] [BugFix] Update fa...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user anubhav100 commented on the issue:

    https://github.com/apache/carbondata/pull/1660
 
    we have find the root cause and raised another pr for this issue https://github.com/apache/carbondata/pull/1719


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1660: [CARBONDATA-1731,CARBONDATA-1728] [BugFix] Up...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user anubhav100 closed the pull request at:

    https://github.com/apache/carbondata/pull/1660


---
12