Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #1660: [CARBONDATA-1731] [BugFix] Update fails incor...

Classic

List

32 messages Options

Options

12

[GitHub] carbondata issue #1660: [CARBONDATA-1731,CARBONDATA-1728] [BugFix] Update fa...

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/1660

@anubhav100 @sounakr you guys also can use my example script to reproduce. This example simulate 7500000 data, can reproduce 1728, and this pr also can fix this issue. please @sounakr double check it again.

@anubhav100 i still have some queries, why need append "return true" after "blockletDetails.get(index).addDeletedRows(blocklet.getDeletedRows());" ?
---------------------------------------------------------------------------------------
package org.apache.carbondata.examples

import java.io.File
import java.text.SimpleDateFormat

import org.apache.spark.sql.SaveMode
import org.apache.spark.sql.SparkSession

import org.apache.carbondata.core.constants.CarbonCommonConstants
import org.apache.carbondata.core.util.CarbonProperties

object DataUpdateDeleteExample {

def main(args: Array[String]) {

// for local files
val rootPath = new File(this.getClass.getResource("/").getPath
+ "../../../..").getCanonicalPath
// for hdfs files
// var rootPath = "hdfs://hdfs-host/carbon"

var storeLocation = s"$rootPath/examples/spark2/target/store"
var warehouse = s"$rootPath/examples/spark2/target/warehouse"
var metastoredb = s"$rootPath/examples/spark2/target"

import org.apache.spark.sql.CarbonSession._
val spark = SparkSession
.builder()
.master("local")
.appName("DataUpdateDeleteExample")
.config("spark.sql.warehouse.dir", warehouse)
.config("spark.driver.host", "localhost")
.config("spark.sql.crossJoin.enabled", "true")
.getOrCreateCarbonSession(storeLocation)
spark.sparkContext.setLogLevel("WARN")

// Specify date format based on raw data
CarbonProperties.getInstance()
.addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "yyyy-MM-dd")

import spark.implicits._
// Drop table
spark.sql("DROP TABLE IF EXISTS t3")

// Simulate data and write to table t3
var sdf = new SimpleDateFormat("yyyy-MM-dd")
var df = spark.sparkContext.parallelize(1 to 7500000)
.map(x => (x, new java.sql.Date(sdf.parse("2015-07-" + (x % 10 + 10)).getTime),
"china", "aaa" + x, "phone" + 555 * x, "ASD" + (60000 + x), 14999 + x))
.toDF("t3_id", "t3_date", "t3_country", "t3_name",
"t3_phonetype", "t3_serialname", "t3_salary")
df.write
.format("carbondata")
.option("tableName", "t3")
.option("tempCSV", "true")
.option("compress", "true")
.mode(SaveMode.Overwrite)
.save()

// Query data again after the above update
spark.sql("""
SELECT * FROM t3 ORDER BY t3_id
""").show()

spark.sql("delete from t3 where exists (select 1 from t3)").show()
spark.sql("""
SELECT count(*) FROM t3
""").show()

// Drop table
spark.sql("DROP TABLE IF EXISTS t3")

spark.stop()
}
}

---

[GitHub] carbondata issue #1660: [CARBONDATA-1731,CARBONDATA-1728] [BugFix] Update fa...

In reply to this post by qiuchenjian-2

Github user anubhav100 commented on the issue:

https://github.com/apache/carbondata/pull/1660

@chenliang reason that i return true is that when blockletDetails.get(index).addDeletedRows(blocklet.getDeletedRows())

it was adding the same row again to deletedrows treeset in DeleteDeltaBlockletDetail class and we are validating the result of whether treeset can add the rows in it or not if these rows are duplicated it will not add them but will return me false,so what i analyze that is it is not required to check whether this treeset can add row again or not

because this check can also validate whether row is added to deleted rows treeset or not
blocklet.addDeletedRow(CarbonUpdateUtil.getIntegerValue(offset));

so what i do is IsRowAddedForDeletion is true that means deletion is succesfull and if that blocklet is there in blockletdetails simply add the deleted rows to deletedrows treeset in DeleteDeltaBlockletDetails Class even if it is duplicated treeset will not add it
but what
@sounakr said is correct if it is adding the same row again that means root cause is that somehow split has choosen duplicate blocks which is correct when i further debug the code if found out that in my table i have 15 lakh rows after 1408000 rows same block is picked up again by the splits which is not correct i am looking to debug it more and seeing why it happend

---

[GitHub] carbondata issue #1660: [CARBONDATA-1731,CARBONDATA-1728] [BugFix] Update fa...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1660

Build Failed with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/943/

---

[GitHub] carbondata issue #1660: [CARBONDATA-1731,CARBONDATA-1728] [BugFix] Update fa...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1660

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2172/

---

[GitHub] carbondata issue #1660: [CARBONDATA-1731,CARBONDATA-1728] [BugFix] Update fa...

In reply to this post by qiuchenjian-2

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/1660

retest this please

---

[GitHub] carbondata issue #1660: [CARBONDATA-1731,CARBONDATA-1728] [BugFix] Update fa...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1660

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2203/

---

[GitHub] carbondata issue #1660: [CARBONDATA-1731,CARBONDATA-1728] [BugFix] Update fa...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1660

Build Failed with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/980/

---

[GitHub] carbondata issue #1660: [CARBONDATA-1731,CARBONDATA-1728] [BugFix] Update fa...

In reply to this post by qiuchenjian-2

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/1660

retest this please

---

[GitHub] carbondata issue #1660: [CARBONDATA-1731,CARBONDATA-1728] [BugFix] Update fa...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1660

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2214/

---

[GitHub] carbondata issue #1660: [CARBONDATA-1731,CARBONDATA-1728] [BugFix] Update fa...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1660

Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/992/

---

[GitHub] carbondata issue #1660: [CARBONDATA-1731,CARBONDATA-1728] [BugFix] Update fa...

In reply to this post by qiuchenjian-2

Github user anubhav100 commented on the issue:

https://github.com/apache/carbondata/pull/1660

we have find the root cause and raised another pr for this issue https://github.com/apache/carbondata/pull/1719

---

[GitHub] carbondata pull request #1660: [CARBONDATA-1731,CARBONDATA-1728] [BugFix] Up...

In reply to this post by qiuchenjian-2

Github user anubhav100 closed the pull request at:

https://github.com/apache/carbondata/pull/1660

---

12