Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] incubator-carbondata pull request #614: [CarbonData-714]Documented how to ha...

Classic

List

11 messages Options

Options

[GitHub] incubator-carbondata pull request #614: [CarbonData-714]Documented how to ha...

GitHub user PallaviSingh1992 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/614

[CarbonData-714]Documented how to handle bad records

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/PallaviSingh1992/incubator-carbondata feature/CarbonData-714

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/614.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #614

----

----

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #614: [CARBONDATA-714]Documented how to handle ba...

Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/614

Build Failed with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/973/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #614: [CARBONDATA-714]Documented how to handle ba...

In reply to this post by qiuchenjian-2

Github user chenliang613 commented on the issue:

https://github.com/apache/incubator-carbondata/pull/614

retest this please

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #614: [CARBONDATA-714]Documented how to handle ba...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/614

Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/983/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #614: [CARBONDATA-714]Documented how to handle ba...

In reply to this post by qiuchenjian-2

Github user PallaviSingh1992 commented on the issue:

https://github.com/apache/incubator-carbondata/pull/614

@chenliang613 please review

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #614: [CARBONDATA-714]Documented how to ha...

In reply to this post by qiuchenjian-2

Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/614#discussion_r104547671

--- Diff: docs/faq.md ---
@@ -18,30 +18,57 @@
-->

# FAQs
-* **Auto Compaction not Working**

- The Property carbon.enable.auto.load.merge in carbon.properties need to be set to true.
+* [What are Bad Records?](#what-are-bad-records)
+* [Where are Bad Records Stored in CarbonData?](#where-are-bad-records-stored-in-carbondata)
+* [How to handle Bad Records?](#how-to-handle-bad-records)
+* [How to resolve store location canât be found?](#how-to-resolve-store-location-can-not-be-found)
+* [What is Carbon Lock Type?](#what-is-carbon-lock-type)
+* [How to resolve Abstract Method Error?](#how-to-resolve-abstract-method-error)

-* **Getting Abstract method error**
+## What are Bad Records?
+Records that fail to get loaded into the CarbonData due to data type incompatibility or are empty or have incompatible format are classified as Bad Records.

- You need to specify the spark version while using Maven to build project.
+## Where are Bad Records Stored in CarbonData?
+The bad records are stored at the location set in carbon.badRecords.location in carbon.properties file.
+By default **carbon.badRecords.location** specifies the following location ``/opt/Carbon/Spark/badrecords``.

-* **Getting NotImplementedException for subquery using IN and EXISTS**
+## How to handle Bad Records?
+While loading data we can specify the approach to handle Bad Records. In order to analyse the cause of the Bad Records the parameter ``BAD_RECORDS_LOGGER_ENABLE`` must be set to value ``TRUE``. There are three approaches to handle Bad Records which can be specified by the parameter ``BAD_RECORDS_ACTION``.

- Subquery with in and exists not supported in CarbonData.
-
-* **Getting Exceptions on creating a view**
-
- View not supported in CarbonData.
-
-* **How to verify if ColumnGroups have been created as desired.**
+- To pad the incorrect values of the csv rows with NULL value and load the data in CarbonData, set the following in the query :
+```
+'BAD_RECORDS_ACTION'='FORCE'
+```
+
+- To write the Bad Records without padding incorrect values with NULL in the raw csv (set in the parameter **carbon.badRecords.location**), set the following in the query :
+```
+'BAD_RECORDS_ACTION'='REDIRECT'
+```
+
+- To ignore the Bad Records from getting stored in the raw csv, we need to set the following in the query :
+```
+'BAD_RECORDS_ACTION'='INDIRECT'
+```
+
+## How to resolve store location can not be found?
+The store location specified while creating carbon session is used by the CarbonData to store the meta data like the schema, dictionary files, dictionary meta data and sort indexes.
+
+Try creating ``carbonsession`` with ``storepath`` specified in the following manner :
+```
+val carbon = SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession(<store_path>)
+```
+Example:
+```
+val carbon = SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession("hdfs://localhost:9000/carbon/store ")
+```
+
+## What is Carbon Lock Type?
--- End diff --

For users, which scenario need to set this parameter for lock?

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #614: [CARBONDATA-714]Documented how to ha...

In reply to this post by qiuchenjian-2

Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/614#discussion_r104548399

--- Diff: docs/faq.md ---
@@ -18,30 +18,57 @@
-->

# FAQs
-* **Auto Compaction not Working**

- The Property carbon.enable.auto.load.merge in carbon.properties need to be set to true.
+* [What are Bad Records?](#what-are-bad-records)
+* [Where are Bad Records Stored in CarbonData?](#where-are-bad-records-stored-in-carbondata)
+* [How to handle Bad Records?](#how-to-handle-bad-records)
+* [How to resolve store location canât be found?](#how-to-resolve-store-location-can-not-be-found)
+* [What is Carbon Lock Type?](#what-is-carbon-lock-type)
+* [How to resolve Abstract Method Error?](#how-to-resolve-abstract-method-error)

-* **Getting Abstract method error**
+## What are Bad Records?
+Records that fail to get loaded into the CarbonData due to data type incompatibility or are empty or have incompatible format are classified as Bad Records.

- You need to specify the spark version while using Maven to build project.
+## Where are Bad Records Stored in CarbonData?
+The bad records are stored at the location set in carbon.badRecords.location in carbon.properties file.
+By default **carbon.badRecords.location** specifies the following location ``/opt/Carbon/Spark/badrecords``.

-* **Getting NotImplementedException for subquery using IN and EXISTS**
+## How to handle Bad Records?
+While loading data we can specify the approach to handle Bad Records. In order to analyse the cause of the Bad Records the parameter ``BAD_RECORDS_LOGGER_ENABLE`` must be set to value ``TRUE``. There are three approaches to handle Bad Records which can be specified by the parameter ``BAD_RECORDS_ACTION``.

- Subquery with in and exists not supported in CarbonData.
-
-* **Getting Exceptions on creating a view**
-
- View not supported in CarbonData.
-
-* **How to verify if ColumnGroups have been created as desired.**
+- To pad the incorrect values of the csv rows with NULL value and load the data in CarbonData, set the following in the query :
+```
+'BAD_RECORDS_ACTION'='FORCE'
+```
+
+- To write the Bad Records without padding incorrect values with NULL in the raw csv (set in the parameter **carbon.badRecords.location**), set the following in the query :
+```
+'BAD_RECORDS_ACTION'='REDIRECT'
+```
+
+- To ignore the Bad Records from getting stored in the raw csv, we need to set the following in the query :
+```
+'BAD_RECORDS_ACTION'='INDIRECT'
+```
+
+## How to resolve store location can not be found?
--- End diff --

Seems the title should be : How to specify storelocation while creating carbonsession

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #614: [CARBONDATA-714]Documented how to ha...

In reply to this post by qiuchenjian-2

Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/614#discussion_r104550787

--- Diff: docs/faq.md ---
@@ -18,30 +18,57 @@
-->

# FAQs
-* **Auto Compaction not Working**

- The Property carbon.enable.auto.load.merge in carbon.properties need to be set to true.
+* [What are Bad Records?](#what-are-bad-records)
+* [Where are Bad Records Stored in CarbonData?](#where-are-bad-records-stored-in-carbondata)
+* [How to handle Bad Records?](#how-to-handle-bad-records)
+* [How to resolve store location canât be found?](#how-to-resolve-store-location-can-not-be-found)
+* [What is Carbon Lock Type?](#what-is-carbon-lock-type)
+* [How to resolve Abstract Method Error?](#how-to-resolve-abstract-method-error)

-* **Getting Abstract method error**
+## What are Bad Records?
+Records that fail to get loaded into the CarbonData due to data type incompatibility or are empty or have incompatible format are classified as Bad Records.

- You need to specify the spark version while using Maven to build project.
+## Where are Bad Records Stored in CarbonData?
+The bad records are stored at the location set in carbon.badRecords.location in carbon.properties file.
+By default **carbon.badRecords.location** specifies the following location ``/opt/Carbon/Spark/badrecords``.

-* **Getting NotImplementedException for subquery using IN and EXISTS**
+## How to handle Bad Records?
--- End diff --

This is "how to enable bad record logging".

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #614: [CARBONDATA-714]Documented how to ha...

In reply to this post by qiuchenjian-2

Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/614#discussion_r104550978

--- Diff: docs/faq.md ---
@@ -18,30 +18,57 @@
-->

# FAQs
-* **Auto Compaction not Working**

- The Property carbon.enable.auto.load.merge in carbon.properties need to be set to true.
+* [What are Bad Records?](#what-are-bad-records)
+* [Where are Bad Records Stored in CarbonData?](#where-are-bad-records-stored-in-carbondata)
+* [How to handle Bad Records?](#how-to-handle-bad-records)
+* [How to resolve store location canât be found?](#how-to-resolve-store-location-can-not-be-found)
+* [What is Carbon Lock Type?](#what-is-carbon-lock-type)
+* [How to resolve Abstract Method Error?](#how-to-resolve-abstract-method-error)

-* **Getting Abstract method error**
+## What are Bad Records?
+Records that fail to get loaded into the CarbonData due to data type incompatibility or are empty or have incompatible format are classified as Bad Records.

- You need to specify the spark version while using Maven to build project.
+## Where are Bad Records Stored in CarbonData?
+The bad records are stored at the location set in carbon.badRecords.location in carbon.properties file.
+By default **carbon.badRecords.location** specifies the following location ``/opt/Carbon/Spark/badrecords``.

-* **Getting NotImplementedException for subquery using IN and EXISTS**
+## How to handle Bad Records?
+While loading data we can specify the approach to handle Bad Records. In order to analyse the cause of the Bad Records the parameter ``BAD_RECORDS_LOGGER_ENABLE`` must be set to value ``TRUE``. There are three approaches to handle Bad Records which can be specified by the parameter ``BAD_RECORDS_ACTION``.

- Subquery with in and exists not supported in CarbonData.
-
-* **Getting Exceptions on creating a view**
-
- View not supported in CarbonData.
-
-* **How to verify if ColumnGroups have been created as desired.**
+- To pad the incorrect values of the csv rows with NULL value and load the data in CarbonData, set the following in the query :
+```
+'BAD_RECORDS_ACTION'='FORCE'
+```
--- End diff --

Please add "How to ignore the bad records" ?
Please find the detail discussion at here : http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/data-lost-when-loading-data-from-csv-file-to-carbon-table-td7554.html

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #614: [CARBONDATA-714]Documented how to handle ba...

In reply to this post by qiuchenjian-2

Github user chenliang613 commented on the issue:

https://github.com/apache/incubator-carbondata/pull/614

LGTM

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #614: [CARBONDATA-714]Documented how to ha...

In reply to this post by qiuchenjian-2

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/614

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---