Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #1097: [WIP] [CARBONDATA-1229] Skip dictionary and d...

Classic

List

65 messages Options

Options

1234

[GitHub] carbondata pull request #1097: [WIP] [CARBONDATA-1229] Skip dictionary and d...

Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1097#discussion_r124212817

--- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonGlobalDictionaryRDD.scala ---
@@ -442,7 +456,15 @@ class CarbonGlobalDictionaryGenerateRDD(
model.hdfsLocation,
dictionaryForDistinctValueLookUp,
distinctValues)
- sortIndexWriteTask.execute()
+ breakable {
+ if (!FileUtils
+ .validateTableExists(model.dictfolderPath + CarbonCommonConstants.FILE_SEPARATOR +
+ model.table.getTableId)) {
+ LOGGER.error(s"Table does not exists: ${ model.table.getTableUniqueName }")
+ break()
+ }
+ sortIndexWriteTask.execute()
+ }
--- End diff --

Remove file exist check from the executor code as it is a costly operation to check for file existence through RPC call to namenode for each column multiple times. Instead add this check in driver after global dictionary generation and delete the dictionary files created in case the table has been deleted.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1097: [WIP] [CARBONDATA-1229] Skip dictionary and d...

In reply to this post by qiuchenjian-2

Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1097#discussion_r124213032

--- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala ---
@@ -867,6 +868,11 @@ object GlobalDictionaryUtil {
columnSchema,
false
)
+ if (!FileUtils
+ .validateTableExists(carbonTablePath.getMetadataDirectoryPath + CarbonCommonConstants.FILE_SEPARATOR +
+ tableIdentifier.getTableId)) {
+ throw new Exception(s"Table does not exists: ${ tableIdentifier.getTableUniqueName }")
+ }
--- End diff --

Remove this check from executor and follow the same method to handle as mentioned in above comment

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1097: [WIP] [CARBONDATA-1229] Skip dictionary and d...

In reply to this post by qiuchenjian-2

Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1097#discussion_r124213243

--- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala ---
@@ -808,10 +808,10 @@ object CarbonDataRDDFactory {
}
else {
val newStatusMap = scala.collection.mutable.Map.empty[String, String]
- if (status.nonEmpty) {
- status.foreach { eachLoadStatus =>
- val state = newStatusMap.get(eachLoadStatus._1)
- state match {
+ if (status.nonEmpty) {
+ status.foreach { eachLoadStatus =>
+ val state = newStatusMap.get(eachLoadStatus._1)
+ state match {
--- End diff --

indentation change...revert it back to old...this change is not part of this PR

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1097: [WIP] [CARBONDATA-1229] Skip dictionary and d...

In reply to this post by qiuchenjian-2

Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1097#discussion_r124213355

--- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala ---
@@ -1099,15 +1099,18 @@ object CarbonDataRDDFactory {
result: Option[DictionaryServer], writeAll: Boolean) = {
// write dictionary file and shutdown dictionary server
val uniqueTableName: String = s"${ carbonLoadModel.getDatabaseName }_${
- carbonLoadModel.getTableName }"
+ carbonLoadModel.getTableName}"
+ val tableId = carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable.getCarbonTableIdentifier
+ .getTableId
result match {
case Some(server) =>
try {
if (writeAll) {
- server.writeDictionary()
+ server.writeDictionary(carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable)
}
else {
- server.writeTableDictionary(uniqueTableName)
+// server.writeTableDictionary(carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable)
--- End diff --

delete commented line

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1097: [WIP] [CARBONDATA-1229] Skip dictionary and d...

In reply to this post by qiuchenjian-2

Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1097#discussion_r124214779

--- Diff: integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala ---
@@ -640,7 +640,16 @@ object CarbonDataRDDFactory {
val nodes = DistributionUtil.ensureExecutorsByNumberAndGetNodeList(nodeNumOfData,
sqlContext.sparkContext)
val newRdd = new DataLoadCoalescedRDD[Row](rdd, nodes.toArray.distinct)
-
+ breakable{
+ if (!FileUtils
+ .validateTableExists(carbonTable.getAbsoluteTableIdentifier.getStorePath +
+ CarbonCommonConstants.FILE_SEPARATOR +
+ "MetaData" + CarbonCommonConstants.FILE_SEPARATOR +
+ carbonTable.getCarbonTableIdentifier.getTableId)) {
+ LOGGER.error(s"Table does not exists: ${carbonTable.getTableUniqueName}")
+ break()
+ }
+ }
--- End diff --

This check should be done only at the final commit point of an operation. Like for data load this should be done just before writing the table status file and after global dictionary generation.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1097: [WIP] [CARBONDATA-1229] Skip dictionary and d...

In reply to this post by qiuchenjian-2

Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1097#discussion_r124215370

--- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala ---
@@ -358,7 +358,17 @@ case class LoadTable(
LOGGER.audit(s"Data loading failed. table not found: $dbName.$tableName")
sys.error(s"Data loading failed. table not found: $dbName.$tableName")
}
-
+ LOGGER.info("----------"+relation.metaData.carbonTable.getAbsoluteTableIdentifier.getTablePath +
+ CarbonCommonConstants.FILE_SEPARATOR + "/Metadata/" +
+ relation.metaData.carbonTable.getCarbonTableIdentifier.getTableId)
+ FileFactory.createNewFile(
+ relation.metaData.carbonTable.getAbsoluteTableIdentifier.getTablePath +
+ CarbonCommonConstants.FILE_SEPARATOR + "/Metadata/" +
--- End diff --

get "/Metadata/" from a constant file

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1097: [WIP] [CARBONDATA-1229] Skip dictionary and d...

In reply to this post by qiuchenjian-2

Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1097#discussion_r124216604

--- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala ---
@@ -358,7 +358,17 @@ case class LoadTable(
LOGGER.audit(s"Data loading failed. table not found: $dbName.$tableName")
sys.error(s"Data loading failed. table not found: $dbName.$tableName")
}
-
+ LOGGER.info("----------"+relation.metaData.carbonTable.getAbsoluteTableIdentifier.getTablePath +
+ CarbonCommonConstants.FILE_SEPARATOR + "/Metadata/" +
+ relation.metaData.carbonTable.getCarbonTableIdentifier.getTableId)
+ FileFactory.createNewFile(
--- End diff --

Move this code of creating a table id file at the time of table creation

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1097: [WIP] [CARBONDATA-1229] Skip dictionary and d...

In reply to this post by qiuchenjian-2

Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1097#discussion_r124216697

--- Diff: processing/src/main/java/org/apache/carbondata/processing/datatypes/PrimitiveDataType.java ---
@@ -141,11 +147,12 @@ public PrimitiveDataType(String name, String parentname, String columnId,
dictionaryMessage.setColumnName(carbonDimension.getColName());
dictionaryMessage.setTableUniqueName(carbonTableIdentifier.getTableUniqueName());
// for table initialization
- dictionaryMessage.setType(DictionaryMessageType.TABLE_INTIALIZATION);
+// dictionaryMessage.setType(DictionaryMessageType.COLUMN_INITIALIZATION);
+ dictionaryMessage.setTableUniqueId(carbonTableIdentifier.getTableId());
dictionaryMessage.setData("0");
- if (tableInitialize) {
- client.getDictionary(dictionaryMessage);
- }
+// if (tableInitialize) {
+// client.getDictionary(dictionaryMessage);
+// }
--- End diff --

Delete commented lines

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1097: [WIP] [CARBONDATA-1229] Skip dictionary and d...

In reply to this post by qiuchenjian-2

Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1097#discussion_r124217513

--- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/AbstractDataLoadProcessorStep.java ---
@@ -95,6 +96,18 @@ public AbstractDataLoadProcessorStep(CarbonDataLoadConfiguration configuration,
* @throws CarbonDataLoadingException
*/
public Iterator<CarbonRowBatch>[] execute() throws CarbonDataLoadingException {
+ try {
+ if (!FileFactory.isFileExist(
+ configuration.getTableIdentifier().getTablePath() + "/Metadata/" + configuration.getTableIdentifier().getCarbonTableIdentifier().getTableId(), FileFactory.getFileType(
+ configuration.getTableIdentifier().getTablePath() + "/Metadata/" + configuration.getTableIdentifier().getCarbonTableIdentifier().getTableId()))) {
+ LOGGER.error("Table does not exists: " + configuration.getTableIdentifier().getCarbonTableIdentifier().getTableUniqueName());
+ throw new CarbonDataLoadingException("Table does not exists: " + configuration.getTableIdentifier().getCarbonTableIdentifier().getTableUniqueName());
+ }
+ } catch (CarbonDataLoadingException e) {
+ throw e;
+ } catch (IOException e) {
+
+ }
--- End diff --

This code is not required in executor. Handle in driver

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1097: [WIP] [CARBONDATA-1229] Skip dictionary and d...

In reply to this post by qiuchenjian-2

Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1097#discussion_r124217603

--- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/converter/impl/DictionaryFieldConverterImpl.java ---
@@ -80,11 +86,12 @@ public DictionaryFieldConverterImpl(DataField dataField,
dictionaryMessage.setColumnName(dataField.getColumn().getColName());
dictionaryMessage.setTableUniqueName(carbonTableIdentifier.getTableUniqueName());
// for table initialization
- dictionaryMessage.setType(DictionaryMessageType.TABLE_INTIALIZATION);
+// dictionaryMessage.setType(DictionaryMessageType.COLUMN_INITIALIZATION);
+ dictionaryMessage.setTableUniqueId(carbonTableIdentifier.getTableId());
dictionaryMessage.setData("0");
- if (tableInitialize) {
- client.getDictionary(dictionaryMessage);
- }
+// if (tableInitialize) {
+// client.getDictionary(dictionaryMessage);
+// }
--- End diff --

Remove commented lines

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1097: [WIP] [CARBONDATA-1229] Skip dictionary and d...

In reply to this post by qiuchenjian-2

Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1097#discussion_r124217843

--- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/steps/DataWriterProcessorStepImpl.java ---
@@ -83,10 +87,20 @@ public CarbonFactDataHandlerModel getDataHandlerModel(int partitionId) {
return model;
}

- @Override public Iterator<CarbonRowBatch>[] execute() throws CarbonDataLoadingException {
- Iterator<CarbonRowBatch>[] iterators = child.execute();
+ @Override public Iterator<CarbonRowBatch>[] execute() throws CarbonDataLoadingException{
CarbonTableIdentifier tableIdentifier =
configuration.getTableIdentifier().getCarbonTableIdentifier();
+ try {
+ if (!FileFactory.isFileExist(
+ configuration.getTableIdentifier().getTablePath() + "/Metadata/" + tableIdentifier.getTableId(), FileFactory.getFileType(
+ configuration.getTableIdentifier().getTablePath() + "/Metadata/" + tableIdentifier.getTableId()))) {
+ LOGGER.error("Table does not exists: " + tableIdentifier.getTableUniqueName());
+ return null;
+ }
+ } catch (IOException e) {
+
+ }
+ Iterator<CarbonRowBatch>[] iterators = child.execute();
--- End diff --

Remove this check as it is not required in executor

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1097: [WIP] [CARBONDATA-1229] Skip dictionary and data wri...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1097

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/2733/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1097: [WIP] [CARBONDATA-1229] Skip dictionary and data wri...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1097

Build Failed with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/158/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1097: [WIP] [CARBONDATA-1229] Skip dictionary and data wri...

In reply to this post by qiuchenjian-2

Github user asfgit commented on the issue:

https://github.com/apache/carbondata/pull/1097

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/carbondata-pr-spark-1.6/661/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1097: [WIP] [CARBONDATA-1229] Skip dictionary and data wri...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1097

Build Failed with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/165/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1097: [WIP] [CARBONDATA-1229] Skip dictionary and data wri...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1097

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/2741/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1097: [WIP] [CARBONDATA-1229] Skip dictionary and data wri...

In reply to this post by qiuchenjian-2

Github user asfgit commented on the issue:

https://github.com/apache/carbondata/pull/1097

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/carbondata-pr-spark-1.6/670/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1097: [WIP] [CARBONDATA-1229] Skip dictionary and data wri...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1097

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/2747/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1097: [WIP] [CARBONDATA-1229] Skip dictionary and data wri...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1097

Build Failed with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/169/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1097: [WIP] [CARBONDATA-1229] Skip dictionary and data wri...

In reply to this post by qiuchenjian-2

Github user asfgit commented on the issue:

https://github.com/apache/carbondata/pull/1097

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/carbondata-pr-spark-1.6/676/

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

1234