Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support

Classic

List

65 messages Options

Options

1234

[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support

GitHub user manishnalla1994 opened a pull request:

https://github.com/apache/carbondata/pull/2980

[CARBONDATA-3017] Map DDL Support

Support Create DDL for Map type.
This PR is dependant on PR#2979 for the change of delimiters.

Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:

- [ ] Any interfaces changed?

- [ ] Any backward compatibility impacted?

- [ ] Document update required?

- [x] Testing done
Please provide details on
- Whether new unit test cases have been added or why no new tests are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance test report.
- Any additional information to help reviewers in testing this change.

- [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/manishnalla1994/carbondata MapDDL5Dec

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2980.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2980

----
commit 322b52a64c840317a6905664e8de16327e3635e0
Author: Manish Nalla <manishnalla1994@...>
Date: 2018-10-16T09:48:08Z

MapDDLSupport

commit 3d119888a80e7d8f9cab59e477984b56af1309f6
Author: manishnalla1994 <manish.nalla1994@...>
Date: 2018-12-07T08:18:31Z

Added Testcases and Local Dict Support

commit 5fe06801360fc04bab9c1239ea8d007f37bc69d4
Author: manishnalla1994 <manish.nalla1994@...>
Date: 2018-12-07T13:28:54Z

Test Files for Map

commit 4cc8ba13b234a13b9a3cef541e37f492153e7d1b
Author: manishnalla1994 <manish.nalla1994@...>
Date: 2018-12-07T14:44:12Z

Changed TestCases and Supported 2 new delimiters

----

---

[GitHub] carbondata issue #2980: [CARBONDATA-3017] Map DDL Support

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2980

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1657/

---

[GitHub] carbondata issue #2980: [CARBONDATA-3017] Map DDL Support

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2980

Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9917/

---

[GitHub] carbondata issue #2980: [CARBONDATA-3017] Map DDL Support

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2980

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1869/

---

[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support

In reply to this post by qiuchenjian-2

Github user qiuchenjian commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2980#discussion_r239991055

--- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/parser/impl/RowParserImpl.java ---
@@ -34,8 +37,12 @@
private int numberOfColumns;

public RowParserImpl(DataField[] output, CarbonDataLoadConfiguration configuration) {
- String[] complexDelimiters =
+ String[] tempComplexDelimiters =
(String[]) configuration.getDataLoadProperty(DataLoadProcessorConstants.COMPLEX_DELIMITERS);
+ Queue<String> complexDelimiters = new LinkedList<>();
+ for (int i = 0; i < 4; i++) {
--- End diff --

âi < 4â the 4 is not clearï¼ it's better to exchage is by constant

---

[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support

In reply to this post by qiuchenjian-2

Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2980#discussion_r240147303

--- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateDDLForComplexMapType.scala ---
@@ -0,0 +1,452 @@
+/*
+
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+ *
+ http://www.apache.org/licenses/LICENSE-2.0
+ *
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ */
+package org.apache.carbondata.spark.testsuite.createTable.TestCreateDDLForComplexMapType
+
+import java.io.File
+import java.util
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.spark.sql.{AnalysisException, Row}
+import org.apache.spark.sql.test.util.QueryTest
+import org.scalatest.BeforeAndAfterAll
+
+import org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk
+
+class TestCreateDDLForComplexMapType extends QueryTest with BeforeAndAfterAll {
+ private val conf: Configuration = new Configuration(false)
+
+ val rootPath = new File(this.getClass.getResource("/").getPath
+ + "../../../..").getCanonicalPath
+
+ val path = s"$rootPath/examples/spark2/src/main/resources/maptest2.csv"
+
+ private def checkForLocalDictionary(dimensionRawColumnChunks: util
+ .List[DimensionRawColumnChunk]): Boolean = {
+ var isLocalDictionaryGenerated = false
+ import scala.collection.JavaConversions._
+ for (dimensionRawColumnChunk <- dimensionRawColumnChunks) {
+ if (dimensionRawColumnChunk.getDataChunkV3
+ .isSetLocal_dictionary) {
+ isLocalDictionaryGenerated = true
+ }
--- End diff --

You can directly use scala filter operation to check for local dictionary

---

[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support

In reply to this post by qiuchenjian-2

Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2980#discussion_r240151744

--- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/model/LoadOption.java ---
@@ -119,6 +119,10 @@
"complex_delimiter_level_2",
Maps.getOrDefault(options, "complex_delimiter_level_2", ":"));

+ optionsFinal.put(
+ "complex_delimiter_level_3",
+ Maps.getOrDefault(options, "complex_delimiter_level_3", "\\\\003"));
+
--- End diff --

Remove hardcoding

---

[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support

In reply to this post by qiuchenjian-2

Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2980#discussion_r240150644

--- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonLoadDataCommand.scala ---
@@ -188,11 +188,13 @@ case class CarbonLoadDataCommand(
val carbonLoadModel = new CarbonLoadModel()
val tableProperties = table.getTableInfo.getFactTable.getTableProperties
val optionsFinal = LoadOption.fillOptionWithDefaultValue(options.asJava)
+ // These two delimiters are non configurable and hardcoded for map type
+ optionsFinal.put("complex_delimiter_level_3", "\003")
+ optionsFinal.put("complex_delimiter_level_4", "\004")
--- End diff --

Remove hardcoded values. Create one `ComplexDelimeterEnum` and use the values from there

---

[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support

In reply to this post by qiuchenjian-2

Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2980#discussion_r240152242

--- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/parser/CarbonParserFactory.java ---
@@ -51,23 +54,37 @@ public static GenericParser createParser(CarbonColumn carbonColumn, String[] com
* delimiters
* @return GenericParser
*/
- private static GenericParser createParser(CarbonColumn carbonColumn, String[] complexDelimiters,
+ private static GenericParser createParser(CarbonColumn carbonColumn,
+ Queue<String> complexDelimiters,
String nullFormat, int depth) {
+ if (depth > 2) {
+ return null;
--- End diff --

I think we should throw exception with proper error message if depth is more than 2

---

[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support

In reply to this post by qiuchenjian-2

Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2980#discussion_r240151175

--- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/DataLoadProcessBuilder.java ---
@@ -222,8 +222,8 @@ public static CarbonDataLoadConfiguration createConfiguration(CarbonLoadModel lo
configuration.setSegmentId(loadModel.getSegmentId());
configuration.setTaskNo(loadModel.getTaskNo());
configuration.setDataLoadProperty(DataLoadProcessorConstants.COMPLEX_DELIMITERS,
- new String[] { loadModel.getComplexDelimiterLevel1(),
- loadModel.getComplexDelimiterLevel2() });
+ new String[] { loadModel.getComplexDelimiterLevel1(), loadModel.getComplexDelimiterLevel2(),
+ loadModel.getComplexDelimiterLevel3(), loadModel.getComplexDelimiterLevel4() });
--- End diff --

Instead of adding these many delimeters methods in loadmodel, create a list or array of complex delimeters in loadmodel and then use it here

---

[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support

In reply to this post by qiuchenjian-2

Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2980#discussion_r240145684

--- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableOutputFormat.java ---
@@ -338,12 +338,15 @@ public static CarbonLoadModel getLoadModel(Configuration conf) throws IOExceptio
SKIP_EMPTY_LINE,
carbonProperty.getProperty(CarbonLoadOptionConstants.CARBON_OPTIONS_SKIP_EMPTY_LINE)));

- String complexDelim = conf.get(COMPLEX_DELIMITERS, "$" + "," + ":");
+ String complexDelim = conf.get(COMPLEX_DELIMITERS, "$" + "," + ":" + "," + "\\\\003");
String[] split = complexDelim.split(",");
model.setComplexDelimiterLevel1(split[0]);
if (split.length > 1) {
model.setComplexDelimiterLevel2(split[1]);
}
+ if (split.length > 2) {
+ model.setComplexDelimiterLevel3(split[2]);
+ }
--- End diff --

Modify the above code as below
`if (split.length > 2) {
model.setComplexDelimiterLevel2(split[1]);
model.setComplexDelimiterLevel3(split[2]);
} else if (split.length > 1) {
model.setComplexDelimiterLevel2(split[1]);
}`

---

[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support

In reply to this post by qiuchenjian-2

Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2980#discussion_r240126484

--- Diff: examples/spark2/src/main/resources/maptest2.csv ---
@@ -0,0 +1,2 @@
+1\002Nalla\0012\002Singh\0011\002Gupta\0014\002Kumar
+10\002Nallaa\00120\002Sissngh\001100\002Gusspta\00140\002Kumar
--- End diff --

Move the above newly added files to spark-common-test

---

[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support

In reply to this post by qiuchenjian-2

Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2980#discussion_r240153125

--- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/parser/impl/RowParserImpl.java ---
@@ -34,8 +37,12 @@
private int numberOfColumns;

public RowParserImpl(DataField[] output, CarbonDataLoadConfiguration configuration) {
- String[] complexDelimiters =
+ String[] tempComplexDelimiters =
(String[]) configuration.getDataLoadProperty(DataLoadProcessorConstants.COMPLEX_DELIMITERS);
+ Queue<String> complexDelimiters = new LinkedList<>();
+ for (int i = 0; i < 4; i++) {
--- End diff --

Avoid using hardcoded values like `4` in the loops while coding....instead iterate over tempComplexDelimiters length

---

[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support

In reply to this post by qiuchenjian-2

Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2980#discussion_r240126217

--- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableOutputFormat.java ---
@@ -338,12 +338,15 @@ public static CarbonLoadModel getLoadModel(Configuration conf) throws IOExceptio
SKIP_EMPTY_LINE,
carbonProperty.getProperty(CarbonLoadOptionConstants.CARBON_OPTIONS_SKIP_EMPTY_LINE)));

- String complexDelim = conf.get(COMPLEX_DELIMITERS, "$" + "," + ":");
+ String complexDelim = conf.get(COMPLEX_DELIMITERS, "$" + "," + ":" + "," + "\\\\003");
--- End diff --

Move all the default delimters to constants and use it everywhere

---

[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support

In reply to this post by qiuchenjian-2

Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2980#discussion_r240151368

--- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/model/CarbonLoadModel.java ---
@@ -631,7 +651,7 @@ public void setFactTimeStamp(long factTimeStamp) {
}

public String[] getDelimiters() {
- return new String[] { complexDelimiterLevel1, complexDelimiterLevel2 };
+ return new String[] { complexDelimiterLevel1, complexDelimiterLevel2, complexDelimiterLevel3 };
--- End diff --

Modify this class and related changes in other classes as per the above comment

---

[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support

In reply to this post by qiuchenjian-2

Github user manishnalla1994 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2980#discussion_r240591596

--- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/parser/impl/RowParserImpl.java ---
@@ -34,8 +37,12 @@
private int numberOfColumns;

public RowParserImpl(DataField[] output, CarbonDataLoadConfiguration configuration) {
- String[] complexDelimiters =
+ String[] tempComplexDelimiters =
(String[]) configuration.getDataLoadProperty(DataLoadProcessorConstants.COMPLEX_DELIMITERS);
+ Queue<String> complexDelimiters = new LinkedList<>();
+ for (int i = 0; i < 4; i++) {
--- End diff --

Done.

---

[GitHub] carbondata issue #2980: [CARBONDATA-3017] Map DDL Support

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2980

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1705/

---

[GitHub] carbondata issue #2980: [CARBONDATA-3017] Map DDL Support

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2980

Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9965/

---

[GitHub] carbondata issue #2980: [CARBONDATA-3017] Map DDL Support

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2980

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1915/

---

[GitHub] carbondata issue #2980: [CARBONDATA-3017] Map DDL Support

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2980

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1916/

---

1234