Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #2882: [CARBONDATA-3060]Improve the command for cli ...

Classic

List

Threaded

32 messages Options

qiuchenjian-2

[GitHub] carbondata issue #2882: [CARBONDATA-3060]Improve the command for cli and fix...

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2882

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9508/

---

qiuchenjian-2

[GitHub] carbondata pull request #2882: [CARBONDATA-3060]Improve the command for cli ...

In reply to this post by qiuchenjian-2

Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2882#discussion_r230340585

--- Diff: tools/cli/src/main/java/org/apache/carbondata/tool/DataFile.java ---
@@ -453,7 +455,16 @@ private double computePercentage(byte[] data, byte[] min, byte[] max, ColumnSche
return dataValue.divide(factorValue).doubleValue();
}
double dataValue, minValue, factorValue;
- if (column.getDataType() == DataTypes.SHORT) {
+ if (columnChunk.column.isDimensionColumn() && DataTypeUtil
--- End diff --

done

---

qiuchenjian-2

[GitHub] carbondata pull request #2882: [CARBONDATA-3060]Improve the command for cli ...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2882#discussion_r230615332

--- Diff: tools/cli/src/main/java/org/apache/carbondata/tool/DataSummary.java ---
@@ -311,26 +309,39 @@ private void printColumnStats(String columnName) throws IOException, MemoryExcep
max = new String(blockletMax, Charset.forName(DEFAULT_CHARSET));
}
} else {
- minPercent = String.format("%.1f", blocklet.getColumnChunk().getMinPercentage() * 100);
- maxPercent = String.format("%.1f", blocklet.getColumnChunk().getMaxPercentage() * 100);
+ // for complex columns min and max and percentage
+ if (blocklet.getColumnChunk().column.getColumnName().contains(".val") ||
+ blocklet.getColumnChunk().column.getColumnName().contains(".")) {
+ minPercent = "NA";
+ maxPercent = "NA";
+ } else {
+ minPercent =
+ String.format("%.1f", Math.abs(blocklet.getColumnChunk().getMinPercentage() * 100));
+ maxPercent =
+ String.format("%.1f", Math.abs(blocklet.getColumnChunk().getMaxPercentage() * 100));
+ }
DataFile.ColumnChunk columnChunk = blocklet.columnChunk;
- if (columnChunk.column.isDimensionColumn() && DataTypeUtil
+ // need to consider no dictionary complex column
+ if (columnChunk.column.hasEncoding(Encoding.DICTIONARY) || blocklet
+ .getColumnChunk().column.getColumnName().contains(".val") || blocklet
--- End diff --

Can you add a function in ColumnSchema to return whether it is a complex column encoded in global dictionary, instead of hard coding it here

---

qiuchenjian-2

[GitHub] carbondata pull request #2882: [CARBONDATA-3060]Improve the command for cli ...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2882#discussion_r230615363

--- Diff: examples/spark2/src/main/java/org/apache/carbondata/examples/sdk/CarbonReaderExample.java ---
@@ -61,7 +61,8 @@ public static void main(String[] args) {
CarbonWriter writer = CarbonWriter.builder()
.outputPath(path)
.withLoadOptions(map)
- .withCsvInput(new Schema(fields)).build();
+ .withCsvInput(new Schema(fields))
+ .writtenBy("CarbonReaderExample").build();
--- End diff --

`.build` should be moved to next line also. please follow coding style

---

qiuchenjian-2

[GitHub] carbondata pull request #2882: [CARBONDATA-3060]Improve the command for cli ...

In reply to this post by qiuchenjian-2

Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2882#discussion_r230637383

--- Diff: examples/spark2/src/main/java/org/apache/carbondata/examples/sdk/CarbonReaderExample.java ---
@@ -61,7 +61,8 @@ public static void main(String[] args) {
CarbonWriter writer = CarbonWriter.builder()
.outputPath(path)
.withLoadOptions(map)
- .withCsvInput(new Schema(fields)).build();
+ .withCsvInput(new Schema(fields))
+ .writtenBy("CarbonReaderExample").build();
--- End diff --

done, actually initially it was like that, and im using carbon formatting only, but i do not know why it is formatting like this, need to once check the xml

---

qiuchenjian-2

[GitHub] carbondata pull request #2882: [CARBONDATA-3060]Improve the command for cli ...

In reply to this post by qiuchenjian-2

Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2882#discussion_r230637603

--- Diff: tools/cli/src/main/java/org/apache/carbondata/tool/DataSummary.java ---
@@ -311,26 +309,39 @@ private void printColumnStats(String columnName) throws IOException, MemoryExcep
max = new String(blockletMax, Charset.forName(DEFAULT_CHARSET));
}
} else {
- minPercent = String.format("%.1f", blocklet.getColumnChunk().getMinPercentage() * 100);
- maxPercent = String.format("%.1f", blocklet.getColumnChunk().getMaxPercentage() * 100);
+ // for complex columns min and max and percentage
+ if (blocklet.getColumnChunk().column.getColumnName().contains(".val") ||
+ blocklet.getColumnChunk().column.getColumnName().contains(".")) {
+ minPercent = "NA";
+ maxPercent = "NA";
+ } else {
+ minPercent =
+ String.format("%.1f", Math.abs(blocklet.getColumnChunk().getMinPercentage() * 100));
+ maxPercent =
+ String.format("%.1f", Math.abs(blocklet.getColumnChunk().getMaxPercentage() * 100));
+ }
DataFile.ColumnChunk columnChunk = blocklet.columnChunk;
- if (columnChunk.column.isDimensionColumn() && DataTypeUtil
+ // need to consider no dictionary complex column
+ if (columnChunk.column.hasEncoding(Encoding.DICTIONARY) || blocklet
+ .getColumnChunk().column.getColumnName().contains(".val") || blocklet
--- End diff --

can have a method which tells the column is complex column based on name, we already have method which tells the column by datatype, and for dictionary include and complex type, togetther no need to check, because again for child columns need to have other method, as we cant give child columns in dictionary

---

qiuchenjian-2

[GitHub] carbondata pull request #2882: [CARBONDATA-3060]Improve the command for cli ...

In reply to this post by qiuchenjian-2

Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2882#discussion_r230637639

--- Diff: tools/cli/src/main/java/org/apache/carbondata/tool/DataSummary.java ---
@@ -311,26 +309,39 @@ private void printColumnStats(String columnName) throws IOException, MemoryExcep
max = new String(blockletMax, Charset.forName(DEFAULT_CHARSET));
}
} else {
- minPercent = String.format("%.1f", blocklet.getColumnChunk().getMinPercentage() * 100);
- maxPercent = String.format("%.1f", blocklet.getColumnChunk().getMaxPercentage() * 100);
+ // for complex columns min and max and percentage
+ if (blocklet.getColumnChunk().column.getColumnName().contains(".val") ||
+ blocklet.getColumnChunk().column.getColumnName().contains(".")) {
+ minPercent = "NA";
+ maxPercent = "NA";
+ } else {
+ minPercent =
+ String.format("%.1f", Math.abs(blocklet.getColumnChunk().getMinPercentage() * 100));
+ maxPercent =
+ String.format("%.1f", Math.abs(blocklet.getColumnChunk().getMaxPercentage() * 100));
+ }
DataFile.ColumnChunk columnChunk = blocklet.columnChunk;
- if (columnChunk.column.isDimensionColumn() && DataTypeUtil
+ // need to consider no dictionary complex column
+ if (columnChunk.column.hasEncoding(Encoding.DICTIONARY) || blocklet
+ .getColumnChunk().column.getColumnName().contains(".val") || blocklet
--- End diff --

handled, please review

---

qiuchenjian-2

[GitHub] carbondata issue #2882: [CARBONDATA-3060]Improve the command for cli and fix...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2882

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1261/

---

qiuchenjian-2

[GitHub] carbondata issue #2882: [CARBONDATA-3060]Improve the command for cli and fix...

In reply to this post by qiuchenjian-2

qiuchenjian-2

[GitHub] carbondata issue #2882: [CARBONDATA-3060]Improve the command for cli and fix...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2882

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1476/

---

qiuchenjian-2

[GitHub] carbondata issue #2882: [CARBONDATA-3060]Improve the command for cli and fix...

In reply to this post by qiuchenjian-2

Github user jackylk commented on the issue:

https://github.com/apache/carbondata/pull/2882

LGTM

---

qiuchenjian-2

[GitHub] carbondata pull request #2882: [CARBONDATA-3060]Improve the command for cli ...

In reply to this post by qiuchenjian-2

Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/2882

---