GitHub user BJangir opened a pull request:
https://github.com/apache/carbondata/pull/2433 [CARBONDATA-2676]Support local Dictionary for SDK Writer Currently local dictionary is supported for managed table which is created using sql . We it should be supported for SDK Writer also. This PR contains a. Interface to supply dictionary threshold & Boolean to enable dictionary. b. DataLoading flow **Yet to support** a. Query Flow of Local dictionary (it will be done when normal table start supports for same) b. External Table loading and Query . Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? Existing Interface is not changed added new method - [ ] Any backward compatibility impacted? NA - [ ] Document update required? NO - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? Added UT - How it is tested? Please attach test report. UT is added - Is it a performance related change? Please attach the performance test report. NO - Any additional information to help reviewers in testing this change. NO - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NO You can merge this pull request into a Git repository by running: $ git pull https://github.com/BJangir/incubator-carbondata sdk_local_dic Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2433.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2433 ---- commit 299db85d9d1ddac7425e98075cdd30eac0413773 Author: BJangir <babulaljangir111@...> Date: 2018-06-30T21:31:05Z [CARBONDATA-2676]Support local Dictionary for SDK Writer ---- --- |
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2433 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5544/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2433 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6691/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2433 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5518/ --- |
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2433#discussion_r199370193 --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java --- @@ -284,6 +286,28 @@ public CarbonWriterBuilder withBlockSize(int blockSize) { return this; } + /** + * @param localDictionaryThreshold is localDictionaryThreshold,default is 1000 + * @return + */ + public CarbonWriterBuilder localDictionaryThreshold(int localDictionaryThreshold) { + if (localDictionaryThreshold <= 0) { + throw new IllegalArgumentException("blockSize should be between greater than 0"); --- End diff -- please change the error message accordingly. It is not a block size. --- |
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2433#discussion_r199370319 --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/TableSchemaBuilder.java --- @@ -108,6 +122,20 @@ public TableSchema build() { schema.setTableProperties(property); } + if (isLocalDictionaryEnabled) { + property.put(CarbonCommonConstants.LOCAL_DICTIONARY_ENABLE, + String.valueOf(isLocalDictionaryEnabled)); + String localdictionaryThreshold = + localDictionaryThreshold.equalsIgnoreCase("0") ? null : localDictionaryThreshold; --- End diff -- Need to set default value instead of null ? --- |
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2433#discussion_r199370410 --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/TableSchemaBuilder.java --- @@ -108,6 +122,20 @@ public TableSchema build() { schema.setTableProperties(property); } + if (isLocalDictionaryEnabled) { + property.put(CarbonCommonConstants.LOCAL_DICTIONARY_ENABLE, + String.valueOf(isLocalDictionaryEnabled)); + String localdictionaryThreshold = + localDictionaryThreshold.equalsIgnoreCase("0") ? null : localDictionaryThreshold; + property.put(CarbonCommonConstants.LOCAL_DICTIONARY_THRESHOLD, localdictionaryThreshold); + for (int index = 0; index < allColumns.size(); index++) { + ColumnSchema colSchema = allColumns.get(index); + if (colSchema.getDataType() == DataTypes.STRING) { + colSchema.setLocalDictColumn(true); --- End diff -- Can you add a comment that why only string columns we enable this ? Why not required for other columns? --- |
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2433#discussion_r199370539 --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/TableSchemaBuilder.java --- @@ -76,6 +78,18 @@ public TableSchemaBuilder blockletSize(int blockletSize) { return this; } + public TableSchemaBuilder localDictionaryThreshold(int localDictionaryThreshold) { --- End diff -- please change the method name to "setLocalDictionaryThreshold" --- |
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2433#discussion_r199370611 --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/TableSchemaBuilder.java --- @@ -76,6 +78,18 @@ public TableSchemaBuilder blockletSize(int blockletSize) { return this; } + public TableSchemaBuilder localDictionaryThreshold(int localDictionaryThreshold) { + this.localDictionaryThreshold = String.valueOf(localDictionaryThreshold); + return this; + } + + + public TableSchemaBuilder isLocalDictionaryEnabled(boolean isLocalDictionaryEnabled) { + this.isLocalDictionaryEnabled = isLocalDictionaryEnabled; --- End diff -- please change the method name to "setLocalDictionaryEnabled" --- |
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2433#discussion_r199371244 --- Diff: store/sdk/src/test/java/org/apache/carbondata/sdk/file/CSVNonTransactionalCarbonWriterTest.java --- @@ -284,4 +284,137 @@ public void testSchemaPersistence() throws IOException { FileUtils.deleteDirectory(new File(path)); } + @Test + public void testLocalDictionarywithTrue() throws Exception { + String path = "./testWriteFiles"; + FileUtils.deleteDirectory(new File(path)); + + Field[] fields = new Field[3]; + fields[0] = new Field("name", DataTypes.STRING); + fields[1] = new Field("surname", DataTypes.STRING); + fields[2] = new Field("age", DataTypes.INT); + + CarbonWriterBuilder builder = CarbonWriter.builder().isTransactionalTable(false).sortBy(new String[]{"name"}).withBlockSize(12).isLocalDictionaryEnabled(true) + .uniqueIdentifier(System.currentTimeMillis()).taskNo(System.nanoTime()).outputPath(path); + CarbonWriter carbonWriter = builder.buildWriterForCSVInput(new Schema(fields)); + for (int i = 0; i < 100; i++) { + carbonWriter.write(new String[]{"robot" + (i % 10),"robot_surname" + (i % 10), String.valueOf(i)}); + } + carbonWriter.close(); + + File segmentFolder = new File(path); + Assert.assertTrue(segmentFolder.exists()); + + File[] dataFiles = segmentFolder.listFiles(new FileFilter() { + @Override public boolean accept(File pathname) { + return pathname.getName().endsWith(CarbonCommonConstants.FACT_FILE_EXT); + } + }); + Assert.assertNotNull(dataFiles); + Assert.assertTrue(dataFiles.length > 0); + + + FileUtils.deleteDirectory(new File(path)); + } + + @Test + public void testLocalDictionarywithFalseOption() throws Exception { + String path = "./testWriteFiles"; + FileUtils.deleteDirectory(new File(path)); + + Field[] fields = new Field[3]; + fields[0] = new Field("name", DataTypes.STRING); + fields[1] = new Field("surname", DataTypes.STRING); + fields[2] = new Field("age", DataTypes.INT); + + CarbonWriterBuilder builder = CarbonWriter.builder().isTransactionalTable(false).sortBy(new String[]{"name"}).withBlockSize(12).isLocalDictionaryEnabled(false) --- End diff -- Default is false. So, other test case in this file test this. This test case is redundant ? --- |
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on the issue:
https://github.com/apache/carbondata/pull/2433 Also for the new interface methods added for the local dictionary. Need to update this in SDK guide. https://github.com/apache/carbondata/blob/master/docs/sdk-guide.md --- |
In reply to this post by qiuchenjian-2
Github user kunal642 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2433#discussion_r199382572 --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/TableSchemaBuilder.java --- @@ -76,6 +78,18 @@ public TableSchemaBuilder blockletSize(int blockletSize) { return this; } + public TableSchemaBuilder localDictionaryThreshold(int localDictionaryThreshold) { + this.localDictionaryThreshold = String.valueOf(localDictionaryThreshold); + return this; + } + + + public TableSchemaBuilder isLocalDictionaryEnabled(boolean isLocalDictionaryEnabled) { --- End diff -- change the method name to enableLocalDictionary --- |
In reply to this post by qiuchenjian-2
Github user kunal642 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2433#discussion_r199382674 --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java --- @@ -284,6 +286,28 @@ public CarbonWriterBuilder withBlockSize(int blockSize) { return this; } + /** + * @param localDictionaryThreshold is localDictionaryThreshold,default is 1000 + * @return + */ + public CarbonWriterBuilder localDictionaryThreshold(int localDictionaryThreshold) { + if (localDictionaryThreshold <= 0) { + throw new IllegalArgumentException("blockSize should be between greater than 0"); + } + this.localDictionaryThreshold = localDictionaryThreshold; + return this; + } + + /** + * @param isLocalDictionaryEnabled enable local dictionary , default is false + * @return updated CarbonWriterBuilder + */ + public CarbonWriterBuilder isLocalDictionaryEnabled(boolean isLocalDictionaryEnabled) { --- End diff -- change the method name to enableLocalDictionary --- |
In reply to this post by qiuchenjian-2
Github user kunal642 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2433#discussion_r199384025 --- Diff: store/sdk/src/test/java/org/apache/carbondata/sdk/file/CSVNonTransactionalCarbonWriterTest.java --- @@ -284,4 +284,137 @@ public void testSchemaPersistence() throws IOException { FileUtils.deleteDirectory(new File(path)); } + @Test + public void testLocalDictionarywithTrue() throws Exception { + String path = "./testWriteFiles"; + FileUtils.deleteDirectory(new File(path)); + + Field[] fields = new Field[3]; + fields[0] = new Field("name", DataTypes.STRING); + fields[1] = new Field("surname", DataTypes.STRING); + fields[2] = new Field("age", DataTypes.INT); + + CarbonWriterBuilder builder = CarbonWriter.builder().isTransactionalTable(false).sortBy(new String[]{"name"}).withBlockSize(12).isLocalDictionaryEnabled(true) + .uniqueIdentifier(System.currentTimeMillis()).taskNo(System.nanoTime()).outputPath(path); + CarbonWriter carbonWriter = builder.buildWriterForCSVInput(new Schema(fields)); + for (int i = 0; i < 100; i++) { + carbonWriter.write(new String[]{"robot" + (i % 10),"robot_surname" + (i % 10), String.valueOf(i)}); + } + carbonWriter.close(); + + File segmentFolder = new File(path); + Assert.assertTrue(segmentFolder.exists()); + + File[] dataFiles = segmentFolder.listFiles(new FileFilter() { + @Override public boolean accept(File pathname) { + return pathname.getName().endsWith(CarbonCommonConstants.FACT_FILE_EXT); + } + }); + Assert.assertNotNull(dataFiles); + Assert.assertTrue(dataFiles.length > 0); + + + FileUtils.deleteDirectory(new File(path)); + } + + @Test + public void testLocalDictionarywithFalseOption() throws Exception { + String path = "./testWriteFiles"; + FileUtils.deleteDirectory(new File(path)); + + Field[] fields = new Field[3]; + fields[0] = new Field("name", DataTypes.STRING); + fields[1] = new Field("surname", DataTypes.STRING); + fields[2] = new Field("age", DataTypes.INT); + + CarbonWriterBuilder builder = CarbonWriter.builder().isTransactionalTable(false).sortBy(new String[]{"name"}).withBlockSize(12).isLocalDictionaryEnabled(false) + .uniqueIdentifier(System.currentTimeMillis()).taskNo(System.nanoTime()).outputPath(path); + CarbonWriter carbonWriter = builder.buildWriterForCSVInput(new Schema(fields)); + for (int i = 0; i < 100; i++) { + carbonWriter.write(new String[]{"robot" + (i % 10),"robot_surname" + (i % 10), String.valueOf(i)}); + } + carbonWriter.close(); + + File segmentFolder = new File(path); + Assert.assertTrue(segmentFolder.exists()); + + File[] dataFiles = segmentFolder.listFiles(new FileFilter() { + @Override public boolean accept(File pathname) { + return pathname.getName().endsWith(CarbonCommonConstants.FACT_FILE_EXT); + } + }); + Assert.assertNotNull(dataFiles); + Assert.assertTrue(dataFiles.length > 0); + + + FileUtils.deleteDirectory(new File(path)); + } + + @Test + public void testLocalDictionarywithThreshold() throws Exception { --- End diff -- can we add some validation to check whether fall back happened or not.? --- |
In reply to this post by qiuchenjian-2
Github user BJangir commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2433#discussion_r200820897 --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/TableSchemaBuilder.java --- @@ -76,6 +78,18 @@ public TableSchemaBuilder blockletSize(int blockletSize) { return this; } + public TableSchemaBuilder localDictionaryThreshold(int localDictionaryThreshold) { + this.localDictionaryThreshold = String.valueOf(localDictionaryThreshold); + return this; + } + + + public TableSchemaBuilder isLocalDictionaryEnabled(boolean isLocalDictionaryEnabled) { --- End diff -- Done --- |
In reply to this post by qiuchenjian-2
Github user BJangir commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2433#discussion_r200820900 --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/TableSchemaBuilder.java --- @@ -108,6 +122,20 @@ public TableSchema build() { schema.setTableProperties(property); } + if (isLocalDictionaryEnabled) { + property.put(CarbonCommonConstants.LOCAL_DICTIONARY_ENABLE, + String.valueOf(isLocalDictionaryEnabled)); + String localdictionaryThreshold = + localDictionaryThreshold.equalsIgnoreCase("0") ? null : localDictionaryThreshold; --- End diff -- Done --- |
In reply to this post by qiuchenjian-2
Github user BJangir commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2433#discussion_r200820906 --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/TableSchemaBuilder.java --- @@ -108,6 +122,20 @@ public TableSchema build() { schema.setTableProperties(property); } + if (isLocalDictionaryEnabled) { + property.put(CarbonCommonConstants.LOCAL_DICTIONARY_ENABLE, + String.valueOf(isLocalDictionaryEnabled)); + String localdictionaryThreshold = + localDictionaryThreshold.equalsIgnoreCase("0") ? null : localDictionaryThreshold; + property.put(CarbonCommonConstants.LOCAL_DICTIONARY_THRESHOLD, localdictionaryThreshold); + for (int index = 0; index < allColumns.size(); index++) { + ColumnSchema colSchema = allColumns.get(index); + if (colSchema.getDataType() == DataTypes.STRING) { + colSchema.setLocalDictColumn(true); --- End diff -- added comment..Done --- |
In reply to this post by qiuchenjian-2
Github user BJangir commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2433#discussion_r200820912 --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java --- @@ -284,6 +286,28 @@ public CarbonWriterBuilder withBlockSize(int blockSize) { return this; } + /** + * @param localDictionaryThreshold is localDictionaryThreshold,default is 1000 + * @return + */ + public CarbonWriterBuilder localDictionaryThreshold(int localDictionaryThreshold) { + if (localDictionaryThreshold <= 0) { + throw new IllegalArgumentException("blockSize should be between greater than 0"); --- End diff -- Corrected . --- |
In reply to this post by qiuchenjian-2
Github user BJangir commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2433#discussion_r200820916 --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java --- @@ -284,6 +286,28 @@ public CarbonWriterBuilder withBlockSize(int blockSize) { return this; } + /** + * @param localDictionaryThreshold is localDictionaryThreshold,default is 1000 + * @return + */ + public CarbonWriterBuilder localDictionaryThreshold(int localDictionaryThreshold) { + if (localDictionaryThreshold <= 0) { + throw new IllegalArgumentException("blockSize should be between greater than 0"); + } + this.localDictionaryThreshold = localDictionaryThreshold; + return this; + } + + /** + * @param isLocalDictionaryEnabled enable local dictionary , default is false + * @return updated CarbonWriterBuilder + */ + public CarbonWriterBuilder isLocalDictionaryEnabled(boolean isLocalDictionaryEnabled) { --- End diff -- Done --- |
In reply to this post by qiuchenjian-2
Github user BJangir commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2433#discussion_r200820921 --- Diff: store/sdk/src/test/java/org/apache/carbondata/sdk/file/CSVNonTransactionalCarbonWriterTest.java --- @@ -284,4 +284,137 @@ public void testSchemaPersistence() throws IOException { FileUtils.deleteDirectory(new File(path)); } + @Test + public void testLocalDictionarywithTrue() throws Exception { + String path = "./testWriteFiles"; + FileUtils.deleteDirectory(new File(path)); + + Field[] fields = new Field[3]; + fields[0] = new Field("name", DataTypes.STRING); + fields[1] = new Field("surname", DataTypes.STRING); + fields[2] = new Field("age", DataTypes.INT); + + CarbonWriterBuilder builder = CarbonWriter.builder().isTransactionalTable(false).sortBy(new String[]{"name"}).withBlockSize(12).isLocalDictionaryEnabled(true) + .uniqueIdentifier(System.currentTimeMillis()).taskNo(System.nanoTime()).outputPath(path); + CarbonWriter carbonWriter = builder.buildWriterForCSVInput(new Schema(fields)); + for (int i = 0; i < 100; i++) { + carbonWriter.write(new String[]{"robot" + (i % 10),"robot_surname" + (i % 10), String.valueOf(i)}); + } + carbonWriter.close(); + + File segmentFolder = new File(path); + Assert.assertTrue(segmentFolder.exists()); + + File[] dataFiles = segmentFolder.listFiles(new FileFilter() { + @Override public boolean accept(File pathname) { + return pathname.getName().endsWith(CarbonCommonConstants.FACT_FILE_EXT); + } + }); + Assert.assertNotNull(dataFiles); + Assert.assertTrue(dataFiles.length > 0); + + + FileUtils.deleteDirectory(new File(path)); + } + + @Test + public void testLocalDictionarywithFalseOption() throws Exception { + String path = "./testWriteFiles"; + FileUtils.deleteDirectory(new File(path)); + + Field[] fields = new Field[3]; + fields[0] = new Field("name", DataTypes.STRING); + fields[1] = new Field("surname", DataTypes.STRING); + fields[2] = new Field("age", DataTypes.INT); + + CarbonWriterBuilder builder = CarbonWriter.builder().isTransactionalTable(false).sortBy(new String[]{"name"}).withBlockSize(12).isLocalDictionaryEnabled(false) --- End diff -- Ok..Removed --- |
Free forum by Nabble | Edit this page |