ajantha-bhat opened a new pull request #3645: [WIP] Fix insert failure on partition table with local sort
URL: https://github.com/apache/carbondata/pull/3645 ### Why is this PR needed? ### What changes were proposed in this PR? ### Does this PR introduce any user interface change? - No - Yes. (please explain the change and update document) ### Is any new testcase added? - No - Yes ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
CarbonDataQA1 commented on issue #3645: [WIP] Fix insert failure on partition table with local sort
URL: https://github.com/apache/carbondata/pull/3645#issuecomment-592953640 Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/542/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3645: [WIP] Fix insert failure on partition table with local sort
URL: https://github.com/apache/carbondata/pull/3645#issuecomment-592956497 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2242/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3645: [WIP] Fix insert failure on partition table with local sort
URL: https://github.com/apache/carbondata/pull/3645#issuecomment-592972578 Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/544/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3645: [WIP] Fix insert failure on partition table with local sort
URL: https://github.com/apache/carbondata/pull/3645#issuecomment-592982114 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2244/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3645: [CARBONDATA-3728] Fix insert failure on partition table with local sort
URL: https://github.com/apache/carbondata/pull/3645#issuecomment-593045702 Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/546/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3645: [CARBONDATA-3728] Fix insert failure on partition table with local sort
URL: https://github.com/apache/carbondata/pull/3645#issuecomment-593050797 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2246/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
akashrn5 commented on a change in pull request #3645: [CARBONDATA-3728] Fix insert failure on partition table with local sort
URL: https://github.com/apache/carbondata/pull/3645#discussion_r386083228 ########## File path: processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java ########## @@ -401,6 +420,33 @@ public static boolean isHeaderValid(String tableName, String[] csvHeader, return noDicSortColMapping; } + /** + * get mapping based on data fields order + * + * @param dataFields + * @return + */ + public static boolean[] getNoDictSortColMappingAsDataFieldOrder(DataField[] dataFields) { + List<Boolean> noDicSortColMap = new ArrayList<>(); + for (DataField dataField : dataFields) { Review comment: same as above ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
akashrn5 commented on a change in pull request #3645: [CARBONDATA-3728] Fix insert failure on partition table with local sort
URL: https://github.com/apache/carbondata/pull/3645#discussion_r386082988 ########## File path: processing/src/main/java/org/apache/carbondata/processing/loading/CarbonDataLoadConfiguration.java ########## @@ -248,6 +248,23 @@ public void setSchemaUpdatedTimeStamp(long schemaUpdatedTimeStamp) { return type; } + public DataType[] getMeasureDataTypeAsDataFieldOrder() { + // same as data fields order + List<Integer> measureIndexes = new ArrayList<>(dataFields.length); + int measureCount = 0; + for (int i = 0; i < dataFields.length; i++) { Review comment: instead of using two for loops, i think u can use steam ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
akashrn5 commented on a change in pull request #3645: [CARBONDATA-3728] Fix insert failure on partition table with local sort
URL: https://github.com/apache/carbondata/pull/3645#discussion_r386082610 ########## File path: processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java ########## @@ -457,6 +534,34 @@ public static boolean isHeaderValid(String tableName, String[] csvHeader, return noDictSortAndNoSortTypes; } + /** + * Get the data types of the no dictionary sort columns as per dataFields order Review comment: i think since we removed dictionary, we should avoid mentioning noDict in all places ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
akashrn5 commented on a change in pull request #3645: [CARBONDATA-3728] Fix insert failure on partition table with local sort
URL: https://github.com/apache/carbondata/pull/3645#discussion_r386082916 ########## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableLoadingTestCase.scala ########## @@ -502,6 +502,51 @@ class StandardPartitionTableLoadingTestCase extends QueryTest with BeforeAndAfte assert(result.get(0).get(7).equals(dataAndIndexSize._2)) } + test("test partition with all sort scope") { + sql("drop table if exists origin_csv") + sql( + s""" + | create table origin_csv(col1 int, col2 string, col3 date) + | using csv + | options('dateFormat'='yyyy-MM-dd', 'timestampFormat'='yyyy-MM-dd HH:mm:ss') + | """.stripMargin) + sql("insert into origin_csv select 1, '3aa', to_date('2019-11-11')") + sql("insert into origin_csv select 2, '2bb', to_date('2019-11-12')") + sql("insert into origin_csv select 3, '1cc', to_date('2019-11-13')") + verifyInsertForPartitionTable("tbl_p_ns", "no_sort") + verifyInsertForPartitionTable("tbl_p_ls", "local_sort") + verifyInsertForPartitionTable("tbl_p_gs", "global_sort") + sql("drop table origin_csv") + } + + def verifyInsertForPartitionTable(tableName: String, sort_scope: String): Unit = { + sql(s"drop table if exists $tableName") + sql( Review comment: include one int column also in test ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
akashrn5 commented on a change in pull request #3645: [CARBONDATA-3728] Fix insert failure on partition table with local sort
URL: https://github.com/apache/carbondata/pull/3645#discussion_r386083102 ########## File path: processing/src/main/java/org/apache/carbondata/processing/loading/constants/DataLoadProcessorConstants.java ########## @@ -38,4 +38,6 @@ public static final String FACT_FILE_PATH = "FACT_FILE_PATH"; + // to indicate that it is optimized insert flow without rearrange of each data rows + public static final String NO_REARRANGE_OF_ROWS = "NO_REARRANGE_OF_ROWS"; Review comment: please add a default value instead of giving true or false ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
akashrn5 commented on a change in pull request #3645: [CARBONDATA-3728] Fix insert failure on partition table with local sort
URL: https://github.com/apache/carbondata/pull/3645#discussion_r386083184 ########## File path: processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java ########## @@ -375,6 +375,25 @@ public static boolean isHeaderValid(String tableName, String[] csvHeader, return type.toArray(new DataType[type.size()]); } + /** + * get visible no dictionary dimensions as per data field order + * + * @param dataFields + * @return + */ + public static DataType[] getNoDictDataTypesAsDataFieldOrder(DataField[] dataFields) { + List<DataType> type = new ArrayList<>(); + for (DataField dataField : dataFields) { Review comment: can use steam ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
akashrn5 commented on a change in pull request #3645: [CARBONDATA-3728] Fix insert failure on partition table with local sort
URL: https://github.com/apache/carbondata/pull/3645#discussion_r386083208 ########## File path: processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java ########## @@ -375,6 +375,25 @@ public static boolean isHeaderValid(String tableName, String[] csvHeader, return type.toArray(new DataType[type.size()]); } + /** + * get visible no dictionary dimensions as per data field order Review comment: i think better to avoid mentioning dictionary ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3645: [CARBONDATA-3728] Fix insert failure on partition table with local sort
URL: https://github.com/apache/carbondata/pull/3645#discussion_r386087326 ########## File path: processing/src/main/java/org/apache/carbondata/processing/loading/constants/DataLoadProcessorConstants.java ########## @@ -38,4 +38,6 @@ public static final String FACT_FILE_PATH = "FACT_FILE_PATH"; + // to indicate that it is optimized insert flow without rearrange of each data rows + public static final String NO_REARRANGE_OF_ROWS = "NO_REARRANGE_OF_ROWS"; Review comment: This is not a carbon property or user-configurable. All the places just checked whether this key is present. It doesn't care about value. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3645: [CARBONDATA-3728] Fix insert failure on partition table with local sort
URL: https://github.com/apache/carbondata/pull/3645#discussion_r386087427 ########## File path: processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java ########## @@ -375,6 +375,25 @@ public static boolean isHeaderValid(String tableName, String[] csvHeader, return type.toArray(new DataType[type.size()]); } + /** + * get visible no dictionary dimensions as per data field order Review comment: all these functions are replicated from the same file. Whole sort step has dictionary and no-dictionary mapping. I do agree that we don't have dictionary. But cannot change all the current sort step variables and logic in this PR. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3645: [CARBONDATA-3728] Fix insert failure on partition table with local sort
URL: https://github.com/apache/carbondata/pull/3645#discussion_r386087434 ########## File path: processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java ########## @@ -401,6 +420,33 @@ public static boolean isHeaderValid(String tableName, String[] csvHeader, return noDicSortColMapping; } + /** + * get mapping based on data fields order + * + * @param dataFields + * @return + */ + public static boolean[] getNoDictSortColMappingAsDataFieldOrder(DataField[] dataFields) { + List<Boolean> noDicSortColMap = new ArrayList<>(); + for (DataField dataField : dataFields) { Review comment: all these functions are replicated from the same file. Whole sort step has dictionary and no-dictionary mapping. I do agree that we don't have dictionary. But cannot change all the current sort step variables and logic in this PR. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3645: [CARBONDATA-3728] Fix insert failure on partition table with local sort
URL: https://github.com/apache/carbondata/pull/3645#discussion_r386087496 ########## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableLoadingTestCase.scala ########## @@ -502,6 +502,51 @@ class StandardPartitionTableLoadingTestCase extends QueryTest with BeforeAndAfte assert(result.get(0).get(7).equals(dataAndIndexSize._2)) } + test("test partition with all sort scope") { + sql("drop table if exists origin_csv") + sql( + s""" + | create table origin_csv(col1 int, col2 string, col3 date) + | using csv + | options('dateFormat'='yyyy-MM-dd', 'timestampFormat'='yyyy-MM-dd HH:mm:ss') + | """.stripMargin) + sql("insert into origin_csv select 1, '3aa', to_date('2019-11-11')") + sql("insert into origin_csv select 2, '2bb', to_date('2019-11-12')") + sql("insert into origin_csv select 3, '1cc', to_date('2019-11-13')") + verifyInsertForPartitionTable("tbl_p_ns", "no_sort") + verifyInsertForPartitionTable("tbl_p_ls", "local_sort") + verifyInsertForPartitionTable("tbl_p_gs", "global_sort") + sql("drop table origin_csv") + } + + def verifyInsertForPartitionTable(tableName: String, sort_scope: String): Unit = { + sql(s"drop table if exists $tableName") + sql( Review comment: Already int and float is present . col 1 is int itself ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
ajantha-bhat commented on a change in pull request #3645: [CARBONDATA-3728] Fix insert failure on partition table with local sort
URL: https://github.com/apache/carbondata/pull/3645#discussion_r386087526 ########## File path: processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java ########## @@ -457,6 +534,34 @@ public static boolean isHeaderValid(String tableName, String[] csvHeader, return noDictSortAndNoSortTypes; } + /** + * Get the data types of the no dictionary sort columns as per dataFields order Review comment: all these functions are replicated from the same file. Whole sort step has dictionary and no-dictionary mapping. I do agree that we don't have dictionary. But cannot change all the current sort step variables and logic in this PR. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
akashrn5 commented on a change in pull request #3645: [CARBONDATA-3728] Fix insert failure on partition table with local sort
URL: https://github.com/apache/carbondata/pull/3645#discussion_r386199857 ########## File path: processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java ########## @@ -457,6 +534,34 @@ public static boolean isHeaderValid(String tableName, String[] csvHeader, return noDictSortAndNoSortTypes; } + /** + * Get the data types of the no dictionary sort columns as per dataFields order Review comment: i agree, but atleast for the methods which you ave written better to avoid,and the existing one we can have a refactor PR later. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
Free forum by Nabble | Edit this page |