CarbonDataQA1 commented on issue #3608: [WIP]Si feature
URL: https://github.com/apache/carbondata/pull/3608#issuecomment-585143772 Build Failed with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/261/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3608: [WIP]Si feature
URL: https://github.com/apache/carbondata/pull/3608#issuecomment-585146097 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1964/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
akashrn5 commented on issue #3608: [WIP]Si feature
URL: https://github.com/apache/carbondata/pull/3608#issuecomment-585167971 > please add description done ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3608: [CARBONDATA-3680]Support Secondary Index feature on carbon table.
URL: https://github.com/apache/carbondata/pull/3608#issuecomment-585180711 Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/264/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
jackylk commented on a change in pull request #3608: [CARBONDATA-3680]Support Secondary Index feature on carbon table.
URL: https://github.com/apache/carbondata/pull/3608#discussion_r378261576 ########## File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ########## @@ -2341,4 +2347,78 @@ private CarbonCommonConstants() { * Default first day of week */ public static final String CARBON_TIMESERIES_FIRST_DAY_OF_WEEK_DEFAULT = "SUNDAY"; + + @CarbonProperty + public static final String CARBON_PUSH_LEFTSEMIEXIST_JOIN_AS_IN_FILTER = + "carbon.infilter.subquery.pushdown.enable"; + + + /** + * CARBON_PUSH_LEFTSEMIEXIST_JOIN_AS_IN_FILTER_DEFAULT + */ + public static final String CARBON_PUSH_LEFTSEMIEXIST_JOIN_AS_IN_FILTER_DEFAULT = "false"; + + /** + * key to get broadcast record size from properties + */ + @CarbonProperty + public static final String BROADCAST_RECORD_SIZE = "broadcast.record.size"; + + /** + * default broadcast record size + */ + public static final String DEFAULT_BROADCAST_RECORD_SIZE = "100"; + + /** + * to enable SI lookup partial string + */ + @CarbonProperty + public static final String ENABLE_SI_LOOKUP_PARTIALSTRING = "carbon.si.lookup.partialstring"; + + /** + * default value of ENABLE_SI_LOOKUP_PARTIALSTRING + */ + public static final String ENABLE_SI_LOOKUP_PARTIALSTRING_DEFAULT = "true"; + + /** + * configuration for launching the number of threads during secondary index creation + */ + @CarbonProperty + public static final String CARBON_SECONDARY_INDEX_CREATION_THREADS = + "carbon.secondary.index.creation.threads"; + + /** + * default value configuration for launching the number of threads during secondary + * index creation + */ + public static final String CARBON_SECONDARY_INDEX_CREATION_THREADS_DEFAULT = "1"; + + /** + * max value configuration for launching the number of threads during secondary + * index creation + */ + public static final int CARBON_SECONDARY_INDEX_CREATION_THREADS_MAX = 50; + + /** + * threshold of high cardinality + */ + @CarbonProperty + public static final String HIGH_CARDINALITY_THRESHOLD = "high.cardinality.threshold"; Review comment: I think this is not required now ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
jackylk commented on a change in pull request #3608: [CARBONDATA-3680]Support Secondary Index feature on carbon table.
URL: https://github.com/apache/carbondata/pull/3608#discussion_r378261883 ########## File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ########## @@ -2341,4 +2347,78 @@ private CarbonCommonConstants() { * Default first day of week */ public static final String CARBON_TIMESERIES_FIRST_DAY_OF_WEEK_DEFAULT = "SUNDAY"; + + @CarbonProperty + public static final String CARBON_PUSH_LEFTSEMIEXIST_JOIN_AS_IN_FILTER = + "carbon.infilter.subquery.pushdown.enable"; + + + /** + * CARBON_PUSH_LEFTSEMIEXIST_JOIN_AS_IN_FILTER_DEFAULT + */ + public static final String CARBON_PUSH_LEFTSEMIEXIST_JOIN_AS_IN_FILTER_DEFAULT = "false"; + + /** + * key to get broadcast record size from properties + */ + @CarbonProperty + public static final String BROADCAST_RECORD_SIZE = "broadcast.record.size"; + + /** + * default broadcast record size + */ + public static final String DEFAULT_BROADCAST_RECORD_SIZE = "100"; + + /** + * to enable SI lookup partial string + */ + @CarbonProperty + public static final String ENABLE_SI_LOOKUP_PARTIALSTRING = "carbon.si.lookup.partialstring"; + + /** + * default value of ENABLE_SI_LOOKUP_PARTIALSTRING + */ + public static final String ENABLE_SI_LOOKUP_PARTIALSTRING_DEFAULT = "true"; + + /** + * configuration for launching the number of threads during secondary index creation + */ + @CarbonProperty + public static final String CARBON_SECONDARY_INDEX_CREATION_THREADS = + "carbon.secondary.index.creation.threads"; + + /** + * default value configuration for launching the number of threads during secondary + * index creation + */ + public static final String CARBON_SECONDARY_INDEX_CREATION_THREADS_DEFAULT = "1"; + + /** + * max value configuration for launching the number of threads during secondary + * index creation + */ + public static final int CARBON_SECONDARY_INDEX_CREATION_THREADS_MAX = 50; + + /** + * threshold of high cardinality + */ + @CarbonProperty + public static final String HIGH_CARDINALITY_THRESHOLD = "high.cardinality.threshold"; + + public static final String HIGH_CARDINALITY_THRESHOLD_DEFAULT = "1000000"; + + public static final int HIGH_CARDINALITY_THRESHOLD_MIN = 10000; + + /** + * Enable SI segment Compaction / merge small files + */ + @CarbonProperty + public static final String CARBON_SI_SEGMENT_MERGE = "carbon.si.segment.merge"; + + /** + * Default value for SI segment Compaction / merge small files + * Making this true degrade the LOAD performance Review comment: please explain in comment when should user set to true? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
jackylk commented on a change in pull request #3608: [CARBONDATA-3680]Support Secondary Index feature on carbon table.
URL: https://github.com/apache/carbondata/pull/3608#discussion_r378262296 ########## File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ########## @@ -2341,4 +2347,78 @@ private CarbonCommonConstants() { * Default first day of week */ public static final String CARBON_TIMESERIES_FIRST_DAY_OF_WEEK_DEFAULT = "SUNDAY"; + + @CarbonProperty + public static final String CARBON_PUSH_LEFTSEMIEXIST_JOIN_AS_IN_FILTER = Review comment: Please explain in comment when should user set to true ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
jackylk commented on a change in pull request #3608: [CARBONDATA-3680]Support Secondary Index feature on carbon table.
URL: https://github.com/apache/carbondata/pull/3608#discussion_r378264017 ########## File path: core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java ########## @@ -526,4 +532,29 @@ public long getRowCount(List<Segment> allsegments, final List<PartitionSpec> par return totalRowCount; } + /** + * Method to prune the segments based on task min/max values + * + * @param segments Review comment: remove it if you are not writing description ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
jackylk commented on a change in pull request #3608: [CARBONDATA-3680]Support Secondary Index feature on carbon table.
URL: https://github.com/apache/carbondata/pull/3608#discussion_r378265311 ########## File path: core/src/main/java/org/apache/carbondata/core/datamap/dev/expr/DataMapExprWrapper.java ########## @@ -32,14 +32,14 @@ * It is the wrapper around datamap and related filter expression. By using it user can apply * datamaps in expression style. */ -public interface DataMapExprWrapper extends Serializable { +public abstract class DataMapExprWrapper implements Serializable { Review comment: you can still use interface, java 8 interface can have default implementation ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
jackylk commented on a change in pull request #3608: [CARBONDATA-3680]Support Secondary Index feature on carbon table.
URL: https://github.com/apache/carbondata/pull/3608#discussion_r378265942 ########## File path: core/src/main/java/org/apache/carbondata/core/indexstore/AbstractMemoryDMStore.java ########## @@ -46,7 +48,17 @@ public void finishWriting() { // do nothing in default implementation } + public void serializeMemoryBlock() { + } + + public void copyToMemoryBlock() { Review comment: why empty implementation? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
jackylk commented on a change in pull request #3608: [CARBONDATA-3680]Support Secondary Index feature on carbon table.
URL: https://github.com/apache/carbondata/pull/3608#discussion_r378265942 ########## File path: core/src/main/java/org/apache/carbondata/core/indexstore/AbstractMemoryDMStore.java ########## @@ -46,7 +48,17 @@ public void finishWriting() { // do nothing in default implementation } + public void serializeMemoryBlock() { + } + + public void copyToMemoryBlock() { Review comment: why empty implementation? not abstract? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
jackylk commented on a change in pull request #3608: [CARBONDATA-3680]Support Secondary Index feature on carbon table.
URL: https://github.com/apache/carbondata/pull/3608#discussion_r378266500 ########## File path: core/src/main/java/org/apache/carbondata/core/indexstore/BlockletDataMapIndexStore.java ########## @@ -113,8 +113,9 @@ public BlockletDataMapIndexWrapper get(TableBlockIndexUniqueIdentifierWrapper id BlockDataMap blockletDataMap = loadAndGetDataMap(identifier, indexFileStore, blockMetaInfoMap, identifierWrapper.getCarbonTable(), - identifierWrapper.isAddTableBlockToUnsafeAndLRUCache(), - identifierWrapper.getConfiguration(), indexInfos); + identifierWrapper.isAddToUnsafe(), + identifierWrapper.getConfiguration(), + identifierWrapper.isSerializeDmStore(), indexInfos); Review comment: move indexInfos to next line ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
jackylk commented on a change in pull request #3608: [CARBONDATA-3680]Support Secondary Index feature on carbon table.
URL: https://github.com/apache/carbondata/pull/3608#discussion_r378266561 ########## File path: core/src/main/java/org/apache/carbondata/core/indexstore/BlockletDataMapIndexStore.java ########## @@ -133,8 +134,9 @@ public BlockletDataMapIndexWrapper get(TableBlockIndexUniqueIdentifierWrapper id BlockDataMap blockletDataMap = loadAndGetDataMap(blockIndexUniqueIdentifier, indexFileStore, blockMetaInfoMap, identifierWrapper.getCarbonTable(), - identifierWrapper.isAddTableBlockToUnsafeAndLRUCache(), - identifierWrapper.getConfiguration(), indexInfos); + identifierWrapper.isAddToUnsafe(), + identifierWrapper.getConfiguration(), + identifierWrapper.isSerializeDmStore(), indexInfos); Review comment: move indexInfos to next line ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3608: [CARBONDATA-3680]Support Secondary Index feature on carbon table.
URL: https://github.com/apache/carbondata/pull/3608#issuecomment-585220935 Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/265/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
akashrn5 commented on a change in pull request #3608: [CARBONDATA-3680][alpha-feature]Support Secondary Index feature on carbon table.
URL: https://github.com/apache/carbondata/pull/3608#discussion_r378309350 ########## File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ########## @@ -2341,4 +2347,78 @@ private CarbonCommonConstants() { * Default first day of week */ public static final String CARBON_TIMESERIES_FIRST_DAY_OF_WEEK_DEFAULT = "SUNDAY"; + + @CarbonProperty + public static final String CARBON_PUSH_LEFTSEMIEXIST_JOIN_AS_IN_FILTER = + "carbon.infilter.subquery.pushdown.enable"; + + + /** + * CARBON_PUSH_LEFTSEMIEXIST_JOIN_AS_IN_FILTER_DEFAULT + */ + public static final String CARBON_PUSH_LEFTSEMIEXIST_JOIN_AS_IN_FILTER_DEFAULT = "false"; + + /** + * key to get broadcast record size from properties + */ + @CarbonProperty + public static final String BROADCAST_RECORD_SIZE = "broadcast.record.size"; + + /** + * default broadcast record size + */ + public static final String DEFAULT_BROADCAST_RECORD_SIZE = "100"; + + /** + * to enable SI lookup partial string + */ + @CarbonProperty + public static final String ENABLE_SI_LOOKUP_PARTIALSTRING = "carbon.si.lookup.partialstring"; + + /** + * default value of ENABLE_SI_LOOKUP_PARTIALSTRING + */ + public static final String ENABLE_SI_LOOKUP_PARTIALSTRING_DEFAULT = "true"; + + /** + * configuration for launching the number of threads during secondary index creation + */ + @CarbonProperty + public static final String CARBON_SECONDARY_INDEX_CREATION_THREADS = + "carbon.secondary.index.creation.threads"; + + /** + * default value configuration for launching the number of threads during secondary + * index creation + */ + public static final String CARBON_SECONDARY_INDEX_CREATION_THREADS_DEFAULT = "1"; + + /** + * max value configuration for launching the number of threads during secondary + * index creation + */ + public static final int CARBON_SECONDARY_INDEX_CREATION_THREADS_MAX = 50; + + /** + * threshold of high cardinality + */ + @CarbonProperty + public static final String HIGH_CARDINALITY_THRESHOLD = "high.cardinality.threshold"; Review comment: yes, removed ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
akashrn5 commented on a change in pull request #3608: [CARBONDATA-3680][alpha-feature]Support Secondary Index feature on carbon table.
URL: https://github.com/apache/carbondata/pull/3608#discussion_r378309399 ########## File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ########## @@ -2341,4 +2347,78 @@ private CarbonCommonConstants() { * Default first day of week */ public static final String CARBON_TIMESERIES_FIRST_DAY_OF_WEEK_DEFAULT = "SUNDAY"; + + @CarbonProperty + public static final String CARBON_PUSH_LEFTSEMIEXIST_JOIN_AS_IN_FILTER = + "carbon.infilter.subquery.pushdown.enable"; + + + /** + * CARBON_PUSH_LEFTSEMIEXIST_JOIN_AS_IN_FILTER_DEFAULT + */ + public static final String CARBON_PUSH_LEFTSEMIEXIST_JOIN_AS_IN_FILTER_DEFAULT = "false"; + + /** + * key to get broadcast record size from properties + */ + @CarbonProperty + public static final String BROADCAST_RECORD_SIZE = "broadcast.record.size"; + + /** + * default broadcast record size + */ + public static final String DEFAULT_BROADCAST_RECORD_SIZE = "100"; + + /** + * to enable SI lookup partial string + */ + @CarbonProperty + public static final String ENABLE_SI_LOOKUP_PARTIALSTRING = "carbon.si.lookup.partialstring"; + + /** + * default value of ENABLE_SI_LOOKUP_PARTIALSTRING + */ + public static final String ENABLE_SI_LOOKUP_PARTIALSTRING_DEFAULT = "true"; + + /** + * configuration for launching the number of threads during secondary index creation + */ + @CarbonProperty + public static final String CARBON_SECONDARY_INDEX_CREATION_THREADS = + "carbon.secondary.index.creation.threads"; + + /** + * default value configuration for launching the number of threads during secondary + * index creation + */ + public static final String CARBON_SECONDARY_INDEX_CREATION_THREADS_DEFAULT = "1"; + + /** + * max value configuration for launching the number of threads during secondary + * index creation + */ + public static final int CARBON_SECONDARY_INDEX_CREATION_THREADS_MAX = 50; + + /** + * threshold of high cardinality + */ + @CarbonProperty + public static final String HIGH_CARDINALITY_THRESHOLD = "high.cardinality.threshold"; + + public static final String HIGH_CARDINALITY_THRESHOLD_DEFAULT = "1000000"; + + public static final int HIGH_CARDINALITY_THRESHOLD_MIN = 10000; + + /** + * Enable SI segment Compaction / merge small files + */ + @CarbonProperty + public static final String CARBON_SI_SEGMENT_MERGE = "carbon.si.segment.merge"; + + /** + * Default value for SI segment Compaction / merge small files + * Making this true degrade the LOAD performance Review comment: done ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
akashrn5 commented on a change in pull request #3608: [CARBONDATA-3680][alpha-feature]Support Secondary Index feature on carbon table.
URL: https://github.com/apache/carbondata/pull/3608#discussion_r378309460 ########## File path: core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java ########## @@ -526,4 +532,29 @@ public long getRowCount(List<Segment> allsegments, final List<PartitionSpec> par return totalRowCount; } + /** + * Method to prune the segments based on task min/max values + * + * @param segments Review comment: done ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
akashrn5 commented on a change in pull request #3608: [CARBONDATA-3680][alpha-feature]Support Secondary Index feature on carbon table.
URL: https://github.com/apache/carbondata/pull/3608#discussion_r378309510 ########## File path: core/src/main/java/org/apache/carbondata/core/datamap/dev/expr/DataMapExprWrapper.java ########## @@ -32,14 +32,14 @@ * It is the wrapper around datamap and related filter expression. By using it user can apply * datamaps in expression style. */ -public interface DataMapExprWrapper extends Serializable { +public abstract class DataMapExprWrapper implements Serializable { Review comment: since some users are still using older java version,i think we can keep until we completely move out. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
akashrn5 commented on a change in pull request #3608: [CARBONDATA-3680][alpha-feature]Support Secondary Index feature on carbon table.
URL: https://github.com/apache/carbondata/pull/3608#discussion_r378309533 ########## File path: core/src/main/java/org/apache/carbondata/core/indexstore/AbstractMemoryDMStore.java ########## @@ -46,7 +48,17 @@ public void finishWriting() { // do nothing in default implementation } + public void serializeMemoryBlock() { + } + + public void copyToMemoryBlock() { Review comment: since only for unsafe implementation, its required. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
In reply to this post by GitBox
akashrn5 commented on a change in pull request #3608: [CARBONDATA-3680][alpha-feature]Support Secondary Index feature on carbon table.
URL: https://github.com/apache/carbondata/pull/3608#discussion_r378309556 ########## File path: core/src/main/java/org/apache/carbondata/core/indexstore/BlockletDataMapIndexStore.java ########## @@ -113,8 +113,9 @@ public BlockletDataMapIndexWrapper get(TableBlockIndexUniqueIdentifierWrapper id BlockDataMap blockletDataMap = loadAndGetDataMap(identifier, indexFileStore, blockMetaInfoMap, identifierWrapper.getCarbonTable(), - identifierWrapper.isAddTableBlockToUnsafeAndLRUCache(), - identifierWrapper.getConfiguration(), indexInfos); + identifierWrapper.isAddToUnsafe(), + identifierWrapper.getConfiguration(), + identifierWrapper.isSerializeDmStore(), indexInfos); Review comment: done ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] With regards, Apache Git Services |
Free forum by Nabble | Edit this page |