GitHub user aniketadnaik opened a pull request:
https://github.com/apache/carbondata/pull/1064 [CARBONDATA-<1173>] Stream ingestion - write path framework Description: This is a over all streaming write path framework implementation. It is mainly targeted for "**streaming_ingest**" branch. This framework implementation will help other developers to leverage a common code path for developing other pieces of the streaming ingestion feature. - Whether new unit test cases have been added or why no new tests are required? - This is a just framework implementation, unit tests will be added with individual functionality commits - What manual testing you have done? - mvn clean verify [INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary: [INFO] [INFO] Apache CarbonData :: Parent ........................ SUCCESS [ 1.533 s] [INFO] Apache CarbonData :: Common ........................ SUCCESS [ 1.335 s] [INFO] Apache CarbonData :: Core .......................... SUCCESS [02:34 min] [INFO] Apache CarbonData :: Processing .................... SUCCESS [ 5.158 s] [INFO] Apache CarbonData :: Hadoop ........................ SUCCESS [ 5.208 s] [INFO] Apache CarbonData :: Spark Common .................. SUCCESS [ 13.989 s] [INFO] Apache CarbonData :: Spark ......................... SUCCESS [02:33 min] [INFO] Apache CarbonData :: Spark Common Test ............. SUCCESS [05:15 min] [INFO] Apache CarbonData :: Assembly ...................... SUCCESS [ 1.683 s] [INFO] Apache CarbonData :: Spark Examples ................ SUCCESS [ 5.123 s] [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 10:58 min [INFO] Finished at: 2017-06-16T17:24:23-07:00 [INFO] Final Memory: 85M/1078M [INFO] ------------------------------------------------------------------------ - Made sure write path class invocation happens correctly with with spark structured streaming (2.1) - Made sure write path execution work flow with structured streaming(2.1) using both socket and file sources - Any additional information to help reviewers in testing this change. - structured streaming now accepts carbondata as a file format in addition to data source - temporary functionality writes out data in text format instead of carbondata format, will be replaced with new functionality additions. You can merge this pull request into a Git repository by running: $ git pull https://github.com/aniketadnaik/carbondataStreamIngest streamIngest-1173 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1064.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1064 ---- commit 5403972898dc0d62471296476d6d5603dccbda10 Author: Aniket Adnaik <[hidden email]> Date: 2017-06-15T18:57:43Z [CARBONDATA-1173] Streaming Ingest - write path framework implementation commit 5270952a43b5e373957682dd75edd15048cb5aa3 Author: Aniket Adnaik <[hidden email]> Date: 2017-06-15T19:11:48Z [CARBONDATA-1173] Streaming Ingest - write path framework implementation commit ac3f1bbbee9c33472726afe10bac73a17a607cce Author: Aniket Adnaik <[hidden email]> Date: 2017-06-16T18:04:19Z [CARBONDATA-1173] Streaming Ingest - write path framework implementation commit 4fc68b6c7a986ceda2c02713884d9293d29961fc Author: Aniket Adnaik <[hidden email]> Date: 2017-06-19T22:19:21Z [CARBONDATA-1173] Streaming Ingest - write path framework implementation ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
Github user asfgit commented on the issue:
https://github.com/apache/carbondata/pull/1064 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1064 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user asfgit commented on the issue:
https://github.com/apache/carbondata/pull/1064 Refer to this link for build results (access rights to CI server needed): https://builds.apache.org/job/carbondata-pr-spark-1.6/497/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on the issue:
https://github.com/apache/carbondata/pull/1064 @aniketadnaik I have push latest master to streaming_ingest branch, you may need to rebase --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user asfgit commented on the issue:
https://github.com/apache/carbondata/pull/1064 Refer to this link for build results (access rights to CI server needed): https://builds.apache.org/job/carbondata-pr-spark-1.6/524/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user aniketadnaik commented on the issue:
https://github.com/apache/carbondata/pull/1064 @jackylk - Thanks for rebase on streaming_ingest branch . I have rebased and tested my branch streamIngest-1173 in aniketadnaik/carbondataStreamIngest repo. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1064 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user aniketadnaik commented on the issue:
https://github.com/apache/carbondata/pull/1064 Can someone review the changes ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1064#discussion_r124167312 --- Diff: core/src/main/java/org/apache/carbondata/core/streaming/CarbonStreamingCommitInfo.java --- @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.streaming; + +/* + * Commit info for streaming writes + * The commit info can be used to recover valid offset in the file + * in the case of write failure. + */ + --- End diff -- remove empty line. and use `/**` instead of `/*` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1064#discussion_r124167548 --- Diff: core/src/main/java/org/apache/carbondata/core/streaming/CarbonStreamingMetaStoreImpl.java --- @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.streaming; + +import java.io.IOException; + +/* JSON format can be used to store the metadata */ + --- End diff -- remove empty line --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1064#discussion_r124168000 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/streaming/CarbonStreamingOutputFormat.java --- @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.hadoop.streaming; + +import java.io.IOException; + +import org.apache.carbondata.core.streaming.CarbonStreamingConstants; +import org.apache.carbondata.processing.csvload.CSVInputFormat; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.mapreduce.TaskAttemptContext; +import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; + + +/** + * Output format to write streaming data to carbondata file + * + * @param <V> - type of record + */ +public class CarbonStreamingOutputFormat<K, V> extends FileOutputFormat<K, V> { + + public static long getBlockSize(Configuration conf) { + --- End diff -- please remove all unnecessary empty lines within function --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1064#discussion_r124168146 --- Diff: core/src/main/java/org/apache/carbondata/core/util/path/CarbonTablePath.java --- @@ -51,6 +51,16 @@ protected static final String INDEX_FILE_EXT = ".carbonindex"; protected static final String DELETE_DELTA_FILE_EXT = ".deletedelta"; + /** + * Streaming ingest related paths + */ + protected static final String STREAM_PREFIX = "Streaming"; + public static final String STREAM_FILE_NAME_EXT = ".carbondata.stream"; --- End diff -- why `public`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user chenliang613 commented on the issue:
https://github.com/apache/carbondata/pull/1064 @aniketadnaik can you change the PR title to : [CARBONDATA-1173] Stream ingestion - write path framework --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on the issue:
https://github.com/apache/carbondata/pull/1064 only formatting issues, others LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user chenliang613 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1064#discussion_r124168973 --- Diff: core/src/main/java/org/apache/carbondata/core/streaming/CarbonStreamingMetaStoreImpl.java --- @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.streaming; + +import java.io.IOException; + +/* JSON format can be used to store the metadata */ --- End diff -- please use the standard comment format : /** * xxxxxxx */ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user chenliang613 commented on the issue:
https://github.com/apache/carbondata/pull/1064 please update some comment format. everything LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user asfgit commented on the issue:
https://github.com/apache/carbondata/pull/1064 Refer to this link for build results (access rights to CI server needed): https://builds.apache.org/job/carbondata-pr-spark-1.6/718/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user aniketadnaik commented on the issue:
https://github.com/apache/carbondata/pull/1064 @jackylk, @chenliang613 - All review comments are addressed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user aniketadnaik commented on the issue:
https://github.com/apache/carbondata/pull/1064 Can someone please merge this PR to "streaming_ingest" branch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
Free forum by Nabble | Edit this page |