[GitHub] carbondata pull request #1064: [CARBONDATA-<1173>] Stream ingestion - write ...

classic Classic list List threaded Threaded
23 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1064: [CARBONDATA-<1173>] Stream ingestion - write ...

qiuchenjian-2
GitHub user aniketadnaik opened a pull request:

    https://github.com/apache/carbondata/pull/1064

    [CARBONDATA-<1173>] Stream ingestion - write path framework

    Description:
    This is a over all streaming write path framework implementation. It is mainly targeted for "**streaming_ingest**" branch.  This framework implementation will help other developers to leverage a common code path for developing other pieces of the streaming ingestion feature.
    - Whether new unit test cases have been added or why no new tests are required?
        - This is a just framework implementation, unit tests will be added with individual functionality commits        - What manual testing you have done?
       - mvn clean verify
        [INFO] ------------------------------------------------------------------------
        [INFO] Reactor Summary:
        [INFO]
        [INFO] Apache CarbonData :: Parent ........................ SUCCESS [  1.533 s]
        [INFO] Apache CarbonData :: Common ........................ SUCCESS [  1.335 s]
        [INFO] Apache CarbonData :: Core .......................... SUCCESS [02:34 min]
        [INFO] Apache CarbonData :: Processing .................... SUCCESS [  5.158 s]
        [INFO] Apache CarbonData :: Hadoop ........................ SUCCESS [  5.208 s]
        [INFO] Apache CarbonData :: Spark Common .................. SUCCESS [ 13.989 s]
        [INFO] Apache CarbonData :: Spark ......................... SUCCESS [02:33 min]
        [INFO] Apache CarbonData :: Spark Common Test ............. SUCCESS [05:15 min]
        [INFO] Apache CarbonData :: Assembly ...................... SUCCESS [  1.683 s]
        [INFO] Apache CarbonData :: Spark Examples ................ SUCCESS [  5.123 s]
        [INFO] ------------------------------------------------------------------------
        [INFO] BUILD SUCCESS
        [INFO] ------------------------------------------------------------------------
        [INFO] Total time: 10:58 min
        [INFO] Finished at: 2017-06-16T17:24:23-07:00
        [INFO] Final Memory: 85M/1078M
        [INFO] ------------------------------------------------------------------------
   
       - Made sure write path class invocation happens correctly with with spark structured streaming (2.1)
       - Made sure write path execution work flow with structured streaming(2.1) using both socket and file
         sources
   
    - Any additional information to help reviewers in testing this change.
       - structured streaming now accepts carbondata as a file format in addition to data source
       - temporary functionality writes out data in text format instead of carbondata format, will be replaced with new functionality additions.
     

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/aniketadnaik/carbondataStreamIngest streamIngest-1173

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/1064.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1064
   
----
commit 5403972898dc0d62471296476d6d5603dccbda10
Author: Aniket Adnaik <[hidden email]>
Date:   2017-06-15T18:57:43Z

    [CARBONDATA-1173] Streaming Ingest - write path framework implementation

commit 5270952a43b5e373957682dd75edd15048cb5aa3
Author: Aniket Adnaik <[hidden email]>
Date:   2017-06-15T19:11:48Z

    [CARBONDATA-1173] Streaming Ingest - write path framework implementation

commit ac3f1bbbee9c33472726afe10bac73a17a607cce
Author: Aniket Adnaik <[hidden email]>
Date:   2017-06-16T18:04:19Z

    [CARBONDATA-1173] Streaming Ingest - write path framework implementation

commit 4fc68b6c7a986ceda2c02713884d9293d29961fc
Author: Aniket Adnaik <[hidden email]>
Date:   2017-06-19T22:19:21Z

    [CARBONDATA-1173] Streaming Ingest - write path framework implementation

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1064: [CARBONDATA-<1173>] Stream ingestion - write path fr...

qiuchenjian-2
Github user asfgit commented on the issue:

    https://github.com/apache/carbondata/pull/1064
 
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1064: [CARBONDATA-<1173>] Stream ingestion - write path fr...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1064
 
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1064: [CARBONDATA-<1173>] Stream ingestion - write path fr...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user asfgit commented on the issue:

    https://github.com/apache/carbondata/pull/1064
 
   
    Refer to this link for build results (access rights to CI server needed):
    https://builds.apache.org/job/carbondata-pr-spark-1.6/497/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1064: [CARBONDATA-<1173>] Stream ingestion - write path fr...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on the issue:

    https://github.com/apache/carbondata/pull/1064
 
    @aniketadnaik I have push latest master to streaming_ingest branch, you may need to rebase


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1064: [CARBONDATA-<1173>] Stream ingestion - write path fr...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user asfgit commented on the issue:

    https://github.com/apache/carbondata/pull/1064
 
   
    Refer to this link for build results (access rights to CI server needed):
    https://builds.apache.org/job/carbondata-pr-spark-1.6/524/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1064: [CARBONDATA-<1173>] Stream ingestion - write path fr...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user aniketadnaik commented on the issue:

    https://github.com/apache/carbondata/pull/1064
 
    @jackylk - Thanks for rebase on streaming_ingest branch . I have rebased and tested my branch streamIngest-1173 in aniketadnaik/carbondataStreamIngest repo.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1064: [CARBONDATA-<1173>] Stream ingestion - write path fr...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1064
 
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1064: [CARBONDATA-<1173>] Stream ingestion - write path fr...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user aniketadnaik commented on the issue:

    https://github.com/apache/carbondata/pull/1064
 
    Can someone review the changes ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1064: [CARBONDATA-<1173>] Stream ingestion - write ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1064#discussion_r124167312
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/streaming/CarbonStreamingCommitInfo.java ---
    @@ -0,0 +1,109 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.carbondata.core.streaming;
    +
    +/*
    + * Commit info for streaming writes
    + * The commit info can be used to recover valid offset in the file
    + * in the case of write failure.
    + */
    +
    --- End diff --
   
    remove empty line.
    and use `/**` instead of `/*`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1064: [CARBONDATA-<1173>] Stream ingestion - write ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1064#discussion_r124167548
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/streaming/CarbonStreamingMetaStoreImpl.java ---
    @@ -0,0 +1,55 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.carbondata.core.streaming;
    +
    +import java.io.IOException;
    +
    +/* JSON format can be used to store the metadata */
    +
    --- End diff --
   
    remove empty line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1064: [CARBONDATA-<1173>] Stream ingestion - write ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1064#discussion_r124168000
 
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/streaming/CarbonStreamingOutputFormat.java ---
    @@ -0,0 +1,70 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.carbondata.hadoop.streaming;
    +
    +import java.io.IOException;
    +
    +import org.apache.carbondata.core.streaming.CarbonStreamingConstants;
    +import org.apache.carbondata.processing.csvload.CSVInputFormat;
    +
    +import org.apache.hadoop.conf.Configuration;
    +import org.apache.hadoop.mapreduce.TaskAttemptContext;
    +import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    +
    +
    +/**
    + * Output format to write streaming data to carbondata file
    + *
    + * @param <V> - type of record
    + */
    +public class CarbonStreamingOutputFormat<K, V> extends FileOutputFormat<K, V> {
    +
    +  public static long getBlockSize(Configuration conf) {
    +
    --- End diff --
   
    please remove all unnecessary empty lines within function


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1064: [CARBONDATA-<1173>] Stream ingestion - write ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1064#discussion_r124168146
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/util/path/CarbonTablePath.java ---
    @@ -51,6 +51,16 @@
       protected static final String INDEX_FILE_EXT = ".carbonindex";
       protected static final String DELETE_DELTA_FILE_EXT = ".deletedelta";
     
    +  /**
    +   * Streaming ingest related paths
    +   */
    +  protected static final String STREAM_PREFIX = "Streaming";
    +  public static final String STREAM_FILE_NAME_EXT = ".carbondata.stream";
    --- End diff --
   
    why `public`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1064: [CARBONDATA-<1173>] Stream ingestion - write path fr...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user chenliang613 commented on the issue:

    https://github.com/apache/carbondata/pull/1064
 
    @aniketadnaik can you change the PR title to :  [CARBONDATA-1173] Stream ingestion - write path framework


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1064: [CARBONDATA-<1173>] Stream ingestion - write path fr...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on the issue:

    https://github.com/apache/carbondata/pull/1064
 
    only formatting issues, others LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1064: [CARBONDATA-<1173>] Stream ingestion - write ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user chenliang613 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1064#discussion_r124168973
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/streaming/CarbonStreamingMetaStoreImpl.java ---
    @@ -0,0 +1,55 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.carbondata.core.streaming;
    +
    +import java.io.IOException;
    +
    +/* JSON format can be used to store the metadata */
    --- End diff --
   
    please use the standard comment format :
    /**
     * xxxxxxx
     */


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1064: [CARBONDATA-<1173>] Stream ingestion - write path fr...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user chenliang613 commented on the issue:

    https://github.com/apache/carbondata/pull/1064
 
    please update some comment format. everything LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1064: [CARBONDATA-<1173>] Stream ingestion - write path fr...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user asfgit commented on the issue:

    https://github.com/apache/carbondata/pull/1064
 
   
    Refer to this link for build results (access rights to CI server needed):
    https://builds.apache.org/job/carbondata-pr-spark-1.6/718/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1064: [CARBONDATA-1173] Stream ingestion - write path fram...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user aniketadnaik commented on the issue:

    https://github.com/apache/carbondata/pull/1064
 
    @jackylk, @chenliang613 - All review comments are addressed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1064: [CARBONDATA-1173] Stream ingestion - write path fram...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user aniketadnaik commented on the issue:

    https://github.com/apache/carbondata/pull/1064
 
    Can someone please merge this PR to "streaming_ingest" branch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
12