[GitHub] carbondata pull request #1147: [WIP][CARBONDATA-1277] Dictionary generation ...

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1147: [WIP][CARBONDATA-1277] Dictionary generation ...

qiuchenjian-2
GitHub user manishgupta88 opened a pull request:

    https://github.com/apache/carbondata/pull/1147

    [WIP][CARBONDATA-1277] Dictionary generation failure if there is failure in closing output steam in HDFS

    Analysis: If there is any failure while closing the output stream of dictionary file in HDFS then on next data load, update or insert into operation dictionary generation fails. This is because we open the dictionary file in append mode and when we try to get the output stream for that file HDFS throws an exception that Lease is already acquired by some other client.
   
    Fix: Recover the lease through carbondata code if exception is for lease failure

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/manishgupta88/carbondata hdfs_lease_recovery_exception

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/1147.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1147
   
----
commit 6ffcf18068b914a86bb83241300c7d8e7ba44c07
Author: manishgupta88 <[hidden email]>
Date:   2017-07-08T10:16:25Z

    Problem: Dictionary generation failure if there is failure in closing output steam in HDFS
   
    Analysis: If there is any failure while closing the output stream of dictionary file in HDFS then on next data load, update or insert into operation dictionary generation fails. This is because we open the dictionary file in append mode and when we try to get the output stream for that file HDFS throws an exception that Lease is already acquired by some other client.
   
    Fix: Recover the lease through carbondata code if exception is for lease failure

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1147: [WIP][CARBONDATA-1277] Dictionary generation failure...

qiuchenjian-2
Github user asfgit commented on the issue:

    https://github.com/apache/carbondata/pull/1147
 
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1147: [WIP][CARBONDATA-1277] Dictionary generation failure...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user asfgit commented on the issue:

    https://github.com/apache/carbondata/pull/1147
 
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1147: [WIP][CARBONDATA-1277] Dictionary generation failure...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1147
 
    Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/370/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1147: [WIP][CARBONDATA-1277] Dictionary generation failure...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1147
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/2958/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1147: [WIP][CARBONDATA-1277] Dictionary generation failure...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1147
 
    Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/371/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1147: [WIP][CARBONDATA-1277] Dictionary generation failure...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1147
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/2959/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1147: [CARBONDATA-1277] Dictionary generation failu...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user gvramana commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1147#discussion_r126415422
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ---
    @@ -1287,6 +1287,12 @@
     
       public static final String CARBON_BAD_RECORDS_ACTION_DEFAULT = "FORCE";
     
    +  @CarbonProperty
    +  public static final String CARBON_LEASE_RECOVERY_RETRY_COUNT =
    +      "carbon.lease.recovery.retry.count";
    +  public static final String CARBON_LEASE_RECOVERY_RETRY_INTERVAL =
    --- End diff --
   
    add attribute for this also


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1147: [CARBONDATA-1277] Dictionary generation failu...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user gvramana commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1147#discussion_r126418899
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/util/path/HDFSUtils.java ---
    @@ -0,0 +1,188 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.carbondata.core.util.path;
    +
    +import java.io.FileNotFoundException;
    +import java.io.IOException;
    +
    +import org.apache.carbondata.common.logging.LogService;
    +import org.apache.carbondata.common.logging.LogServiceFactory;
    +import org.apache.carbondata.core.constants.CarbonCommonConstants;
    +import org.apache.carbondata.core.datastore.impl.FileFactory;
    +import org.apache.carbondata.core.util.CarbonProperties;
    +
    +import org.apache.hadoop.fs.FileSystem;
    +import org.apache.hadoop.fs.Path;
    +import org.apache.hadoop.hdfs.DistributedFileSystem;
    +import org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException;
    +
    +/**
    + * Implementation for HDFS utility methods
    + */
    +public class HDFSUtils {
    --- End diff --
   
    Make it hdfsLeaseUtils


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1147: [CARBONDATA-1277] Dictionary generation failu...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user gvramana commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1147#discussion_r126419166
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/util/path/HDFSUtils.java ---
    @@ -0,0 +1,188 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.carbondata.core.util.path;
    +
    +import java.io.FileNotFoundException;
    +import java.io.IOException;
    +
    +import org.apache.carbondata.common.logging.LogService;
    +import org.apache.carbondata.common.logging.LogServiceFactory;
    +import org.apache.carbondata.core.constants.CarbonCommonConstants;
    +import org.apache.carbondata.core.datastore.impl.FileFactory;
    +import org.apache.carbondata.core.util.CarbonProperties;
    +
    +import org.apache.hadoop.fs.FileSystem;
    +import org.apache.hadoop.fs.Path;
    +import org.apache.hadoop.hdfs.DistributedFileSystem;
    +import org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException;
    +
    +/**
    + * Implementation for HDFS utility methods
    + */
    +public class HDFSUtils {
    +
    +  private static final int CARBON_LEASE_RECOVERY_RETRY_COUNT_MIN = 1;
    +  private static final int CARBON_LEASE_RECOVERY_RETRY_COUNT_MAX = 5;
    --- End diff --
   
    make max 50


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1147: [CARBONDATA-1277] Dictionary generation failu...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user gvramana commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1147#discussion_r126420489
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/util/path/HDFSUtils.java ---
    @@ -0,0 +1,188 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.carbondata.core.util.path;
    +
    +import java.io.FileNotFoundException;
    +import java.io.IOException;
    +
    +import org.apache.carbondata.common.logging.LogService;
    +import org.apache.carbondata.common.logging.LogServiceFactory;
    +import org.apache.carbondata.core.constants.CarbonCommonConstants;
    +import org.apache.carbondata.core.datastore.impl.FileFactory;
    +import org.apache.carbondata.core.util.CarbonProperties;
    +
    +import org.apache.hadoop.fs.FileSystem;
    +import org.apache.hadoop.fs.Path;
    +import org.apache.hadoop.hdfs.DistributedFileSystem;
    +import org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException;
    +
    +/**
    + * Implementation for HDFS utility methods
    + */
    +public class HDFSUtils {
    +
    +  private static final int CARBON_LEASE_RECOVERY_RETRY_COUNT_MIN = 1;
    +  private static final int CARBON_LEASE_RECOVERY_RETRY_COUNT_MAX = 5;
    +  private static final String CARBON_LEASE_RECOVERY_RETRY_COUNT_DEFAULT = "3";
    +  private static final int CARBON_LEASE_RECOVERY_RETRY_INTERVAL_MIN = 100;
    +  private static final int CARBON_LEASE_RECOVERY_RETRY_INTERVAL_MAX = 10000;
    +  private static final String CARBON_LEASE_RECOVERY_RETRY_INTERVAL_DEFAULT = "1000";
    +
    +  /**
    +   * LOGGER
    +   */
    +  private static final LogService LOGGER =
    +      LogServiceFactory.getLogService(HDFSUtils.class.getName());
    +
    +  /**
    +   * This method will validate whether the exception thrown if for lease recovery from HDFS
    +   *
    +   * @param message
    +   * @return
    +   */
    +  public static boolean checkExceptionMessageForLeaseRecovery(String message) {
    +    // depending on the scenario few more cases can be added for validating lease recovery exception
    +    if (null != message && message.contains("Failed to APPEND_FILE")) {
    +      return true;
    +    }
    +    return false;
    +  }
    +
    +  /**
    +   * This method will make attempts to recover lease on a file using the
    +   * distributed file system utility.
    +   *
    +   * @param filePath
    +   * @return
    +   * @throws IOException
    +   */
    +  public static boolean recoverFileLease(String filePath) throws IOException {
    +    LOGGER.info("Trying to recover lease on file: " + filePath);
    +    FileFactory.FileType fileType = FileFactory.getFileType(filePath);
    +    switch (fileType) {
    +      case ALLUXIO:
    +      case HDFS:
    +      case VIEWFS:
    +        DistributedFileSystem dfs = null;
    +        Path path = FileFactory.getPath(filePath);
    +        FileSystem fs = FileFactory.getFileSystem(path);
    +        dfs = (DistributedFileSystem) fs;
    +        int maxAttempts = getLeaseRecoveryRetryCount();
    +        int retryInterval = getLeaseRecoveryRetryInterval();
    +        boolean leaseRecovered = false;
    +        IOException ioException = null;
    +        for (int retryCount = 1; retryCount <= maxAttempts; retryCount++) {
    +          try {
    +            leaseRecovered = dfs.recoverLease(path);
    --- End diff --
   
    check viwefs lease recovery mechanism


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1147: [CARBONDATA-1277] Dictionary generation failure if t...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1147
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/2999/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1147: [CARBONDATA-1277] Dictionary generation failure if t...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1147
 
    Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/410/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1147: [CARBONDATA-1277] Dictionary generation failure if t...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1147
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3001/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1147: [CARBONDATA-1277] Dictionary generation failure if t...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1147
 
    Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/412/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1147: [CARBONDATA-1277] Dictionary generation failure if t...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user gvramana commented on the issue:

    https://github.com/apache/carbondata/pull/1147
 
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1147: [CARBONDATA-1277] Dictionary generation failu...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user asfgit closed the pull request at:

    https://github.com/apache/carbondata/pull/1147


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---