From: Kleber Sacilotto de Souza Date: Mon, 16 Jan 2012 21:30:25 +0000 (-0200) Subject: [SCSI] ipr: fix eeh recovery for 64-bit adapters X-Git-Tag: v3.3-rc5~5^2~15 X-Git-Url: https://openfabrics.org/gitweb/?a=commitdiff_plain;h=a92fa25c63a788758bd52e9123504d133210c8b7;p=~emulex%2Finfiniband.git [SCSI] ipr: fix eeh recovery for 64-bit adapters In some scenarios, an EEH error can take a long time to be detected, since the driver issues an MMIO read only after a device reset command times out and we try to reset the adapter. This patch adds some code in ipr_cancel_op() to read a hardware register so we detect the error earlier in case the op is being aborted because of a timeout caused by a frozen adapter slot. Another problem in such scenarios is that in __ipr_eh_host_reset() we change the dump state flag from WAIT_FOR_DUMP to GET_DUMP, and the flag is later changed from GET_DUMP to READ_DUMP in ipr_reset_restore_cfg_space(). However, if when __ipr_eh_host_reset() is called by the SCSI error handling the function ipr_reset_restore_cfg_space() has already been called by the PCI EEH code, we end up with the flag in an inconsistent state. This patch also prevents this problem. Signed-off-by: Kleber Sacilotto de Souza Acked-by: Brian King Signed-off-by: James Bottomley --- diff --git a/drivers/scsi/ipr.c b/drivers/scsi/ipr.c index 67b169b7a5b..b538f0883fd 100644 --- a/drivers/scsi/ipr.c +++ b/drivers/scsi/ipr.c @@ -4613,11 +4613,13 @@ static int __ipr_eh_host_reset(struct scsi_cmnd * scsi_cmd) ENTER; ioa_cfg = (struct ipr_ioa_cfg *) scsi_cmd->device->host->hostdata; - dev_err(&ioa_cfg->pdev->dev, - "Adapter being reset as a result of error recovery.\n"); + if (!ioa_cfg->in_reset_reload) { + dev_err(&ioa_cfg->pdev->dev, + "Adapter being reset as a result of error recovery.\n"); - if (WAIT_FOR_DUMP == ioa_cfg->sdt_state) - ioa_cfg->sdt_state = GET_DUMP; + if (WAIT_FOR_DUMP == ioa_cfg->sdt_state) + ioa_cfg->sdt_state = GET_DUMP; + } rc = ipr_reset_reload(ioa_cfg, IPR_SHUTDOWN_ABBREV); @@ -4907,7 +4909,7 @@ static int ipr_cancel_op(struct scsi_cmnd * scsi_cmd) struct ipr_ioa_cfg *ioa_cfg; struct ipr_resource_entry *res; struct ipr_cmd_pkt *cmd_pkt; - u32 ioasc; + u32 ioasc, int_reg; int op_found = 0; ENTER; @@ -4920,7 +4922,17 @@ static int ipr_cancel_op(struct scsi_cmnd * scsi_cmd) */ if (ioa_cfg->in_reset_reload || ioa_cfg->ioa_is_dead) return FAILED; - if (!res || !ipr_is_gscsi(res)) + if (!res) + return FAILED; + + /* + * If we are aborting a timed out op, chances are that the timeout was caused + * by a still not detected EEH error. In such cases, reading a register will + * trigger the EEH recovery infrastructure. + */ + int_reg = readl(ioa_cfg->regs.sense_interrupt_reg); + + if (!ipr_is_gscsi(res)) return FAILED; list_for_each_entry(ipr_cmd, &ioa_cfg->pending_q, queue) {