|
|
It seems that very few low-level driver maintainers have made the effort to convert over and use the new error handling code. I would like to take this opportunity to explain the many advantages of switching over. As you all know, the old error handling code is quite imperfect. I see all kinds of bug reports where people get an infinite stream of SCSI resets, and the system is more or less hopelessly wedged. This is a common symptom, and given the way the old code worked, wouldn't be easy to fix. The point of this message is to try and sell the concept, and encourage driver maintainers to make the effort to switch over. It isn't that hard, and the benefits are definitely worthwhile. A second objective of this note is to explain in detail what a driver author would need to do. Finally, I should point out that this stuff isn't set in stone. If anyone has objections/comments, please send them on. In particular, some people have expressed an interest in doing their own timeout handling (i.e. a boolean in the host template which would tell the midlevel not to start a timer before the queuecommand interface is called). I don't want to make the change unless someone seriously intends to use it, as I would like to choose an implementation that suits the needs of the people who would want to use it. A point about terminology - I still tend to refer to mid-layer, upper-layer, and lower-layer, mainly out of force of habit. With the new queuing code, the upper level drivers (disk, cdrom, etc) aren't really so much a layer any more, but I haven't decided what to call it instead. Similarly, the mid-level isn't really so much a level, but represents the scsi core, I suppose.
Architecturally there are a couple of really interesting points about the new error handling code.
Converting a driver to use the new error handling code is fairly easy, and consists of several steps. These are summarized here:
Ensure that sense data is automatically requested on a CHECK_CONDITION The idea here is that sometimes when we look at the results of a command execution, that the command might have returned a CHECK_CONDITION, meaning that something went wrong. The sense data has details about what went wrong. If the driver doesn't get this automatically, then the error handler is forced to wake up and ask for it explicitly. While not a strict requirement that you get the sense data, it tends to streamline things quite a bit, as the error handler thread is only involved when there is something that really needs attention. There are actually two separate ways which a driver author could to do make use of the new error handling code. They can be roughly called the "easy" and the "hard" way. The easy way is designed to make it easy for people to convert from the old style error handling to the new style. The prescription is as follows: Define up to 4 functions in the driver: int (*eh_abort_handler)(Scsi_Cmnd *);
int (*eh_device_reset_handler)(Scsi_Cmnd *);
int (*eh_bus_reset_handler)(Scsi_Cmnd *);
int (*eh_host_reset_handler)(Scsi_Cmnd *);
These things perform the obvious jobs. A couple of points here.
The hard way of doing this involves instead defining a single function: int (*eh_strategy_handler)(struct Scsi_Host *);
In addition to defining the error handling functions, you will need to modify the queuecommand interface to return a meaningful value. A return value of 0 implies that the command was correctly queued. A non-zero return value implies that the host was unable to accept the command (probably because of some resource shortage) - in this case the command is left on the queue for the device. No further commands will be sent to this host until some command that is currently running on the host completes. Note that it is assumed that at least one command can be queued to a host at any given time.
Finally, you need to set the use_new_eh_code flag to 1 in the host template. This essentially tells the mid-layer that the driver is ready to use the new error handling code. This should be a trivial one-line change to your host template - probably just adding a line like: use_new_eh_code: 1, \should be sufficient.
As long as people are working on these things, I have a request to make. I would like people to delete the command() interface if there is also a queuecommand() interface. The command() interface will never be used if there is a queuecommand interface(). At the same time, check for any other dead functions (completion functions that would be used to wake up the thread in the command() interface).
When the SCSI layer was first made SMP safe, locking was added all over the place. The general idea was to hold io_request_lock as long as control was within the SCSI layer. While this made the thing SMP-safe, in the long run it wasn't a great idea, as it led to a lot of latency issues - the problem is that as long as you hold this lock, all other block devices are prevented from doing any work. There is also an architectural problem whereby the low-level drivers are assuming that the mid-level is holding the io_request_lock when the low-level interfaces are called, and this isn't optimal. The fundamental issue here is that a finer granularity of lock needs to be used so as to make the system more responsive. The new queuing code helps with a lot of this - new locks were added to protect specific data structures, and this reduced the need for holding io_request_lock in the mid-upper levels itself. I haven't changed anything WRT locking and low-level drivers, however. Typically there are two major entrypoints that are of significance. The queuecommand() interface for inserting new commands, and the interrupt handler which processes commands that are now done. As I mentioned before, the error handling routines are always run from the context of a single error handler thread. There is the possibility that an interrupt will arrive for a timed out command during error processing, and thus it is also important that you also do the appropriate locking in each one of the error handling strategy routines (assuming that they actually do anything in the first place). Some driver authors have taken it upon themselves to implement their own locking. Thus upon entering the queuecommand entrypoint, they immediately release the io_request_lock, and then grab the actual low-level lock that they implemented. This is mostly the right solution, but there is still some small overhead in locking/unlocking io_request_lock when we don't need to be. My long range goal is to turn things around so that the low-level drivers are completely responsible for their own locking, and make no assumptions about what locks might be held upon entry. In addition, I would like it if none of the low-level drivers touched io_request_lock - each driver should have it's own locks for this purpose. It would be theoretically possible for the mid-layer to use finer grained locks when calling the low-level drivers, but I believe that this is not the correct approach. It is architecturally more correct to instead insist that each low-level driver be responsible for any locking to protect it's own data structures. Ideally I would also like to eliminate all references to io_request_lock from low-level drivers. To accomplish this, I am thinking of adding a flag to the host template "smp_safe". If you do not set this (default case), then io_request_lock is assumed to be held when the mid-level calls into a low-level driver. In addition, it will also assumed that when the low-level driver calls back to the mid-level to report command completion, that the low-level driver has taken hold of this lock. If you set "smp_safe", then these assumptions are reversed. In particular, it is assumed that the low-level driver does all of it's own locking and thus the mid-level does not grab io_request_lock prior to calling the low-level driver. The low-level driver really shouldn't use io_request_lock at all. A low-level driver can protect it's data structures in a couple of ways. The two main choices are to use a driver specific spinlock, or to use a host specific spinlock. The difference is only apparent and important if there are multiple instances of identical cards on a system. In the case of a driver specific spinlock, there is one spinlock shared by all hosts that are using the driver. The case of a host-specific spinlock would imply that there is one spinlock per instance of host. In the event that you choose to use host specific spinlocks, beware of accessing any data structures shared between hosts. Finally, if you set "smp_safe", there is no lock held for the duration of error recovery. Thus it may be the case that the eh_abort_handler() is called for a command that may have already completed. The error handling routines need to allow for the possibility that the command is no longer running on the host, and in such an event just return FAILED. It is important that the interrupt service routine and the error handler functions each grab the same lock prior to doing work in order to eliminate race conditions. At this point, I haven't coded the actual changes. In terms of the mid-level, I would need to define the flag in the host template, and then fix it so that the io_request_lock isn't grabbed when calling the queuecommand function. In addition, the new error handling code will need to be examined to make sure that it is SMP-safe without holding the lock, and then fix it so that this lock isn't held during error processing. Note that I have no intention of trying to support smp_safe for drivers that use the old error handling code. I don't know when I will get around to trying to do this - it will be after the new queuing code is in and has been shaken down for a few months.
There are a couple of other known problems with SCSI - sometimes it is easy to forget about them as the sands of time pass. I have enumerated these here: scsi_todo.html. This is in addition to the stuff I discussed in future directions.
If you have comments or suggestions, Last updated: 1/3/00. |