1 Linux Driver for Mylex DAC960/AcceleRAID/eXtremeRAID PCI RAID Controllers 2 3 Version 2.2.11 for Linux 2.2.19 4 Version 2.4.11 for Linux 2.4.12 5 6 PRODUCTION RELEASE 7 8 11 October 2001 9 10 Leonard N. Zubkoff 11 Dandelion Digital 12 lnz@dandelion.com 13 14 Copyright 1998-2001 by Leonard N. Zubkoff <lnz@dandelion.com> 15 16 17 INTRODUCTION 18 19Mylex, Inc. designs and manufactures a variety of high performance PCI RAID 20controllers. Mylex Corporation is located at 34551 Ardenwood Blvd., Fremont, 21California 94555, USA and can be reached at 510.796.6100 or on the World Wide 22Web at http://www.mylex.com. Mylex Technical Support can be reached by 23electronic mail at mylexsup@us.ibm.com, by voice at 510.608.2400, or by FAX at 24510.745.7715. Contact information for offices in Europe and Japan is available 25on their Web site. 26 27The latest information on Linux support for DAC960 PCI RAID Controllers, as 28well as the most recent release of this driver, will always be available from 29my Linux Home Page at URL "http://www.dandelion.com/Linux/". The Linux DAC960 30driver supports all current Mylex PCI RAID controllers including the new 31eXtremeRAID 2000/3000 and AcceleRAID 352/170/160 models which have an entirely 32new firmware interface from the older eXtremeRAID 1100, AcceleRAID 150/200/250, 33and DAC960PJ/PG/PU/PD/PL. See below for a complete controller list as well as 34minimum firmware version requirements. For simplicity, in most places this 35documentation refers to DAC960 generically rather than explicitly listing all 36the supported models. 37 38Driver bug reports should be sent via electronic mail to "lnz@dandelion.com". 39Please include with the bug report the complete configuration messages reported 40by the driver at startup, along with any subsequent system messages relevant to 41the controller's operation, and a detailed description of your system's 42hardware configuration. Driver bugs are actually quite rare; if you encounter 43problems with disks being marked offline, for example, please contact Mylex 44Technical Support as the problem is related to the hardware configuration 45rather than the Linux driver. 46 47Please consult the RAID controller documentation for detailed information 48regarding installation and configuration of the controllers. This document 49primarily provides information specific to the Linux support. 50 51 52 DRIVER FEATURES 53 54The DAC960 RAID controllers are supported solely as high performance RAID 55controllers, not as interfaces to arbitrary SCSI devices. The Linux DAC960 56driver operates at the block device level, the same level as the SCSI and IDE 57drivers. Unlike other RAID controllers currently supported on Linux, the 58DAC960 driver is not dependent on the SCSI subsystem, and hence avoids all the 59complexity and unnecessary code that would be associated with an implementation 60as a SCSI driver. The DAC960 driver is designed for as high a performance as 61possible with no compromises or extra code for compatibility with lower 62performance devices. The DAC960 driver includes extensive error logging and 63online configuration management capabilities. Except for initial configuration 64of the controller and adding new disk drives, most everything can be handled 65from Linux while the system is operational. 66 67The DAC960 driver is architected to support up to 8 controllers per system. 68Each DAC960 parallel SCSI controller can support up to 15 disk drives per 69channel, for a maximum of 60 drives on a four channel controller; the fibre 70channel eXtremeRAID 3000 controller supports up to 125 disk drives per loop for 71a total of 250 drives. The drives installed on a controller are divided into 72one or more "Drive Groups", and then each Drive Group is subdivided further 73into 1 to 32 "Logical Drives". Each Logical Drive has a specific RAID Level 74and caching policy associated with it, and it appears to Linux as a single 75block device. Logical Drives are further subdivided into up to 7 partitions 76through the normal Linux and PC disk partitioning schemes. Logical Drives are 77also known as "System Drives", and Drive Groups are also called "Packs". Both 78terms are in use in the Mylex documentation; I have chosen to standardize on 79the more generic "Logical Drive" and "Drive Group". 80 81DAC960 RAID disk devices are named in the style of the obsolete Device File 82System (DEVFS). The device corresponding to Logical Drive D on Controller C 83is referred to as /dev/rd/cCdD, and the partitions are called /dev/rd/cCdDp1 84through /dev/rd/cCdDp7. For example, partition 3 of Logical Drive 5 on 85Controller 2 is referred to as /dev/rd/c2d5p3. Note that unlike with SCSI 86disks the device names will not change in the event of a disk drive failure. 87The DAC960 driver is assigned major numbers 48 - 55 with one major number per 88controller. The 8 bits of minor number are divided into 5 bits for the Logical 89Drive and 3 bits for the partition. 90 91 92 SUPPORTED DAC960/AcceleRAID/eXtremeRAID PCI RAID CONTROLLERS 93 94The following list comprises the supported DAC960, AcceleRAID, and eXtremeRAID 95PCI RAID Controllers as of the date of this document. It is recommended that 96anyone purchasing a Mylex PCI RAID Controller not in the following table 97contact the author beforehand to verify that it is or will be supported. 98 99eXtremeRAID 3000 100 1 Wide Ultra-2/LVD SCSI channel 101 2 External Fibre FC-AL channels 102 233MHz StrongARM SA 110 Processor 103 64 Bit 33MHz PCI (backward compatible with 32 Bit PCI slots) 104 32MB/64MB ECC SDRAM Memory 105 106eXtremeRAID 2000 107 4 Wide Ultra-160 LVD SCSI channels 108 233MHz StrongARM SA 110 Processor 109 64 Bit 33MHz PCI (backward compatible with 32 Bit PCI slots) 110 32MB/64MB ECC SDRAM Memory 111 112AcceleRAID 352 113 2 Wide Ultra-160 LVD SCSI channels 114 100MHz Intel i960RN RISC Processor 115 64 Bit 33MHz PCI (backward compatible with 32 Bit PCI slots) 116 32MB/64MB ECC SDRAM Memory 117 118AcceleRAID 170 119 1 Wide Ultra-160 LVD SCSI channel 120 100MHz Intel i960RM RISC Processor 121 16MB/32MB/64MB ECC SDRAM Memory 122 123AcceleRAID 160 (AcceleRAID 170LP) 124 1 Wide Ultra-160 LVD SCSI channel 125 100MHz Intel i960RS RISC Processor 126 Built in 16M ECC SDRAM Memory 127 PCI Low Profile Form Factor - fit for 2U height 128 129eXtremeRAID 1100 (DAC1164P) 130 3 Wide Ultra-2/LVD SCSI channels 131 233MHz StrongARM SA 110 Processor 132 64 Bit 33MHz PCI (backward compatible with 32 Bit PCI slots) 133 16MB/32MB/64MB Parity SDRAM Memory with Battery Backup 134 135AcceleRAID 250 (DAC960PTL1) 136 Uses onboard Symbios SCSI chips on certain motherboards 137 Also includes one onboard Wide Ultra-2/LVD SCSI Channel 138 66MHz Intel i960RD RISC Processor 139 4MB/8MB/16MB/32MB/64MB/128MB ECC EDO Memory 140 141AcceleRAID 200 (DAC960PTL0) 142 Uses onboard Symbios SCSI chips on certain motherboards 143 Includes no onboard SCSI Channels 144 66MHz Intel i960RD RISC Processor 145 4MB/8MB/16MB/32MB/64MB/128MB ECC EDO Memory 146 147AcceleRAID 150 (DAC960PRL) 148 Uses onboard Symbios SCSI chips on certain motherboards 149 Also includes one onboard Wide Ultra-2/LVD SCSI Channel 150 33MHz Intel i960RP RISC Processor 151 4MB Parity EDO Memory 152 153DAC960PJ 1/2/3 Wide Ultra SCSI-3 Channels 154 66MHz Intel i960RD RISC Processor 155 4MB/8MB/16MB/32MB/64MB/128MB ECC EDO Memory 156 157DAC960PG 1/2/3 Wide Ultra SCSI-3 Channels 158 33MHz Intel i960RP RISC Processor 159 4MB/8MB ECC EDO Memory 160 161DAC960PU 1/2/3 Wide Ultra SCSI-3 Channels 162 Intel i960CF RISC Processor 163 4MB/8MB EDRAM or 2MB/4MB/8MB/16MB/32MB DRAM Memory 164 165DAC960PD 1/2/3 Wide Fast SCSI-2 Channels 166 Intel i960CF RISC Processor 167 4MB/8MB EDRAM or 2MB/4MB/8MB/16MB/32MB DRAM Memory 168 169DAC960PL 1/2/3 Wide Fast SCSI-2 Channels 170 Intel i960 RISC Processor 171 2MB/4MB/8MB/16MB/32MB DRAM Memory 172 173DAC960P 1/2/3 Wide Fast SCSI-2 Channels 174 Intel i960 RISC Processor 175 2MB/4MB/8MB/16MB/32MB DRAM Memory 176 177For the eXtremeRAID 2000/3000 and AcceleRAID 352/170/160, firmware version 1786.00-01 or above is required. 179 180For the eXtremeRAID 1100, firmware version 5.06-0-52 or above is required. 181 182For the AcceleRAID 250, 200, and 150, firmware version 4.06-0-57 or above is 183required. 184 185For the DAC960PJ and DAC960PG, firmware version 4.06-0-00 or above is required. 186 187For the DAC960PU, DAC960PD, DAC960PL, and DAC960P, either firmware version 1883.51-0-04 or above is required (for dual Flash ROM controllers), or firmware 189version 2.73-0-00 or above is required (for single Flash ROM controllers) 190 191Please note that not all SCSI disk drives are suitable for use with DAC960 192controllers, and only particular firmware versions of any given model may 193actually function correctly. Similarly, not all motherboards have a BIOS that 194properly initializes the AcceleRAID 250, AcceleRAID 200, AcceleRAID 150, 195DAC960PJ, and DAC960PG because the Intel i960RD/RP is a multi-function device. 196If in doubt, contact Mylex RAID Technical Support (mylexsup@us.ibm.com) to 197verify compatibility. Mylex makes available a hard disk compatibility list at 198http://www.mylex.com/support/hdcomp/hd-lists.html. 199 200 201 DRIVER INSTALLATION 202 203This distribution was prepared for Linux kernel version 2.2.19 or 2.4.12. 204 205To install the DAC960 RAID driver, you may use the following commands, 206replacing "/usr/src" with wherever you keep your Linux kernel source tree: 207 208 cd /usr/src 209 tar -xvzf DAC960-2.2.11.tar.gz (or DAC960-2.4.11.tar.gz) 210 mv README.DAC960 linux/Documentation 211 mv DAC960.[ch] linux/drivers/block 212 patch -p0 < DAC960.patch (if DAC960.patch is included) 213 cd linux 214 make config 215 make bzImage (or zImage) 216 217Then install "arch/x86/boot/bzImage" or "arch/x86/boot/zImage" as your 218standard kernel, run lilo if appropriate, and reboot. 219 220To create the necessary devices in /dev, the "make_rd" script included in 221"DAC960-Utilities.tar.gz" from http://www.dandelion.com/Linux/ may be used. 222LILO 21 and FDISK v2.9 include DAC960 support; also included in this archive 223are patches to LILO 20 and FDISK v2.8 that add DAC960 support, along with 224statically linked executables of LILO and FDISK. This modified version of LILO 225will allow booting from a DAC960 controller and/or mounting the root file 226system from a DAC960. 227 228Red Hat Linux 6.0 and SuSE Linux 6.1 include support for Mylex PCI RAID 229controllers. Installing directly onto a DAC960 may be problematic from other 230Linux distributions until their installation utilities are updated. 231 232 233 INSTALLATION NOTES 234 235Before installing Linux or adding DAC960 logical drives to an existing Linux 236system, the controller must first be configured to provide one or more logical 237drives using the BIOS Configuration Utility or DACCF. Please note that since 238there are only at most 6 usable partitions on each logical drive, systems 239requiring more partitions should subdivide a drive group into multiple logical 240drives, each of which can have up to 6 usable partitions. Also, note that with 241large disk arrays it is advisable to enable the 8GB BIOS Geometry (255/63) 242rather than accepting the default 2GB BIOS Geometry (128/32); failing to so do 243will cause the logical drive geometry to have more than 65535 cylinders which 244will make it impossible for FDISK to be used properly. The 8GB BIOS Geometry 245can be enabled by configuring the DAC960 BIOS, which is accessible via Alt-M 246during the BIOS initialization sequence. 247 248For maximum performance and the most efficient E2FSCK performance, it is 249recommended that EXT2 file systems be built with a 4KB block size and 16 block 250stride to match the DAC960 controller's 64KB default stripe size. The command 251"mke2fs -b 4096 -R stride=16 <device>" is appropriate. Unless there will be a 252large number of small files on the file systems, it is also beneficial to add 253the "-i 16384" option to increase the bytes per inode parameter thereby 254reducing the file system metadata. Finally, on systems that will only be run 255with Linux 2.2 or later kernels it is beneficial to enable sparse superblocks 256with the "-s 1" option. 257 258 259 DAC960 ANNOUNCEMENTS MAILING LIST 260 261The DAC960 Announcements Mailing List provides a forum for informing Linux 262users of new driver releases and other announcements regarding Linux support 263for DAC960 PCI RAID Controllers. To join the mailing list, send a message to 264"dac960-announce-request@dandelion.com" with the line "subscribe" in the 265message body. 266 267 268 CONTROLLER CONFIGURATION AND STATUS MONITORING 269 270The DAC960 RAID controllers running firmware 4.06 or above include a Background 271Initialization facility so that system downtime is minimized both for initial 272installation and subsequent configuration of additional storage. The BIOS 273Configuration Utility (accessible via Alt-R during the BIOS initialization 274sequence) is used to quickly configure the controller, and then the logical 275drives that have been created are available for immediate use even while they 276are still being initialized by the controller. The primary need for online 277configuration and status monitoring is then to avoid system downtime when disk 278drives fail and must be replaced. Mylex's online monitoring and configuration 279utilities are being ported to Linux and will become available at some point in 280the future. Note that with a SAF-TE (SCSI Accessed Fault-Tolerant Enclosure) 281enclosure, the controller is able to rebuild failed drives automatically as 282soon as a drive replacement is made available. 283 284The primary interfaces for controller configuration and status monitoring are 285special files created in the /proc/rd/... hierarchy along with the normal 286system console logging mechanism. Whenever the system is operating, the DAC960 287driver queries each controller for status information every 10 seconds, and 288checks for additional conditions every 60 seconds. The initial status of each 289controller is always available for controller N in /proc/rd/cN/initial_status, 290and the current status as of the last status monitoring query is available in 291/proc/rd/cN/current_status. In addition, status changes are also logged by the 292driver to the system console and will appear in the log files maintained by 293syslog. The progress of asynchronous rebuild or consistency check operations 294is also available in /proc/rd/cN/current_status, and progress messages are 295logged to the system console at most every 60 seconds. 296 297Starting with the 2.2.3/2.0.3 versions of the driver, the status information 298available in /proc/rd/cN/initial_status and /proc/rd/cN/current_status has been 299augmented to include the vendor, model, revision, and serial number (if 300available) for each physical device found connected to the controller: 301 302***** DAC960 RAID Driver Version 2.2.3 of 19 August 1999 ***** 303Copyright 1998-1999 by Leonard N. Zubkoff <lnz@dandelion.com> 304Configuring Mylex DAC960PRL PCI RAID Controller 305 Firmware Version: 4.07-0-07, Channels: 1, Memory Size: 16MB 306 PCI Bus: 1, Device: 4, Function: 1, I/O Address: Unassigned 307 PCI Address: 0xFE300000 mapped at 0xA0800000, IRQ Channel: 21 308 Controller Queue Depth: 128, Maximum Blocks per Command: 128 309 Driver Queue Depth: 127, Maximum Scatter/Gather Segments: 33 310 Stripe Size: 64KB, Segment Size: 8KB, BIOS Geometry: 255/63 311 SAF-TE Enclosure Management Enabled 312 Physical Devices: 313 0:0 Vendor: IBM Model: DRVS09D Revision: 0270 314 Serial Number: 68016775HA 315 Disk Status: Online, 17928192 blocks 316 0:1 Vendor: IBM Model: DRVS09D Revision: 0270 317 Serial Number: 68004E53HA 318 Disk Status: Online, 17928192 blocks 319 0:2 Vendor: IBM Model: DRVS09D Revision: 0270 320 Serial Number: 13013935HA 321 Disk Status: Online, 17928192 blocks 322 0:3 Vendor: IBM Model: DRVS09D Revision: 0270 323 Serial Number: 13016897HA 324 Disk Status: Online, 17928192 blocks 325 0:4 Vendor: IBM Model: DRVS09D Revision: 0270 326 Serial Number: 68019905HA 327 Disk Status: Online, 17928192 blocks 328 0:5 Vendor: IBM Model: DRVS09D Revision: 0270 329 Serial Number: 68012753HA 330 Disk Status: Online, 17928192 blocks 331 0:6 Vendor: ESG-SHV Model: SCA HSBP M6 Revision: 0.61 332 Logical Drives: 333 /dev/rd/c0d0: RAID-5, Online, 89640960 blocks, Write Thru 334 No Rebuild or Consistency Check in Progress 335 336To simplify the monitoring process for custom software, the special file 337/proc/rd/status returns "OK" when all DAC960 controllers in the system are 338operating normally and no failures have occurred, or "ALERT" if any logical 339drives are offline or critical or any non-standby physical drives are dead. 340 341Configuration commands for controller N are available via the special file 342/proc/rd/cN/user_command. A human readable command can be written to this 343special file to initiate a configuration operation, and the results of the 344operation can then be read back from the special file in addition to being 345logged to the system console. The shell command sequence 346 347 echo "<configuration-command>" > /proc/rd/c0/user_command 348 cat /proc/rd/c0/user_command 349 350is typically used to execute configuration commands. The configuration 351commands are: 352 353 flush-cache 354 355 The "flush-cache" command flushes the controller's cache. The system 356 automatically flushes the cache at shutdown or if the driver module is 357 unloaded, so this command is only needed to be certain a write back cache 358 is flushed to disk before the system is powered off by a command to a UPS. 359 Note that the flush-cache command also stops an asynchronous rebuild or 360 consistency check, so it should not be used except when the system is being 361 halted. 362 363 kill <channel>:<target-id> 364 365 The "kill" command marks the physical drive <channel>:<target-id> as DEAD. 366 This command is provided primarily for testing, and should not be used 367 during normal system operation. 368 369 make-online <channel>:<target-id> 370 371 The "make-online" command changes the physical drive <channel>:<target-id> 372 from status DEAD to status ONLINE. In cases where multiple physical drives 373 have been killed simultaneously, this command may be used to bring all but 374 one of them back online, after which a rebuild to the final drive is 375 necessary. 376 377 Warning: make-online should only be used on a dead physical drive that is 378 an active part of a drive group, never on a standby drive. The command 379 should never be used on a dead drive that is part of a critical logical 380 drive; rebuild should be used if only a single drive is dead. 381 382 make-standby <channel>:<target-id> 383 384 The "make-standby" command changes physical drive <channel>:<target-id> 385 from status DEAD to status STANDBY. It should only be used in cases where 386 a dead drive was replaced after an automatic rebuild was performed onto a 387 standby drive. It cannot be used to add a standby drive to the controller 388 configuration if one was not created initially; the BIOS Configuration 389 Utility must be used for that currently. 390 391 rebuild <channel>:<target-id> 392 393 The "rebuild" command initiates an asynchronous rebuild onto physical drive 394 <channel>:<target-id>. It should only be used when a dead drive has been 395 replaced. 396 397 check-consistency <logical-drive-number> 398 399 The "check-consistency" command initiates an asynchronous consistency check 400 of <logical-drive-number> with automatic restoration. It can be used 401 whenever it is desired to verify the consistency of the redundancy 402 information. 403 404 cancel-rebuild 405 cancel-consistency-check 406 407 The "cancel-rebuild" and "cancel-consistency-check" commands cancel any 408 rebuild or consistency check operations previously initiated. 409 410 411 EXAMPLE I - DRIVE FAILURE WITHOUT A STANDBY DRIVE 412 413The following annotated logs demonstrate the controller configuration and and 414online status monitoring capabilities of the Linux DAC960 Driver. The test 415configuration comprises 6 1GB Quantum Atlas I disk drives on two channels of a 416DAC960PJ controller. The physical drives are configured into a single drive 417group without a standby drive, and the drive group has been configured into two 418logical drives, one RAID-5 and one RAID-6. Note that these logs are from an 419earlier version of the driver and the messages have changed somewhat with newer 420releases, but the functionality remains similar. First, here is the current 421status of the RAID configuration: 422 423gwynedd:/u/lnz# cat /proc/rd/c0/current_status 424***** DAC960 RAID Driver Version 2.0.0 of 23 March 1999 ***** 425Copyright 1998-1999 by Leonard N. Zubkoff <lnz@dandelion.com> 426Configuring Mylex DAC960PJ PCI RAID Controller 427 Firmware Version: 4.06-0-08, Channels: 3, Memory Size: 8MB 428 PCI Bus: 0, Device: 19, Function: 1, I/O Address: Unassigned 429 PCI Address: 0xFD4FC000 mapped at 0x8807000, IRQ Channel: 9 430 Controller Queue Depth: 128, Maximum Blocks per Command: 128 431 Driver Queue Depth: 127, Maximum Scatter/Gather Segments: 33 432 Stripe Size: 64KB, Segment Size: 8KB, BIOS Geometry: 255/63 433 Physical Devices: 434 0:1 - Disk: Online, 2201600 blocks 435 0:2 - Disk: Online, 2201600 blocks 436 0:3 - Disk: Online, 2201600 blocks 437 1:1 - Disk: Online, 2201600 blocks 438 1:2 - Disk: Online, 2201600 blocks 439 1:3 - Disk: Online, 2201600 blocks 440 Logical Drives: 441 /dev/rd/c0d0: RAID-5, Online, 5498880 blocks, Write Thru 442 /dev/rd/c0d1: RAID-6, Online, 3305472 blocks, Write Thru 443 No Rebuild or Consistency Check in Progress 444 445gwynedd:/u/lnz# cat /proc/rd/status 446OK 447 448The above messages indicate that everything is healthy, and /proc/rd/status 449returns "OK" indicating that there are no problems with any DAC960 controller 450in the system. For demonstration purposes, while I/O is active Physical Drive 4511:1 is now disconnected, simulating a drive failure. The failure is noted by 452the driver within 10 seconds of the controller's having detected it, and the 453driver logs the following console status messages indicating that Logical 454Drives 0 and 1 are now CRITICAL as a result of Physical Drive 1:1 being DEAD: 455 456DAC960#0: Physical Drive 1:2 Error Log: Sense Key = 6, ASC = 29, ASCQ = 02 457DAC960#0: Physical Drive 1:3 Error Log: Sense Key = 6, ASC = 29, ASCQ = 02 458DAC960#0: Physical Drive 1:1 killed because of timeout on SCSI command 459DAC960#0: Physical Drive 1:1 is now DEAD 460DAC960#0: Logical Drive 0 (/dev/rd/c0d0) is now CRITICAL 461DAC960#0: Logical Drive 1 (/dev/rd/c0d1) is now CRITICAL 462 463The Sense Keys logged here are just Check Condition / Unit Attention conditions 464arising from a SCSI bus reset that is forced by the controller during its error 465recovery procedures. Concurrently with the above, the driver status available 466from /proc/rd also reflects the drive failure. The status message in 467/proc/rd/status has changed from "OK" to "ALERT": 468 469gwynedd:/u/lnz# cat /proc/rd/status 470ALERT 471 472and /proc/rd/c0/current_status has been updated: 473 474gwynedd:/u/lnz# cat /proc/rd/c0/current_status 475 ... 476 Physical Devices: 477 0:1 - Disk: Online, 2201600 blocks 478 0:2 - Disk: Online, 2201600 blocks 479 0:3 - Disk: Online, 2201600 blocks 480 1:1 - Disk: Dead, 2201600 blocks 481 1:2 - Disk: Online, 2201600 blocks 482 1:3 - Disk: Online, 2201600 blocks 483 Logical Drives: 484 /dev/rd/c0d0: RAID-5, Critical, 5498880 blocks, Write Thru 485 /dev/rd/c0d1: RAID-6, Critical, 3305472 blocks, Write Thru 486 No Rebuild or Consistency Check in Progress 487 488Since there are no standby drives configured, the system can continue to access 489the logical drives in a performance degraded mode until the failed drive is 490replaced and a rebuild operation completed to restore the redundancy of the 491logical drives. Once Physical Drive 1:1 is replaced with a properly 492functioning drive, or if the physical drive was killed without having failed 493(e.g., due to electrical problems on the SCSI bus), the user can instruct the 494controller to initiate a rebuild operation onto the newly replaced drive: 495 496gwynedd:/u/lnz# echo "rebuild 1:1" > /proc/rd/c0/user_command 497gwynedd:/u/lnz# cat /proc/rd/c0/user_command 498Rebuild of Physical Drive 1:1 Initiated 499 500The echo command instructs the controller to initiate an asynchronous rebuild 501operation onto Physical Drive 1:1, and the status message that results from the 502operation is then available for reading from /proc/rd/c0/user_command, as well 503as being logged to the console by the driver. 504 505Within 10 seconds of this command the driver logs the initiation of the 506asynchronous rebuild operation: 507 508DAC960#0: Rebuild of Physical Drive 1:1 Initiated 509DAC960#0: Physical Drive 1:1 Error Log: Sense Key = 6, ASC = 29, ASCQ = 01 510DAC960#0: Physical Drive 1:1 is now WRITE-ONLY 511DAC960#0: Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 1% completed 512 513and /proc/rd/c0/current_status is updated: 514 515gwynedd:/u/lnz# cat /proc/rd/c0/current_status 516 ... 517 Physical Devices: 518 0:1 - Disk: Online, 2201600 blocks 519 0:2 - Disk: Online, 2201600 blocks 520 0:3 - Disk: Online, 2201600 blocks 521 1:1 - Disk: Write-Only, 2201600 blocks 522 1:2 - Disk: Online, 2201600 blocks 523 1:3 - Disk: Online, 2201600 blocks 524 Logical Drives: 525 /dev/rd/c0d0: RAID-5, Critical, 5498880 blocks, Write Thru 526 /dev/rd/c0d1: RAID-6, Critical, 3305472 blocks, Write Thru 527 Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 6% completed 528 529As the rebuild progresses, the current status in /proc/rd/c0/current_status is 530updated every 10 seconds: 531 532gwynedd:/u/lnz# cat /proc/rd/c0/current_status 533 ... 534 Physical Devices: 535 0:1 - Disk: Online, 2201600 blocks 536 0:2 - Disk: Online, 2201600 blocks 537 0:3 - Disk: Online, 2201600 blocks 538 1:1 - Disk: Write-Only, 2201600 blocks 539 1:2 - Disk: Online, 2201600 blocks 540 1:3 - Disk: Online, 2201600 blocks 541 Logical Drives: 542 /dev/rd/c0d0: RAID-5, Critical, 5498880 blocks, Write Thru 543 /dev/rd/c0d1: RAID-6, Critical, 3305472 blocks, Write Thru 544 Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 15% completed 545 546and every minute a progress message is logged to the console by the driver: 547 548DAC960#0: Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 32% completed 549DAC960#0: Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 63% completed 550DAC960#0: Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 94% completed 551DAC960#0: Rebuild in Progress: Logical Drive 1 (/dev/rd/c0d1) 94% completed 552 553Finally, the rebuild completes successfully. The driver logs the status of the 554logical and physical drives and the rebuild completion: 555 556DAC960#0: Rebuild Completed Successfully 557DAC960#0: Physical Drive 1:1 is now ONLINE 558DAC960#0: Logical Drive 0 (/dev/rd/c0d0) is now ONLINE 559DAC960#0: Logical Drive 1 (/dev/rd/c0d1) is now ONLINE 560 561/proc/rd/c0/current_status is updated: 562 563gwynedd:/u/lnz# cat /proc/rd/c0/current_status 564 ... 565 Physical Devices: 566 0:1 - Disk: Online, 2201600 blocks 567 0:2 - Disk: Online, 2201600 blocks 568 0:3 - Disk: Online, 2201600 blocks 569 1:1 - Disk: Online, 2201600 blocks 570 1:2 - Disk: Online, 2201600 blocks 571 1:3 - Disk: Online, 2201600 blocks 572 Logical Drives: 573 /dev/rd/c0d0: RAID-5, Online, 5498880 blocks, Write Thru 574 /dev/rd/c0d1: RAID-6, Online, 3305472 blocks, Write Thru 575 Rebuild Completed Successfully 576 577and /proc/rd/status indicates that everything is healthy once again: 578 579gwynedd:/u/lnz# cat /proc/rd/status 580OK 581 582 583 EXAMPLE II - DRIVE FAILURE WITH A STANDBY DRIVE 584 585The following annotated logs demonstrate the controller configuration and and 586online status monitoring capabilities of the Linux DAC960 Driver. The test 587configuration comprises 6 1GB Quantum Atlas I disk drives on two channels of a 588DAC960PJ controller. The physical drives are configured into a single drive 589group with a standby drive, and the drive group has been configured into two 590logical drives, one RAID-5 and one RAID-6. Note that these logs are from an 591earlier version of the driver and the messages have changed somewhat with newer 592releases, but the functionality remains similar. First, here is the current 593status of the RAID configuration: 594 595gwynedd:/u/lnz# cat /proc/rd/c0/current_status 596***** DAC960 RAID Driver Version 2.0.0 of 23 March 1999 ***** 597Copyright 1998-1999 by Leonard N. Zubkoff <lnz@dandelion.com> 598Configuring Mylex DAC960PJ PCI RAID Controller 599 Firmware Version: 4.06-0-08, Channels: 3, Memory Size: 8MB 600 PCI Bus: 0, Device: 19, Function: 1, I/O Address: Unassigned 601 PCI Address: 0xFD4FC000 mapped at 0x8807000, IRQ Channel: 9 602 Controller Queue Depth: 128, Maximum Blocks per Command: 128 603 Driver Queue Depth: 127, Maximum Scatter/Gather Segments: 33 604 Stripe Size: 64KB, Segment Size: 8KB, BIOS Geometry: 255/63 605 Physical Devices: 606 0:1 - Disk: Online, 2201600 blocks 607 0:2 - Disk: Online, 2201600 blocks 608 0:3 - Disk: Online, 2201600 blocks 609 1:1 - Disk: Online, 2201600 blocks 610 1:2 - Disk: Online, 2201600 blocks 611 1:3 - Disk: Standby, 2201600 blocks 612 Logical Drives: 613 /dev/rd/c0d0: RAID-5, Online, 4399104 blocks, Write Thru 614 /dev/rd/c0d1: RAID-6, Online, 2754560 blocks, Write Thru 615 No Rebuild or Consistency Check in Progress 616 617gwynedd:/u/lnz# cat /proc/rd/status 618OK 619 620The above messages indicate that everything is healthy, and /proc/rd/status 621returns "OK" indicating that there are no problems with any DAC960 controller 622in the system. For demonstration purposes, while I/O is active Physical Drive 6231:2 is now disconnected, simulating a drive failure. The failure is noted by 624the driver within 10 seconds of the controller's having detected it, and the 625driver logs the following console status messages: 626 627DAC960#0: Physical Drive 1:1 Error Log: Sense Key = 6, ASC = 29, ASCQ = 02 628DAC960#0: Physical Drive 1:3 Error Log: Sense Key = 6, ASC = 29, ASCQ = 02 629DAC960#0: Physical Drive 1:2 killed because of timeout on SCSI command 630DAC960#0: Physical Drive 1:2 is now DEAD 631DAC960#0: Physical Drive 1:2 killed because it was removed 632DAC960#0: Logical Drive 0 (/dev/rd/c0d0) is now CRITICAL 633DAC960#0: Logical Drive 1 (/dev/rd/c0d1) is now CRITICAL 634 635Since a standby drive is configured, the controller automatically begins 636rebuilding onto the standby drive: 637 638DAC960#0: Physical Drive 1:3 is now WRITE-ONLY 639DAC960#0: Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 4% completed 640 641Concurrently with the above, the driver status available from /proc/rd also 642reflects the drive failure and automatic rebuild. The status message in 643/proc/rd/status has changed from "OK" to "ALERT": 644 645gwynedd:/u/lnz# cat /proc/rd/status 646ALERT 647 648and /proc/rd/c0/current_status has been updated: 649 650gwynedd:/u/lnz# cat /proc/rd/c0/current_status 651 ... 652 Physical Devices: 653 0:1 - Disk: Online, 2201600 blocks 654 0:2 - Disk: Online, 2201600 blocks 655 0:3 - Disk: Online, 2201600 blocks 656 1:1 - Disk: Online, 2201600 blocks 657 1:2 - Disk: Dead, 2201600 blocks 658 1:3 - Disk: Write-Only, 2201600 blocks 659 Logical Drives: 660 /dev/rd/c0d0: RAID-5, Critical, 4399104 blocks, Write Thru 661 /dev/rd/c0d1: RAID-6, Critical, 2754560 blocks, Write Thru 662 Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 4% completed 663 664As the rebuild progresses, the current status in /proc/rd/c0/current_status is 665updated every 10 seconds: 666 667gwynedd:/u/lnz# cat /proc/rd/c0/current_status 668 ... 669 Physical Devices: 670 0:1 - Disk: Online, 2201600 blocks 671 0:2 - Disk: Online, 2201600 blocks 672 0:3 - Disk: Online, 2201600 blocks 673 1:1 - Disk: Online, 2201600 blocks 674 1:2 - Disk: Dead, 2201600 blocks 675 1:3 - Disk: Write-Only, 2201600 blocks 676 Logical Drives: 677 /dev/rd/c0d0: RAID-5, Critical, 4399104 blocks, Write Thru 678 /dev/rd/c0d1: RAID-6, Critical, 2754560 blocks, Write Thru 679 Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 40% completed 680 681and every minute a progress message is logged on the console by the driver: 682 683DAC960#0: Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 40% completed 684DAC960#0: Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 76% completed 685DAC960#0: Rebuild in Progress: Logical Drive 1 (/dev/rd/c0d1) 66% completed 686DAC960#0: Rebuild in Progress: Logical Drive 1 (/dev/rd/c0d1) 84% completed 687 688Finally, the rebuild completes successfully. The driver logs the status of the 689logical and physical drives and the rebuild completion: 690 691DAC960#0: Rebuild Completed Successfully 692DAC960#0: Physical Drive 1:3 is now ONLINE 693DAC960#0: Logical Drive 0 (/dev/rd/c0d0) is now ONLINE 694DAC960#0: Logical Drive 1 (/dev/rd/c0d1) is now ONLINE 695 696/proc/rd/c0/current_status is updated: 697 698***** DAC960 RAID Driver Version 2.0.0 of 23 March 1999 ***** 699Copyright 1998-1999 by Leonard N. Zubkoff <lnz@dandelion.com> 700Configuring Mylex DAC960PJ PCI RAID Controller 701 Firmware Version: 4.06-0-08, Channels: 3, Memory Size: 8MB 702 PCI Bus: 0, Device: 19, Function: 1, I/O Address: Unassigned 703 PCI Address: 0xFD4FC000 mapped at 0x8807000, IRQ Channel: 9 704 Controller Queue Depth: 128, Maximum Blocks per Command: 128 705 Driver Queue Depth: 127, Maximum Scatter/Gather Segments: 33 706 Stripe Size: 64KB, Segment Size: 8KB, BIOS Geometry: 255/63 707 Physical Devices: 708 0:1 - Disk: Online, 2201600 blocks 709 0:2 - Disk: Online, 2201600 blocks 710 0:3 - Disk: Online, 2201600 blocks 711 1:1 - Disk: Online, 2201600 blocks 712 1:2 - Disk: Dead, 2201600 blocks 713 1:3 - Disk: Online, 2201600 blocks 714 Logical Drives: 715 /dev/rd/c0d0: RAID-5, Online, 4399104 blocks, Write Thru 716 /dev/rd/c0d1: RAID-6, Online, 2754560 blocks, Write Thru 717 Rebuild Completed Successfully 718 719and /proc/rd/status indicates that everything is healthy once again: 720 721gwynedd:/u/lnz# cat /proc/rd/status 722OK 723 724Note that the absence of a viable standby drive does not create an "ALERT" 725status. Once dead Physical Drive 1:2 has been replaced, the controller must be 726told that this has occurred and that the newly replaced drive should become the 727new standby drive: 728 729gwynedd:/u/lnz# echo "make-standby 1:2" > /proc/rd/c0/user_command 730gwynedd:/u/lnz# cat /proc/rd/c0/user_command 731Make Standby of Physical Drive 1:2 Succeeded 732 733The echo command instructs the controller to make Physical Drive 1:2 into a 734standby drive, and the status message that results from the operation is then 735available for reading from /proc/rd/c0/user_command, as well as being logged to 736the console by the driver. Within 60 seconds of this command the driver logs: 737 738DAC960#0: Physical Drive 1:2 Error Log: Sense Key = 6, ASC = 29, ASCQ = 01 739DAC960#0: Physical Drive 1:2 is now STANDBY 740DAC960#0: Make Standby of Physical Drive 1:2 Succeeded 741 742and /proc/rd/c0/current_status is updated: 743 744gwynedd:/u/lnz# cat /proc/rd/c0/current_status 745 ... 746 Physical Devices: 747 0:1 - Disk: Online, 2201600 blocks 748 0:2 - Disk: Online, 2201600 blocks 749 0:3 - Disk: Online, 2201600 blocks 750 1:1 - Disk: Online, 2201600 blocks 751 1:2 - Disk: Standby, 2201600 blocks 752 1:3 - Disk: Online, 2201600 blocks 753 Logical Drives: 754 /dev/rd/c0d0: RAID-5, Online, 4399104 blocks, Write Thru 755 /dev/rd/c0d1: RAID-6, Online, 2754560 blocks, Write Thru 756 Rebuild Completed Successfully 757