A disk is a device that can read and write fixed length sectors. Various flavors of disks differ in how sectors are organized. A hard disk has multiple surfaces where sectors of typically 512 bytes are organized in concentric tracks. A floppy disk has one or two surfaces where sectors of typically 512 bytes are organized in concentric tracks. A compact disk has one surface where sectors of typically 2048 bytes are organized in a spiral track.
Initially, sectors on a disk were addressed using the surface, track and sector numbers. This had several problems. First, implementations of the ATA hardware interface and the BIOS software interface typically limited the number of surfaces to 16, the number of cylinders to 1024, and the number of sectors to 63. Second, the fact that the length of a cylinder depends on the distance from the center of the disk makes it advantageous to vary the number of sectors per cylinder. Lately, sectors on a disk are therefore addressed using a logical block address that numbers sectors sequentially.
An ATA disk denotes a disk using the Advanced Technology Attachment (ATA) or the Advanced Technology Attachment with Packet Interface (ATAPI) standard, which describes an interface between the disk and the computer. The ATA standard allows the disk to be accessed using the command block registers, the ATAPI standard allows the disk to be accessed using the command block registers or the packet commands.
The command block registers interface relies on a number of registers, including the Cylinder High, Cylinder Low, Device/Head, Sector Count, Sector Number, Command, Status, Features, Error, and Data registers. Issuing a command entails reading the Status register until its BSY and DRDY bits are cleared, which indicates that the disk is ready, then writing the other registers with the required parameter, and finally writing the Command register with the required command. When the Command register is written, the disk will set the Status register to indicate that a command is being executed, execute the command, and finally generate an interrupt to indicate that the command has been executed. Data are transferred either through the Data register or using Direct Memory Access.
The packet commands interface relies on the command block registers interface to issue a command that sends a data packet, which is interpreted as another command. The packet commands interface is suitable for complex commands that cannot be described using the command block registers interface.
Because of the mechanical properties of the disk, the relative speed of the computer and the disk must be considered. A problem arises when the computer issues requests for accessing consecutive sectors too slowly relative to the rotation speed, this can be solved by interleaving of sectors. Another problem arises when the computer issues requests for accessing random sectors too quickly relative to the access speed, this can be solved by queuing of requests. The strategy of processing queued requests is important.
The FIFO strategy of processing requests directs the disk to always service the first of the waiting requests. The strategy can suffer from excessive seeking across tracks.
The Shortest Seek First strategy of processing requests directs the disk to service the request that has the shortest distance from the current position of the disk head. The strategy can suffer from letting too distant requests starve.
The Bidirectional Elevator strategy of processing requests directs the disk to service the request that has the shortest distance from the current position of the disk head in the selected direction, which changes when no more requests in the selected direction are waiting. The strategy lets too distant requests starve at most two passes over the disk in both directions.
The Unidirectional Sweep strategy of processing requests directs the disk to service the request that has the shortest distance from the current position of the disk head in the selected direction, or the longest distance from the current position of the disk head when no more requests in the selected direction are waiting. The strategy lets too distant requests starve at most one pass over the disk in the selected directions.
The strategy used to process the queue of requests can be implemented either by the computer in software or by the disk in hardware. The computer typically only considers the current track that the disk head is on, because it does not change without the computer commanding the disk to do so, as opposed to the current sector that the disk head moves over.
Most versions of the ATA interface do not support issuing a new request to the disk before the previous request is completed, and therefore cannot implement any strategy to process the queue of requests. On the contrary, most versions of the SCSI and the SATA interfaces do support issuing a new request to the disk before the previous request is completed.
A SATA disk uses Native Command Queuing as the mechanism used to maintain the queue of requests. The mechanism is coupled with First Party Direct Memory Access, which allows the drive to instruct the controller to set up Direct Memory Access for particular part of particular request.
[Linux 2.2.18 /drivers/block/ll_rw_blk.c] Linux sice ve zdrojácích vytrvale používá název Elevator, ale ve skutečnosti řadí příchozí požadavky podle lineárního čísla sektorů, s výjimkou požadavků, které příliš dlouho čekají (256 přeskočení pro čtení, 512 pro zápis), ty se nepřeskakují. Tedy programy, které intenzivně pracují se začátkem disku, blokují programy, které pracují jinde.
[Linux 2.4.2 /drivers/block/ll_rw_blk.c & elevator.c] Novější Linux se polepšil, nové požadavky nejprve zkouší připojit do sekvence se stávajícími (s omezením maximální délky sekvence), pak je zařadí podle čísla sektoru, nikoliv však na začátek fronty a nikoliv před dlouho čekající požadavky. Výsledkem je one direction sweep se stárnutím.
[Linux 2.6.x] The kernel makes it possible to associate a queueing discipline with a block device by providing modular request schedulers. The three schedules implemented by the kernel are anticipatory, deadline driven and complete fairness queueing.
The Anticipatory scheduler implements a modified version of the Unidirectional Sweep strategy, which permits processing of requests that are close to the current position of the disk head but in the opposite of the selected direction. Additionally, the scheduler enforces an upper limit on the time a request can starve.
The scheduler handles read and write requests separately and inserts delays between read requests when it judges that the process that made the last request is likely to submit another one soon. Note that this implies sending the read requests to the disk one by one and therefore giving up the option of queueing read requests in hardware.
The Deadline Driven scheduler actually also implements a modified version of the Unidirectional Sweep strategy, except that it assigns deadlines to all requests and when a deadline of a request expires, it processes the expired request and continues from that position of the disk head.
The Complete Fairness Queueing scheduler is based on the idea of queueing requests from processes separately and servicing the queues in a round robin fashion, or in a weighted round robin fashion directed by priorities.
[ This information is current for kernel 2.6.19. ]
References.
Hao Ran Liu: Linux I/O Schedulers. http://www.cs.ccu.edu.tw/~lhr89/linux-kernel/Linux IO Schedulers.pdf
Obsluha diskových chyb, retries, reset řadiče, chyby v software. Správa vadných bloků, případně vadných stop, v hardware, SMART diagnostics. Caching, whole track caching, read ahead, write back. Zmínit mirroring a redundantní disková pole.
RAID 0 uses striping to speed up reading and writing. RAID 1 uses plain mirorring and therefore requires pairs of disks of same size. RAID 2 uses bit striping and Hamming Code. RAID 3 uses byte striping and parity disk. RAID 4 uses block striping and parity disk. RAID 5 uses block striping and parity striping. RAID 6 uses block striping and double parity striping. The levels were initially defined in a paper of authors from IBM but vendors tend to tweak levels as they see fit. RAID 2 is not used, RAID 3 is rare, RAID 5 is frequent. RAID 0+1 and RAID 1+0 or RAID 10 combine RAID 0 and RAID 1.
Linux 2.6.10 smartctl -a /dev/hda prints all device information. Attributes have raw value and normalized value, raw value is usually but not necessarily human readable, normalized value is 1-254, threshold 0-255 is associated with normalized value, worst lifetime value is kept. If value is less or equal to threshold then the attribute failed. Attributes are of two types, pre failure and old age. Failed pre failure attribute signals imminent failure. Failed old age attribute signals end of life. Attributes are numbered and some numbers are standardized.
Zmínit partitioning and logical volume management.
Physical volumes, logical volumes, extents (size e.g. 32M), mapping of extents (linear or striped), snapshots.
> vgdisplay --- Volume group --- VG Name volumes System ID Format lvm2 Metadata Areas 2 Metadata Sequence No 10 VG Access read/write VG Status resizable MAX LV 0 Cur LV 3 Open LV 3 Max PV 0 Cur PV 2 Act PV 2 VG Size 1.27 TiB PE Size 32.00 MiB Total PE 41695 Alloc PE / Size 24692 / 771.62 GiB Free PE / Size 17003 / 531.34 GiB VG UUID fbvtrb-GFbS-Nvf4-Ogg3-J4fX-dj83-ebh39q
> pvdisplay --map --- Physical volume --- PV Name /dev/md0 VG Name volumes PV Size 931.39 GiB / not usable 12.56 MiB Allocatable yes PE Size 32.00 MiB Total PE 29804 Free PE 17003 Allocated PE 12801 PV UUID hvfcSD-FvSp-xJn4-lsR3-40Kx-LdDD-wvfGfV --- Physical Segments --- Physical extent 0 to 6875: Logical volume /dev/volumes/home Logical extents 0 to 6875 Physical extent 6876 to 6876: Logical volume /dev/volumes/var Logical extents 11251 to 11251 Physical extent 6877 to 12800: Logical volume /dev/volumes/home Logical extents 6876 to 12799 Physical extent 12801 to 29803: FREE
> lvdisplay --map --- Logical volume --- LV Name /dev/volumes/home VG Name volumes LV UUID OAdf3v-zfI1-w5vq-tFVr-Sfgv-yvre-GWFb3v LV Write Access read/write LV Status available LV Size 400.00 GiB Current LE 12800 Segments 2 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:2 --- Segments --- Logical extent 0 to 6875: Type linear Physical volume /dev/md0 Physical extents 0 to 6875 Logical extent 6876 to 12799: Type linear Physical volume /dev/md0 Physical extents 6877 to 12800