赞
踩
该文是通过对virtio-1.2官方文档翻译生成的,文档的下载地址为: http://docs.oasis-open.org/virtio/virtio/v1.2/
The virtio block device is a simple virtual block device (ie. disk). Read and write requests (and other exotic requests) are placed in one of its queues, and serviced (probably out of order) by the device except where noted.
virtio块设备是一个简单的虚拟块设备(即:磁盘)。读写请求(和其他外来请求)放在其队列中,并由设备进行服务(可能无序)。
2
0 requestq 1
. . .
N-1 requestq N
N=1 if VIRTIO_BLK_F_MQ is not negotiated, otherwise N is set by num_queues.
VIRTIO_BLK_F_SIZE_MAX (1) Maximum size of any single segment is in size_max.
VIRTIO_BLK_F_SEG_MAX (2) Maximum number of segments in a request is in seg_max. VIRTIO_BLK_F_GEOMETRY (4) Disk-style geometry specified in geometry.
VIRTIO_BLK_F_RO (5) Device is read-only.
VIRTIO_BLK_F_BLK_SIZE (6) Block size of disk is in blk_size.
VIRTIO_BLK_F_FLUSH (9) Cache flush command support.
VIRTIO_BLK_F_TOPOLOGY (10) Device exports information on optimal I/O alignment. VIRTIO_BLK_F_CONFIG_WCE (11) Device can toggle its cache between writeback and writethrough modes.
VIRTIO_BLK_F_MQ (12) Device supports multiqueue.
VIRTIO_BLK_F_DISCARD (13) Device can support discard command, maximum discard sectors size in max_discard_sectors and maximum discard segment number in max_discard_seg.
VIRTIO_BLK_F_WRITE_ZEROES (14) Device can support write zeroes command, maximum write zeroes sectors size in max_write_zeroes_sectors and maximum write zeroes segment number in max_write_zeroes_seg.
VIRTIO_BLK_F_LIFETIME (15) Device supports providing storage lifetime information.
VIRTIO_BLK_F_SECURE_ERASE (16) Device supports secure erase command, maximum erase sectors count in max_secure_erase_sectors and maximum erase segment number in max_secure_erase_seg.
VIRTIO_BLK_F_BARRIER (0) Device supports request barriers.
VIRTIO_BLK_F_SCSI (7) Device supports scsi packet commands.
Note: In the legacy interface, VIRTIO_BLK_F_FLUSH was also called VIRTIO_BLK_F_WCE.
The capacity of the device (expressed in 512-byte sectors) is always present. The availability of the others all depend on various feature bits as indicated above.
设备的容量(以512字节扇区表示)始终存在。其他选项的可用性都依赖于上面所示的各种特征位。
The field num_queues only exists if VIRTIO_BLK_F_MQ is set. This field specifies the number of queues.
字段num_queues仅在设置了VIRTIO_BLK_F_MQ时才存在。此字段指定队列的数量。
The parameters in the configuration space of the device max_discard_sectors discard_sector_alignment are expressed in 512-byte units if the VIRTIO_BLK_F_DISCARD feature bit is negotiated. The max_write_zeroes_sectors is expressed in 512-byte units if the VIRTIO_BLK_F_WRITE_ZEROES feature bit is negotiated. The parameters in the configuration space of the device max_secure_erase_sectors secure_erase_sector_alignment are expressed in 512-byte units if the VIRTIO_BLK_F_SECURE_ERASE feature bit is negotiated.
struct virtio_blk_config { le64 capacity; le32 size_max; le32 seg_max; struct virtio_blk_geometry { le16 cylinders; u8 heads; u8 sectors; } geometry; le32 blk_size; struct virtio_blk_topology { // # of logical blocks per physical block (log2) u8 physical_block_exp; // offset of first aligned logical block u8 alignment_offset; // suggested minimum I/O size in blocks le16 min_io_size; // optimal (suggested maximum) I/O size in blocks le32 opt_io_size; } topology; u8 writeback; u8 unused0; u16 num_queues; le32 max_discard_sectors; le32 max_discard_seg; le32 discard_sector_alignment; le32 max_write_zeroes_sectors; le32 max_write_zeroes_seg; u8 write_zeroes_may_unmap; u8 unused1[3]; le32 max_secure_erase_sectors; le32 max_secure_erase_seg; le32 secure_erase_sector_alignment; };
When using the legacy interface, transitional devices and drivers MUST format the fields in struct virtio_blk_config according to the native endian of the guest rather than (necessarily when not using the legacy interface) little-endian.
Drivers SHOULD NOT negotiate VIRTIO_BLK_F_FLUSH if they are incapable of sending VIRTIO_BLK_T_FLUSH commands.
If neither VIRTIO_BLK_F_CONFIG_WCE nor VIRTIO_BLK_F_FLUSH are negotiated, the driver MAY deduce the presence of a writethrough cache. If VIRTIO_BLK_F_CONFIG_WCE was not negotiated but VIRTIO_BLK_F_FLUSH was, the driver SHOULD assume presence of a writeback cache.
The driver MUST NOT read writeback before setting the FEATURES_OK device status bit.
Devices SHOULD always offer VIRTIO_BLK_F_FLUSH, and MUST offer it if they offer VIRTIO_BLK_F_CONFIG_WCE.
If VIRTIO_BLK_F_CONFIG_WCE is negotiated but VIRTIO_BLK_F_FLUSH is not, the device MUST initialize writeback to 0.
The device MUST initialize padding bytes unused0 and unused1 to 0.
Because legacy devices do not have FEATURES_OK, transitional devices MUST implement slightly different behavior around feature negotiation when used through the legacy interface. In particular, when using the legacy interface:
• the driver MAY read or write writeback before setting the DRIVER or DRIVER_OK device status bit
• the device MUST NOT modify the cache mode (and writeback) as a result of a driver setting a status bit, unless the DRIVER_OK bit is being set and the driver has not set the VIRTIO_BLK_F_CONFIG_WCE driver feature bit.
• the device MUST NOT modify the cache mode (and writeback) as a result of a driver modifying the driver feature bits, for example if the driver sets the VIRTIO_BLK_F_CONFIG_WCE driver feature bit but does not set the VIRTIO_BLK_F_FLUSH bit.
The driver queues requests to the virtqueues, and they are used by the device (not necessarily in order). Each request is of form:
struct virtio_blk_req {
le32 type;
le32 reserved;
le64 sector;
u8 data[];
u8 status;
};
The type of the request is either a read (VIRTIO_BLK_T_IN), a write (VIRTIO_BLK_T_OUT), a discard (VIRTIO_BLK_T_DISCARD), a write zeroes (VIRTIO_BLK_T_WRITE_ZEROES), a flush (VIRTIO_BLK_T_FLUSH), a get device ID string command (VIRTIO_BLK_T_GET_ID), a secure erase (VIRTIO_BLK_T_SECURE_ERASE), or a get device lifetime command (VIRTIO_BLK_T_GET_LIFETIME).
#define VIRTIO_BLK_T_IN 0
#define VIRTIO_BLK_T_OUT 1
#define VIRTIO_BLK_T_FLUSH 4
#define VIRTIO_BLK_T_GET_ID 8
#define VIRTIO_BLK_T_GET_LIFETIME 10
#define VIRTIO_BLK_T_DISCARD 11
#define VIRTIO_BLK_T_WRITE_ZEROES 13
#define VIRTIO_BLK_T_SECURE_ERASE 14
The sector number indicates the offset (multiplied by 512) where the read or write is to occur. This field is unused and set to 0 for commands other than read or write.
VIRTIO_BLK_T_IN requests populate data with the contents of sectors read from the block device (in multiples of 512 bytes). VIRTIO_BLK_T_OUT requests write the contents of data to the block device (in multiples of 512 bytes).
The data used for discard, secure erase or write zeroes commands consists of one or more segments. The maximum number of segments is max_discard_seg for discard commands, max_secure_erase_seg for secure erase commands and max_write_zeroes_seg for write zeroes commands. Each segment is of form:
struct virtio_blk_discard_write_zeroes {
le64 sector;
le32 num_sectors;
struct {
le32 unmap:1;
le32 reserved:31;
} flags;
};
sector indicates the starting offset (in 512-byte units) of the segment, while num_sectors indicates the number of sectors in each discarded range. unmap is only used in write zeroes commands and allows the device to discard the specified range, provided that following reads return zeroes.
VIRTIO_BLK_T_GET_ID requests fetch the device ID string from the device into data. The device ID string is a NUL-padded ASCII string up to 20 bytes long. If the string is 20 bytes long then there is no NUL terminator.
The data used for VIRTIO_BLK_T_GET_LIFETIME requests is populated by the device, and is of the form
struct virtio_blk_lifetime {
le16 pre_eol_info;
le16 device_lifetime_est_typ_a;
le16 device_lifetime_est_typ_b;
};
The pre_eol_info specifies the percentage of reserved blocks that are consumed and will have one of these values:
/* Value not available */
#define VIRTIO_BLK_PRE_EOL_INFO_UNDEFINED 0
/* < 80% of reserved blocks are consumed */
#define VIRTIO_BLK_PRE_EOL_INFO_NORMAL 1
/* 80% of reserved blocks are consumed */
#define VIRTIO_BLK_PRE_EOL_INFO_WARNING 2
/* 90% of reserved blocks are consumed */
#define VIRTIO_BLK_PRE_EOL_INFO_URGENT 3
/* All others values are reserved */
The device_lifetime_est_typ_a refers to wear of SLC cells and is provided in increments of 10used, and so on, thru to 11 meaning estimated lifetime exceeded. All values above 11 are reserved.
The device_lifetime_est_typ_b refers to wear of MLC cells and is provided with the same semantics as device_lifetime_est_typ_a.
The final status byte is written by the device: either VIRTIO_BLK_S_OK for success, VIRTIO_BLK_S_IOERR for device or driver error or VIRTIO_BLK_S_UNSUPP for a request unsupported by device:
#define VIRTIO_BLK_S_OK 0
#define VIRTIO_BLK_S_IOERR 1
#define VIRTIO_BLK_S_UNSUPP 2
The status of individual segments is indeterminate when a discard or write zero command produces VIRTIO_BLK_S_IOERR. A segment may have completed successfully, failed, or not been processed by the device.
A driver MUST NOT submit a request which would cause a read or write beyond capacity.
A driver SHOULD accept the VIRTIO_BLK_F_RO feature if offered.
A driver MUST set sector to 0 for a VIRTIO_BLK_T_FLUSH request. A driver SHOULD NOT include any data in a VIRTIO_BLK_T_FLUSH request.
The length of data MUST be a multiple of 512 bytes for VIRTIO_BLK_T_IN and VIRTIO_BLK_T_OUT requests.
The length of data MUST be a multiple of the size of struct virtio_blk_discard_write_zeroes for VIRTIO_BLK_T_DISCARD, VIRTIO_BLK_T_SECURE_ERASE and VIRTIO_BLK_T_WRITE_ZEROES requests.
The length of data MUST be 20 bytes for VIRTIO_BLK_T_GET_ID requests.
VIRTIO_BLK_T_DISCARD requests MUST NOT contain more than max_discard_seg struct virtio_blk_discard_write_zeroes segments in data.
VIRTIO_BLK_T_SECURE_ERASE requests MUST NOT contain more than max_secure_erase_seg struct virtio_blk_discard_write_zeroes segments in data.
VIRTIO_BLK_T_WRITE_ZEROES requests MUST NOT contain more than max_write_zeroes_seg struct virtio_blk_discard_write_zeroes segments in data.
If the VIRTIO_BLK_F_CONFIG_WCE feature is negotiated, the driver MAY switch to writethrough or writeback mode by writing respectively 0 and 1 to the writeback field. After writing a 0 to writeback, the driver MUST NOT assume that any volatile writes have been committed to persistent device backend storage.
The unmap bit MUST be zero for discard commands. The driver MUST NOT assume anything about the data returned by read requests after a range of sectors has been discarded.
A driver MUST NOT assume that individual segments in a multi-segment VIRTIO_BLK_T_DISCARD or VIRTIO_BLK_T_WRITE_ZEROES request completed successfully, failed, or were processed by the device at all if the request failed with VIRTIO_BLK_S_IOERR.
A device MUST set the status byte to VIRTIO_BLK_S_IOERR for a write request if the VIRTIO_BLK_F_RO feature if offered, and MUST NOT write any data.
The device MUST set the status byte to VIRTIO_BLK_S_UNSUPP for discard, secure erase and write zeroes commands if any unknown flag is set. Furthermore, the device MUST set the status byte to VIRTIO_BLK_S_UNSUPP for discard commands if the unmap flag is set.
For discard commands, the device MAY deallocate the specified range of sectors in the device backend storage.
For write zeroes commands, if the unmap is set, the device MAY deallocate the specified range of sectors in the device backend storage, as if the discard command had been sent. After a write zeroes command is completed, reads of the specified ranges of sectors MUST return zeroes. This is true independent of whether unmap was set or clear.
The device SHOULD clear the write_zeroes_may_unmap field of the virtio configuration space if and only if a write zeroes request cannot result in deallocating one or more sectors. The device MAY change the content of the field during operation of the device; when this happens, the device SHOULD trigger a configuration change notification.
A write is considered volatile when it is submitted; the contents of sectors covered by a volatile write are undefined in persistent device backend storage until the write becomes stable. A write becomes stable once it is completed and one or more of the following conditions is true:
When using the legacy interface, transitional devices and drivers MUST format the fields in struct virtio_blk_req according to the native endian of the guest rather than (necessarily when not using the legacy interface) little-endian.
When using the legacy interface, transitional drivers SHOULD ignore the used length values.
Note: Historically, some devices put the total descriptor length, or the total length of device-writable buffers there, even when only the status byte was actually written.
The reserved field was previously called ioprio。 ioprio is a hint about the relative priorities of requests to the device: higher numbers indicate more important requests.
#define VIRTIO FLUSH OUT 5
The command VIRTIO_BLK_T_FLUSH_OUT was a synonym for VIRTIO_BLK_T_FLUSH; a driver MUST treat it as a VIRTIO_BLK_T_FLUSH command.
#define VIRTIO_BLK_T_BARRIER 0x80000000
If the device has VIRTIO_BLK_F_BARRIER feature the high bit (VIRTIO_BLK_T_BARRIER) indicates that this request acts as a barrier and that all preceding requests SHOULD be complete before this one, and all following requests SHOULD NOT be started until this is complete.
Note: A barrier does not flush caches in the underlying backend device in host, and thus does not serve as data consistency guarantee. Only a VIRTIO_BLK_T_FLUSH request does that.
Some older legacy devices did not commit completed writes to persistent device backend storage when VIRTIO_BLK_F_FLUSH was offered but not negotiated. In order to work around this, the driver MAY set the writeback to 0 (if available) or it MAY send an explicit flush request after every completed write.
If the device has VIRTIO_BLK_F_SCSI feature, it can also support scsi packet command requests, each of these requests is of form:
Note that in this case, according to 5.2.5.2, the device will not have offered VIRTIO_BLK_F_CONFIG_WCE either.
/* All fields are in guest's native endian. */
struct virtio_scsi_pc_req {
u32 type;
u32 ioprio;
u64 sector;
u8 cmd[];
u8 data[][512];
#define SCSI_SENSE_BUFFERSIZE 96
u8 sense[SCSI_SENSE_BUFFERSIZE];
u32 errors;
u32 data_len;
u32 sense_len;
u32 residual;
u8 status;
};
A request type can also be a scsi packet command (VIRTIO_BLK_T_SCSI_CMD or VIRTIO_BLK_T_SCSI_CMD_OUT). The two types are equivalent, the device does not distinguish between them:
#define VIRTIO_BLK_T_SCSI_CMD 2
#define VIRTIO_BLK_T_SCSI_CMD_OUT 3
The cmd field is only present for scsi packet command requests, and indicates the command to perform. This field MUST reside in a single, separate device-readable buffer; command length can be derived from the length of this buffer.
Note that these first three (four for scsi packet commands) fields are always device-readable: data is either device-readable or device-writable, depending on the request. The size of the read or write can be derived from the total size of the request buffers.
sense is only present for scsi packet command requests, and indicates the buffer for scsi sense data. data_len is only present for scsi packet command requests, this field is deprecated, and SHOULD be ignored by the driver. Historically, devices copied data length there.
sense_len is only present for scsi packet command requests and indicates the number of bytes actually written to the sense buffer.
residual field is only present for scsi packet command requests and indicates the residual size, calculated as data length - number of bytes actually transferred.
When using legacy interfaces, transitional drivers which have not negotiated VIRTIO_F_ANY_LAYOUT:
• MUST use a single 8-byte descriptor containing type, reserved and sector, followed by descriptors for data, then finally a separate 1-byte descriptor for status.
• For SCSI commands there are additional constraints. sense MUST reside in a single separate device- writable descriptor of size 96 bytes, and errors, data_len, sense_len and residual MUST reside a single separate device-writable descriptor.
See 2.7.4.
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。