当前位置:   article > 正文

virtio device type : Block Device_virtio blk pagecache flush

virtio blk pagecache flush

该文是通过对virtio-1.2官方文档翻译生成的,文档的下载地址为: http://docs.oasis-open.org/virtio/virtio/v1.2/

5.2 Block Device

The virtio block device is a simple virtual block device (ie. disk). Read and write requests (and other exotic requests) are placed in one of its queues, and serviced (probably out of order) by the device except where noted.
virtio块设备是一个简单的虚拟块设备(即:磁盘)。读写请求(和其他外来请求)放在其队列中,并由设备进行服务(可能无序)。

5.2.1 Device ID

2

5.2.2 Virtqueues

0 requestq 1
. . .
N-1 requestq N
N=1 if VIRTIO_BLK_F_MQ is not negotiated, otherwise N is set by num_queues.

5.2.3 Feature bits

VIRTIO_BLK_F_SIZE_MAX (1) Maximum size of any single segment is in size_max.
VIRTIO_BLK_F_SEG_MAX (2) Maximum number of segments in a request is in seg_max. VIRTIO_BLK_F_GEOMETRY (4) Disk-style geometry specified in geometry.
VIRTIO_BLK_F_RO (5) Device is read-only.
VIRTIO_BLK_F_BLK_SIZE (6) Block size of disk is in blk_size.
VIRTIO_BLK_F_FLUSH (9) Cache flush command support.
VIRTIO_BLK_F_TOPOLOGY (10) Device exports information on optimal I/O alignment. VIRTIO_BLK_F_CONFIG_WCE (11) Device can toggle its cache between writeback and writethrough modes.
VIRTIO_BLK_F_MQ (12) Device supports multiqueue.
VIRTIO_BLK_F_DISCARD (13) Device can support discard command, maximum discard sectors size in max_discard_sectors and maximum discard segment number in max_discard_seg.
VIRTIO_BLK_F_WRITE_ZEROES (14) Device can support write zeroes command, maximum write zeroes sectors size in max_write_zeroes_sectors and maximum write zeroes segment number in max_write_zeroes_seg.
VIRTIO_BLK_F_LIFETIME (15) Device supports providing storage lifetime information.
VIRTIO_BLK_F_SECURE_ERASE (16) Device supports secure erase command, maximum erase sectors count in max_secure_erase_sectors and maximum erase segment number in max_secure_erase_seg.

5.2.3.1 Legacy Interface: Feature bits

VIRTIO_BLK_F_BARRIER (0) Device supports request barriers.
VIRTIO_BLK_F_SCSI (7) Device supports scsi packet commands.
Note: In the legacy interface, VIRTIO_BLK_F_FLUSH was also called VIRTIO_BLK_F_WCE.

5.2.4 Device configuration layout

The capacity of the device (expressed in 512-byte sectors) is always present. The availability of the others all depend on various feature bits as indicated above.
设备的容量(以512字节扇区表示)始终存在。其他选项的可用性都依赖于上面所示的各种特征位。
The field num_queues only exists if VIRTIO_BLK_F_MQ is set. This field specifies the number of queues.
字段num_queues仅在设置了VIRTIO_BLK_F_MQ时才存在。此字段指定队列的数量。
The parameters in the configuration space of the device max_discard_sectors discard_sector_alignment are expressed in 512-byte units if the VIRTIO_BLK_F_DISCARD feature bit is negotiated. The max_write_zeroes_sectors is expressed in 512-byte units if the VIRTIO_BLK_F_WRITE_ZEROES feature bit is negotiated. The parameters in the configuration space of the device max_secure_erase_sectors secure_erase_sector_alignment are expressed in 512-byte units if the VIRTIO_BLK_F_SECURE_ERASE feature bit is negotiated.

struct virtio_blk_config {
	le64 capacity;
	le32 size_max;
	le32 seg_max;
	struct virtio_blk_geometry {
		le16 cylinders;
		u8 heads;
		u8 sectors;
	} geometry;
	le32 blk_size;
	struct virtio_blk_topology {
		// # of logical blocks per physical block (log2)
		u8 physical_block_exp;
		// offset of first aligned logical block
		u8 alignment_offset;
		// suggested minimum I/O size in blocks
		le16 min_io_size;
		// optimal (suggested maximum) I/O size in blocks
		le32 opt_io_size;
	} topology;
	u8 writeback;
	u8 unused0;
	u16 num_queues;
	le32 max_discard_sectors;
	le32 max_discard_seg;
	le32 discard_sector_alignment;
	le32 max_write_zeroes_sectors;
	le32 max_write_zeroes_seg;
	u8 write_zeroes_may_unmap;
	u8 unused1[3];
	le32 max_secure_erase_sectors;
	le32 max_secure_erase_seg;
	le32 secure_erase_sector_alignment;
};
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34

5.2.4.1 Legacy Interface: Device configuration layout

When using the legacy interface, transitional devices and drivers MUST format the fields in struct virtio_blk_config according to the native endian of the guest rather than (necessarily when not using the legacy interface) little-endian.

5.2.5 Device Initialization

  1. The device size can be read from capacity.
  2. If the VIRTIO_BLK_F_BLK_SIZE feature is negotiated, blk_size can be read to determine the optimal sector size for the driver to use. This does not affect the units used in the protocol (always 512 bytes), but awareness of the correct value can affect performance.
  3. If the VIRTIO_BLK_F_RO feature is set by the device, any write requests will fail.
  4. If the VIRTIO_BLK_F_TOPOLOGY feature is negotiated, the fields in the topology struct can be read to determine the physical block size and optimal I/O lengths for the driver to use. This also does not affect the units in the protocol, only performance.
  5. If the VIRTIO_BLK_F_CONFIG_WCE feature is negotiated, the cache mode can be read or set through the writeback field. 0 corresponds to a writethrough cache, 1 to a writeback cache. The cache mode after reset can be either writeback or writethrough. The actual mode can be determined by reading writeback after feature negotiation.
  6. If the VIRTIO_BLK_F_DISCARD feature is negotiated, max_discard_sectors and max_discard_seg can be read to determine the maximum discard sectors and maximum number of discard segments for the block driver to use. discard_sector_alignment can be used by OS when splitting a request based on alignment.
  7. If the VIRTIO_BLK_F_WRITE_ZEROES feature is negotiated, max_write_zeroes_sectors and max_write_zeroes_seg can be read to determine the maximum write zeroes sectors and maximum number of write zeroes segments for the block driver to use.
  8. If the VIRTIO_BLK_F_MQ feature is negotiated, num_queues field can be read to determine the number of queues.
  9. If the VIRTIO_BLK_F_SECURE_ERASE feature is negotiated, max_secure_erase_sectors and max_secure_erase_seg can be read to determine the maximum secure erase sectors and maximum number of secure erase segments for the block driver to use. secure_erase_sector_alignment can be used by OS when splitting a request based on alignment.

5.2.5.1 Driver Requirements: Device Initialization

Drivers SHOULD NOT negotiate VIRTIO_BLK_F_FLUSH if they are incapable of sending VIRTIO_BLK_T_FLUSH commands.
If neither VIRTIO_BLK_F_CONFIG_WCE nor VIRTIO_BLK_F_FLUSH are negotiated, the driver MAY deduce the presence of a writethrough cache. If VIRTIO_BLK_F_CONFIG_WCE was not negotiated but VIRTIO_BLK_F_FLUSH was, the driver SHOULD assume presence of a writeback cache.
The driver MUST NOT read writeback before setting the FEATURES_OK device status bit.

5.2.5.2 Device Requirements: Device Initialization

Devices SHOULD always offer VIRTIO_BLK_F_FLUSH, and MUST offer it if they offer VIRTIO_BLK_F_CONFIG_WCE.
If VIRTIO_BLK_F_CONFIG_WCE is negotiated but VIRTIO_BLK_F_FLUSH is not, the device MUST initialize writeback to 0.
The device MUST initialize padding bytes unused0 and unused1 to 0.

5.2.5.3 Legacy Interface: Device Initialization

Because legacy devices do not have FEATURES_OK, transitional devices MUST implement slightly different behavior around feature negotiation when used through the legacy interface. In particular, when using the legacy interface:
• the driver MAY read or write writeback before setting the DRIVER or DRIVER_OK device status bit
• the device MUST NOT modify the cache mode (and writeback) as a result of a driver setting a status bit, unless the DRIVER_OK bit is being set and the driver has not set the VIRTIO_BLK_F_CONFIG_WCE driver feature bit.
• the device MUST NOT modify the cache mode (and writeback) as a result of a driver modifying the driver feature bits, for example if the driver sets the VIRTIO_BLK_F_CONFIG_WCE driver feature bit but does not set the VIRTIO_BLK_F_FLUSH bit.

5.2.6 Device Operation

The driver queues requests to the virtqueues, and they are used by the device (not necessarily in order). Each request is of form:

struct virtio_blk_req {
	le32 type;
	le32 reserved;
	le64 sector;
	u8 data[];
	u8 status;
};
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

The type of the request is either a read (VIRTIO_BLK_T_IN), a write (VIRTIO_BLK_T_OUT), a discard (VIRTIO_BLK_T_DISCARD), a write zeroes (VIRTIO_BLK_T_WRITE_ZEROES), a flush (VIRTIO_BLK_T_FLUSH), a get device ID string command (VIRTIO_BLK_T_GET_ID), a secure erase (VIRTIO_BLK_T_SECURE_ERASE), or a get device lifetime command (VIRTIO_BLK_T_GET_LIFETIME).

#define VIRTIO_BLK_T_IN           0
#define VIRTIO_BLK_T_OUT          1
#define VIRTIO_BLK_T_FLUSH        4
#define VIRTIO_BLK_T_GET_ID       8
#define VIRTIO_BLK_T_GET_LIFETIME 10
#define VIRTIO_BLK_T_DISCARD      11
#define VIRTIO_BLK_T_WRITE_ZEROES 13
#define VIRTIO_BLK_T_SECURE_ERASE   14
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8

The sector number indicates the offset (multiplied by 512) where the read or write is to occur. This field is unused and set to 0 for commands other than read or write.
VIRTIO_BLK_T_IN requests populate data with the contents of sectors read from the block device (in multiples of 512 bytes). VIRTIO_BLK_T_OUT requests write the contents of data to the block device (in multiples of 512 bytes).
The data used for discard, secure erase or write zeroes commands consists of one or more segments. The maximum number of segments is max_discard_seg for discard commands, max_secure_erase_seg for secure erase commands and max_write_zeroes_seg for write zeroes commands. Each segment is of form:

struct virtio_blk_discard_write_zeroes {
	le64 sector;
	le32 num_sectors;
	struct {
		le32 unmap:1;
		le32 reserved:31;
	} flags;
};
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8

sector indicates the starting offset (in 512-byte units) of the segment, while num_sectors indicates the number of sectors in each discarded range. unmap is only used in write zeroes commands and allows the device to discard the specified range, provided that following reads return zeroes.
VIRTIO_BLK_T_GET_ID requests fetch the device ID string from the device into data. The device ID string is a NUL-padded ASCII string up to 20 bytes long. If the string is 20 bytes long then there is no NUL terminator.
The data used for VIRTIO_BLK_T_GET_LIFETIME requests is populated by the device, and is of the form

struct virtio_blk_lifetime {
	le16 pre_eol_info;
	le16 device_lifetime_est_typ_a;
	le16 device_lifetime_est_typ_b;
};
  • 1
  • 2
  • 3
  • 4
  • 5

The pre_eol_info specifies the percentage of reserved blocks that are consumed and will have one of these values:

/* Value not available */
#define VIRTIO_BLK_PRE_EOL_INFO_UNDEFINED 0
/* < 80% of reserved blocks are consumed */
#define VIRTIO_BLK_PRE_EOL_INFO_NORMAL 1
/* 80% of reserved blocks are consumed */
#define VIRTIO_BLK_PRE_EOL_INFO_WARNING 2
/* 90% of reserved blocks are consumed */
#define VIRTIO_BLK_PRE_EOL_INFO_URGENT 3
/* All others values are reserved */
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9

The device_lifetime_est_typ_a refers to wear of SLC cells and is provided in increments of 10used, and so on, thru to 11 meaning estimated lifetime exceeded. All values above 11 are reserved.
The device_lifetime_est_typ_b refers to wear of MLC cells and is provided with the same semantics as device_lifetime_est_typ_a.
The final status byte is written by the device: either VIRTIO_BLK_S_OK for success, VIRTIO_BLK_S_IOERR for device or driver error or VIRTIO_BLK_S_UNSUPP for a request unsupported by device:

#define VIRTIO_BLK_S_OK 0
#define VIRTIO_BLK_S_IOERR 1
#define VIRTIO_BLK_S_UNSUPP 2
  • 1
  • 2
  • 3

The status of individual segments is indeterminate when a discard or write zero command produces VIRTIO_BLK_S_IOERR. A segment may have completed successfully, failed, or not been processed by the device.

5.2.6.1 Driver Requirements: Device Operation

A driver MUST NOT submit a request which would cause a read or write beyond capacity.
A driver SHOULD accept the VIRTIO_BLK_F_RO feature if offered.
A driver MUST set sector to 0 for a VIRTIO_BLK_T_FLUSH request. A driver SHOULD NOT include any data in a VIRTIO_BLK_T_FLUSH request.
The length of data MUST be a multiple of 512 bytes for VIRTIO_BLK_T_IN and VIRTIO_BLK_T_OUT requests.
The length of data MUST be a multiple of the size of struct virtio_blk_discard_write_zeroes for VIRTIO_BLK_T_DISCARD, VIRTIO_BLK_T_SECURE_ERASE and VIRTIO_BLK_T_WRITE_ZEROES requests.
The length of data MUST be 20 bytes for VIRTIO_BLK_T_GET_ID requests.
VIRTIO_BLK_T_DISCARD requests MUST NOT contain more than max_discard_seg struct virtio_blk_discard_write_zeroes segments in data.
VIRTIO_BLK_T_SECURE_ERASE requests MUST NOT contain more than max_secure_erase_seg struct virtio_blk_discard_write_zeroes segments in data.
VIRTIO_BLK_T_WRITE_ZEROES requests MUST NOT contain more than max_write_zeroes_seg struct virtio_blk_discard_write_zeroes segments in data.
If the VIRTIO_BLK_F_CONFIG_WCE feature is negotiated, the driver MAY switch to writethrough or writeback mode by writing respectively 0 and 1 to the writeback field. After writing a 0 to writeback, the driver MUST NOT assume that any volatile writes have been committed to persistent device backend storage.
The unmap bit MUST be zero for discard commands. The driver MUST NOT assume anything about the data returned by read requests after a range of sectors has been discarded.
A driver MUST NOT assume that individual segments in a multi-segment VIRTIO_BLK_T_DISCARD or VIRTIO_BLK_T_WRITE_ZEROES request completed successfully, failed, or were processed by the device at all if the request failed with VIRTIO_BLK_S_IOERR.

5.2.6.2 Device Requirements: Device Operation

A device MUST set the status byte to VIRTIO_BLK_S_IOERR for a write request if the VIRTIO_BLK_F_RO feature if offered, and MUST NOT write any data.
The device MUST set the status byte to VIRTIO_BLK_S_UNSUPP for discard, secure erase and write zeroes commands if any unknown flag is set. Furthermore, the device MUST set the status byte to VIRTIO_BLK_S_UNSUPP for discard commands if the unmap flag is set.
For discard commands, the device MAY deallocate the specified range of sectors in the device backend storage.
For write zeroes commands, if the unmap is set, the device MAY deallocate the specified range of sectors in the device backend storage, as if the discard command had been sent. After a write zeroes command is completed, reads of the specified ranges of sectors MUST return zeroes. This is true independent of whether unmap was set or clear.
The device SHOULD clear the write_zeroes_may_unmap field of the virtio configuration space if and only if a write zeroes request cannot result in deallocating one or more sectors. The device MAY change the content of the field during operation of the device; when this happens, the device SHOULD trigger a configuration change notification.
A write is considered volatile when it is submitted; the contents of sectors covered by a volatile write are undefined in persistent device backend storage until the write becomes stable. A write becomes stable once it is completed and one or more of the following conditions is true:

  1. neither VIRTIO_BLK_F_CONFIG_WCE nor VIRTIO_BLK_F_FLUSH feature were negotiated, but VIRTIO_BLK_F_FLUSH was offered by the device;
  2. the VIRTIO_BLK_F_CONFIG_WCE feature was negotiated and the writeback field in configuration space was 0 all the time between the submission of the write and its completion;
  3. a VIRTIO_BLK_T_FLUSH request is sent after the write is completed and is completed itself.
    If the device is backed by persistent storage, the device MUST ensure that stable writes are committed to it, before reporting completion of the write (cases 1 and 2) or the flush (case 3). Failure to do so can cause data loss in case of a crash.
    If the driver changes writeback between the submission of the write and its completion, the write could be either volatile or stable when its completion is reported; in other words, the exact behavior is undefined.
    If VIRTIO_BLK_F_FLUSH was not offered by the device, the device MAY also commit writes to persistent device backend storage before reporting their completion. Unlike case 1, however, this is not an absolute requirement of the specification.
    Note: An implementation that does not offer VIRTIO_BLK_F_FLUSH and does not commit completed writes will not be resilient to data loss in case of crashes. Not offering VIRTIO_BLK_F_FLUSH is an absolute requirement for implementations that do not wish to be safe against such data losses.
    If the device is backed by storage providing lifetime metrics (such as eMMC or UFS persistent storage), the device SHOULD offer the VIRTIO_BLK_F_LIFETIME flag. The flag MUST NOT be offered if the device is backed by storage for which the lifetime metrics described in this document cannot be obtained or for which such metrics have no useful meaning. If the metrics are offered, the device MUST NOT send any reserved values, as defined in this specification.
    Note: The device lifetime metrics pre_eol_info, device_lifetime_est_a and device_lifetime_est_b are discussed in the JESD84-B50 specification.
    The complete JESD84-B50 is available at the JEDEC website (https://www.jedec.org) pursuant to
    JEDEC’s licensing terms and conditions. This information is provided to simplfy passthrough implementations from eMMC devices.

5.2.6.3 Legacy Interface: Device Operation

When using the legacy interface, transitional devices and drivers MUST format the fields in struct virtio_blk_req according to the native endian of the guest rather than (necessarily when not using the legacy interface) little-endian.
When using the legacy interface, transitional drivers SHOULD ignore the used length values.
Note: Historically, some devices put the total descriptor length, or the total length of device-writable buffers there, even when only the status byte was actually written.
The reserved field was previously called ioprio。 ioprio is a hint about the relative priorities of requests to the device: higher numbers indicate more important requests.

#define VIRTIO       FLUSH OUT    5
  • 1

The command VIRTIO_BLK_T_FLUSH_OUT was a synonym for VIRTIO_BLK_T_FLUSH; a driver MUST treat it as a VIRTIO_BLK_T_FLUSH command.

#define VIRTIO_BLK_T_BARRIER     0x80000000
  • 1

If the device has VIRTIO_BLK_F_BARRIER feature the high bit (VIRTIO_BLK_T_BARRIER) indicates that this request acts as a barrier and that all preceding requests SHOULD be complete before this one, and all following requests SHOULD NOT be started until this is complete.
Note: A barrier does not flush caches in the underlying backend device in host, and thus does not serve as data consistency guarantee. Only a VIRTIO_BLK_T_FLUSH request does that.
Some older legacy devices did not commit completed writes to persistent device backend storage when VIRTIO_BLK_F_FLUSH was offered but not negotiated. In order to work around this, the driver MAY set the writeback to 0 (if available) or it MAY send an explicit flush request after every completed write.
If the device has VIRTIO_BLK_F_SCSI feature, it can also support scsi packet command requests, each of these requests is of form:
Note that in this case, according to 5.2.5.2, the device will not have offered VIRTIO_BLK_F_CONFIG_WCE either.

/* All fields are in guest's native endian. */
struct virtio_scsi_pc_req {
	u32 type;
	u32 ioprio;
	u64 sector;
	u8 cmd[];
	u8 data[][512];
#define SCSI_SENSE_BUFFERSIZE   96
	u8 sense[SCSI_SENSE_BUFFERSIZE];
	u32 errors;
	u32 data_len;
	u32 sense_len;
	u32 residual;
	u8 status;
};
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15

A request type can also be a scsi packet command (VIRTIO_BLK_T_SCSI_CMD or VIRTIO_BLK_T_SCSI_CMD_OUT). The two types are equivalent, the device does not distinguish between them:

#define VIRTIO_BLK_T_SCSI_CMD     2
#define VIRTIO_BLK_T_SCSI_CMD_OUT 3
  • 1
  • 2

The cmd field is only present for scsi packet command requests, and indicates the command to perform. This field MUST reside in a single, separate device-readable buffer; command length can be derived from the length of this buffer.
Note that these first three (four for scsi packet commands) fields are always device-readable: data is either device-readable or device-writable, depending on the request. The size of the read or write can be derived from the total size of the request buffers.
sense is only present for scsi packet command requests, and indicates the buffer for scsi sense data. data_len is only present for scsi packet command requests, this field is deprecated, and SHOULD be ignored by the driver. Historically, devices copied data length there.
sense_len is only present for scsi packet command requests and indicates the number of bytes actually written to the sense buffer.
residual field is only present for scsi packet command requests and indicates the residual size, calculated as data length - number of bytes actually transferred.

5.2.6.4 Legacy Interface: Framing Requirements

When using legacy interfaces, transitional drivers which have not negotiated VIRTIO_F_ANY_LAYOUT:
• MUST use a single 8-byte descriptor containing type, reserved and sector, followed by descriptors for data, then finally a separate 1-byte descriptor for status.
• For SCSI commands there are additional constraints. sense MUST reside in a single separate device- writable descriptor of size 96 bytes, and errors, data_len, sense_len and residual MUST reside a single separate device-writable descriptor.
See 2.7.4.

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/在线问答5/article/detail/885052
推荐阅读
相关标签
  

闽ICP备14008679号