赞
踩
流水线前传机制(Pipeline Forwarding)指的是操作数转发(Operand Forwarding)(或数据转发(Data Forwarding)),是CPU中的一种优化机制,以限制由于流水线失速(Pipeline Stall)而发生的性能缺陷。Pipeline Stall指的是当前操作,必须等待尚未完成的较早操作的结果,才能进行。
例子:如果下列两条汇编指令在Pipeline中运行,则在获取(Fetch)并解码(Decode)第二条指令后,Pipeline将暂停,等待第一条语句读取操作数(Read Operands),执行(Execute)ADD,并写入结果(Write Result)。
// A=B+C
ADD A B C
// D=C-A
SUB D C A
未进行Operand Forwarding的Pipeline
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
---|---|---|---|---|---|---|---|
Fetch ADD | Decode ADD | Read Operands for ADD | Execute ADD | Write Result | |||
Fetch SUB | Decode SUB | Stall | Stall | Read Operands for SUB | Execute SUB | Write Result |
Operand Forwarding之后的Pipeline,此时完全消除Stall
1 | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|
Fetch ADD | Decode ADD | Read Operands for ADD | Execute ADD | Write Result | |
Fetch SUB | Decode SUB | Read Operands for SUB(利用较早操作时的缓存) | Execute SUB | Write Result |
从上述例子可见,同样的指令,经过Pipeline Forwarding之后,所需步骤缩减了,从而对应的机器周期也缩减了
grid_dim = (2, 2);
block_idx = (1, 1);
block_id = block_idx.y* grid_dim.x + block_idx.x = 3;
block_dim = (4, 2, 2);
thread_idx = (3, 0, 0);
thread_idx = block_id * (block_dim.x + block_dim.y + block_dim.z)
+ thread_idx.z * (block_dim.x + block_dim.y)
+ thread_idx.y* block_dim* x + thread_idx.x
= 51;
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。