NASM 2.05 based x86 Instruction Reference[ch255]
A.5.231 PMADDWD: MMX Packed Multiply and Add PMADDWD mm1,mm2/m64 ; 0F F5 /r [PENT,MMX] PMADDWD xmm1,xmm2/m128 ; 66 0F F5 /r [WILLAMETTE,SSE2] PMADDWD treats its two inputs as vectors of signed words. It multiplies corresponding elements of the two operands, giving doubleword results. These are then added together in pairs and stored in the destination operand. The operation of this instruction is: dst[0-31] := (dst[0-15] * src[0-15]) + (dst[16-31] * src[16-31]); dst[32-63] := (dst[32-47] * src[32-47]) + (dst[48-63] * src[48-63]); The following apply to the SSE version of the instruction: dst[64-95] := (dst[64-79] * src[64-79]) + (dst[80-95] * src[80-95]); dst[96-127] := (dst[96-111] * src[96-111]) + (dst[112-127] * src[112-127]).