A.5.239 PMULHRWA: Multiply Packed 16-bit Integers With Rounding, and Store
High Word
PMULHRWA mm1,mm2/m64 ; 0F 0F /r B7 [PENT,3DNOW]
PMULHRWA takes two packed 16-bit integer inputs, multiplies the values
in the inputs, rounds on bit 16 of each result, then stores bits 16-
31 of each result to the corresponding position of the destination
register.
The operation of this instruction is:
dst[0-15] := (src1[0-15] *src2[0-15] + 0x00008000)[16-31];
dst[16-31] := (src1[16-31]*src2[16-31] + 0x00008000)[16-31];
dst[32-47] := (src1[32-47]*src2[32-47] + 0x00008000)[16-31];
dst[48-63] := (src1[48-63]*src2[48-63] + 0x00008000)[16-31].
See also PMULHRWC (section A.5.238) for a Cyrix version of this
instruction.