NASM 2.05 based x86 Instruction Reference[ch263]
A.5.239 PMULHRWA: Multiply Packed 16-bit Integers With Rounding, and Store High Word PMULHRWA mm1,mm2/m64 ; 0F 0F /r B7 [PENT,3DNOW] PMULHRWA takes two packed 16-bit integer inputs, multiplies the values in the inputs, rounds on bit 16 of each result, then stores bits 16- 31 of each result to the corresponding position of the destination register. The operation of this instruction is: dst[0-15] := (src1[0-15] *src2[0-15] + 0x00008000)[16-31]; dst[16-31] := (src1[16-31]*src2[16-31] + 0x00008000)[16-31]; dst[32-47] := (src1[32-47]*src2[32-47] + 0x00008000)[16-31]; dst[48-63] := (src1[48-63]*src2[48-63] + 0x00008000)[16-31]. See also PMULHRWC (section A.5.238) for a Cyrix version of this instruction.