A.5.240 PMULHUW: Multiply Packed 16-bit Integers, and Store High Word
PMULHUW mm1,mm2/m64 ; 0F E4 /r [KATMAI,MMX]
PMULHUW xmm1,xmm2/m128 ; 66 0F E4 /r [WILLAMETTE,SSE2]
PMULHUW takes two packed unsigned 16-bit integer inputs, multiplies
the values in the inputs, then stores bits 16-31 of each result to the
corresponding position of the destination register.