A.5.242 PMULUDQ: Multiply Packed Unsigned 32-bit Integers, and Store.
PMULUDQ mm1,mm2/m64 ; 0F F4 /r [WILLAMETTE,SSE2]
PMULUDQ xmm1,xmm2/m128 ; 66 0F F4 /r [WILLAMETTE,SSE2]
PMULUDQ takes two packed unsigned 32-bit integer inputs, and multiplies
the values in the inputs, forming quadword results. The source is either
an unsigned doubleword in the low doubleword of a 64-bit operand, or
it's two unsigned doublewords in the first and third doublewords of a
128-bit operand. This produces either one or two 64-bit results, which
are stored in the respective quadword locations of the destination
register.
The operation is:
dst[0-63] := dst[0-31] * src[0-31];
dst[64-127] := dst[64-95] * src[64-95].