NASM 2.05 based x86 Instruction Reference[ch266]
A.5.242 PMULUDQ: Multiply Packed Unsigned 32-bit Integers, and Store. PMULUDQ mm1,mm2/m64 ; 0F F4 /r [WILLAMETTE,SSE2] PMULUDQ xmm1,xmm2/m128 ; 66 0F F4 /r [WILLAMETTE,SSE2] PMULUDQ takes two packed unsigned 32-bit integer inputs, and multiplies the values in the inputs, forming quadword results. The source is either an unsigned doubleword in the low doubleword of a 64-bit operand, or it's two unsigned doublewords in the first and third doublewords of a 128-bit operand. This produces either one or two 64-bit results, which are stored in the respective quadword locations of the destination register. The operation is: dst[0-63] := dst[0-31] * src[0-31]; dst[64-127] := dst[64-95] * src[64-95].