NASM 2.05 based x86 Instruction Reference[ch220]
A.5.196 PACKSSDW, PACKSSWB, PACKUSWB: Pack Data PACKSSDW mm1,mm2/m64 ; 0F 6B /r [PENT,MMX] PACKSSWB mm1,mm2/m64 ; 0F 63 /r [PENT,MMX] PACKUSWB mm1,mm2/m64 ; 0F 67 /r [PENT,MMX] PACKSSDW xmm1,xmm2/m128 ; 66 0F 6B /r [WILLAMETTE,SSE2] PACKSSWB xmm1,xmm2/m128 ; 66 0F 63 /r [WILLAMETTE,SSE2] PACKUSWB xmm1,xmm2/m128 ; 66 0F 67 /r [WILLAMETTE,SSE2] All these instructions start by combining the source and destination operands, and then splitting the result in smaller sections which it then packs into the destination register. The MMX versions pack two 64- bit operands into one 64-bit register, while the SSE versions pack two 128-bit operands into one 128-bit register. - PACKSSWB splits the combined value into words, and then reduces the words to bytes, using signed saturation. It then packs the bytes into the destination register in the same order the words were in. - PACKSSDW performs the same operation as PACKSSWB, except that it reduces doublewords to words, then packs them into the destination register. - PACKUSWB performs the same operation as PACKSSWB, except that it uses unsigned saturation when reducing the size of the elements. To perform signed saturation on a number, it is replaced by the largest signed number (7FFFh or 7Fh) that _will_ fit, and if it is too small it is replaced by the smallest signed number (8000h or 80h) that will fit. To perform unsigned saturation, the input is treated as unsigned, and the input is replaced by the largest unsigned number that will fit.