{"id":558,"date":"2015-08-14T09:46:29","date_gmt":"2015-08-13T23:46:29","guid":{"rendered":"http:\/\/www.nickdu.com\/?p=558"},"modified":"2015-08-14T09:46:29","modified_gmt":"2015-08-13T23:46:29","slug":"simd-high-performace-xorcpy-in-c","status":"publish","type":"post","link":"https:\/\/nickdu.com\/?p=558","title":{"rendered":"SIMD high performace xorcpy in C"},"content":{"rendered":"<p>In my current project I need a high performace xorcpy implemented in C. Here is my implementation to share and it is tested on my Xeon E5-2680 PC, it reaches almost 1.7GB\/s.<\/p>\n<pre lang=\"c\">#include &lt;emmintrin.h&gt;\n...\n__forceinline unsigned char* xorcpy(unsigned char* dst, const unsigned char* src, unsigned block_size)\n{\n    \/\/ Do the bulk of the copy a __m128i at a time, for faster speed\n    __m128i* mto = (__m128i*)dst;\n    const __m128i* mfrom = (__m128i*)(src);\n    for(int i=(block_size \/ sizeof(__m128i) - 1); i&gt;=0; i--)\n    {\n        __m128i xmm1 = _mm_loadu_si128(mto);\n        __m128i xmm2 = _mm_loadu_si128(mfrom);\n\n        xmm1 = _mm_xor_si128(xmm1, xmm2);     \/\/  XOR 16 bytes\n        _mm_storeu_si128(mto, xmm1);\n        ++mto;\n        ++mfrom;\n    }\n\n    \/\/ The rest bytes we have to do a byte a time though\n    unsigned char* cto = (unsigned char*) mto;\n    const unsigned char* cfrom = (const unsigned char*)mfrom;\n    for(int i=(block_size % sizeof(__m128i)) - 1; i&gt;=0; i--)\n    {\n        *cto++ ^= (*cfrom++);\n    }\n    return dst;\n}\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>In my current project I need a high performace xorcpy implemented in C. Here is my implementation to share and it is tested on my Xeon E5-2680 PC, it reaches almost 1.7GB\/s. #include &lt;emmintrin.h&gt; &#8230; __forceinline unsigned char* xorcpy(unsigned char* dst, const unsigned char* src, unsigned block_size) { \/\/ Do the bulk of the copy &hellip; <a href=\"https:\/\/nickdu.com\/?p=558\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;SIMD high performace xorcpy in C&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[12,2],"tags":[],"class_list":["post-558","post","type-post","status-publish","format-standard","hentry","category-cc","category-it"],"_links":{"self":[{"href":"https:\/\/nickdu.com\/index.php?rest_route=\/wp\/v2\/posts\/558","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/nickdu.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/nickdu.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/nickdu.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/nickdu.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=558"}],"version-history":[{"count":0,"href":"https:\/\/nickdu.com\/index.php?rest_route=\/wp\/v2\/posts\/558\/revisions"}],"wp:attachment":[{"href":"https:\/\/nickdu.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=558"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/nickdu.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=558"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/nickdu.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=558"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}