Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to get IPv4 address from string

I have the following code which is about 7 times faster than inet_addr . I was wondering if there is a way to improve this to make it even faster or if a faster alternative exists.

This code requires that a valid null terminated IPv4 address is supplied with no whitespace, which in my case is always the way, so I optimized for that case. Usually you would have more error checking, but if there is a way to make the following even faster or a faster alternative exists I would really appreciate it.

UINT32 GetIP(const char *p)
{
    UINT32 dwIP=0,dwIP_Part=0;
    while(true)
    {
        if(p[0] == 0)
        {
            dwIP = (dwIP << 8) | dwIP_Part;
            break;
        }
        if(p[0]=='.') 
        {       
            dwIP = (dwIP << 8) | dwIP_Part;                     
            dwIP_Part = 0;
           p++;
        }
        dwIP_Part = (dwIP_Part*10)+(p[0]-'0');
        p++;
    }
    return dwIP;
}
like image 847
Harry Avatar asked Jul 28 '15 14:07

Harry


People also ask

What is general expression for extracting IP address from logs?

Grep and Regular Expressions You've used grep and regular expression syntax to search for IP addresses in a log file.

Is IP address a string?

An IP address is a string of numbers separated by periods. IP addresses are expressed as a set of four numbers — an example address might be 192.158. 1.38. Each number in the set can range from 0 to 255.

How can I find my IP address without CMD?

To find the IP address on Windows 10, without using the command prompt: Click the Start icon and select Settings. Click the Network & Internet icon. To view the IP address of a wired connection, select Ethernet on the left menu pane and select your network connection, your IP address will appear next to "IPv4 Address".

How do I get just an IP address in PowerShell?

Use Get-NetIPAddress to Get IPv4 Address Into a Variable in PowerShell. The Get-NetIPAddress cmdlet gets the IP address configuration, such as IPv4 addresses, IPv6 addresses, and the IP interfaces associated with addresses.


2 Answers

Since we are speaking about maximizing throughput of IP address parsing, I suggest using a vectorized solution.

Here is x86-specific fast solution (needs SSE4.1, or at least SSSE3 for poor):

__m128i shuffleTable[65536];    //can be reduced 256x times, see @IwillnotexistIdonotexist

UINT32 MyGetIP(const char *str) {
    __m128i input = _mm_lddqu_si128((const __m128i*)str);   //"192.167.1.3"
    input = _mm_sub_epi8(input, _mm_set1_epi8('0'));        //1 9 2 254 1 6 7 254 1 254 3 208 245 0 8 40 
    __m128i cmp = input;                                    //...X...X.X.XX...  (signs)
    UINT32 mask = _mm_movemask_epi8(cmp);                   //6792 - magic index
    __m128i shuf = shuffleTable[mask];                      //10 -1 -1 -1 8 -1 -1 -1 6 5 4 -1 2 1 0 -1 
    __m128i arr = _mm_shuffle_epi8(input, shuf);            //3 0 0 0 | 1 0 0 0 | 7 6 1 0 | 2 9 1 0 
    __m128i coeffs = _mm_set_epi8(0, 100, 10, 1, 0, 100, 10, 1, 0, 100, 10, 1, 0, 100, 10, 1);
    __m128i prod = _mm_maddubs_epi16(coeffs, arr);          //3 0 | 1 0 | 67 100 | 92 100 
    prod = _mm_hadd_epi16(prod, prod);                      //3 | 1 | 167 | 192 | ? | ? | ? | ?
    __m128i imm = _mm_set_epi8(-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 6, 4, 2, 0);
    prod = _mm_shuffle_epi8(prod, imm);                     //3 1 167 192 0 0 0 0 0 0 0 0 0 0 0 0
    return _mm_extract_epi32(prod, 0);
//  return (UINT32(_mm_extract_epi16(prod, 1)) << 16) + UINT32(_mm_extract_epi16(prod, 0)); //no SSE 4.1
}

And here is the required precalculation for shuffleTable:

void MyInit() {
    memset(shuffleTable, -1, sizeof(shuffleTable));
    int len[4];
    for (len[0] = 1; len[0] <= 3; len[0]++)
        for (len[1] = 1; len[1] <= 3; len[1]++)
            for (len[2] = 1; len[2] <= 3; len[2]++)
                for (len[3] = 1; len[3] <= 3; len[3]++) {
                    int slen = len[0] + len[1] + len[2] + len[3] + 4;
                    int rem = 16 - slen;
                    for (int rmask = 0; rmask < 1<<rem; rmask++) {
//                    { int rmask = (1<<rem)-1;    //note: only maximal rmask is possible if strings are zero-padded
                        int mask = 0;
                        char shuf[16] = {-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1};
                        int pos = 0;
                        for (int i = 0; i < 4; i++) {
                            for (int j = 0; j < len[i]; j++) {
                                shuf[(3-i) * 4 + (len[i]-1-j)] = pos;
                                pos++;
                            }
                            mask ^= (1<<pos);
                            pos++;
                        }
                        mask ^= (rmask<<slen);
                        _mm_store_si128(&shuffleTable[mask], _mm_loadu_si128((__m128i*)shuf));
                    }
                }
}

Full code with testing is avaliable here. On Ivy Bridge processor it prints:

C0A70103
Time = 0.406   (1556701184)
Time = 3.133   (1556701184)

It means that the suggested solution is 7.8 times faster in terms of throughput than the code by OP. It processes 336 millions of addresses per second (single core of 3.4 Ghz).

Now I'll try to explain how it works. Note that on each line of the listing you can see contents of the value just computed. All the arrays are printed in little-endian order (though set intrinsics use big-endian).

First of all, we load 16 bytes from unaligned address by lddqu instruction. Note that in 64-bit mode memory is allocated by 16-byte chunks, so this works well automatically. On 32-bit it may theoretically cause issues with out of range access. Though I do not believe that it really can. The subsequent code would work properly regardless of the values in the after-the-end bytes. Anyway, you'd better ensure that each IP address takes at least 16 bytes of storage.

Then we subtract '0' from all the chars. After that '.' turns into -2, and zero turns into -48, all the digits remain nonnegative. Now we take bitmask of signs of all the bytes with _mm_movemask_epi8.

Depending on the value of this mask, we fetch a nontrivial 16-byte shuffling mask from lookup table shuffleTable. The table is quite large: 1Mb total. And it takes quite some time to precompute. However, it does not take precious space in CPU cache, because only 81 elements from this table are really used. That is because each part of IP address can be either one, two, three digits long => hence 81 variants in total. Note that random trashy bytes after the end of the string may in principle cause increased memory footprint in the lookup table.

EDIT: you can find a version modified by @IwillnotexistIdonotexist in comments, which uses lookup table of only 4Kb size (it is a bit slower, though).

The ingenious _mm_shuffle_epi8 intrinsic allows us to reorder the bytes with our shuffle mask. As a result XMM register contains four 4-byte blocks, each block contains digits in little-endian order. We convert each block into a 16-bit number by _mm_maddubs_epi16 followed by _mm_hadd_epi16. Then we reorder bytes of the register, so that the whole IP address occupies the lower 4 bytes.

Finally, we extract the lower 4 bytes from the XMM register to GP register. It is done with SSE4.1 intrinsic (_mm_extract_epi32). If you don't have it, replace it with other line using _mm_extract_epi16, but it will run a bit slower.

Finally, here is the generated assembly (MSVC2013), so that you can check that your compiler does not generate anything suspicious:

lddqu   xmm1, XMMWORD PTR [rcx]
psubb   xmm1, xmm6
pmovmskb ecx, xmm1
mov ecx, ecx               //useless, see @PeterCordes and @IwillnotexistIdonotexist
add rcx, rcx               //can be removed, see @EvgenyKluev
pshufb  xmm1, XMMWORD PTR [r13+rcx*8]
movdqa  xmm0, xmm8
pmaddubsw xmm0, xmm1
phaddw  xmm0, xmm0
pshufb  xmm0, xmm7
pextrd  eax, xmm0, 0

P.S. If you are still reading it, be sure to check out comments =)

like image 191
stgatilov Avatar answered Oct 17 '22 13:10

stgatilov


As for alternatives: this is similar to yours but with some error checking:

#include <iostream>
#include <string>
#include <cstdint>

uint32_t getip(const std::string &sip)
{
    uint32_t r=0, b, p=0, c=0;
    const char *s;
    s = sip.c_str();
    while (*s)
    {
        r<<=8;
        b=0;
        while (*s&&((*s==' ')||(*s=='\t'))) s++;
        while (*s)
        {
            if ((*s==' ')||(*s=='\t')) { while (*s&&((*s==' ')||(*s=='\t'))) s++; if (*s!='.') break; }
            if (*s=='.') { p++; s++; break; }
            if ((*s>='0')&&(*s<='9'))
            {
                b*=10;
                b+=(*s-'0');
                s++;
            }
        }
        if ((b>255)||(*s=='.')) return 0;
        r+=b;
        c++;
    }
    return ((c==4)&&(p==3))?r:0;
}

void testip(const std::string &sip)
{
    uint32_t nIP=0;
    nIP = getip(sip);
    std::cout << "\nsIP = " << sip << " --> " << std::hex << nIP << "\n";
}

int main()
{
    testip("192.167.1.3");
    testip("292.167.1.3");
    testip("192.267.1.3");
    testip("192.167.1000.3");
    testip("192.167.1.300");
    testip("192.167.1.");
    testip("192.167.1");
    testip("192.167..1");
    testip("192.167.1.3.");
    testip("192.1 67.1.3.");
    testip("192 . 167 . 1 . 3");
    testip(" 192 . 167 . 1 . 3 ");
    return 0;
}
like image 1
slashmais Avatar answered Oct 17 '22 11:10

slashmais