Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

save IP address in mongoDB

Currently in order to save an IP address I am converting it to number and store it in the collection. Basically I am doing this for logging purposes. This means that I care to store information as fast as possible and with smallest amount of space.

I will be rarely using it for querying.

My ideas that

  • Storing as strings is for sure inefficient.
  • Storing as 4 digits will be slower and will take more space.

Nonetheless I think that this is an adequate method, but is there a better one for my purpose?

like image 235
Salvador Dali Avatar asked Mar 23 '13 19:03

Salvador Dali


People also ask

How do you store IP address in MongoDB?

200.104/27", you can easily do this by masking an integer address with an integer subnet mask. (Mongo doesn't support this particular query, but most RDBMS do.) If you store addresses as strings, then your query will need to convert each row to an integer, then mask it, which is several orders of magnitude slower.

How do you whitelist IP address in MongoDB?

To whitelist multiple IP addresses, go to your target cluster on MongoDB Atlas. Next, go to Network Access under the Security tab. On the IP Whitelist tab, click on Add IP address. Type your IP address manually under Whitelist Entry, then click Confirm.

What is IP address in MongoDB?

An IP is a unique numeric identifier for a device connecting to a network. In MongoDB Atlas for Government, you can only connect to a cluster from a trusted IP address.

How does MongoDB connect to IP address?

Enable MongoDB Auth In the same config file, go to the network interfaces section and change the bindIp from 127.0. 0.1 to 0.0. 0.0 which means allow connections from all ip addresses. Now save and exit the config file and restart mongodb server.


2 Answers

Definitely save IP addresses as numbers, if you don't mind the extra bit of work that it takes, especially if you need to do queries on the addresses and you have large tables/collections.

Here's why:

Storage

  • An IPv4 address is 4 bytes if stored as unsigned integer.
  • An IPv4 address varies between 10 bytes and 18 bytes when written out as a string in dotted octed form. (Let's assume the average is 14 bytes.)

That is 7-15 bytes for the characters, plus 2-3 bytes if you're using a variable length string type, which varies based on the database you're using. If you have a fixed length string representation available, then you must use a 15-character fixed width field.

Disk storage is cheap, so that's not a factor in most use cases. Memory, however, is not as cheap, and if you have a large table/collection and you want to do fast queries, then you need an index. The 2-3x storage penalty of string encoding drastically reduces the amount of records you can index while still keeping the index resident in memory.

  • An IPv6 address is 16 bytes if stored as an unsigned integer. (Likely as multiple 4 or 8 byte integers, depending on your platform.)
  • An IPv6 address ranges from 6 bytes to 42 bytes when encoded as a string in abbreviated hex notation.

On the low end, a loop back address (::1) is 3 bytes plus the variable length string overhead. On the high end, an address like 2002:4559:1FE2:1FE2:4559:1FE2:4559:1FE2 uses 39 bytes plus the variable length string overhead.

Unlike with IPv4, it's not safe to assume the average IPv6 string length will be mean of 6 and 42, because the number of addresses with a significant number of consecutive zeroes is a very small fraction of the overall IPv6 address space. Only some special addresses, like loopback and autoconf addresses, are likely to be compressible in this way.

Again, this is a storage penalty of >2x for string encoding versus integer encoding.

Network Math

Do you think routers store IP addresses as strings? Of course they don't.

If you need to do network math on IP addresses, the string representation is a hassle. E.g. if you want to write a query that searches for all addresses on a specific subnet ("return all records with an IP address in 10.7.200.104/27", you can easily do this by masking an integer address with an integer subnet mask. (Mongo doesn't support this particular query, but most RDBMS do.) If you store addresses as strings, then your query will need to convert each row to an integer, then mask it, which is several orders of magnitude slower. (Bitwise masking for an IPv4 address can be done in a few CPU cycles using 2 registers. Converting a string to an integer requires looping over the string.)

Similarly, range queries ("return all records all records between 192.168.1.50 and 192.168.50.100") with integer addresses will be able to use indexes, whereas range queries on string addresses will not.

The Bottom Line

It takes a little bit more work, but not much (there are a million aton() and ntoa() functions out there), but if you're building something serious and solid and you want to future-proof it against future requirements and the possibility of a large dataset, you should store IP addresses as integers, not strings.

If you're doing something quick and dirty and don't mind the possibility of remodeling in the future, then use strings.

For the OP's purpose, if you are optimizing for speed and space and you don't think you want to query it often, then why use a database at all? Just print IP addresses to a file. That would be faster and more storage efficient than storing it in a database (with associated API and storage overhead).

like image 169
Mark E. Haase Avatar answered Sep 17 '22 01:09

Mark E. Haase


An efficient way to save a ip address as a int. If you want to tag a ip with cidr filter, a demo here:

> db.getCollection('iptag').insert({tags: ['office'], hostmin: 2886991873, hostmax: 2887057406, cidr: '172.20.0.0/16'})
> db.getCollection('iptag').insert({tags: ['server'], hostmin: 173867009, hostmax: 173932542, cidr: '10.93.0.0/16'})
> db.getCollection('iptag').insert({tags: ['server'], hostmin: 173932545, hostmax: 173998078, cidr: '10.94.0.0/16'})

Create tags index.

> db.getCollection('iptag').ensureIndex(tags: 1)

Filter ip with cidr range. ip2int('10.94.25.32') == 173938976.

> db.getCollection('iptag').find({hostmin: {$lte: 173938976}, hostmax: {$gte: 173938976}})
like image 44
debug Avatar answered Sep 19 '22 01:09

debug