I have two DataFrames, df1 and df2, structured as follows:
ip_address property_A
1.1.1.1 AAA
1.2.2.2 BBB
1.3.3.3 CCC
... ...
1.255.255.255.255 ZZZ
ip_address property_B
1.1.1.1 YRG
1.2.2.2 HJK
1.3.3.3 KJH
... ...
1.255.255.255.255 TYU
And I want to merge them on the column "ip_address". Due to the nature of the data contained in that column, this command is failing:
pd.merge(df1, df2, on='ip_address', how='inner')
>> dtype: object does not appear to be an IPv4 or IPv6 address
A possible solution would be to convert IP addresses to integers using the ipaddress module as in this example:
import ipaddress
int(ipaddress.IPv4Address('192.168.0.1'))
>> 3232235521
To do this efficiently, I tried this command:
import numpy as np
import pandas as pd
df1['int_ip'] = np.nan
df1.int_ip = int(ipaddress.IPv4Address(df1.ip_address))
However, even this command is failing:
>> AddressValueError: Expected 4 octets in [...]
The only approach that seems to be feasible is the following:
for i in range(0, df1.shape[0]):
df1.int_ip[i] = int(ipaddress.IPv4Address(df1.ip_address[i]))
But this one is extremely inefficient.
Do you have a better approach?
d = {'ip_address': ['1.1.1.1', '2.2.2.2','3.3.3.3','1.255.255.255'], 'property_A': ['AAA','BBB','CCC','ZZZ']}
df1 = pd.DataFrame(data=d)
b = {'ip_address': ['1.1.1.1', '2.2.2.2','3.3.3.3','1.255.255.255'], 'property_B': ['YRG','HJK','KJH','TYU']}
df2 = pd.DataFrame(data=b)
I would try this:
df3= df1.merge(df2.set_index('ip_address'),
left_on=df1.ip_address,
right_index=True)
df1
ip_address property_A
0 1.1.1.1 AAA
1 2.2.2.2 BBB
2 3.3.3.3 CCC
3 1.255.255.255 ZZZ
df2
ip_address property_B
0 1.1.1.1 YRG
1 2.2.2.2 HJK
2 3.3.3.3 KJH
3 1.255.255.255 TYU
df3
ip_address property_A property_B
0 1.1.1.1 AAA YRG
1 2.2.2.2 BBB HJK
2 3.3.3.3 CCC KJH
3 1.255.255.255 ZZZ TYU
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With