I've a dataset that contains 1 column but huge umber of rows. The column contains huge number of public IP addresses. So its possible to get the geolocation from those IPs using sites like (http://freegeoip.net).I want to generate a column of country names which contains the country name for each IP of the rows. Here is my naive approach -
library(XML)
#Import your list of IPs
ip.addresses <- read.csv("ip-address.csv")
#This is my API
api.url <- "http://freegeoip.net/xml/"
#Appending API URL before each of the IPs
api.with.ip <- paste(api.url, ip.addresses$IP.Addresses ,sep="")
#Creating an empty vector for collecting the country names
country.vec <- c()
#Running a for loop to parse country name for each IP
for(i in api.with.ip)
{
#Using xmlParse & xmlToList to extract IP information
data <- xmlParse(i)
xml.data <- xmlToList(data)
#Selecting only Country Name by using xml.data$CountryName
#If Country Name is NULL then putting NA
if(is.null(xml.data$CountryName)){
country.vec <- c(country.vec, NA)
}
else{
country.vec <- c(country.vec, xml.data$CountryName)
}
}
#Combining IPs with its corresponding country names into a dataframe
result <- data.frame(ip.addresses,country.vec)
colnames(result) <- c("IP Address", "Country")
#Exporting the dataframe as csv file
write.csv(result, "IP_to_Location.csv")
But as I've huge number of rows, my approach using for loop is very slow. How the process can be faster?
At last solved this problem in much faster way with 'rgeolocate' and mmdb.
library(rgeolocate)
setwd("/home/imran/Documents/")
ipdf <- read.csv("IP_Address.csv")
ipmmdb <- system.file("extdata","GeoLite2-Country.mmdb", package = "rgeolocate")
results <- maxmind(ipdf$IP.Address, ipmmdb,"country_name")
export.results <- data.frame(ipdf$IP.Address, results$country_name)
colnames(export.results) <- c("IP Address", "Country")
write.csv(export.results, "IP_to_Locationmmdb.csv")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With