Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get continent name from country name in R

Tags:

r

I have a data frame with one column representing country names. My goal is to add one more column which gives the continent information. Please check the following use case:

my.df <- data.frame(country = c("Afghanistan","Algeria"))

Is there a package that I can use to append a column of data containing the continent names without having the original data?

like image 676
jay Avatar asked Nov 27 '17 11:11

jay


3 Answers

You can use the countrycode package for this task.

library(countrycode)
df <- data.frame(country = c("Afghanistan",
                             "Algeria",
                             "USA",
                             "France",
                             "New Zealand",
                             "Fantasyland"))

df$continent <- countrycode(sourcevar = df[, "country"],
                            origin = "country.name",
                            destination = "continent")
#warning
#In countrycode(sourcevar = df[, "country"], origin = "country.name",  :
#  Some values were not matched unambiguously: Fantasyland

Result

df
#      country continent
#1 Afghanistan      Asia
#2     Algeria    Africa
#3         USA  Americas
#4      France    Europe
#5 New Zealand   Oceania
#6 Fantasyland      <NA>
like image 121
markus Avatar answered Sep 22 '22 22:09

markus


Expanding on Markus' answer, countrycode draws on codelists 'continent' declaration.

?codelist

Definition of continent:

continent: Continent as defined in the World Bank Development Indicators

The question asked for continents but sometimes continents don't provide enough groups for you to delineate the data. For example, continents groups North and South America into Americas.

What you might want is region:

region: Regions as defined in the World Bank Development Indicators

It is unclear how the World Bank groups regions but the below code shows how this destination is more granular.

library(countrycode)

egnations <- c("Afghanistan","Algeria","USA","France","New Zealand","Fantasyland")

countrycode(sourcevar = egnations, origin = "country.name",destination = "region")

Output:

[1] "Southern Asia"            
[2] "Northern Africa"          
[3] "Northern America"         
[4] "Western Europe"           
[5] "Australia and New Zealand"
[6] NA      
like image 21
Justapigeon Avatar answered Sep 25 '22 22:09

Justapigeon


You can try

my.df <- data.frame(country = c("Afghanistan","Algeria"),
                    continent= as.factor(c("Asia","Africa")))
merge(my.df, raster::ccodes()[,c("NAME", "CONTINENT")], by.x="country", by.y="NAME", all.x=T)
#       country continent CONTINENT
# 1 Afghanistan      Asia      Asia
# 2     Algeria    Africa    Africa

Some country values might need an adjustment; I dunno since you did not provide all values.

like image 42
lukeA Avatar answered Sep 22 '22 22:09

lukeA