A scientific name usually consists of 3 pieces of information: Genus, species epitheton and Author. A simple example would be the following:
Acanthus ilicifolius L.
Easy. However, the matter gets more complicated when we have to deal with hybrids, subspecies/varieties/forma, several authors and other inconsistencies. In these cases, a species name might look like this:
cf. Andrographis paniculata (Burm.f.) Wall. ex Nees
or this:
Ipomoea pes-caprae (L.) DC. subsp. brasiliensis (L.) Ooststr.f
I'm trying to find a reliable way to deconstruct such names. I could write some hackish code using tons if if/else statements but I'm looking for something more elegant (and robust). I was thinking of some kind of parser that parses the name similarly to a calculator parsing a mathematical expression. Unfortunately, I'm not the most sophisticated programmer and neither have I written a real parser before, nor do I know if it would make sense in this case, as there is quite a lot of variation in scientific names. What do you think is the best way to tackle this problem? Preferred language is R, perhaps also Julia if it suits the task better.
You're in luck (kind of). GBIF have a name parser, and the taxize
package hooks into its API with the gbif_parse
function.
library(taxize)
gbif_parse(c('Acanthus ilicifolius L.',
'cf. Andrographis paniculata (Burm.f.) Wall. ex Nees',
'Ipomoea pes-caprae (L.) DC. subsp. brasiliensis (L.) Ooststr.f'))
# scientificname type genusorabove specificepithet authorsparsed authorship canonicalname canonicalnamewithmarker canonicalnamecomplete bracketauthorship infraspecificepithet rankmarker
# 1 Acanthus ilicifolius L. WELLFORMED Acanthus ilicifolius TRUE L. Acanthus ilicifolius Acanthus ilicifolius Acanthus ilicifolius L. <NA> <NA> <NA>
# 2 cf. Andrographis paniculata (Burm.f.) Wall. ex Nees INFORMAL Andrographis paniculata TRUE Wall. ex Nees Andrographis paniculata Andrographis paniculata Andrographis paniculata (Burm. f.) Wall. ex Nees Burm. f. <NA> <NA>
# 3 Ipomoea pes-caprae (L.) DC. subsp. brasiliensis (L.) Ooststr.f SCINAME Ipomoea pes-caprae TRUE Ooststr.f Ipomoea pes-caprae brasiliensis Ipomoea pes-caprae subsp. brasiliensis Ipomoea pes-caprae subsp. brasiliensis (L.) Ooststr.f L. brasiliensis subsp.
See ?gbif_parse
for more info. You can also find GBIF on github.
taxize
also takes advantage of the EOL API - see ?gni_parse
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With