I have long lists of strings such as this machine readable example:
A <- list(c("Biology","Cell Biology","Art","Humanities, Multidisciplinary; Psychology, Experimental","Astronomy & Astrophysics; Physics, Particles & Fields","Economics; Mathematics, Interdisciplinary Applications; Social Sciences, Mathematical Methods","Geriatrics & Gerontology","Gerontology","Management","Operations Research & Management Science","Computer Science, Artificial Intelligence; Computer Science, Information Systems; Engineering, Electrical & Electronic","Economics; Mathematics, Interdisciplinary Applications; Social Sciences, Mathematical Methods; Statistics & Probability"))
So it looks like this:
> A
[[1]]
[1] "Biology"
[2] "Cell Biology"
[3] "Art"
[4] "Humanities, Multidisciplinary; Psychology, Experimental"
[5] "Astronomy & Astrophysics; Physics, Particles & Fields"
[6] "Economics; Mathematics, Interdisciplinary Applications; Social Sciences, Mathematical Methods"
[7] "Geriatrics & Gerontology"
[8] "Gerontology"
[9] "Management"
[10] "Operations Research & Management Science"
[11] "Computer Science, Artificial Intelligence; Computer Science, Information Systems; Engineering, Electrical & Electronic"
[12] "Economics; Mathematics, Interdisciplinary Applications; Social Sciences, Mathematical Methods; Statistics & Probability"
I would like to edit these terms and eliminate duplicates in order to get this result:
[1] "Science"
[2] "Science"
[3] "Arts & Humanities"
[4] "Arts & Humanities; Social Sciences"
[5] "Science"
[6] "Social Sciences; Science"
[7] "Science"
[8] "Social Sciences"
[9] "Social Sciences"
[10] "Science"
[11] "Science"
[12] "Social Sciences; Science"
So far I only got this:
stringedit <- function(A)
{
A <-gsub("Biology", "Science", A)
A <-gsub("Cell Biology", "Science", A)
A <-gsub("Art", "Arts & Humanities", A)
A <-gsub("Humanities, Multidisciplinary", "Arts & Humanities", A)
A <-gsub("Psychology, Experimental", "Social Sciences", A)
A <-gsub("Astronomy & Astrophysics", "Science", A)
A <-gsub("Physics, Particles & Fields", "Science", A)
A <-gsub("Economics", "Social Sciences", A)
A <-gsub("Mathematics", "Science", A)
A <-gsub("Mathematics, Applied", "Science", A)
A <-gsub("Mathematics, Interdisciplinary Applications", "Science", A)
A <-gsub("Social Sciences, Mathematical Methods", "Social Sciences", A)
A <-gsub("Geriatrics & Gerontology", "Science", A)
A <-gsub("Gerontology", "Social Sciences", A)
A <-gsub("Management", "Social Sciences", A)
A <-gsub("Operations Research & Management Science", "Science", A)
A <-gsub("Computer Science, Artificial Intelligence", "Science", A)
A <-gsub("Computer Science, Information Systems", "Science", A)
A <-gsub("Engineering, Electrical & Electronic", "Science", A)
A <-gsub("Statistics & Probability", "Science", A)
}
B <- lapply(A, stringedit)
But it does not work properly:
> B
[[1]]
[1] "Science"
[2] "Cell Science"
[3] "Arts & Humanities"
[4] "Arts & Humanities; Social Sciences"
[5] "Science; Science"
[6] "Social Sciences; Science, Interdisciplinary Applications; Social Sciences"
[7] "Science"
[8] "Social Sciences"
[9] "Social Sciences"
[10] "Operations Research & Social Sciences Science"
[11] "Computer Science, Arts & Humanitiesificial Intelligence; Science; Science"
[12] "Social Sciences; Science, Interdisciplinary Applications; Social Sciences; Science"
How can I achieve the correct output mentioned above?
Thank you very much in advance for your consideration!
To remove a character in an R data frame column, we can use gsub function which will replace the character with blank. For example, if we have a data frame called df that contains a character column say x which has a character ID in each value then it can be removed by using the command gsub("ID","",as.
Definition and Usage The \r metacharacter matches carriage return characters.
To create that regular expression, you need to use a string, which also needs to escape \ . That means to match a literal \ you need to write "\\\\" — you need four backslashes to match one!
For this purpose, we use the two main anchors in R regular expressions: ^ – matches from the beginning of the string (for multiline strings – the beginning of each line) $ – matches from the end of the string (for multiline strings – the end of each line)
I found it easiest to have a two-column data.frame
as a lookup, with one column for the course name and one column for the category. Here's an example:
course.categories <- data.frame(
Course =
c("Art", "Humanities, Multidisciplinary", "Biology", "Cell Biology",
"Astronomy & Astrophysics", "Physics, Particles & Fields", "Mathematics",
"Mathematics, Applied", "Mathematics, Interdisciplinary Applications",
"Geriatrics & Gerontology", "Operations Research & Management Science",
"Computer Science, Artificial Intelligence",
"Computer Science, Information Systems",
"Engineering, Electrical & Electronic", "Statistics & Probability",
"Psychology, Experimental", "Economics",
"Social Sciences, Mathematical Methods",
"Gerontology", "Management"),
Category =
c("Arts & Humanities", "Arts & Humanities", "Science", "Science",
"Science", "Science", "Science", "Science", "Science", "Science",
"Science", "Science", "Science", "Science", "Science", "Social Sciences",
"Social Sciences", "Social Sciences", "Social Sciences", "Social Sciences"))
Then, assuming A
as a list as in your question:
sapply(strsplit(unlist(A), "; "),
function(x)
paste(unique(course.categories[match(x, course.categories[["Course"]]),
"Category"]),
collapse = "; "))
# [1] "Science" "Science"
# [3] "Arts & Humanities" "Arts & Humanities; Social Sciences"
# [5] "Science" "Social Sciences; Science"
# [7] "Science" "Social Sciences"
# [9] "Social Sciences" "Science"
# [11] "Science" "Social Sciences; Science"
match
matches the values from A
with the course names in the course.categories
dataset and says which rows the match occurs on; this is used to extract the category the course belongs to. Then, unique
makes sure we just have one of each category. paste
puts things back together.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With