My data set is stored in a single column table named "Formula" which looks like this:
row.identity..main.ID.
C5H6O2N3
C10H12N
C5H6O2N3S
I want to extend the current table, where in each column the letters are written and in the line below the coresponding number is shown. Basically I want to have something like this:
row.identity..main.ID. C H O N S X
C5H6O2N3 5 6 2 3 0 0
C10H12N 10 12 0 1 0 0
C5H6O2N3S 5 6 2 3 1 0
It would be great when the code is flexible for even longer data set with variating letters. So far, I tried to implement the solution from Onyambu.
library(tidyverse)
library(stringr)
Formula%>%mutate(row.identity..main.ID.=gsub("\\b([A-Za-z]+)\\b","\\30",row.identity..main.ID.),
elements=str_extract_all(row.identity..main.ID.,"[A-Za-z]+"),
value=str_extract_all(row.identity..main.ID.,"\\d+"))%>%
unnest()%>%pivot_wider(elements,value,fill=0)
However this is resulting in several errors like "Incompatible lengths: 4, 3." and/or cols
is now required when using unnest().
You could also do:
a<- sub("([A-Z]$)","\\1:1", gsub("(\\D+)(\\d+)", "\\1:\\2\n",df[,1]))
e <- sapply(a, function(x)data.frame(read.dcf(textConnection(x))))
f <- cbind(df, plyr::rbind.fill(e))
f[is.na(f)] <- 0
f
row.identity..main.ID. C H O N S
1 C5H6O2N3 5 6 2 3 0
2 C10H12N 10 12 0 1 0
3 C5H6O2N3S 5 6 2 3 1
Another option is to convert the text to Json then read it into R:
a <- gsub("(\\D)(\\d+)", '"\\1":\\2,', df[,1])
b <- gsub("([A-Z])$", '"\\1":1', trimws(a, whitespace = ","))
cbind(df, jsonlite::fromJSON(sprintf("[{%s}]",paste(b, collapse = "}, {"))))
replace(f, is.na(f), 0)
row.identity..main.ID. C H O N S
1 C5H6O2N3 5 6 2 3 0
2 C10H12N 10 12 0 1 0
3 C5H6O2N3S 5 6 2 3 1
You can try the code below
df <- cbind(
df,
do.call(
rbind,
Map(function(x) {
x <- gsub("(?<=[A-z])(?![0-9])","1",x,perl = TRUE)
table(
factor(rep(
gsub("\\d+", "", x),
as.numeric(gsub("\\D+", "", x))
), levels = c("C", "H", "O", "N", "S", "X"))
)
}, regmatches(df$ID, gregexpr("[A-z]+(\\d+)?", df$ID)))
)
)
which gives
> df
ID C H O N S X
1 C5H6O2N3 5 6 2 3 0 0
2 C10H12N 10 12 0 1 0 0
3 C5H6O2N3S 5 6 2 3 1 0
Data
> dput(df)
structure(list(ID = c("C5H6O2N3", "C10H12N", "C5H6O2N3S"), C = c(5L,
10L, 5L), H = c(6L, 12L, 6L), O = c(2L, 0L, 2L), N = c(3L, 1L,
3L), S = c(0L, 0L, 1L), X = c(0L, 0L, 0L)), class = "data.frame", row.names = c(NA,
-3L))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With