Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Going from a list of elements to chemical formula

I have a list of elemental compositions, each element in it's own row. Sometimes these elements have a zero.

   C H N O S
1  5 5 0 0 0
2  6 4 1 0 1
3  4 6 2 1 0

I need to combine them so that they read, e.g. C5H5, C6H4NS, C4H6N2O. This means that for any element of value "1" I should only take the column name, and for anything with value 0, the column should be skipped altogether.

I'm not really sure where to start here. I could add a new column to make it easier to read across the columns, e.g.

   c C h H n N o O s S
1  C 5 H 5 N 0 O 0 S 0
2  C 6 H 4 N 1 O 0 S 1
3  C 4 H 6 N 2 O 1 S 0

This way, I just need the output to be a single string, but I need to ignore any zero values, and drop the one after the element name.

like image 595
HarD Avatar asked Dec 24 '22 03:12

HarD


2 Answers

And here a base R solution:

df = read.table(text = "
C H N O S
5 5 0 0 0
6 4 1 0 1
4 6 2 1 0
", header=T)

apply(df, 1, function(x){return(gsub('1', '', paste0(colnames(df)[x > 0], x[x > 0], collapse='')))})
[1] "C5H5"    "C6H4NS"  "C4H6N2O"

paste0(colnames(df)[x > 0], x[x > 0], collapse='') pastes together the column names where the row values are bigger than zero. gsub then removes the ones. And apply does this for each row in the data frame.

like image 78
bobbel Avatar answered Dec 30 '22 11:12

bobbel


Here's a tidyverse solution that uses some reshaping:

df = read.table(text = "
C H N O S
5 5 0 0 0
6 4 1 0 1
4 6 2 1 0
", header=T)

library(tidyverse)

df %>%
  mutate(id = row_number()) %>%                      # add row id
  gather(key, value, -id) %>%                        # reshape data
  filter(value != 0) %>%                             # remove any zero rows
  mutate(value = ifelse(value == 1, "", value)) %>%  # replace 1 with ""
  group_by(id) %>%                                   # for each row
  summarise(v = paste0(key, value, collapse = ""))   # create the string value

# # A tibble: 3 x 2
#      id v      
#   <int> <chr>  
# 1     1 C5H5   
# 2     2 C6H4NS 
# 3     3 C4H6N2O
like image 26
AntoniosK Avatar answered Dec 30 '22 10:12

AntoniosK