Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

strsplit a melted dataset

I am trying to graph the results of a genetic test that comes in the format of a large CSV file. Each x,y position in the CSV is a number score, most of which are zero. I am only interested in non-zero data. Also the names of each X and Y title have additional information that I would like to use to subset the data further. What I wanted to do is to melt the data, strip all the rows with zero values and String-split the melted data to give extra columns that I could use for casting. However I run into a problem when I try to string-split the melted data. Here are the commands and some sample data:

test <- read.csv("~/Documents/Bioinformatics/Python_Scripts/test.csv", as.is=TRUE)
smalltest <- test[1:10, 1:4]
small.melt <- melt(smalltest)
head(smalltest)
head(small.melt)

This results in the data below:

head(small.test)
BlastCompare Triostin_A_2 Triostin_A_1 Myxochelin_2 Myxochelin_1 
HA9WEQA05FUABT_497_TxR_K2            0            0      105          120 
G9VUOJT08JA64I_426_TxC_N3            0            0  0            0 
HA9WEQA06G2SFP_457_TxC_J4            0            0     0            0 
HA9WEQA05GCP8Q_506_TxR_J7          150          150    0            0 
HA9WEQA07HU6MW_421_TxR_P7            0            0    0            0 
G9VUOJT05FST3W_399_TxR_J4            0            0    255          240

head(small.melt)

BlastCompare     variable value 
HA9WEQA05FUABT_497_TxR_K2Triostin_A_2     0  
G9VUOJT08JA64I_426_TxC_N3 Triostin_A_2     0 
HA9WEQA06G2SFP_457_TxC_J4 Triostin_A_2     0 
HA9WEQA05GCP8Q_506_TxR_J7 Triostin_A_2   150 
HA9WEQA07HU6MW_421_TxR_P7 Triostin_A_2     0 
G9VUOJT05FST3W_399_TxR_J4 Triostin_A_2     0

However when I try to string split on the $variable column gives this result:

small.melt$name <- sapply(strsplit(small.melt$variable, "_") , "[", 1)
Error in strsplit(small.melt$variable, "_") : non-character argument

Any thoughts on why? Or how to get around this?

thanks zach cp

like image 965
zach Avatar asked Nov 24 '25 14:11

zach


1 Answers

The problem is that small.melt$variable is of class factor, while strsplit() expects a character vector as it first argument. (It pretty much tells you so with the error message it returns above and in the stripped down example below):

f <- as.factor(c("a_b", "a_c"))
strsplit(f, "_")
Error in strsplit(f, "_") : non-character argument

To make strsplit() happy, simply use as.character() to convert the factor to a character vector:

sapply(strsplit(as.character(small.melt$variable), "_") , "[", 1)
#  [1] "Triostin"   "Triostin"   "Triostin"   "Triostin"   "Triostin"  
#  [6] "Triostin"   "Triostin"   "Triostin"   "Triostin"   "Triostin"  
# [11] "Triostin"   "Triostin"   "Myxochelin" "Myxochelin" "Myxochelin"
# [16] "Myxochelin" "Myxochelin" "Myxochelin" "Myxochelin" "Myxochelin"
# [21] "Myxochelin" "Myxochelin" "Myxochelin" "Myxochelin"
like image 163
Josh O'Brien Avatar answered Nov 26 '25 04:11

Josh O'Brien



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!