How to get a data frame with the same data as an already existing matrix has?
A simplified example of my matrix:
mat <- matrix(c(0, 0.5, 1, 0.1, 0.2, 0.3, 0.3, 0.4, 0.5),
ncol = 3, nrow = 3,
dimnames = list(NULL, c("time", "C_0", "C_1")))
> mat
time C_0 C_1
[1,] 0.0 0.1 0.3
[2,] 0.5 0.2 0.4
[3,] 1.0 0.3 0.5
I would like to create a data frame that looks like this:
name time val
1 C_0 0.0 0.1
2 C_0 0.5 0.2
3 C_0 1.0 0.3
4 C_1 0.0 0.3
5 C_1 0.5 0.4
6 C_1 1.0 0.5
All my attempts are quite clumsy, for example:
data.frame(cbind(c(rep("C_1", 3), rep("C_2", 3)),
rbind(cbind(mat[,"time"], mat[,"C_0"]),
cbind(mat[,"time"], mat[,"C_1"]))))
Does anyone have an idea of how to do this more elegantly? Please note that my real data has a few more columns (40 columns).
How do you convert an array to a DataFrame in Python? To convert an array to a dataframe with Python you need to 1) have your NumPy array (e.g., np_array), and 2) use the pd. DataFrame() constructor like this: df = pd. DataFrame(np_array, columns=['Column1', 'Column2']) .
as_tibble() turns an existing object, such as a data frame or matrix, into a so-called tibble, a data frame with class tbl_df . This is in contrast with tibble() , which builds a tibble from individual columns. as_tibble() is to tibble() as base::as. data.
Both represent 'rectangular' data types, meaning that they are used to store tabular data, with rows and columns. The main difference, as you'll see, is that matrices can only contain a single class of data, while data frames can consist of many different classes of data.
If you change your time
column into row names, then you can use as.data.frame(as.table(mat))
for simple cases like this.
Example:
data <- c(0.1, 0.2, 0.3, 0.3, 0.4, 0.5)
dimnames <- list(time=c(0, 0.5, 1), name=c("C_0", "C_1"))
mat <- matrix(data, ncol=2, nrow=3, dimnames=dimnames)
as.data.frame(as.table(mat))
time name Freq
1 0 C_0 0.1
2 0.5 C_0 0.2
3 1 C_0 0.3
4 0 C_1 0.3
5 0.5 C_1 0.4
6 1 C_1 0.5
In this case time and name are both factors. You may want to convert time back to numeric, or it may not matter.
You can use stack
from the base package. But, you need first to coerce your matrix to a data.frame
and to reorder the columns once the data is stacked.
mat <- as.data.frame(mat)
res <- data.frame(time= mat$time,stack(mat,select=-time))
res[,c(3,1,2)]
ind time values
1 C_0 0.0 0.1
2 C_0 0.5 0.2
3 C_0 1.0 0.3
4 C_1 0.0 0.3
5 C_1 0.5 0.4
6 C_1 1.0 0.5
Note that stack
is generally more efficient than the reshape2
package.
melt()
from the reshape2 package gets you close ...
library(reshape2)
(res <- melt(as.data.frame(mat), id="time"))
# time variable value
# 1 0.0 C_0 0.1
# 2 0.5 C_0 0.2
# 3 1.0 C_0 0.3
# 4 0.0 C_1 0.3
# 5 0.5 C_1 0.4
# 6 1.0 C_1 0.5
... although you may want to post-process its results to get your preferred column names and ordering.
setNames(res[c("variable", "time", "value")], c("name", "time", "val"))
# name time val
# 1 C_0 0.0 0.1
# 2 C_0 0.5 0.2
# 3 C_0 1.0 0.3
# 4 C_1 0.0 0.3
# 5 C_1 0.5 0.4
# 6 C_1 1.0 0.5
Using dplyr
and tidyr
:
library(dplyr)
library(tidyr)
df <- as_data_frame(mat) %>% # convert the matrix to a data frame
gather(name, val, C_0:C_1) %>% # convert the data frame from wide to long
select(name, time, val) # reorder the columns
df
# A tibble: 6 x 3
name time val
<chr> <dbl> <dbl>
1 C_0 0.0 0.1
2 C_0 0.5 0.2
3 C_0 1.0 0.3
4 C_1 0.0 0.3
5 C_1 0.5 0.4
6 C_1 1.0 0.5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With