I have data like dataframe df_a, and want to have it converted to the format as in dataframe df_b.
xtabs() gives similar result, but I did not find a way to access elements as in the example code below. Accessing through xa[1,1] gives no advantage since there is a weak correlation between indexing by numbers ("1") and names ("A"). As you can see there is a sort difference in the xtabs() result, so xa[2,2]=2 and not 0 as on the df_b listing.
> df_a
ItemName Feature Amount
1 First A 2
2 First B 3
3 First A 4
4 Second C 3
5 Second C 2
6 Third D 1
7 Fourth B 2
8 Fourth D 3
9 Fourth D 2
> df_b
ItemName A B C D
1 First 6 3 0 0
2 Second 0 0 5 0
3 Third 0 0 0 1
4 Fourth 0 2 0 5
> df_b$A
[1] 6 0 0 0
> xa<-xtabs(df_a$Amount~df_a$ItemName+df_a$Feature)
> xa
df_a$Feature
df_a$ItemName A B C D
First 6 3 0 0
Fourth 0 2 0 5
Second 0 0 5 0
Third 0 0 0 1
> xa$A
Error in xa$A : $ operator is invalid for atomic vectors
There is a way of iterative conversion with for() loops, but totally inefficient in my case because my data has millions of records.
For the purpose of further processing my required output format is dataframe. If anyone solved similar problem please share.
The as. data. frame() function converts a table to a data frame in a format that you need for regression analysis on count data. If you need to summarize the counts first, you use table() to create the desired table.
A matrix can be converted to a dataframe by using a function called as. data. frame(). It will take each column from the matrix and convert it to each column in the dataframe.
How to Create a Data Frame. We can create a dataframe in R by passing the variable a,b,c,d into the data. frame() function. We can R create dataframe and name the columns with name() and simply specify the name of the variables.
You can just use as.data.frame.matrix(xa)
# output
A B C D
First 6 3 0 0
Fourth 0 2 0 5
Second 0 0 5 0
Third 0 0 0 1
## or
df_b <- as.data.frame.matrix(xa)[unique(df_a$ItemName), ]
data.frame(ItemName = row.names(df_b), df_b, row.names = NULL)
# output
ItemName A B C D
1 First 6 3 0 0
2 Second 0 0 5 0
3 Third 0 0 0 1
4 Fourth 0 2 0 5
Without using xtabs
you can do something like this:
df %>%
dplyr::group_by(ItemName, Feature) %>%
dplyr::summarise(Sum=sum(Amount, na.rm = T)) %>%
tidyr::spread(Feature, Sum, fill=0) %>%
as.data.frame()
This will transform as you require and it stays as a data.frame
Or, you can just as.data.frame(your_xtabs_result)
and that should work too
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With