Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data transformation for machine learning

I have dataset with SKU IDs and their counts, i need to feed this data into a machine learning algorithm, in a way that SKU IDs become columns and COUNTs are at the intersection of transaction id and SKU ID. Can anyone suggest how to achieve this transformation.

CURRENT DATA

TransID     SKUID      COUNT
1           31         1  
1           32         2 
1           33         1  
2           31         2  
2           34         -1  

DESIRED DATA

TransID      31      32      33      34
  1          1        2      1       0
  2          2        0      0       -1  
like image 427
Arslán Avatar asked May 25 '26 12:05

Arslán


1 Answers

In R, we can use either xtabs

xtabs(COUNT~., df1)
#         SKUID
#TransID 31 32 33 34
#     1  1  2  1  0
#     2  2  0  0 -1

Or dcast

library(reshape2)
dcast(df1, TransID~SKUID, value.var="COUNT", fill=0)
#  TransID 31 32 33 34
#1       1  1  2  1  0
#2       2  2  0  0 -1

Or spread

library(tidyr)
spread(df1, SKUID, COUNT, fill=0)
like image 155
akrun Avatar answered May 28 '26 08:05

akrun



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!