<p>Hey so I'm pretty new to R and only familiar with some functions.I have a row data of around 2,000,000 rows. </p> <p>Raw data is like this, an item has four kinds of tariff (AHS, BND, MFN, PRF). Some data have PRF and some don't. The goal is to transform each item's tariff into a column by the type of tariff.</p> <pre class="prettyprint"><code>AHS 3.00 BND 3.80 MFN 4.00 PRF 2.00 AHS 4.00 BND 3.80 MFN 4.00 </code></pre> <p>How to transform the raw data into like this:</p> <pre class="prettyprint"><code>AHS BND MFN PRF 3.00 3.80 4.00 2.00 4.00 3.80 4.00 NA </code></pre> <p>I tried rbind, for those don't have PRF, R will assign the AHS to PRF. </p> <p>Can anyone tell me how to do this transformation? Thanks a lot!</p>

<p>You can use <code>ave</code> in base R or a comparable approach in a package to create the "id" variable. Since some "PRF" values are missing, you probably also need to use <code>cummax</code> during the id creation stage.</p> <p>Here are some alternatives, all using @G.Grothendieck's sample data. My vote would go for the "data.table" approach.</p> <pre class="prettyprint"><code>DF <- data.frame( V1 = c("AHS", "BND", "MFN", "PRF", "AHS", "BND", "MFN"), V2 = c(3, 3.8, 4, 2, 4, 3.8, 4), stringsAsFactors = FALSE) </code></pre> <h3>Base R: <code>reshape</code> </h3> <p>Notorious for its syntax... and probably not recommended for working with 2M rows....</p> <pre class="prettyprint"><code>reshape(within(DF, { id <- cummax(ave(V1, V1, FUN = seq_along)) }), direction = "wide", idvar = "id", timevar = "V1") </code></pre> <h3>Base R: <code>xtabs</code> </h3> <p>Easier to remember syntax, but less flexible. Also, returns a <code>matrix</code>, so you'll need to use <code>as.data.frame.matrix</code> if you want to get a <code>data.frame</code>. Fills missing values with "0", which may not be desirable.</p> <pre class="prettyprint"><code>xtabs(V2 ~ id + V1, within(DF, { id <- cummax(ave(V1, V1, FUN = seq_along)) })) </code></pre> <h3>"data.table"</h3> <p>Fast. Predictable behavior from <code>dcast.data.table</code> following behavior long established by <code>dcast</code> from "reshape2".</p> <pre class="prettyprint"><code>library(data.table) dcast.data.table( as.data.table(DF)[, id := sequence(.N), by = V1][, id := cummax(id)], id ~ V1, value.var = "V2") # id AHS BND MFN PRF # 1: 1 3 3.8 4 2 # 2: 2 4 3.8 4 NA </code></pre>

<p>Create a <code>grp</code> variable which is 1 for the first group, 2 for the second, etc. Then use <code>tapply</code></p> <pre class="prettyprint"><code>grp <- cumsum(DF$V1 == "AHS") tapply(DF$V2, list(grp, DF$V1), sum) </code></pre> <p>giving:</p> <pre class="prettyprint"><code> AHS BND MFN PRF 1 3 3.8 4 2 2 4 3.8 4 NA </code></pre> <p>We used this as the data:</p> <pre class="prettyprint"><code>DF <- data.frame(V1 = c("AHS", "BND", "MFN", "PRF", "AHS", "BND", "MFN"), V2 = c(3, 3.8, 4, 2, 4, 3.8, 4), stringsAsFactors = FALSE) </code></pre>

Transform row data into column by certain row name in R

Tags:

merge

r

transpose

transformation

Hey so I'm pretty new to R and only familiar with some functions.I have a row data of around 2,000,000 rows.

Raw data is like this, an item has four kinds of tariff (AHS, BND, MFN, PRF). Some data have PRF and some don't. The goal is to transform each item's tariff into a column by the type of tariff.

AHS      3.00 
BND      3.80
MFN      4.00
PRF      2.00
AHS      4.00
BND      3.80
MFN      4.00

How to transform the raw data into like this:

AHS   BND   MFN   PRF
3.00  3.80  4.00  2.00
4.00  3.80  4.00  NA

I tried rbind, for those don't have PRF, R will assign the AHS to PRF.

Can anyone tell me how to do this transformation? Thanks a lot!

254

asked Oct 03 '14 23:10

StatCC

2 Answers

You can use ave in base R or a comparable approach in a package to create the "id" variable. Since some "PRF" values are missing, you probably also need to use cummax during the id creation stage.

Here are some alternatives, all using @G.Grothendieck's sample data. My vote would go for the "data.table" approach.

DF <- data.frame(
  V1 = c("AHS", "BND", "MFN", "PRF", "AHS", "BND", "MFN"), 
  V2 = c(3, 3.8, 4, 2, 4, 3.8, 4), 
  stringsAsFactors = FALSE)

Base R: `reshape`

Notorious for its syntax... and probably not recommended for working with 2M rows....

reshape(within(DF, {
  id <- cummax(ave(V1, V1, FUN = seq_along))
}), direction = "wide", idvar = "id", timevar = "V1")

Base R: `xtabs`

Easier to remember syntax, but less flexible. Also, returns a matrix, so you'll need to use as.data.frame.matrix if you want to get a data.frame. Fills missing values with "0", which may not be desirable.

xtabs(V2 ~ id + V1, within(DF, {
  id <- cummax(ave(V1, V1, FUN = seq_along))
}))

"data.table"

Fast. Predictable behavior from dcast.data.table following behavior long established by dcast from "reshape2".

library(data.table)
dcast.data.table(
  as.data.table(DF)[, id := sequence(.N), by = V1][, id := cummax(id)], 
                 id ~ V1, value.var = "V2")
#    id AHS BND MFN PRF
# 1:  1   3 3.8   4   2
# 2:  2   4 3.8   4  NA

198

answered Oct 11 '22 10:10

A5C1D2H2I1M1N2O1R2T1

Create a grp variable which is 1 for the first group, 2 for the second, etc. Then use tapply

grp <- cumsum(DF$V1 == "AHS")
tapply(DF$V2, list(grp, DF$V1), sum)

giving:

  AHS BND MFN PRF
1   3 3.8   4   2
2   4 3.8   4  NA

We used this as the data:

DF <- data.frame(V1 = c("AHS", "BND", "MFN", "PRF", "AHS", "BND", "MFN"), 
                 V2 = c(3, 3.8, 4, 2, 4, 3.8, 4), stringsAsFactors = FALSE)

answered Oct 11 '22 10:10

G. Grothendieck

Related questions
                            
                                Naming list items via loop in R
                            
                                Knitr-error from Task Schedule Manager
                            
                                How do I diagnose "unable to create socket"?
                            
                                How to declare input into Rcpp functions?
                            
                                ggplot2 stat_function with calculated argument for different data subset inside a facet_grid
                            
                                Inserting Latex equations in R Markdown in Shiny mode
                            
                                R regex: remove times from character string
                            
                                Regex to extract US zip codes but not faux codes
                            
                                Error when knitr has to download a zip file
                            
                                R removing unicode linebreaks
                            
                                Fill in missing year in ordered list of dates
                            
                                Ranking NAs in a vector equally [r]
                            
                                Selecting most recently changed reactive expressions in Shiny
                            
                                R Generic solution to create 2*2 confusion matrix
                            
                                subset a matrix, and get NA if index is not valid
                            
                                xtable thead in html output
                            
                                Aligning text annotation in ggplot2
                            
                                R: Error in nrow[w] * ncol[w] : non-numeric argument to binary operator, while using neuralnet package
                            
                                caret's helper functions for feature selection: caretSBF and caretFuncts
                            
                                Disable Selectize Input Shiny

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Transform row data into column by certain row name in R

Tags:

merge

r

transpose

transformation

StatCC

People also ask

2 Answers

Base R: `reshape`

Base R: `xtabs`

"data.table"

A5C1D2H2I1M1N2O1R2T1

G. Grothendieck

Recent Activity

Donate For Us

Transform row data into column by certain row name in R

Tags:

merge

r

transpose

transformation

StatCC

People also ask

2 Answers

Base R: reshape

Base R: xtabs

"data.table"

A5C1D2H2I1M1N2O1R2T1

G. Grothendieck

Related questions

Recent Activity

Donate For Us

Base R: `reshape`

Base R: `xtabs`