Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reshape data with repeated columns

Tags:

r

reshape

I am trying to use reshape to restructure my data set.

Here is a subset of my data, which is a 16 X 198 data frame. Every odd column is a list of 16 years, and and even column has values a different country.

   Algeria.x Algeria.y Argentina.x Argentina.y
1       1985     37.48        1985       27.86
2       1986     36.26        1986       27.52
3       1987     35.04        1987       27.18
4       1988     33.82        1988       26.84
5       1989     32.60        1989       26.50
6       1990        NA        1990       25.50
7       1991        NA        1991       24.50
8       1992        NA        1992       23.50
9       1993        NA        1993       22.50
10      1994        NA        1994       21.50
11      1995        NA        1995       22.12
12      1996        NA        1996       22.74
13      1997        NA        1997       23.36
14      1998        NA        1998       23.98
15      1999        NA        1999       24.60
16      2000        NA        2000          NA

I would like to reshape the data so that it has three columns. The first for country name, the second for year, and the third for value. This would be a long matrix that is 1584 x 3.

like image 342
user1288578 Avatar asked Mar 23 '12 22:03

user1288578


4 Answers

I would use the stack function twice, after splitting the data into two data.frames: one for the years and one for the values:

# split the data into two data.frames
years.df  <- df[, seq(from = 1, to = ncol(df), by = 2)]
values.df <- df[, seq(from = 2, to = ncol(df), by = 2)]

# remove ".x" and ".y" at the end of the country names
names(years.df)  <- sub("\\.x$", "", names(years.df))
names(values.df) <- sub("\\.y$", "", names(values.df))

# stack each data.frame into a two-column data.frame
years.stack  <- stack(years.df)
values.stack <- stack(values.df)

# gather everything into a single data.frame
final.df <- data.frame(country = years.stack$ind,
                       year    = years.stack$value,
                       value   = values.stack$value)
final.df
#      country year value
# 1    Algeria 1985 37.48
# 2    Algeria 1986 36.26
# 3    Algeria 1987 35.04
# 4    Algeria 1988 33.82
# 5    Algeria 1989 32.60
# 6    Algeria 1990    NA
# 7    Algeria 1991    NA
# 8    Algeria 1992    NA
# 9    Algeria 1993    NA
# 10   Algeria 1994    NA
# 11   Algeria 1995    NA
# 12   Algeria 1996    NA
# 13   Algeria 1997    NA
# 14   Algeria 1998    NA
# 15   Algeria 1999    NA
# 16   Algeria 2000    NA
# 17 Argentina 1985 27.86
# 18 Argentina 1986 27.52
# 19 Argentina 1987 27.18
# 20 Argentina 1988 26.84
# 21 Argentina 1989 26.50
# 22 Argentina 1990 25.50
# 23 Argentina 1991 24.50
# 24 Argentina 1992 23.50
# 25 Argentina 1993 22.50
# 26 Argentina 1994 21.50
# 27 Argentina 1995 22.12
# 28 Argentina 1996 22.74
# 29 Argentina 1997 23.36
# 30 Argentina 1998 23.98
# 31 Argentina 1999 24.60
# 32 Argentina 2000    NA
like image 130
flodel Avatar answered Oct 21 '22 07:10

flodel


One liner using the base function reshape.

reshape(dat, varying = 1:4, direction = 'long')
like image 41
Ramnath Avatar answered Oct 21 '22 06:10

Ramnath


With such a small data frame I think I'd piece this together simply by tearing apart the vectors of the original:

#read in your data
dat <- read.table(text="   Algeria.x Algeria.y Argentina.x Argentina.y
1       1985     37.48        1985       27.86
2       1986     36.26        1986       27.52
3       1987     35.04        1987       27.18
4       1988     33.82        1988       26.84
5       1989     32.60        1989       26.50
6       1990        NA        1990       25.50
7       1991        NA        1991       24.50
8       1992        NA        1992       23.50
9       1993        NA        1993       22.50
10      1994        NA        1994       21.50
11      1995        NA        1995       22.12
12      1996        NA        1996       22.74
13      1997        NA        1997       23.36
14      1998        NA        1998       23.98
15      1999        NA        1999       24.60
16      2000        NA        2000          NA")

The solution:

dat2 <- data.frame(  #tear apart original vectors & piece 'em together
    country_name = rep(c("Algeria", "Argentina"), each = nrow(dat)),
    year = unlist(dat[, c(1, 3)]), 
    value = unlist(dat[, c(2, 4)])
)

rownames(dat2) <- 1:nrow(dat2) #give proper row names
dat2
like image 28
Tyler Rinker Avatar answered Oct 21 '22 08:10

Tyler Rinker


Assuming your dataset is called "df": Original answer (using the "reshape" package):

library(reshape)
# make a new column called year, and select only even columns
df = data.frame(year=1985:2000, 
                df[, seq(from=2, to=length(names(df)), by=2)])
# optional--for removing ".y" from country name
names(df) = sub("\\.y$", "", names(df))
# "melt" your dataset
m.df2 = melt(df, id=1)

Update: A more streamlined approach

You can make use of the fact that all countries have the same year value, thus making any of the ".x" columns a potential id.var for melting your data.frame.

A little bit of cleanup is still required.

library(reshape2)
names(df) <- gsub(".y", "", names(df))
df_long <- setNames(melt(df[, c("Algeria.x", grep(".x", names(df), 
                                                  invert=TRUE, value=TRUE))],
                         id.vars="Algeria.x"), c("Year", "Country", "Value"))
list(head(df_long), tail(df_long))
# [[1]]
#   Year Country Value
# 1 1985 Algeria 37.48
# 2 1986 Algeria 36.26
# 3 1987 Algeria 35.04
# 4 1988 Algeria 33.82
# 5 1989 Algeria 32.60
# 6 1990 Algeria    NA
# 
# [[2]]
#    Year   Country Value
# 27 1995 Argentina 22.12
# 28 1996 Argentina 22.74
# 29 1997 Argentina 23.36
# 30 1998 Argentina 23.98
# 31 1999 Argentina 24.60
# 32 2000 Argentina    NA
like image 1
A5C1D2H2I1M1N2O1R2T1 Avatar answered Oct 21 '22 07:10

A5C1D2H2I1M1N2O1R2T1