Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Transpose of Julia DataFrame

Let's create Julia DataFrame

df=convert(DataFrame, rand(10, 4))

It would look like this. I am trying to take the transpose of this dataFrame. "transpose" function appears to be not working for Julia Data Frame as shown below.

enter image description here

I have used Python Pandas dataframe package extensively in the past. In Python, it would be as easy as "df.T" Please let me know a way to Tranpose this dataframe.

like image 745
nyan314sn Avatar asked Jun 06 '16 23:06

nyan314sn


2 Answers

I had the same question and tried the strategy suggested in the comments to your question. The problem I encountered, however, is that converting to a Matrix won't work if your DataFrame has NA values. You have to change them to something else, then convert to a Matrix. I had a lot of problems converting back to NA when I wanted to get from a Matrix back to a DataFrame type.

Here's a way to do it using DataFrame's stack and unstack functions.

julia> using DataFrames

julia> df = DataFrame(A = 1:4, B = 5:8)
4×2 DataFrame
│ Row │ A     │ B     │
│     │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1   │ 1     │ 5     │
│ 2   │ 2     │ 6     │
│ 3   │ 3     │ 7     │
│ 4   │ 4     │ 8     │

julia> colnames = names(df)
2-element Array{Symbol,1}:
 :A
 :B

julia> df[!, :id] = 1:size(df, 1)
1:4

julia> df
4×3 DataFrame
│ Row │ A     │ B     │ id    │
│     │ Int64 │ Int64 │ Int64 │
├─────┼───────┼───────┼───────┤
│ 1   │ 1     │ 5     │ 1     │
│ 2   │ 2     │ 6     │ 2     │
│ 3   │ 3     │ 7     │ 3     │
│ 4   │ 4     │ 8     │ 4     │

Adding the :id column is suggested by the DataFrame documentation as a way to help with unstacking.

Now stack the columns you want to transpose:

julia> dfl = stack(df, colnames)
8×3 DataFrame
│ Row │ variable │ value │ id    │
│     │ Symbol   │ Int64 │ Int64 │
├─────┼──────────┼───────┼───────┤
│ 1   │ A        │ 1     │ 1     │
│ 2   │ A        │ 2     │ 2     │
│ 3   │ A        │ 3     │ 3     │
│ 4   │ A        │ 4     │ 4     │
│ 5   │ B        │ 5     │ 1     │
│ 6   │ B        │ 6     │ 2     │
│ 7   │ B        │ 7     │ 3     │
│ 8   │ B        │ 8     │ 4     │

Then unstack, switching the id and variable names (this is why adding the :id column is necessary).

julia> dfnew = unstack(dfl, :variable, :id, :value)
2×5 DataFrame
│ Row │ variable │ 1      │ 2      │ 3      │ 4      │
│     │ Symbol   │ Int64⍰ │ Int64⍰ │ Int64⍰ │ Int64⍰ │
├─────┼──────────┼────────┼────────┼────────┼────────┤
│ 1   │ A        │ 1      │ 2      │ 3      │ 4      │
│ 2   │ B        │ 5      │ 6      │ 7      │ 8      │
like image 110
Stephen Nicar Avatar answered Oct 06 '22 23:10

Stephen Nicar


The problem with Stephen answer, is that order of columns is not preserved (try if you are not convinced with the following DataFrame

julia> df = DataFrame(A = 1:4, B = 5:8, AA = 15:18)
4×3 DataFrame
│ Row │ A     │ B     │ AA    │
│     │ Int64 │ Int64 │ Int64 │
├─────┼───────┼───────┼───────┤
│ 1   │ 1     │ 5     │ 15    │
│ 2   │ 2     │ 6     │ 16    │
│ 3   │ 3     │ 7     │ 17    │
│ 4   │ 4     │ 8     │ 18    │

but this DataFrame can be transposed (keeping order of columns/rows) using:

julia> DataFrame([[names(df)]; collect.(eachrow(df))], [:column; Symbol.(axes(df, 1))])
3×5 DataFrame
│ Row │ column │ 1     │ 2     │ 3     │ 4     │
│     │ Symbol │ Int64 │ Int64 │ Int64 │ Int64 │
├─────┼────────┼───────┼───────┼───────┼───────┤
│ 1   │ A      │ 1     │ 2     │ 3     │ 4     │
│ 2   │ B      │ 5     │ 6     │ 7     │ 8     │
│ 3   │ AA     │ 15    │ 16    │ 17    │ 18    │

Reference: https://github.com/JuliaData/DataFrames.jl/issues/2065#issuecomment-568937464

like image 23
scls Avatar answered Oct 07 '22 01:10

scls