Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Julia DataFrame: remove column by name

The DataFrame type in Julia allows you to access it as an array, so it is possible to remove columns via indexing:

df = df[:,[1:2,4:end]] # remove column 3 

The problem with this approach is that I often only know the column's name, not its column index in the table.

Is there a built-in way to remove a column by name?

Alternatively, is there a better way to do it than this?

colind = findfirst(names(df), colsymbol) df = df[:,[1:colind-1,colind+1:end]] 

The above is failure prone; there are a few edge-cases (single column, first column, last column, symbol not in table, etc.)

Thank you

like image 247
Mageek Avatar asked Jul 09 '14 23:07

Mageek


1 Answers

You can use select!:

julia> df = DataFrame(A = 1:4, B = ["M", "F", "F", "M"], C = 2:5) 4x3 DataFrame |-------|---|-----|---| | Row # | A | B   | C | | 1     | 1 | "M" | 2 | | 2     | 2 | "F" | 3 | | 3     | 3 | "F" | 4 | | 4     | 4 | "M" | 5 |  julia> select!(df, Not(:B)) 4x2 DataFrame |-------|---|---| | Row # | A | C | | 1     | 1 | 2 | | 2     | 2 | 3 | | 3     | 3 | 4 | | 4     | 4 | 5 | 

For more general ops, remember that you can pass an array of Symbols or a bool array too, and so arbitrarily complicated selections like

julia> df[~[(x in [:B, :C]) for x in names(df)]] 4x1 DataFrame |-------|---| | Row # | A | | 1     | 1 | | 2     | 2 | | 3     | 3 | | 4     | 4 |  julia> df[setdiff(names(df), [:C])] 4x1 DataFrame |-------|---| | Row # | A | | 1     | 1 | | 2     | 2 | | 3     | 3 | | 4     | 4 | 

will also work.

like image 179
DSM Avatar answered Oct 18 '22 12:10

DSM