The DataFrame type in Julia allows you to access it as an array, so it is possible to remove columns via indexing:
df = df[:,[1:2,4:end]] # remove column 3
The problem with this approach is that I often only know the column's name, not its column index in the table.
Is there a built-in way to remove a column by name?
Alternatively, is there a better way to do it than this?
colind = findfirst(names(df), colsymbol) df = df[:,[1:colind-1,colind+1:end]]
The above is failure prone; there are a few edge-cases (single column, first column, last column, symbol not in table, etc.)
Thank you
You can use select!
:
julia> df = DataFrame(A = 1:4, B = ["M", "F", "F", "M"], C = 2:5) 4x3 DataFrame |-------|---|-----|---| | Row # | A | B | C | | 1 | 1 | "M" | 2 | | 2 | 2 | "F" | 3 | | 3 | 3 | "F" | 4 | | 4 | 4 | "M" | 5 | julia> select!(df, Not(:B)) 4x2 DataFrame |-------|---|---| | Row # | A | C | | 1 | 1 | 2 | | 2 | 2 | 3 | | 3 | 3 | 4 | | 4 | 4 | 5 |
For more general ops, remember that you can pass an array of Symbols or a bool array too, and so arbitrarily complicated selections like
julia> df[~[(x in [:B, :C]) for x in names(df)]] 4x1 DataFrame |-------|---| | Row # | A | | 1 | 1 | | 2 | 2 | | 3 | 3 | | 4 | 4 | julia> df[setdiff(names(df), [:C])] 4x1 DataFrame |-------|---| | Row # | A | | 1 | 1 | | 2 | 2 | | 3 | 3 | | 4 | 4 |
will also work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With