Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multiple column selection on a Julia DataFrame

Imagine I have the following DataFrame :

10 rows x 26 columns named A to Z

What I would like to do is to make a multiple subset of the columns by their name (not the index). For instance, assume that I want columns A to D and P to Z in a new DataFrame named df2.

I tried something like this but it doesn't seem to work :

df2=df[:,[:A,:D ; :P,:Z]]

syntax: unexpected semicolon in array expression top-level scope at Slicing.jl:1

Any idea of the way to do it ? Thanks for any help

like image 697
Bebio Avatar asked Mar 03 '23 05:03

Bebio


2 Answers

df2 = select(df, Between(:A,:D), Between(:P,:Z))

or

df2 = df[:, All(Between(:A,:D), Between(:P,:Z))]

if you are sure your columns are only from :A to :Z you can also write:

df2 = select(df, Not(Between(:E, :O)))

or

df2 = df[:, Not(Between(:E, :O))]

Finally, you can easily find an index of the column using columnindex function, e.g.:

columnindex(df, :A)

and later use column numbers - if this is something what you would prefer.

like image 77
Bogumił Kamiński Avatar answered Mar 05 '23 17:03

Bogumił Kamiński


In Julia you can also build Ranges with Chars and hence when your columns are named just by single letters yet another option is:

df[:, Symbol.(vcat('A':'D', 'P':'Z'))]
like image 36
Przemyslaw Szufel Avatar answered Mar 05 '23 16:03

Przemyslaw Szufel