I have a Dataframe of several columns say column1, column2...column100. How do I select only a subset of the columns eg (not column1) should return all columns column2...column100. <pre class="prettyprint"><code>data[[colnames(data) .!= "column1"]]) </code></pre> doesn't seem to work. I don't want to mutate the dataframe. I just want to select all the columns that don't have a particular column name like in my example

EDIT 2/7/2021: as people seem to still find this on Google, I'll edit this to say write at the top that current DataFrames (1.0+) allows both <code>Not()</code> selection supported by <code>InvertedIndices.jl</code> and also string types as column names, including regex selection with the <code>r""</code> string macro. Examples: <pre class="prettyprint"><code>julia> df = DataFrame(a1 = rand(2), a2 = rand(2), x1 = rand(2), x2 = rand(2), y = rand(["a", "b"], 2)) 2×5 DataFrame Row │ a1 a2 x1 x2 y │ Float64 Float64 Float64 Float64 String ─────┼──────────────────────────────────────────────── 1 │ 0.784704 0.963761 0.124937 0.37532 a 2 │ 0.814647 0.986194 0.236149 0.468216 a julia> df[!, r"2"] 2×2 DataFrame Row │ a2 x2 │ Float64 Float64 ─────┼──────────────────── 1 │ 0.963761 0.37532 2 │ 0.986194 0.468216 julia> df[!, Not(r"2")] 2×3 DataFrame Row │ a1 x1 y │ Float64 Float64 String ─────┼──────────────────────────── 1 │ 0.784704 0.124937 a 2 │ 0.814647 0.236149 a </code></pre> Finally, the <code>names</code> function has a method which takes a type as its second argument, which is handy for subsetting DataFrames by the element type of each column: <pre class="prettyprint"><code> julia> df[!, names(df, String)] 2×1 DataFrame Row │ y │ String ─────┼──────── 1 │ a 2 │ a </code></pre> In addition to indexing with square brackets, there's also the <code>select</code> function (and its mutating equivalent <code>select!</code>), which basically takes the same input as the column index in <code>[]</code>-indexing as its second argument: <pre class="prettyprint"><code>julia> select(df, Not(r"a")) 2×3 DataFrame Row │ x1 x2 y │ Float64 Float64 String ─────┼──────────────────────────── 1 │ 0.124937 0.37532 a 2 │ 0.236149 0.468216 a </code></pre> Original answer below <hr> As @Reza Afzalan said, what you're trying to do returns an array of strings, while column names in DataFrames are symbols. Given that Julia doesn't have conditional list comprehension, the nicest thing you could do I guess would be <pre class="prettyprint"><code>data[:, filter(x -> x != :column1, names(df))] </code></pre> This will give you the data set with column 1 removed (without mutating it). You could extend this to checking against lists of names as well: <pre class="prettyprint"><code>data[:, filter(x -> !(x in [:column1,:column2]), names(df))] </code></pre> UPDATE: As Ian says below, for this use case the <code>Not</code> syntax is now the best way to go. More generally, conditional list comprehensions are also available by now, so you could do: <pre class="prettyprint"><code>data[:, [x for x in names(data) if x != :column1]] </code></pre>

As of DataFrames 0.19, seems that you can now do <pre class="prettyprint"><code>select(data, Not(:column1)) </code></pre> to select all but the column <code>column1</code>. To select all except for multiple columns, use an array in the inverted index: <pre class="prettyprint"><code>select(data, Not([:column1, :column2])) </code></pre>

To select several columns by name: <pre class="prettyprint"><code> df[[:col1, :col2] </code></pre> or, for other versions of the DataFrames library, I use: <pre class="prettyprint"><code>select(df, [:col1, :col2]) </code></pre>

How to select only a subset of dataframe columns in julia

Tags:

julia

I have a Dataframe of several columns say column1, column2...column100. How do I select only a subset of the columns eg (not column1) should return all columns column2...column100.

data[[colnames(data) .!= "column1"]])

doesn't seem to work.

I don't want to mutate the dataframe. I just want to select all the columns that don't have a particular column name like in my example

710

asked Sep 14 '15 06:09

Vishnu

3 Answers

EDIT 2/7/2021: as people seem to still find this on Google, I'll edit this to say write at the top that current DataFrames (1.0+) allows both Not() selection supported by InvertedIndices.jl and also string types as column names, including regex selection with the r"" string macro. Examples:

julia> df = DataFrame(a1 = rand(2), a2 = rand(2), x1 = rand(2), x2 = rand(2), y = rand(["a", "b"], 2))
2×5 DataFrame
 Row │ a1        a2        x1        x2        y      
     │ Float64   Float64   Float64   Float64   String 
─────┼────────────────────────────────────────────────
   1 │ 0.784704  0.963761  0.124937  0.37532   a
   2 │ 0.814647  0.986194  0.236149  0.468216  a

julia> df[!, r"2"]
2×2 DataFrame
 Row │ a2        x2       
     │ Float64   Float64  
─────┼────────────────────
   1 │ 0.963761  0.37532
   2 │ 0.986194  0.468216

julia> df[!, Not(r"2")]
2×3 DataFrame
 Row │ a1        x1        y      
     │ Float64   Float64   String 
─────┼────────────────────────────
   1 │ 0.784704  0.124937  a
   2 │ 0.814647  0.236149  a

Finally, the names function has a method which takes a type as its second argument, which is handy for subsetting DataFrames by the element type of each column:


julia> df[!, names(df, String)]
2×1 DataFrame
 Row │ y      
     │ String 
─────┼────────
   1 │ a
   2 │ a

In addition to indexing with square brackets, there's also the select function (and its mutating equivalent select!), which basically takes the same input as the column index in []-indexing as its second argument:

julia> select(df, Not(r"a"))
2×3 DataFrame
 Row │ x1        x2        y      
     │ Float64   Float64   String 
─────┼────────────────────────────
   1 │ 0.124937  0.37532   a
   2 │ 0.236149  0.468216  a

Original answer below

As @Reza Afzalan said, what you're trying to do returns an array of strings, while column names in DataFrames are symbols.

Given that Julia doesn't have conditional list comprehension, the nicest thing you could do I guess would be

data[:, filter(x -> x != :column1, names(df))]

This will give you the data set with column 1 removed (without mutating it). You could extend this to checking against lists of names as well:

data[:, filter(x -> !(x in [:column1,:column2]), names(df))]

UPDATE: As Ian says below, for this use case the Not syntax is now the best way to go.

More generally, conditional list comprehensions are also available by now, so you could do:

data[:, [x for x in names(data) if x != :column1]]

137

answered Oct 21 '22 23:10

Nils Gudat

As of DataFrames 0.19, seems that you can now do

select(data, Not(:column1))

to select all but the column column1. To select all except for multiple columns, use an array in the inverted index:

select(data, Not([:column1, :column2]))

answered Oct 21 '22 23:10

Ian Fiske

To select several columns by name:

 df[[:col1, :col2]

or, for other versions of the DataFrames library, I use:

select(df, [:col1, :col2])

answered Oct 21 '22 23:10

Timothée HENRY

Related questions
                            
                                Speeding up package load in Julia
                            
                                Confused by memory allocation and garbage collection in Julia
                            
                                How to plot a vector field in Julia?
                            
                                Abstract types and inheritance in Julia
                            
                                Julia: check whether array entry is undef
                            
                                Julia compiler does not appear to optimize when a function is passed a function
                            
                                How to write a parallel loop in julia?
                            
                                Single vs Double quotes in Julia
                            
                                What exactly is the difference between @parallel and pmap?
                            
                                How can I do web scraping in Julia?
                            
                                dplyr like %>% syntax in julia
                            
                                Scope of variables in Julia
                            
                                Multi-threaded parallelism performance problem with Fibonacci sequence in Julia (1.3)
                            
                                Does Julia support static variables with function-scope
                            
                                How do you find the unicode value of a character in Julia?
                            
                                In Julia, how to merge a dictionary?
                            
                                Mutating function in Julia (function that modifies its arguments)
                            
                                Partition Equivalent in Julia
                            
                                Assign only if not already defined in Julia
                            
                                immutable vs struct and type vs mutable struct in Julia

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to select only a subset of dataframe columns in julia

Tags:

julia

Vishnu

People also ask

3 Answers

Nils Gudat

Ian Fiske

Timothée HENRY

Recent Activity

Donate For Us