Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Conditional Selection on Julia DataFrames [duplicate]

#Let the CSV contain the two columns "Age" and "Gender" where:

  Age = [30, 24, 55, 61, 70, 21]

  Gender = [Male, Female, Male, Male, Male, Female]

#I want it to show me all the values (and the amount of the values) of Age that correspond to the Gender="Male" and the same for "Female"

  using DataFrames

#So this is what I try

julia> df= CSV.read(raw"Clocation)", DataFrame)
julia> df. Age
6-element Vector{Int64}:
30
24
55
61
70
21

#Adjusted for the example

julia> df. Age, Gender
ERROR: UndefVarError: Gender not defined
Stacktrace:
 [1] top-level scope
   @ REPL[26]:1

#What I want is 'df.Age, Gender=Male', but this doesn't work either and I'm really stuck :( Source: https://testdataframesjl.readthedocs.io/en/readthedocs/subsets/

#Any advice? Thank you in advance! #Edit: So then I try

julia> combine(groupby(df, :Age), :Gender=>"Male")
200×2 DataFrame
 Row │ Age    Male
     │ Int64  String7
─────┼────────────────
   1 │    18  Male
   2 │    18  Male
   3 │    18  Male
   4 │    18  Female
   5 │    19  Male
   6 │    19  Male
   7 │    19  Male
   8 │    19  Female
   9 │    19  Male
  10 │    19  Female
  11 │    19  Male
  12 │    19  Male
  13 │    20  Female
  14 │    20  Male
  15 │    20  Female
  16 │    20  Male
  17 │    20  Male
  18 │    21  Male
  19 │    21  Female
  20 │    21  Female
  21 │    21  Female
  22 │    21  Female
  23 │    22  Female
  24 │    22  Male
  25 │    22  Female
  26 │    23  Female
  27 │    23  Female
  28 │    23  Female
  ⋮  │   ⋮       ⋮
 173 │    57  Male
 174 │    57  Female
 175 │    58  Female
 176 │    58  Male
 177 │    59  Male
 178 │    59  Male
 179 │    59  Male
 180 │    59  Male
 181 │    60  Male
 182 │    60  Female
 183 │    60  Female
 184 │    63  Male
 185 │    63  Female
 186 │    64  Male
 187 │    65  Female
 188 │    65  Male
 189 │    66  Female
 190 │    66  Male
 191 │    67  Male
 192 │    67  Female
 193 │    67  Male
 194 │    67  Male
 195 │    68  Female
 196 │    68  Female
 197 │    68  Male
 198 │    69  Male
 199 │    70  Male
 200 │    70  Male
      144 rows omitted

#And now I'm just confused Source: https://discourse.julialang.org/t/how-to-count-the-number-of-categories-present-in-a-column-of-a-dataframe/33244/3

like image 533
Jeremy S Avatar asked Jun 25 '26 18:06

Jeremy S


2 Answers

julia> df. Age, Gender

where did you see this syntax?

This would be what you want?:

julia> df = DataFrame(Age = [30, 24, 55, 61, 70, 21], Gender = ["Male", "Female", "Male", "Male", "Male", "Female"]);

julia> df[df.Gender .== "Male", :]
4×2 DataFrame
 Row │ Age    Gender 
     │ Int64  String 
─────┼───────────────
   1 │    30  Male
   2 │    55  Male
   3 │    61  Male
   4 │    70  Male

julia> df.Age[df.Gender .== "Male"]
4-element Vector{Int64}:
 30
 55
 61
 70
like image 137
jling Avatar answered Jun 27 '26 08:06

jling


Apart from the answer by jling which is a simplest one here are the alternatives.

Using groupby you can create a division of the rows of the data frame by the grouping columns:

julia> gdf = groupby(df, :Gender)
GroupedDataFrame with 2 groups based on key: Gender
First Group (4 rows): Gender = "Male"
 Row │ Age    Gender
     │ Int64  String
─────┼───────────────
   1 │    30  Male
   2 │    55  Male
   3 │    61  Male
   4 │    70  Male
⋮
Last Group (2 rows): Gender = "Female"
 Row │ Age    Gender
     │ Int64  String
─────┼───────────────
   1 │    24  Female
   2 │    21  Female

julia> gdf[("Male",)]
4×2 SubDataFrame
 Row │ Age    Gender
     │ Int64  String
─────┼───────────────
   1 │    30  Male
   2 │    55  Male
   3 │    61  Male
   4 │    70  Male

julia> gdf[("Female",)]
2×2 SubDataFrame
 Row │ Age    Gender
     │ Int64  String
─────┼───────────────
   1 │    24  Female
   2 │    21  Female

If you would want only one subset you can also use filter or subset (that do a similar thing but with a different syntax):

julia> filter(:Gender => ==("Male"), df)
4×2 DataFrame
 Row │ Age    Gender
     │ Int64  String
─────┼───────────────
   1 │    30  Male
   2 │    55  Male
   3 │    61  Male
   4 │    70  Male

julia> subset(df, :Gender => ByRow(==("Male")))
4×2 DataFrame
 Row │ Age    Gender
     │ Int64  String
─────┼───────────────
   1 │    30  Male
   2 │    55  Male
   3 │    61  Male
   4 │    70  Male

Finally you can consider using DataFramesMeta.jl that probably is a bit easier to understand:

julia> using DataFramesMeta

julia> @subset(df, :Gender .== "Male")
4×2 DataFrame
 Row │ Age    Gender
     │ Int64  String
─────┼───────────────
   1 │    30  Male
   2 │    55  Male
   3 │    61  Male
   4 │    70  Male

julia> @rsubset(df, :Gender == "Male") # "r" prefix stands for "row" so you do not need to broadcast the operation
4×2 DataFrame
 Row │ Age    Gender
     │ Int64  String
─────┼───────────────
   1 │    30  Male
   2 │    55  Male
   3 │    61  Male
   4 │    70  Male
like image 45
Bogumił Kamiński Avatar answered Jun 27 '26 07:06

Bogumił Kamiński