Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace missing values with values from another column in Julia Dataframe

I have a data frame where some columns have missing values. I would like that if missing values are found, an alternative from a second column is picked. For example, in:

df = DataFrame(x = [0, missing, 2], y=[2, 4, 6])

I would like missing to be substituted with 4.

At the moment I am solving the problem with this solution:

for row in eachrow(df)
    if ismissing(row[:x])
        row[:x] = row[:y]
    end
end

But I wonder if a better solution that avoids for-loops can be foundπŸ€”.

I tried with replace(A, old_new::Pair...; [count::Integer]), but it seems that the pair accepts only scalars, and also with broadcasting I was not able to have success.

Do you have any suggestions?

like image 778
Michele Garau Avatar asked Nov 25 '20 17:11

Michele Garau


People also ask

How do you replace missing values in a column?

The easiest way to replace NA's with the mean in multiple columns is by using the functions mutate_at() and vars(). These functions let you select the columns in which you want to replace the missing values. To actually replace the NA with the mean, you can use the replace_na() and mean() function.

How do you replace missing values?

Missing values can be replaced by the minimum, maximum or average value of that Attribute. Zero can also be used to replace missing values. Any replenishment value can also be specified as a replacement of missing values.

How do you drop missing values in Julia?

That in mind, how can we use conditionals in order to detect our missing values? In Julia, we actually use methods in order to do this. The two methods are isnan() and ismissing(). In most cases, you might not have to drop down to the level of using conditionals to process your data.

Which function is used to replace missing values?

Transform > Replace Missing Values... Select the estimation method you want to use to replace missing values. Select the variable(s) for which you want to replace missing values.


1 Answers

You can use coalesce:

julia> df = DataFrame(x = [0, missing, 2], y=[2, 4, 6])
3Γ—2 DataFrame
 Row β”‚ x        y
     β”‚ Int64?   Int64
─────┼────────────────
   1 β”‚       0      2
   2 β”‚ missing      4
   3 β”‚       2      6

julia> df.x .= coalesce.(df.x, df.y)
3-element Array{Union{Missing, Int64},1}:
 0
 4
 2

julia> df
3Γ—2 DataFrame
 Row β”‚ x       y
     β”‚ Int64?  Int64
─────┼───────────────
   1 β”‚      0      2
   2 β”‚      4      4
   3 β”‚      2      6

or if you like piping-aware functions:

julia> df = DataFrame(x = [0, missing, 2], y=[2, 4, 6])
3Γ—2 DataFrame
 Row β”‚ x        y
     β”‚ Int64?   Int64
─────┼────────────────
   1 β”‚       0      2
   2 β”‚ missing      4
   3 β”‚       2      6

julia> transform!(df, [:x, :y] => ByRow(coalesce) => :x)
3Γ—2 DataFrame
 Row β”‚ x      y
     β”‚ Int64  Int64
─────┼──────────────
   1 β”‚     0      2
   2 β”‚     4      4
   3 β”‚     2      6

and this is the same, but not requiring you to remember about coalesce:

julia> df = DataFrame(x = [0, missing, 2], y=[2, 4, 6])
3Γ—2 DataFrame
 Row β”‚ x        y
     β”‚ Int64?   Int64
─────┼────────────────
   1 β”‚       0      2
   2 β”‚ missing      4
   3 β”‚       2      6

julia> transform!(df, [:x, :y] => ByRow((x,y) -> ismissing(x) ? y : x) => :x)
3Γ—2 DataFrame
 Row β”‚ x      y
     β”‚ Int64  Int64
─────┼──────────────
   1 β”‚     0      2
   2 β”‚     4      4
   3 β”‚     2      6
like image 173
BogumiΕ‚ KamiΕ„ski Avatar answered Dec 31 '22 19:12

BogumiΕ‚ KamiΕ„ski