I have a data frame where some columns have missing values. I would like that if missing values are found, an alternative from a second column is picked. For example, in:
df = DataFrame(x = [0, missing, 2], y=[2, 4, 6])
I would like missing
to be substituted with 4.
At the moment I am solving the problem with this solution:
for row in eachrow(df)
if ismissing(row[:x])
row[:x] = row[:y]
end
end
But I wonder if a better solution that avoids for-loops can be foundπ€.
I tried with replace(A, old_new::Pair...; [count::Integer])
, but it seems that the pair accepts only scalars, and also with broadcasting I was not able to have success.
Do you have any suggestions?
The easiest way to replace NA's with the mean in multiple columns is by using the functions mutate_at() and vars(). These functions let you select the columns in which you want to replace the missing values. To actually replace the NA with the mean, you can use the replace_na() and mean() function.
Missing values can be replaced by the minimum, maximum or average value of that Attribute. Zero can also be used to replace missing values. Any replenishment value can also be specified as a replacement of missing values.
That in mind, how can we use conditionals in order to detect our missing values? In Julia, we actually use methods in order to do this. The two methods are isnan() and ismissing(). In most cases, you might not have to drop down to the level of using conditionals to process your data.
Transform > Replace Missing Values... Select the estimation method you want to use to replace missing values. Select the variable(s) for which you want to replace missing values.
You can use coalesce
:
julia> df = DataFrame(x = [0, missing, 2], y=[2, 4, 6])
3Γ2 DataFrame
Row β x y
β Int64? Int64
ββββββΌββββββββββββββββ
1 β 0 2
2 β missing 4
3 β 2 6
julia> df.x .= coalesce.(df.x, df.y)
3-element Array{Union{Missing, Int64},1}:
0
4
2
julia> df
3Γ2 DataFrame
Row β x y
β Int64? Int64
ββββββΌβββββββββββββββ
1 β 0 2
2 β 4 4
3 β 2 6
or if you like piping-aware functions:
julia> df = DataFrame(x = [0, missing, 2], y=[2, 4, 6])
3Γ2 DataFrame
Row β x y
β Int64? Int64
ββββββΌββββββββββββββββ
1 β 0 2
2 β missing 4
3 β 2 6
julia> transform!(df, [:x, :y] => ByRow(coalesce) => :x)
3Γ2 DataFrame
Row β x y
β Int64 Int64
ββββββΌββββββββββββββ
1 β 0 2
2 β 4 4
3 β 2 6
and this is the same, but not requiring you to remember about coalesce
:
julia> df = DataFrame(x = [0, missing, 2], y=[2, 4, 6])
3Γ2 DataFrame
Row β x y
β Int64? Int64
ββββββΌββββββββββββββββ
1 β 0 2
2 β missing 4
3 β 2 6
julia> transform!(df, [:x, :y] => ByRow((x,y) -> ismissing(x) ? y : x) => :x)
3Γ2 DataFrame
Row β x y
β Int64 Int64
ββββββΌββββββββββββββ
1 β 0 2
2 β 4 4
3 β 2 6
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With