I have a dataframe. I want to multiply column "b" by a "log" and then replace NaN by 0s.
How can I do that in Julia? I am checking this: DataFrames.jl But I do not understand.
df = DataFrame(a = repeat([1, 2, 3, 4], outer=[2]),
b = repeat([2, 1], outer=[4]),
c = randn(8))

I want to multiply column "b" by a "log"
Assuming you mean you want to apply the (natural) log to each element in column :b, you can do the following:
log.(df.b)
log(x) applies the (natural) log to an individual element x. By putting a dot after the log, you are broadcasting the log function across each element.
If you wanted to replace column b, do the following:
df.b = log.(df.b)
and then replace NaN by 0s
I'm assuming you want to handle the case where you have a DomainError (ie taking the log of a negative number). Your best best is to handle the error before it arises:
map( x -> x <= 0 ? 0.0 : log(x), df.b)
This maps the anonymous function x -> x <= 0 ? 0.0 : log(x) across each element of column b in your DataFrame. This function tests if x is less than zero - if yes then return 0.0 else return log(x). This "one-line if statement" is called a ternary operator.
Use a generator:
( v <= 0. ? 0. : log(v) for v in df.c )
If you want to add a new column:
df[!, :d] .= ( v <= 0. ? 0. : log(v) for v in df.c)
This is faster than using map (those tests assume that df.d already exits:
julia> using BenchmarkTools
julia> @btime $df[!, :d] .= ( v <= 0.0 ? 0.0 : log(v) for v in $df.c)
1.440 μs (14 allocations: 720 bytes)
julia> @btime $df[!, :d] .= map( x -> x <= 0.0 ? 0.0 : log(x), $df.c);
1.570 μs (14 allocations: 720 bytes)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With