Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Change values in a data set in Julia

Tags:

r

julia

I am converting a function in R to Julia, but I do not know how to convert the following R code:

x[x==0]=4

Basically, x contains rows of numbers, but whenever there is a 0, I need to change it to a 4. The data set x comes from a binomial distribution. Can someone help me define the above code in Julia?

like image 852
Ellie Avatar asked Jun 21 '18 16:06

Ellie


3 Answers

Use the .== (broadcasted ==), ie:

  • Dot Syntax for Vectorizing Functions

With vector:

julia> x = round.(Int, rand(5))  # notice how round is also broadcasted here
5-element Array{Int64,1}:
 0
 0
 1
 0
 1

julia> x .== 0
5-element BitArray{1}:
  true
  true
 false
  true
 false

julia> x[x .== 0] = 4
4

julia> x
5-element Array{Int64,1}:
 4
 4
 1
 4
 1

With matrix:

julia> y = round.(Int, rand(5, 5))
h5×5 Array{Int64,2}:
 0  1  1  0  0
 1  0  1  1  1
 0  0  0  0  1
 1  1  0  0  0
 0  1  0  1  1

julia> y[y .== 0] = 4
4

julia> y
5×5 Array{Int64,2}:
 4  1  1  4  4
 1  4  1  1  1
 4  4  4  4  1
 1  1  4  4  4
 4  1  4  1  1

With dataframe:

julia> using DataFrames

julia> df = DataFrame(x = round.(Int, rand(5)), y = round.(Int, rand(5)))
5×2 DataFrames.DataFrame
│ Row │ x │ y │
├─────┼───┼───┤
│ 1   │ 0 │ 0 │
│ 2   │ 0 │ 1 │
│ 3   │ 0 │ 0 │
│ 4   │ 0 │ 1 │
│ 5   │ 1 │ 0 │

julia> df[:x][df[:x] .== 0] = 4
4

julia> df
5×2 DataFrames.DataFrame
│ Row │ x │ y │
├─────┼───┼───┤
│ 1   │ 4 │ 0 │
│ 2   │ 4 │ 1 │
│ 3   │ 4 │ 0 │
│ 4   │ 4 │ 1 │
│ 5   │ 1 │ 0 │
like image 103
HarmonicaMuse Avatar answered Nov 14 '22 23:11

HarmonicaMuse


The simplest solution is to use the replace! function:

replace!(x, 0=>4)

Use replace(x, 0=>4) (without the !) to do the same thing, but creating a copy of the vector.

Note that these functions only exist in version 0.7!

like image 33
DNF Avatar answered Nov 15 '22 01:11

DNF


Two small issues two long for a comment are:

In Julia 0.7 you should write x[x .== 0] .= 4 (using a second dot in assignment also)

In general it is faster to use e.g. foreach or a loop than to allocate a vector with x .== 0, e.g.:

julia> using BenchmarkTools

julia> x = rand(1:4, 10^8);

julia> function f1(x)
           x[x .== 4] .= 0
       end
f1 (generic function with 1 method)

julia> function f2(x)
           foreach(i -> x[i] == 0 && (x[i] = 4), eachindex(x))
       end
f2 (generic function with 1 method)

julia> @benchmark f1($x)
BenchmarkTools.Trial:
  memory estimate:  11.93 MiB
  allocs estimate:  10
  --------------
  minimum time:     137.889 ms (0.00% GC)
  median time:      142.335 ms (0.00% GC)
  mean time:        143.145 ms (1.08% GC)
  maximum time:     160.591 ms (0.00% GC)
  --------------
  samples:          35
  evals/sample:     1

julia> @benchmark f2($x)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     86.904 ms (0.00% GC)
  median time:      87.916 ms (0.00% GC)
  mean time:        88.504 ms (0.00% GC)
  maximum time:     91.289 ms (0.00% GC)
  --------------
  samples:          57
  evals/sample:     1
like image 22
Bogumił Kamiński Avatar answered Nov 14 '22 23:11

Bogumił Kamiński