I have an array of number (integer or float) values (it's actually a column in a DataFrame object) and would like to replace, for example, all instances of 0 to "NaN" or some text. (Or convert 1-->"M" and 2-->"F".)
I am running into the problem that when I write array[i] = "text"
, I get the error:
`convert` has no method matching convert(::Type{Int64}, ::ASCIIString)
How do I get around this? Also, what is the most efficient way of doing an equivalent of Pandas' df.column.replace({1:"M", 2:"F"}, inplace=True)
?
I did try this:
df[:sex] = [ {1 => "M", 2 => "F"}[i] for i in df[:sex] ]
... but this runs into a problem when I am only replacing some of the values (then I get "key X not found" error, since I am passing a value from [:sex] that is not in my dict).
Here's a start:
df[:sex] = convert(DataArray{Union(Int64, ASCIIString), 1}, df[:sex])
df[df[:sex] .== 1, :sex] = "M"
df[df[:sex] .== 2, :sex] = "F"
Perhaps you're better off with a PooledDataArray
:
PooledDataArray{T}
: A variant ofDataArray{T}
optimized for representing arrays that contain many repetitions of a small number of unique values -- as commonly occurs when working with categorical data.
...it is equivalent to a Categorical in pandas/R.
julia> df = DataFrame([1 3; 2 4; 1 6])
3x2 DataFrames.DataFrame
| Row | x1 | x2 |
|-----|----|----|
| 1 | 1 | 3 |
| 2 | 2 | 4 |
| 3 | 1 | 6 |
julia> PooledDataArray(DataArrays.RefArray(df[:x1]), [:Male, :Female])
3-element DataArrays.PooledDataArray{Symbol,Int64,1}:
:Male
:Female
:Male
julia> df[:x1] = PooledDataArray(DataArrays.RefArray(df[:x1]), [:Male, :Female])
3-element DataArrays.PooledDataArray{Symbol,Int64,1}:
:Male
:Female
:Male
julia> df
3x2 DataFrames.DataFrame
| Row | x1 | x2 |
|-----|--------|----|
| 1 | Male | 3 |
| 2 | Female | 4 |
| 3 | Male | 6 |
Note: this works because the reference array contains values from 1 to the size of the labels (2).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With