Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Julia: Replacing a number with a string in an array

I have an array of number (integer or float) values (it's actually a column in a DataFrame object) and would like to replace, for example, all instances of 0 to "NaN" or some text. (Or convert 1-->"M" and 2-->"F".)

I am running into the problem that when I write array[i] = "text", I get the error:

`convert` has no method matching convert(::Type{Int64}, ::ASCIIString)

How do I get around this? Also, what is the most efficient way of doing an equivalent of Pandas' df.column.replace({1:"M", 2:"F"}, inplace=True)?

I did try this:

df[:sex] = [ {1 => "M", 2 => "F"}[i] for i in df[:sex] ]

... but this runs into a problem when I am only replacing some of the values (then I get "key X not found" error, since I am passing a value from [:sex] that is not in my dict).

like image 819
Anarcho-Chossid Avatar asked Oct 19 '22 19:10

Anarcho-Chossid


2 Answers

Here's a start:

df[:sex] = convert(DataArray{Union(Int64, ASCIIString), 1}, df[:sex])

df[df[:sex] .== 1, :sex] = "M"
df[df[:sex] .== 2, :sex] = "F"
like image 137
rickhg12hs Avatar answered Nov 02 '22 08:11

rickhg12hs


Perhaps you're better off with a PooledDataArray:

PooledDataArray{T}: A variant of DataArray{T} optimized for representing arrays that contain many repetitions of a small number of unique values -- as commonly occurs when working with categorical data.

...it is equivalent to a Categorical in pandas/R.


julia> df = DataFrame([1 3; 2 4; 1 6])
3x2 DataFrames.DataFrame
| Row | x1 | x2 |
|-----|----|----|
| 1   | 1  | 3  |
| 2   | 2  | 4  |
| 3   | 1  | 6  |

julia> PooledDataArray(DataArrays.RefArray(df[:x1]), [:Male, :Female])
3-element DataArrays.PooledDataArray{Symbol,Int64,1}:
 :Male
 :Female
 :Male

julia> df[:x1] = PooledDataArray(DataArrays.RefArray(df[:x1]), [:Male, :Female])
3-element DataArrays.PooledDataArray{Symbol,Int64,1}:
 :Male
 :Female
 :Male

julia> df
3x2 DataFrames.DataFrame
| Row | x1     | x2 |
|-----|--------|----|
| 1   | Male   | 3  |
| 2   | Female | 4  |
| 3   | Male   | 6  |

Note: this works because the reference array contains values from 1 to the size of the labels (2).

like image 27
Andy Hayden Avatar answered Nov 02 '22 08:11

Andy Hayden