Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting String to unique integer in R

Tags:

r

I have a data frame of this type

string1,string2,value1
string3,string1,value2
string3,string5,value3
...
...

I would convert srings in unique integers:

1,2,value1
3,1,value2
3,5,value3
...
...

I am trying with c() operator, that convert the string in a unique integer. The problem is how to manage the two columns of the data frame. How can I do this?

like image 642
emanuele Avatar asked Dec 08 '12 15:12

emanuele


2 Answers

If you want to assign numbers to the strings, rather than removing the text 'string', you can use a factor with known levels, then coerce to numeric.

d <- read.csv(header=TRUE, file=textConnection("a,b,c
string1,string2,value1
string3,string1,value2
string3,string5,value3"))

l=unique(c(as.character(d$a), as.character(d$b)))

d1 <- data.frame(a=as.numeric(factor(d$a, levels=l)), b=as.numeric(factor(d$b, levels=l)), c=d$c)
> d1
  a b      c
1 1 3 value1
2 2 1 value2
3 2 4 value3

Note that the numeric values chosen do not agree with the numerals in the strings, but each string is given a unique number.

like image 125
Matthew Lundberg Avatar answered Nov 15 '22 06:11

Matthew Lundberg


Here's a simple solution using match:

df <- read.csv(text="string1,string2,value1
string3,string1,value2
string3,string5,value3", header = FALSE)

cbind(sapply(df[-3], match, unique(unlist(df[-3]))), df[3])

  V1 V2     V3
1  1  3 value1
2  2  1 value2
3  2  4 value3

How it works: The values of both columns are matched with a vector of unique numbers of these columns. This returns their positions.

like image 26
Sven Hohenstein Avatar answered Nov 15 '22 04:11

Sven Hohenstein