I have 2 columns of data with the same type of data (Strings).
I want to join the levels of the columns. ie. we have:
col1 col2
Bob John
Tom Bob
Frank Jane
Jim Bob
Tom Bob
... ... (and so on)
now col1 has 4 levels (Bob, Tom Frank, Jim) and col2 has 3 levels (John, Jane, Bob)
But I want both columns to have all the factor levels (Bob, Tom, Frank, Jim, Jane, John), as to later replace each of the 'names' with a unique id, such that the final output would be:
col1 col2
1 5
2 1
3 6
4 1
2 1
that is Bob -> 1, Tom -> 2, etc. in both columns.
Any ideas :) ?
edit: Thanks all for the wonderful answers! You are all awesome as far as I know :)
To convert the data type of all columns from integer to factor, we can use lapply function with factor function.
How do I concatenate two columns in R? To concatenate two columns you can use the <code>paste()</code> function. For example, if you want to combine the two columns A and B in the dataframe df you can use the following code: <code>df['AB'] <- paste(df$A, df$B)</code>.
Column factor is equal to sum of entries of row divide by sum of entries of column and vice versa for the Row factor (so I will have 12 factors).
x <- structure(list(col1 = structure(c(1L, 4L, 2L, 3L, 4L), .Label = c("Bob", "Frank", "Jim", "Tom"), class = "factor"), col2 = structure(c(3L, 1L, 2L, 1L, 1L), .Label = c("Bob", "Jane", "John"), class = "factor")), .Names = c("col1", "col2"), class = "data.frame", row.names = c(NA, -5L))
Make a simple union of factor names:
both <- union(levels(x$col1), levels(x$col2))
And relevel the two factors:
x$col1 <- factor(x$col1, levels=both)
x$col2 <- factor(x$col2, levels=both)
After editing: added example to make numeric values from factors
You could simply transform the factor levels to numeric values, e.g.:
as.numeric(x$col1)
Or a more simpler, nicer solution based on @Gavin Simpson's hint below in one step:
data.matrix(x)
You want the factors to include all the unique names from both columns.
col1 <- factor(c("Bob", "Tom", "Frank", "Jim", "Tom"))
col2 <- factor(c("John", "Bob", "Jane", "Bob", "Bob"))
mynames <- unique(c(levels(col1), levels(col2)))
fcol1 <- factor(col1, levels = mynames)
fcol2 <- factor(col2, levels = mynames)
EDIT: a little nicer if you replace the third line with this:
mynames <- union(levels(col1), levels(col2))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With