Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In R, how can I access the first element of each level of a factor?

Tags:

I have a data frame like this:

n = c(2, 2, 3, 3, 4, 4) 
n <- as.factor(n)
s = c("a", "b", "c", "d", "e", "f") 
df = data.frame(n, s)  

df
  n s
1 2 a
2 2 b
3 3 c
4 3 d
5 4 e
6 4 f

and I want to access the first element of each level of my factor (and have in this example a vector containing a, c, e).

It is possible to reach the first element of one level, with

df$s[df$n == 2][1]

but it does not work for all levels:

df$s[df$n == levels(n)]
[1] a f

How would you do that?

And to go further, I’d like to modify my data frame to see which is the first element for each level at every occurrence. In my example, a new column should be:

  n s rep firstelement
1 2 a   a            a
2 2 b   c            a
3 3 c   e            c
4 3 d   a            c
5 4 e   c            e
6 4 f   e            e
like image 201
hhh Avatar asked Mar 19 '14 22:03

hhh


People also ask

How do I extract the first element in R?

To extract only first element from a list, we can use sapply function and access the first element with double square brackets. For example, if we have a list called LIST that contains 5 elements each containing 20 elements then the first sub-element can be extracted by using the command sapply(LIST,"[[",1).

How do I select the first element of a vector in R?

To get the first element of a vector, we could do the following. In R, array indexes start at 1 - the 1st element is at index 1. This is different than 0-based languages like C, Python, or Java where the first element is at index 0. Notice that for the second example, we put a function inside the square brackets.

How do I extract values from a factor in R?

To extract the factor levels from factor column, we can simply use levels function. For example, if we have a data frame called df that contains a factor column defined with x then the levels of factor levels in x can be extracted by using the command levels(df$x).

How do you access elements in R?

Vector elements are accessed using indexing vectors, which can be numeric, character or logical vectors. You can access an individual element of a vector by its position (or "index"), indicated using square brackets. In R, the first element has an index of 1. To get the 7th element of the colors vector: colors[7] .


1 Answers

Edit. The first part of my answer addresses the original question, i.e. before "And to go further" (which was added by OP in an edit).

Another possibility, using duplicated. From ?duplicated: "duplicated() determines which elements of a vector or data frame are duplicates of elements with smaller subscripts."

Here we use !, the logical negation (NOT), to select not duplicated elements of 'n', i.e. first elements of each level of 'n'.

df[!duplicated(df$n), ]
#   n s
# 1 2 a
# 3 3 c
# 5 4 e

Update Didn't see your "And to go further" edit until now. My first suggestion would definitely be to use ave, as already proposed by @thelatemail and @sparrow. But just to dig around in the R toolbox and show you an alternative, here's a dplyr way:

Group the data by n, use the mutate function to create a new variable 'first', with the value 'first element of s' (s[1]),

library(dplyr)

df %.%
  group_by(n) %.%
  mutate(
    first = s[1])
#   n s first
# 1 2 a     a
# 2 2 b     a
# 3 3 c     c
# 4 3 d     c
# 5 4 e     e
# 6 4 f     e

Or go all in with dplyr convenience functions and use first instead of [1]:

df %.%
  group_by(n) %.%
  mutate(
    first = first(s))

A dplyr solution for your original question would be to use summarise:

df %.%
  group_by(n) %.%
  summarise(
    first = first(s))

#   n first
# 1 2     a
# 2 3     c
# 3 4     e
like image 119
Henrik Avatar answered Sep 20 '22 21:09

Henrik