Let's say I have a data frame with 10 numeric variables V1-V10 (columns) and multiple rows (cases). What I would like R to do is: For each case, give me the number of occurrences of a certain value in a set of variables. For example the number of occurrences of the numeric value 99 in that single row for V2, V3, V6, which obviously has a minimum of 0 (none of the three have the value 99) and a maximum of 3 (all of the three have the value 99). I am really looking for an equivalent to the SPSS function <code>COUNT</code>: "COUNT creates a numeric variable that, for each case, counts the occurrences of the same value (or list of values) across a list of variables." I thought about <code>table()</code> and library plyr's <code>count()</code>, but I cannot really figure it out. Vectorized computation preferred. Thanks a lot!

I think that there ought to be a simpler way to do this, but the best way that I can think of to get a table of counts is to loop (implicitly using sapply) over the unique values in the dataframe. <pre class="prettyprint"><code>#Some example data df <- data.frame(a=c(1,1,2,2,3,9),b=c(1,2,3,2,3,1)) df # a b #1 1 1 #2 1 2 #3 2 3 #4 2 2 #5 3 3 #6 9 1 levels=unique(do.call(c,df)) #all unique values in df out <- sapply(levels,function(x)rowSums(df==x)) #count occurrences of x in each row colnames(out) <- levels out # 1 2 3 9 #[1,] 2 0 0 0 #[2,] 1 1 0 0 #[3,] 0 1 1 0 #[4,] 0 2 0 0 #[5,] 0 0 2 0 #[6,] 1 0 0 1 </code></pre>

If you need to count any particular word/letter in the row. <pre class="prettyprint"><code>#Let df be a data frame with four variables (V1-V4) df <- data.frame(V1=c(1,1,2,1,L),V2=c(1,L,2,2,L), V3=c(1,2,2,1,L), V4=c(L, L, 1,2, L)) </code></pre> For counting number of L in each row just use <pre class="prettyprint"><code>#This is how to compute a new variable counting occurences of "L" in V1-V4. df$count.L <- apply(df, 1, function(x) length(which(x=="L"))) </code></pre> The result will appear like this <pre class="prettyprint"><code>> df V1 V2 V3 V4 count.L 1 1 1 1 L 1 2 1 L 2 L 2 3 2 2 2 1 0 4 1 2 1 2 0 </code></pre>

Count occurrences of value in a set of variables in R (per row)

Tags:

r

count

find-occurrences

Let's say I have a data frame with 10 numeric variables V1-V10 (columns) and multiple rows (cases).

What I would like R to do is: For each case, give me the number of occurrences of a certain value in a set of variables.

For example the number of occurrences of the numeric value 99 in that single row for V2, V3, V6, which obviously has a minimum of 0 (none of the three have the value 99) and a maximum of 3 (all of the three have the value 99).

I am really looking for an equivalent to the SPSS function COUNT: "COUNT creates a numeric variable that, for each case, counts the occurrences of the same value (or list of values) across a list of variables."

I thought about table() and library plyr's count(), but I cannot really figure it out. Vectorized computation preferred. Thanks a lot!

333

asked Jun 03 '14 12:06

nilsole

2 Answers

I think that there ought to be a simpler way to do this, but the best way that I can think of to get a table of counts is to loop (implicitly using sapply) over the unique values in the dataframe.

#Some example data
df <- data.frame(a=c(1,1,2,2,3,9),b=c(1,2,3,2,3,1))
df
#  a b
#1 1 1
#2 1 2
#3 2 3
#4 2 2
#5 3 3
#6 9 1

levels=unique(do.call(c,df)) #all unique values in df
out <- sapply(levels,function(x)rowSums(df==x)) #count occurrences of x in each row
colnames(out) <- levels
out
#     1 2 3 9
#[1,] 2 0 0 0
#[2,] 1 1 0 0
#[3,] 0 1 1 0
#[4,] 0 2 0 0
#[5,] 0 0 2 0
#[6,] 1 0 0 1

195

answered Sep 29 '22 09:09

Miff

If you need to count any particular word/letter in the row.

#Let df be a data frame with four variables (V1-V4)
             df <- data.frame(V1=c(1,1,2,1,L),V2=c(1,L,2,2,L),
             V3=c(1,2,2,1,L), V4=c(L, L, 1,2, L))

For counting number of L in each row just use

#This is how to compute a new variable counting occurences of "L" in V1-V4.      
df$count.L <- apply(df, 1, function(x) length(which(x=="L")))

The result will appear like this

> df
  V1 V2 V3 V4 count.L
1  1  1  1 L       1
2  1  L  2 L       2
3  2  2  2  1      0
4  1  2  1  2      0

answered Sep 29 '22 09:09

Adi.sr

Related questions
                            
                                R: ggplot : How do you plot a square-matrix(not symmetric) as a heatmap?
                            
                                What's the fastest way to apply t.test to each column of a large matrix?
                            
                                using anonymous functions in R with multiple arguments
                            
                                How to be alerted about the ongoing progress of a loop/lapply
                            
                                Guess correct column storage mode from data.frame of strings
                            
                                replace the first N dots of a string
                            
                                Aggregating in R over 80K unique ID's
                            
                                Replicate vector in R
                            
                                read.xls - read in variable-length list of sheets, with their names
                            
                                Subsetting a dataframe by the amount of repetition [duplicate]
                            
                                R grouping by condition in data.table
                            
                                get first entries in rows of list?
                            
                                Print correlation data in same plot position across facets
                            
                                How to display "beautiful" glm and multinom table with Rmd and Knit HTML?
                            
                                Fast correlation in R using C and parallelization
                            
                                How to use msgbox in R [closed]
                            
                                geom_ribbon doesn't work - Error in eval(expr, envir, enclos) : object 'variable' not found
                            
                                data.table or dplyr - data manipulation
                            
                                How to sort all dataframes in a list of dataframes on the same column?
                            
                                convert to local time zone using latitude and longitude?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With