I am looking for a command in R which is equivalent of this SQL statement. I want this to be a very simple basic solution without using complex functions OR dplyr type of packages. <pre class="prettyprint"><code>Select count(*) as number_of_states from myTable where sCode = "CA" </code></pre> so essentially I would be counting number of rows matching my where condition. I have imported a csv file into mydata as a data frame.So far I have tried these with no avail. <ol> <li><code>nrow(mydata$sCode == "CA") ## ==>> returns NULL</code></li> <li><code>sum(mydata[mydata$sCode == 'CA',], na.rm=T) ## ==>> gives Error in FUN(X[[1L]], ...) : only defined on a data frame with all numeric variables</code></li> <li><code>sum(subset(mydata, sCode='CA', select=c(sCode)), na.rm=T) ## ==>> FUN(X[[1L]], ...) : only defined on a data frame with all numeric variables</code></li> <li><code>sum(mydata$sCode == "CA", na.rm=T) ## ==>> returns count of all rows in the entire data set, which is not the correct result.</code></li> </ol> and some variations of the above samples. Any help would be appreciated! Thanks.

<code>mydata$sCode == "CA"</code> will return a boolean array, with a <code>TRUE</code> value everywhere that the condition is met. To illustrate: <pre class="prettyprint"><code>> mydata = data.frame(sCode = c("CA", "CA", "AC")) > mydata$sCode == "CA" [1] TRUE TRUE FALSE </code></pre> There are a couple of ways to deal with this: <ol> <li><code>sum(mydata$sCode == "CA")</code>, as suggested in the comments; because <code>TRUE</code> is interpreted as 1 and <code>FALSE</code> as 0, this should return the numer of <code>TRUE</code> values in your vector.</li> <li><code>length(which(mydata$sCode == "CA"))</code>; the <code>which()</code> function returns a vector of the indices where the condition is met, the length of which is the count of <code>"CA"</code>.</li> </ol> Edit to expand upon what's happening in #2: <pre class="prettyprint"><code>> which(mydata$sCode == "CA") [1] 1 2 </code></pre> <code>which()</code> returns a vector identify each column where the condition is met (in this case, columns 1 and 2 of the dataframe). The <code>length()</code> of this vector is the number of occurences.

<ol> <li> <code>mydata$sCode</code> is a vector, it's why nrow output is NULL.</li> <li> <code>mydata[mydata$sCode == 'CA',]</code> returns <code>data.frame</code> where <code>sCode == 'CA'</code>. sCode includes character. That's why <code>sum</code> gives you the error.</li> <li> <code>subset(mydata, sCode='CA', select=c(sCode))</code>, you should use <code>sCode=='CA'</code> instead <code>sCode='CA'</code>. Then subset returns you vector where sCode equals CA, so you should use length(subset(na.omit(mydata), sCode='CA', select=c(sCode))) </li> </ol> Or you can try this: <code>sum(na.omit(mydata$sCode) == "CA")</code>

Count number of rows matching a criteria

Tags:

r

I am looking for a command in R which is equivalent of this SQL statement. I want this to be a very simple basic solution without using complex functions OR dplyr type of packages.

Select count(*) as number_of_states 
  from myTable
where  sCode = "CA"

so essentially I would be counting number of rows matching my where condition.

I have imported a csv file into mydata as a data frame.So far I have tried these with no avail.

nrow(mydata$sCode == "CA") ## ==>> returns NULL
sum(mydata[mydata$sCode == 'CA',], na.rm=T) ## ==>> gives Error in FUN(X[[1L]], ...) : only defined on a data frame with all numeric variables
sum(subset(mydata, sCode='CA', select=c(sCode)), na.rm=T) ## ==>> FUN(X[[1L]], ...) : only defined on a data frame with all numeric variables
sum(mydata$sCode == "CA", na.rm=T) ## ==>> returns count of all rows in the entire data set, which is not the correct result.

and some variations of the above samples. Any help would be appreciated! Thanks.

870

asked Jan 28 '15 15:01

multi-sam

3 Answers

mydata$sCode == "CA" will return a boolean array, with a TRUE value everywhere that the condition is met. To illustrate:

> mydata = data.frame(sCode = c("CA", "CA", "AC"))
> mydata$sCode == "CA"
[1]  TRUE  TRUE FALSE

There are a couple of ways to deal with this:

sum(mydata$sCode == "CA"), as suggested in the comments; because TRUE is interpreted as 1 and FALSE as 0, this should return the numer of TRUE values in your vector.
length(which(mydata$sCode == "CA")); the which() function returns a vector of the indices where the condition is met, the length of which is the count of "CA".

Edit to expand upon what's happening in #2:

> which(mydata$sCode == "CA")
[1] 1 2

which() returns a vector identify each column where the condition is met (in this case, columns 1 and 2 of the dataframe). The length() of this vector is the number of occurences.

answered Oct 22 '22 12:10

Joe

sum is used to add elements; nrow is used to count the number of rows in a rectangular array (typically a matrix or data.frame); length is used to count the number of elements in a vector. You need to apply these functions correctly.

Let's assume your data is a data frame named "dat". Correct solutions:

nrow(dat[dat$sCode == "CA",])
length(dat$sCode[dat$sCode == "CA"])
sum(dat$sCode == "CA")

answered Oct 22 '22 10:10

Alex W

mydata$sCode is a vector, it's why nrow output is NULL.
mydata[mydata$sCode == 'CA',] returns data.frame where sCode == 'CA'. sCode includes character. That's why sum gives you the error.
subset(mydata, sCode='CA', select=c(sCode)), you should use sCode=='CA' instead sCode='CA'. Then subset returns you vector where sCode equals CA, so you should use

length(subset(na.omit(mydata), sCode='CA', select=c(sCode)))

Or you can try this: sum(na.omit(mydata$sCode) == "CA")

answered Oct 22 '22 12:10

Fedorenko Kristina

Related questions
                            
                                How can I `print` or `cat` when using parallel
                            
                                Plotting lines and the group aesthetic in ggplot2
                            
                                Loop in R markdown
                            
                                Return data subset time frames within another timeframes?
                            
                                In R, what exactly is the problem with having variables with the same name as base R functions?
                            
                                How do I get a list of built-in data sets in R?
                            
                                How to request an early exit when knitting an Rmd document?
                            
                                How to concatenate factors, without them being converted to integer level?
                            
                                Variable name restrictions in R
                            
                                How to get a reversed, log10 scale in ggplot2?
                            
                                Controlling the 'alpha' level in a ggplot2 legend
                            
                                Faster way to read fixed-width files
                            
                                catching an error and then branching logic
                            
                                Is there an R Markdown equivalent to \Sexpr{} in Sweave?
                            
                                aggregate methods treat missing values (NA) differently
                            
                                Row-wise iteration like apply with purrr
                            
                                Here we go again: append an element to a list in R
                            
                                Weird characters added to first column name after reading a toad-exported csv file
                            
                                How to check if entire vector has no values other than NA (or NAN) in R?
                            
                                R Shiny - add tabPanel to tabsetPanel dynamically (with the use of renderUI)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With