I want a simple count of the number of subjects in each condition of a study. The data look something like this:
subjectid cond obser variable
1234 1 1 12
1234 1 2 14
2143 2 1 19
3456 1 1 12
3456 1 2 14
3456 1 3 13
etc etc etc etc
This is a large dataset and it is not always obvious how many unique subjects contribute to each condition, etc.
I have this in a data.frame.
What I want is something like
cond ofSs
1 122
2 98
Where for each "condition" I get a count of the number of unique Ss contributing data to that condition. Seems like this should be painfully simple.
To count unique values in the pandas dataframe column use Series. unique() function and then call the size to get the count. Series.
You can use the combination of the SUM and COUNTIF functions to count unique values in Excel. The syntax for this combined formula is = SUM(IF(1/COUNTIF(data, data)=1,1,0)). Here the COUNTIF formula counts the number of times each value in the range appears.
Use the ddply
function from the plyr
package:
require(plyr)
df <- data.frame(subjectid = sample(1:3,7,T),
cond = sample(1:2,7,T), obser = sample(1:7))
> ddply(df, .(cond), summarize, NumSubs = length(unique(subjectid)))
cond NumSubs
1 1 1
2 2 2
The ddply
function "splits" the data-frame by the cond
variable, and produces a summary column NumSubs
for each sub-data-frame.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With