let's say that the following is my table
id | name | topic | reply
...
1 | user1 | 00001 | yes ## replied on topic 00001
2 | user2 | 00002 | yes ** replied on topic 00002
3 | user1 | 00001 | yes ## replied on topic 00001
4 | user5 | 00001 | no
5 | user1 | 00001 | yes ## replied on topic 00001
6 | user1 | 00002 | no
7 | user2 | 00002 | yes ** replied on topic 00002
8 | user3 | 00001 | no
9 | user4 | 00002 | yes
Imagine that this is a topic table. Only user1 and user reply on a single topic more than once (topic 00001 and 00002) How can I select the number of user that reply on a topic more than once, which should return 2 from 5
You can count the number of duplicate rows by counting True in pandas. Series obtained with duplicated() . The number of True can be counted with sum() method. If you want to count the number of False (= the number of non-duplicate rows), you can invert it with negation ~ and then count True with sum() .
This will give you the number of users that replied more than once on a topic:
SELECT COUNT(DISTINCT name)
FROM (
SELECT name
FROM yourtable
WHERE reply = 'yes'
GROUP BY name, topic
HAVING COUNT(*) > 1
) T1
First you need to work out the number of replies per user per topic and then count up the number of distinct users that have more than one reply for any topic.
SELECT COUNT(DISTINCT name) FROM (
SELECT name, topic, count(*) replies
FROM table
WHERE reply = 'yes'
GROUP BY name, topic
) a
WHERE replies > 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With