Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count item pairs linked by column value

Tags:

r

aggregation

I'm struggling to solve this problem in R. I have data like this:

item   id
1      500
2      500
2      600
2      700
3      500
3      600

data.frame(item = c(1, 2, 2, 2, 3, 3),
           id = c(500, 500, 600, 700, 500, 600))

And I want to count the number of times a pair of items is linked to the same id. So I want this output:

item1    item2    count
    1        2        1
    2        3        2
    1        3        2

I've tried approaching this with commands like:

x_agg = aggregate(x, by=list(x$id), c)

and then

x_agg_id = lapply(x_agg$item, unique)

thinking that I could then count the occurrence of each item. But the by function seems to create an object of lists, which I don't know how to manipulate. I am hoping there is a simpler way....

like image 992
Harry Palmer Avatar asked Aug 22 '12 11:08

Harry Palmer


People also ask

Does Count work with GROUP BY?

The GROUP BY statement is often used with aggregate functions ( COUNT() , MAX() , MIN() , SUM() , AVG() ) to group the result-set by one or more columns.

How do I get a count of a specific column?

In SQL, you can make a database query and use the COUNT function to get the number of rows for a particular group in the table. Here is the basic syntax: SELECT COUNT(column_name) FROM table_name; COUNT(column_name) will not include NULL values as part of the count.

Can we use count in order by clause?

Then, in the ORDER BY clause, you use the aggregate function COUNT, which counts the number of values in the column of your choice; in our example, we count distinct IDs with COUNT(id) . This effectively counts the number of elements in each group.

Does Count work without GROUP BY?

Using COUNT, without GROUP BY clause will return a total count of a number of rows present in the table. Adding GROUP BY, we can COUNT total occurrences for each unique value present in the column.


1 Answers

# your data
df<-read.table(text="item   id
1      500
2      500
2      600
2      700
3      500
3      600",header=TRUE)


library(tnet)
item_item<-projecting_tm(df, method="sum")
names(item_item)<-c("item1","item2","count")

item_item

  #item1 item2 count
#1     1     2     1
#2     1     3     1
#3     2     1     1
#4     2     3     2
#5     3     1     1
#6     3     2     2

EDIT

how many ids and items do you have? you could always rename things. e.g.

numberitems<-length(unique(df$id))+9000
items<-data.frame(item=unique(df$item),newitems=c(9000:(numberitems-1)))
numberids<-length(unique(df$id))+1000
ids<-data.frame(id=unique(df$id),newids=c(1000:(numberids-1)))
newdf<-merge(df,items,by="item")
newdf<-merge(newdf,ids,by="id")
DF<-data.frame(item=newdf$newitems,id=newdf$newids)

library(tnet)
item_item<-projecting_tm(DF, method="sum")
names(item_item)<-c("item1","item2","count")

then merge back the original names afterwards....

like image 126
user1317221_G Avatar answered Sep 30 '22 11:09

user1317221_G