I have my data in table as:
id Author_ID Research_Area Category_ID Paper_Count Paper_Year Rank
---------------------------------------------------------------------------------
1 677 feature extraction 8 1 2005 1
2 677 image annotation 11 1 2005 2
3 677 probabilistic model 12 1 2005 3
4 677 semantic 19 1 2007 1
5 677 feature extraction 8 1 2009 1
6 677 image annotation 11 1 2011 1
7 677 semantic 19 1 2012 1
8 677 video sequence 5 2 2013 1
9 1359 adversary model 1 2 2005 1
10 1359 ensemble method 14 2 2005 2
11 1359 image represent 11 2 2005 3
12 1359 adversary model 1 7 2006 1
13 1359 concurrency control 17 5 2006 2
14 1359 information system 12 2 2006 3
15 ...
16 ...
Whereas I want to have an output of query as:
id Author_ID Category_ID Paper_Count Category_Prob Paper_Year Rank
---------------------------------------------------------------------------------
1 677 8 1 0.333 2005 1
2 677 11 1 0.333 2005 2
3 677 12 1 0.333 2005 3
4 677 19 1 1.0 2007 1
5 677 8 1 1.0 2009 1
6 677 11 1 1.0 2011 1
7 677 19 1 1.0 2012 1
8 677 5 2 1.0 2013 1
9 1359 1 2 0.333 2005 1
10 1359 14 2 0.333 2005 2
11 1359 11 2 0.333 2005 3
12 1359 1 7 0.5 2006 1
13 1359 17 5 0.357 2006 2
14 1359 12 2 0.142 2006 3
15 ...
16 ...
Whereas Category_Prob
is a calculated column which is calculated in two steps as:
Step First, we have to have a SUM
of Paper_Count
in each Paper_Year
for instance i.e. Paper_Year = 2005
and Author_ID = 677
, the SUM(Paper_Count) = 3
Step Second, then for each Category_ID
, we have to divide Paper_Count
with value of SUM(Paper_Count)
in that Paper_Year
which will be 1/3
i.e. 0.333
and so on...
Moreover, I have tried this query:
SELECT
Author_ID, Abstract_Category, Paper_Count,
[Category_Prob] = Paper_Count / SUM(Paper_Count),
Paper_Year, Rank
FROM
Author_Areas
GROUP BY
Author_ID, Abstract_Category, Paper_Year, Paper_Count, Rank
ORDER BY
Author_ID, Paper_Year
But it returns just 1
in the column Category_Prob
for all of the rows in the table.
A computed column is a virtual column that is not physically stored in the table, unless the column is marked PERSISTED. A computed column expression can use data from other columns to calculate a value for the column to which it belongs.
Go to your database, right click on tables, select “New Table” option. Create all columns that you require and to mark any column as computed, select that column and go to column Properties window and write your formula for computed column.
A calculated field is a field that uses existing database fields and applies additional logic — it allows you to create new data from your existing data. A calculated field either: performs some calculation on database fields to create a value that is not directly stored in the database or.
The problem with your query is that you are not grouping by Paper_Year
, but also by Author_ID, Abstract_Category, Paper_Count, Rank
. Hence SUM(Paper_Count)
is equal to Paper_Count for each group.
You can use SUM OVER
for this:
SELECT id, Author_ID, Abstract_Category [Category_ID],
Paper_Count,
Paper_Count * 1.0 / SUM(Paper_Count)
OVER (PARTITION BY Author_ID, Paper_Year) AS [Category_Prob],
Paper_Year, Rank
FROM Author_Areas
ORDER BY Author_ID, Paper_Year
Note: You have to multiply by 1.0
so as to avoid integer division.
Note 2: Perhaps you have to add Author_ID
field in the PARTITION BY
clause as well, if your actual requirement is to group by author, year.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With