Some background to help explain this difficult question first:
I have a database that estimates the reliability of users by comparing the values that they have submitted to a global average. Values range between 0 and 1. So, where:
r
a
g
Reliability:
r = 1 - ABS(g - a)
This is how the reliability of each user is calculated.
Now, the Global, "agreed upon" average g
, is calculated using a weighted mean, where the weighting is the r
and the value is a
. If there are 3 users in total:
g = ((r1 * a1) + (r2 * a2) + (r3 * a3)) / (r1 + r2 + r3)
The problem is, that once users have a high reliability, they have a complete monopoly, and no new values can change this. Taking an example:
g was initially 0.5
user1 r was initially 0.5
user2 r was initially 0.5
user3 r was initially 0.5
Now, they will submit values one by one, and watch what happens:
user1 a is submitted, 1.0
user1 reliability goes slightly down because it differs from g (0.5)
user2 a is submitted, 1.0
user1 and user2 reliability go up to 100%, g is now 1.0.
user3 a is submitted, 0.0
user3 reliability goes down to 0%. g is still 1.0.
Since user3 has a very low reliability, the weighting does not have any effect at all on g
. User3's reliability goes down because the values submitted differ completely from the global average. What can be done to make user3's submissions have some effect on the final value? Maybe I need to add some constant so that reliability never ends up completely zero (but close)?
Now, for the SQL code. I've added an SQL fiddle that demonstrates the problem: http://sqlfiddle.com/#!3/d3fd1/21 I've abstracted the code to keep it as short, but it's still quite long.
Table creation, stored procedures and trigger:
-- Stores user info
CREATE TABLE dbo.Users(
[UserID] [int] NOT NULL,
[Reliability] [float] NOT NULL
)
-- Contains global averages from all users who submitted data
CREATE TABLE dbo.GlobalSubmission(
GlobalSubmissionID [int] NOT NULL,
Name [varchar](50) NULL,
GlobalAverage [float] NOT NULL,
)
CREATE TABLE dbo.UserSubmission(
SubValue float NOT NULL,
GlobalSubmissionID int NOT NULL,
UserID int NOT NULL,
)
GO
--Calculate the "ideal value", used for GlobalSubmission.
CREATE FUNCTION dbo.IdealValueCalc(@globalSubmissionID INT)
RETURNS int
AS
BEGIN
DECLARE @tmpReliability TABLE (SubValue float, Reliability float)
INSERT INTO @tmpReliability
SELECT AVG(us.SubValue) as SubValue, usr.Reliability Reliability FROM UserSubmission us
JOIN Users usr
ON us.UserID = usr.UserID
WHERE GlobalSubmissionID = @GlobalSubmissionID
GROUP BY us.UserID, usr.Reliability
--Perform weighted mean calculations.
Return (SELECT SUM(SubValue * Reliability) / SUM(Reliability) FROM @tmpReliability)
END
go
--Calculate the reliability of one user.
CREATE FUNCTION dbo.GetReliabilityForUser
(@userID int)
Returns Float
AS BEGIN
Return (SELECT 1 - AVG(ABS(db.userAvg - db.GlobalAverage))
FROM (
SELECT pmd.UserID,
gs.GlobalAverage,
AVG(pmd.SubValue) as userAvg
FROM UserSubmission pmd
-- Joins average value for each user with "ideal" value from GlobalSubmission
JOIN GlobalSubmission gs
ON gs.GlobalSubmissionID = pmd.GlobalSubmissionID
WHERE pmd.UserID = 1
GROUP BY pmd.UserID, gs.GlobalSubmissionID, gs.GlobalAverage
) db
GROUP BY db.UserID)
End
go
CREATE TRIGGER trg_SubmissionComputation
ON UserSubmission
AFTER INSERT, UPDATE
AS BEGIN
--Calculate this uer's reliability
DECLARE @userID int = (SELECT TOP(1) UserID FROM inserted)
DECLARE @userReliability float = dbo.GetReliabilityForUser(@userID)
UPDATE Users
SET Reliability=@userReliability
WHERE UserID = @userID
--Recalculate globalSubmission values:
DECLARE @globalSubmissionID int = (SELECT TOP(1) GlobalSubmissionID FROM inserted)
DECLARE @globalAverage float = dbo.IdealValueCalc(@globalSubmissionID)
--The global average for this set of submissions has been recalculated. Now inserting:
UPDATE GlobalSubmission
SET GlobalAverage = @globalAverage
WHERE GlobalSubmissionID = @globalSubmissionID
END
GO
Testing it:
--Creating 3 new users
INSERT INTO Users
(UserID, Reliability)
values
(1, 0.5),
(2, 0.5),
(3, 0.5)
GO
--Creating a new GlobalSubmission
INSERT INTO GlobalSubmission
(GlobalSubmissionID, NAME, GlobalAverage)
values (1, 'BOILER2B' , 0.5)
GO
--First, we will submit values of 1 for two users:
INSERT INTO UserSubmission values (1.0, 1, 1); -- Value: 1.0, User 1, Submission 1
GO
INSERT INTO UserSubmission values (1.0, 1, 2); -- Value: 1.0, User 2, Submission 1
GO
INSERT INTO UserSubmission values (1.0, 1, 1); -- Value: 1.0, User 1, Submission 1
GO
INSERT INTO UserSubmission values (1.0, 1, 2); -- Value: 1.0, User 2, Submission 1
GO
--Now, we will submit values of 0 for the third user:
INSERT INTO UserSubmission values (0.0, 1, 3); -- Value: 0.0, User 3, Submission 1
GO
INSERT INTO UserSubmission values (0.0, 1, 3); -- Value: 0.0, User 3, Submission 1
GO
SELECT * FROM Users -- This results in 0% reliability for the last user.
--If we create new users and add them, the reliability won't budge:
INSERT INTO Users
(UserID, Reliability)
values
(4, 0.5),
(5, 0.5),
(6, 0.5),
(7, 0.5),
(8, 0.5)
GO
INSERT INTO UserSubmission values (0, 1, 4); -- Value: 0, User 4, Submission 1
GO
INSERT INTO UserSubmission values (0, 1, 5); -- Value: 0, User 5, Submission 1
GO
INSERT INTO UserSubmission values (0, 1, 6); -- Value: 0, User 6, Submission 1
GO
INSERT INTO UserSubmission values (0, 1, 7); -- Value: 0, User 7, Submission 1
GO
INSERT INTO UserSubmission values (0, 1, 8); -- Value: 0, User 8, Submission 1
GO
INSERT INTO UserSubmission values (0, 1, 4); -- Value: 0, User 4, Submission 1
GO
INSERT INTO UserSubmission values (0, 1, 5); -- Value: 0, User 5, Submission 1
GO
INSERT INTO UserSubmission values (0, 1, 6); -- Value: 0, User 6, Submission 1
GO
INSERT INTO UserSubmission values (0, 1, 7); -- Value: 0, User 7, Submission 1
GO
INSERT INTO UserSubmission values (0, 1, 8); -- Value: 0, User 8, Submission 1
GO
SELECT * FROM Users -- Even though we've added loads of new users suggesting 0 as value, the final value
-- is remaining 1.0, because when a new value (0) is submitted, it varies too much from the global average
--(1), causing the reliability of that user to go down, and that user ends up making no influence on the
-- global average!
Here is an alternative estimate that is still a bit ad hoc but doesn't ever produce weight 0.
1) For each user produce an exponentially decaying estimate of squared error. Start with a tunable arbitrary estimate K. Then every time the user produces a value a and the group mean is g produce a squared error E = (a - g) * (a - g) and change the estimate of squared error from before to after = before * x + E * (1 - x) where x is another tunable constant between 0 and 1 which tunes how fast old estimates decay. This estimate can never quite get down to zero, but because of the next step it might be as well to stop it decreasing below some tunable value.
2) To get a new global estimate use a weighted mean as before, but make the weights the reciprocal of the current estimate of squared error for that user.
If all the users were unbiased then the exponentially decaying estimates might end up as decent estimates of average squared error for each user, and then the weights would be the linear combination of estimates which minimises the expected squared error of the global estimate. Check: if the different users i submitted averages of Ni estimates from the same source then the mean squared error of each user's estimates would be 1/Ni so multiplying by the reciprocal of this would turn their averages into the original sum of estimates produced by each user and the weighted estimate would end up just pooling the estimates.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With