Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Weighted mean deadlock: Value depending on value

Some background to help explain this difficult question first:

I have a database that estimates the reliability of users by comparing the values that they have submitted to a global average. Values range between 0 and 1. So, where:

  • Reliability of this particular user = r
  • Average of this particular user's submitted values = a
  • Global, "agreed upon" average = g

Reliability:

r = 1 - ABS(g - a)

This is how the reliability of each user is calculated. Now, the Global, "agreed upon" average g, is calculated using a weighted mean, where the weighting is the r and the value is a. If there are 3 users in total:

  g = ((r1 * a1) + (r2 * a2) + (r3 * a3)) / (r1 + r2 + r3)

The problem is, that once users have a high reliability, they have a complete monopoly, and no new values can change this. Taking an example:

g was initially 0.5
user1 r was initially 0.5
user2 r was initially 0.5
user3 r was initially 0.5

Now, they will submit values one by one, and watch what happens:

user1 a is submitted, 1.0
user1 reliability goes slightly down because it differs from g (0.5)
user2 a is submitted, 1.0
user1 and user2 reliability go up to 100%, g is now 1.0.
user3 a is submitted, 0.0
user3 reliability goes down to 0%. g is still 1.0.

Since user3 has a very low reliability, the weighting does not have any effect at all on g. User3's reliability goes down because the values submitted differ completely from the global average. What can be done to make user3's submissions have some effect on the final value? Maybe I need to add some constant so that reliability never ends up completely zero (but close)?

Now, for the SQL code. I've added an SQL fiddle that demonstrates the problem: http://sqlfiddle.com/#!3/d3fd1/21 I've abstracted the code to keep it as short, but it's still quite long.

Table creation, stored procedures and trigger:

-- Stores user info
CREATE TABLE dbo.Users(
    [UserID] [int] NOT NULL,
    [Reliability] [float] NOT NULL
  )

-- Contains global averages from all users who submitted data
CREATE TABLE dbo.GlobalSubmission(
    GlobalSubmissionID  [int] NOT NULL,
    Name [varchar](50) NULL,
    GlobalAverage [float] NOT NULL,
)

CREATE TABLE dbo.UserSubmission(
    SubValue float NOT NULL,
    GlobalSubmissionID int NOT NULL,
    UserID int NOT NULL,
)


GO

--Calculate the "ideal value", used for GlobalSubmission.
CREATE FUNCTION dbo.IdealValueCalc(@globalSubmissionID INT)
RETURNS int
AS
BEGIN

DECLARE @tmpReliability TABLE (SubValue float, Reliability float)


INSERT INTO @tmpReliability
    SELECT AVG(us.SubValue) as SubValue, usr.Reliability Reliability FROM UserSubmission us
    JOIN Users usr 
    ON us.UserID = usr.UserID
    WHERE GlobalSubmissionID = @GlobalSubmissionID
    GROUP BY us.UserID, usr.Reliability

--Perform weighted mean calculations.
Return (SELECT SUM(SubValue * Reliability) / SUM(Reliability) FROM @tmpReliability)
END
go


--Calculate the reliability of one user.
CREATE FUNCTION dbo.GetReliabilityForUser
(@userID int)
Returns Float
AS BEGIN
Return (SELECT 1 - AVG(ABS(db.userAvg - db.GlobalAverage))
    FROM (
      SELECT pmd.UserID,
            gs.GlobalAverage, 
            AVG(pmd.SubValue) as userAvg
      FROM UserSubmission pmd
      -- Joins average value for each user with "ideal" value from GlobalSubmission
      JOIN GlobalSubmission gs 
        ON gs.GlobalSubmissionID = pmd.GlobalSubmissionID
        WHERE pmd.UserID = 1
      GROUP BY pmd.UserID, gs.GlobalSubmissionID, gs.GlobalAverage
     ) db
     GROUP BY db.UserID)
End
go



CREATE TRIGGER trg_SubmissionComputation
ON UserSubmission 
AFTER INSERT, UPDATE
AS BEGIN
    --Calculate this uer's reliability
    DECLARE @userID int = (SELECT TOP(1) UserID FROM inserted)
    DECLARE @userReliability float = dbo.GetReliabilityForUser(@userID)

    UPDATE Users
    SET Reliability=@userReliability
    WHERE UserID = @userID

    --Recalculate globalSubmission values:
    DECLARE @globalSubmissionID int = (SELECT TOP(1) GlobalSubmissionID FROM inserted)
    DECLARE @globalAverage float = dbo.IdealValueCalc(@globalSubmissionID)
        --The global average for this set of submissions has been recalculated. Now inserting:

    UPDATE GlobalSubmission
    SET GlobalAverage = @globalAverage 
    WHERE GlobalSubmissionID = @globalSubmissionID
END
GO

Testing it:

--Creating 3 new users
INSERT INTO Users
(UserID, Reliability)
values 
(1, 0.5),
(2, 0.5),
(3, 0.5)
GO

--Creating a new GlobalSubmission
INSERT INTO GlobalSubmission
(GlobalSubmissionID, NAME, GlobalAverage)
values (1, 'BOILER2B' , 0.5)
GO

--First, we will submit values of 1 for two users:
INSERT INTO UserSubmission values (1.0, 1, 1); -- Value: 1.0, User 1, Submission 1
GO
INSERT INTO UserSubmission values (1.0, 1, 2); -- Value: 1.0, User 2, Submission 1
GO
INSERT INTO UserSubmission values (1.0, 1, 1); -- Value: 1.0, User 1, Submission 1
GO
INSERT INTO UserSubmission values (1.0, 1, 2); -- Value: 1.0, User 2, Submission 1
GO


--Now, we will submit values of 0 for the third user:
INSERT INTO UserSubmission values (0.0, 1, 3); -- Value: 0.0, User 3, Submission 1
GO
INSERT INTO UserSubmission values (0.0, 1, 3); -- Value: 0.0, User 3, Submission 1
GO

SELECT * FROM Users -- This results in 0% reliability for the last user.

--If we create new users and add them, the reliability won't budge:
INSERT INTO Users
(UserID, Reliability)
values 
(4, 0.5),
(5, 0.5),
(6, 0.5),
(7, 0.5),
(8, 0.5)
GO


INSERT INTO UserSubmission values (0, 1, 4); -- Value: 0, User 4, Submission 1
GO
INSERT INTO UserSubmission values (0, 1, 5); -- Value: 0, User 5, Submission 1
GO
INSERT INTO UserSubmission values (0, 1, 6); -- Value: 0, User 6, Submission 1
GO
INSERT INTO UserSubmission values (0, 1, 7); -- Value: 0, User 7, Submission 1
GO
INSERT INTO UserSubmission values (0, 1, 8); -- Value: 0, User 8, Submission 1
GO
INSERT INTO UserSubmission values (0, 1, 4); -- Value: 0, User 4, Submission 1
GO
INSERT INTO UserSubmission values (0, 1, 5); -- Value: 0, User 5, Submission 1
GO
INSERT INTO UserSubmission values (0, 1, 6); -- Value: 0, User 6, Submission 1
GO
INSERT INTO UserSubmission values (0, 1, 7); -- Value: 0, User 7, Submission 1
GO
INSERT INTO UserSubmission values (0, 1, 8); -- Value: 0, User 8, Submission 1
GO


SELECT * FROM Users -- Even though we've added loads of new users suggesting 0 as value, the final value
-- is remaining 1.0, because when a new value (0) is submitted, it varies too much from the global average
--(1), causing the reliability of that user to go down, and that user ends up making no influence on the
-- global average!
like image 372
David Avatar asked Apr 11 '13 18:04

David


1 Answers

Here is an alternative estimate that is still a bit ad hoc but doesn't ever produce weight 0.

1) For each user produce an exponentially decaying estimate of squared error. Start with a tunable arbitrary estimate K. Then every time the user produces a value a and the group mean is g produce a squared error E = (a - g) * (a - g) and change the estimate of squared error from before to after = before * x + E * (1 - x) where x is another tunable constant between 0 and 1 which tunes how fast old estimates decay. This estimate can never quite get down to zero, but because of the next step it might be as well to stop it decreasing below some tunable value.

2) To get a new global estimate use a weighted mean as before, but make the weights the reciprocal of the current estimate of squared error for that user.

If all the users were unbiased then the exponentially decaying estimates might end up as decent estimates of average squared error for each user, and then the weights would be the linear combination of estimates which minimises the expected squared error of the global estimate. Check: if the different users i submitted averages of Ni estimates from the same source then the mean squared error of each user's estimates would be 1/Ni so multiplying by the reciprocal of this would turn their averages into the original sum of estimates produced by each user and the weighted estimate would end up just pooling the estimates.

like image 136
mcdowella Avatar answered Oct 19 '22 20:10

mcdowella