I've set up an AWS Sagemaker Ground Truth labeling project and am using a private team for the work. I want to track which member of my team gives each answer.
The only user specific information is a workerId
as seen, for example, here.
The sagemaker documentation does not have any information about this ID, nor is it anywhere in the cognito documentation, which I need to use to manage my worker team.
As far as I can tell, the workerId
is a mturk related id. A workerId
shows up in the data structures here.
My question is how can I map the workerId
to the specific user in my cognito group? Without the ability to do so, the project will not work.
This is actually possible programmatically, without relying on workers to report their identity - I ran into the same problem, and found the following:
Sagemaker Ground Truth does automatic logging of worker actions. Among the things it logs is the workerId
, with which you're familiar, the cognito_user_pool_id
, and the cognito_sub_id
(take a look at the track worker performance docs).
The workerId
is Ground Truth specific and opaque, and there isn't a way to get Ground Truth to tell you which Cognito user a workerId
maps to. However, a Cognito user is uniquely mapped to by its sub id
.
You can leverage the logs' pairing of workerId
and cognito_sub_id
to generate mappings, by using the cognito sub id to query the cognito username.
You can use the mappings above to maintain a database of workerId - cognito sub id - username
triplets, and use that database whenever you need to figure out what user a workerId
belongs to. Note, that this will mean the first time you see a workerId
in a Ground Truth job, you won't have a way to find its mapping. If this is a problem, you can actually solve it by using a throwaway job as suggested previously. The logs of that job will include the mappings you need.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With