I'm hoping that someone might be able to answer whether what I'm trying to accomplish below can be done with the MongoDB Aggregation Framework.
I have a user data structure that resembles the following with close to 1 million documents.
{
"firstName" : "John",
"lastName" : "Doe",
"state" : "NJ",
"email" : "[email protected]"
"source" : [
{
"type" : "SOURCE-A",
"data" : {
"info" : "abc",
"info2" : "xyz"
}
},
{
"type" : "SOURCE-B",
"data" : {
"info3" : "abc"
}
}
]
}
For the purposes of feeding data to another system, I need to generate a flat file structure with limited information from the previous dataset. The columns need to represent:
firstname, lastname, email, is_source-a, is_source-b
The part that I'm having difficulty with is the conditional code that attempts to populate "is_source-a" and "is_source-b". I have tried to use the following aggregation query, but can't figure out how to get it working since the $EQ operator used along with $COND doesn't seem to evaluate data inside of an array (always false).
db.collection.aggregate([
{
$project : {
_id : 0,
firstName : 1,
lastName: 1,
"is_source-a" : {
$cond : [
{ $eq: [ "$source.type", "source-a" ] },
1,
0
]
},
"is_source-b" : {
$cond : [
{ $eq: [ "$source.type", "source-b" ] },
1,
0
]
}
}
}
]);
I could $UNWIND the array first, but then I wind up with a multiple records for each user document and don't understand how to consolidate them back.
Is there something that I'm missing with how to use $EQ (or some other operator) along with $COND when dealing with arrays of objects?
You're definitely on the right track, and using $unwind
can get you there if you follow it up with a $group
to put things back together:
db.collection.aggregate([
{$unwind: '$source'},
{$project: {
_id: 1,
firstName: 1,
lastName: 1,
email: 1,
'is_source-a': {$eq: ['$source.type', 'SOURCE-A']},
'is_source-b': {$eq: ['$source.type', 'SOURCE-B']}
}},
// group the docs that were duplicated in the $unwind back together by _id,
// taking the values for most fields from the $first occurrence of the _id,
// but the $max of the is_source fields so that if its true in any of the
// docs for that _id it will be true in the output for that _id.
{$group: {
_id: '$_id',
firstName: {$first: '$firstName'},
lastName: {$first: '$lastName'},
email: {$first: '$email'},
'is_source-a': {$max: '$is_source-a'},
'is_source-b': {$max: '$is_source-b'}
}},
// project again to remove _id
{$project: {
_id: 0,
firstName: 1,
lastName: 1,
email: 1,
'is_source-a': '$is_source-a',
'is_source-b': '$is_source-b'
}}
])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With