Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extracting a list of substrings from MongoDB using a Regular Expression

I need to extract a part of a string that matches a regex and return it.

I have a set of documents such as:

{"_id" :12121, "fileName" : "apple.doc"}, 
{"_id" :12125, "fileName" : "rap.txt"},
{"_id" :12126, "fileName" : "tap.pdf"}, 
{"_id" :12126, "fileName" : "cricket.txt"}, 

I need to extract all file extensions and return {".doc", ".txt", ".pdf"}.

I am trying to use the $regex operator to find the sub strings and aggregate on the results but am unable to extract the required part and pass it down the pipeline.

I have tried something like this without success:

aggregate([
  { $match: { "name": { $regex: '/\.[0-9a-z]+$/i', "$options": "i" } } },
  { $group: { _id: null, tot: { $push: "$name" } } }
])
like image 355
Macky Avatar asked Oct 30 '25 22:10

Macky


1 Answers

It's almost undoable to do it in the aggregation pipe, you want to project your matches and include only the part after the period. There is no (yet) operator to locate the position of the period. You need the position because $substr (https://docs.mongodb.com/manual/reference/operator/aggregation/substr/) requires a start position. In addition $regEx is only for matching, you cannot use it in a projection to replace.

I think for now it's a easier to do it in code. here you could use a replace regex or any other solution provided by your language

like image 71
HoefMeistert Avatar answered Nov 02 '25 11:11

HoefMeistert