Can I retrieve basic information about all collections in a MongoDB
with F#
?
I have a MongoDB
with > 450 collections. I can access the db with
open MongoDB.Bson
open MongoDB.Driver
open MongoDB.Driver.Core
open MongoDB.FSharp
open System.Collections.Generic
let connectionString = "mystring"
let client = new MongoClient(connectionString)
let db = client.GetDatabase(name = "Production")
I had considered trying to just get all collections then loop through each collection name and get basic information about each collection with
let collections = db.ListCollections()
and
db.GetCollection([name of a collection])
but the db.GetCollection([name])
requires me to define a type to pull the information about each collection. This is challenging for me as I don't want to have to define a type for each collection, of which there are > 450, and frankly, I don't really know much about this DB. (Actually, no one in my org does; that's why I'm trying to put together a very basic data dictionary.)
Is defining the type for each collection really necessary? Can I use the MongoCollection methods available here without having to define a type for each collection?
EDIT: Ultimately, I'd like to be able to output collection name, the n documents in each collection, a list of the field names in each collection, and a list of each field type.
I chose to write my examples in C# as i'm more familiar with the C# driver and it is a listed tag on the question. You can run an aggregation against each collection to find all top level fields and their (mongodb) types for each document.
The aggregation is done in 3 steps. Lets assume the input is 10 documents which all have this form:
{
"_id": ObjectId("myId"),
"num": 1,
"str": "Hello, world!"
}
$project
Convert each document into an array of documents with values fieldName
and fieldType
. Outputs 10 documents with a single array field. The array field will have 3 elements.
$unwind
the arrays of field infos. Outputs 30 documents each with a single field corresponding to an element from the output of step 1.
$group
the fields by fieldName
and fieldType
to get distinct values. Outputs 3 documents. Since all fields with the same name always have the same type in this example, there is only one final output document for each field. If two different documents defined the same field, one as string and one as int there would be separate entries in this result set for both.
// Define our aggregation steps.
// Step 1, $project:
var project = new BsonDocument
{ {
"$project", new BsonDocument
{
{
"_id", 0
},
{
"fields", new BsonDocument
{ {
"$map", new BsonDocument
{
{ "input", new BsonDocument { { "$objectToArray", "$$ROOT" } } },
{ "in", new BsonDocument {
{ "fieldName", "$$this.k" },
{ "fieldType", new BsonDocument { { "$type", "$$this.v" } } }
} }
}
} }
}
}
} };
// Step 2, $unwind
var unwind = new BsonDocument
{ {
"$unwind", "$fields"
} };
// Step 3, $group
var group = new BsonDocument
{
{
"$group", new BsonDocument
{
{
"_id", new BsonDocument
{
{ "fieldName", "$fields.fieldName" },
{ "fieldType", "$fields.fieldType" }
}
}
}
}
};
// Connect to our database
var client = new MongoClient("myConnectionString");
var db = client.GetDatabase("myDatabase");
var collections = db.ListCollections().ToEnumerable();
/*
We will store the results in a dictionary of collections.
Since the same field can have multiple types associated with it the inner value corresponding to each field is `List<string>`.
The outer dictionary keys are collection names. The inner dictionary keys are field names.
The inner dictionary values are the types for the provided inner dictionary's key (field name).
List<string> fieldTypes = allCollectionFieldTypes[collectionName][fieldName]
*/
Dictionary<string, Dictionary<string, List<string>>> allCollectionFieldTypes = new Dictionary<string, Dictionary<string, List<string>>>();
foreach (var collInfo in collections)
{
var collName = collInfo["name"].AsString;
var coll = db.GetCollection<BsonDocument>(collName);
Console.WriteLine("Finding field information for " + collName);
var pipeline = PipelineDefinition<BsonDocument, BsonDocument>.Create(project, unwind, group);
var cursor = coll.Aggregate(pipeline);
var lst = cursor.ToList();
allCollectionFieldTypes.Add(collName, new Dictionary<string, List<string>>());
foreach (var item in lst)
{
var innerDict = allCollectionFieldTypes[collName];
var fieldName = item["_id"]["fieldName"].AsString;
var fieldType = item["_id"]["fieldType"].AsString;
if (!innerDict.ContainsKey(fieldName))
{
innerDict.Add(fieldName, new List<string>());
}
innerDict[fieldName].Add(fieldType);
}
}
Now you can iterate over your result set:
foreach(var collKvp in allCollectionFieldTypes)
{
foreach(var fieldKvp in collKvp.Value)
{
foreach(var fieldType in fieldKvp.Value)
{
Console.WriteLine($"Collection {collKvp.Key} has field name {fieldKvp.Key} with type {fieldType}");
}
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With