Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting general information about MongoDB collections with FSharp

Tags:

c#

mongodb

f#

Can I retrieve basic information about all collections in a MongoDB with F#?

I have a MongoDB with > 450 collections. I can access the db with

open MongoDB.Bson
open MongoDB.Driver
open MongoDB.Driver.Core 
open MongoDB.FSharp
open System.Collections.Generic

let connectionString = "mystring"
let client = new MongoClient(connectionString)
let db = client.GetDatabase(name = "Production")

I had considered trying to just get all collections then loop through each collection name and get basic information about each collection with

let collections = db.ListCollections()

and

db.GetCollection([name of a collection])

but the db.GetCollection([name]) requires me to define a type to pull the information about each collection. This is challenging for me as I don't want to have to define a type for each collection, of which there are > 450, and frankly, I don't really know much about this DB. (Actually, no one in my org does; that's why I'm trying to put together a very basic data dictionary.)

Is defining the type for each collection really necessary? Can I use the MongoCollection methods available here without having to define a type for each collection?


EDIT: Ultimately, I'd like to be able to output collection name, the n documents in each collection, a list of the field names in each collection, and a list of each field type.

like image 876
Steven Avatar asked Aug 23 '18 17:08

Steven


1 Answers

I chose to write my examples in C# as i'm more familiar with the C# driver and it is a listed tag on the question. You can run an aggregation against each collection to find all top level fields and their (mongodb) types for each document.

The aggregation is done in 3 steps. Lets assume the input is 10 documents which all have this form:

{
  "_id": ObjectId("myId"),
  "num": 1,
  "str": "Hello, world!"
}
  1. $project Convert each document into an array of documents with values fieldName and fieldType. Outputs 10 documents with a single array field. The array field will have 3 elements.

  2. $unwind the arrays of field infos. Outputs 30 documents each with a single field corresponding to an element from the output of step 1.

  3. $group the fields by fieldName and fieldType to get distinct values. Outputs 3 documents. Since all fields with the same name always have the same type in this example, there is only one final output document for each field. If two different documents defined the same field, one as string and one as int there would be separate entries in this result set for both.


// Define our aggregation steps.
// Step 1, $project:
var project = new BsonDocument
{ {
    "$project", new BsonDocument
    {
        {
            "_id", 0
        },
        {
            "fields", new BsonDocument
            { {
                "$map", new BsonDocument
                {
                    { "input", new BsonDocument { { "$objectToArray", "$$ROOT" } } },
                    { "in", new BsonDocument {
                        { "fieldName", "$$this.k" },
                        { "fieldType", new BsonDocument { { "$type", "$$this.v" } } }
                    } }
                }
            } }
        }
    }
} };

// Step 2, $unwind
var unwind = new BsonDocument
{ {
    "$unwind", "$fields"
} };

// Step 3, $group
var group = new BsonDocument
{
    {
        "$group", new BsonDocument
        {
            {
                "_id", new BsonDocument
                {
                    { "fieldName", "$fields.fieldName" },
                    { "fieldType", "$fields.fieldType" }
                }
            }
        }
    }
};

// Connect to our database
var client = new MongoClient("myConnectionString");
var db = client.GetDatabase("myDatabase");

var collections = db.ListCollections().ToEnumerable();

/*
We will store the results in a dictionary of collections.
Since the same field can have multiple types associated with it the inner value corresponding to each field is `List<string>`.

The outer dictionary keys are collection names. The inner dictionary keys are field names.
The inner dictionary values are the types for the provided inner dictionary's key (field name).
List<string> fieldTypes = allCollectionFieldTypes[collectionName][fieldName]
*/
Dictionary<string, Dictionary<string, List<string>>> allCollectionFieldTypes = new Dictionary<string, Dictionary<string, List<string>>>();

foreach (var collInfo in collections)
{
    var collName = collInfo["name"].AsString;
    var coll = db.GetCollection<BsonDocument>(collName);

    Console.WriteLine("Finding field information for " + collName);                

    var pipeline = PipelineDefinition<BsonDocument, BsonDocument>.Create(project, unwind, group);
    var cursor = coll.Aggregate(pipeline);
    var lst = cursor.ToList();

    allCollectionFieldTypes.Add(collName, new Dictionary<string, List<string>>());
    foreach (var item in lst)
    {
        var innerDict = allCollectionFieldTypes[collName];

        var fieldName = item["_id"]["fieldName"].AsString;
        var fieldType = item["_id"]["fieldType"].AsString;

        if (!innerDict.ContainsKey(fieldName))
        {
            innerDict.Add(fieldName, new List<string>());
        }

        innerDict[fieldName].Add(fieldType);
    }
}

Now you can iterate over your result set:

foreach(var collKvp in allCollectionFieldTypes)
{
  foreach(var fieldKvp in collKvp.Value)
  {
    foreach(var fieldType in fieldKvp.Value)
    {
      Console.WriteLine($"Collection {collKvp.Key} has field name {fieldKvp.Key} with type {fieldType}");
    }
  }
}
like image 101
Neil Avatar answered Sep 20 '22 12:09

Neil