Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert array of paths into data structure

I have an array of paths like so:

/doc/data/main.js
/doc/data/xl.js
/doc/data/dandu/sdasa.js
/mnt/data/la.js

I'm trying to build the following structure:

{
  "directories": {
    "/doc/data": {
      "directories": {
        "dandu": {
          "files": {
            "sdasa.js": 1
          }
        }
      },
      "files": {
        "main.js": 1,
        "xl.js": 1
      }
    },
    "/mnt/data": {
      "directories": {},
      "files": {
        "la.js": 1
      }
    }
  },
  "files": {}
}

Please ignore the value of the files in that examples. I will assign more complex data for that in the future. Currently the values are 1.

From previous topic I found out that I could use the following function in order to get something similar:

var parsePathArray = function() {
    var parsed = {};
    for(var i = 0; i < paths.length; i++) {
        var position = parsed;
        var split = paths[i].split('/');
        for(var j = 0; j < split.length; j++) {
            if(split[j] !== "") {
                if(typeof position[split[j]] === 'undefined')
                    position[split[j]] = {};
                position = position[split[j]];
            }
        }
    }
    return parsed;
}

The main problem with that solution is that it splits each directory. But I don't want to split each directory, rather get the directories that contain at least one file. For example, /doc does not have files in my example (only directory - /data) so we continue with it. I tried to change the function a bit but it didn't work:

var str = '';
for (var j = 0; j < split.length; j++) {
    if (j < split.length - 1 && typeof this.files[str] === 'undefined') {
        str += '/' + split[j];
        continue;
    }
    if (str !== '') {
        if (typeof this.files[str] === 'undefined')
            this.files[str] = {};
        this.files = this.files[str];
    }
}

What would be the best way to convert those strings into that data structure?

like image 788
abuka123 Avatar asked Mar 10 '26 01:03

abuka123


1 Answers

Here is the solution I came up with. It works by building each path up one piece at a time and comparing it against the existing data structure. It should also handle files by themselves, as your original post seemed to imply that was necessary. I decided to split it into two functions in the end as it might make it easier to explain.

The Code:

const paths = [
    '/doc/data/main.js',
    'doc/data/xl.js',
    '/etc/further/owy.js',
    '/etc/further/abc.js',
    'etc/mma.js',
    '/mnt/data/it.js',
    '/mnt/data/path/is/long/la.js',
    'mnt/data/path/is/la.js',
    '/doc/data/dandu/sdasa.js',
    '/etc/i/j/k/l/thing.js',
    '/etc/i/j/areallylongname.js',
    'thing.js'
];

function buildStructure(paths) {
    let structure = {
        directories: {},
        files: {}
    };

    const compare = (a, b) => {
        return a.split('/').length - b.split('/').length;
    };

    [...paths]
    .map(path => path = path.charAt(0) === '/' ? path : `/${path}`)
    .sort((a, b) => compare(a, b)).forEach(path => {
        const nodes = path.split('/').slice(1);
        const file = nodes.pop();
        
        let pointer = findDirectory(nodes[0] ? structure.directories : structure, '', [...nodes]);

        pointer.files = pointer.files || {};
        pointer.files = {
            ...pointer.files,
            [file]: 1
        };
    });

    return structure;
};

function findDirectory(pointer, subPath, nodes) {
    if (nodes.length === 0) {
        if (subPath) {
            pointer[subPath] = {};
            pointer = pointer[subPath];
        };
        return pointer;
    };

    let newPath = `${subPath}/${nodes[0]}`;
    nodes.shift();

    if (pointer[newPath]) {
        pointer = pointer[newPath];

        if (nodes.length >= 1) {
            pointer.directories = pointer.directories || {};
            pointer = pointer.directories;
        };

        newPath = '';
    };

    return findDirectory(pointer, newPath, nodes);
};

const structure = buildStructure(paths);
console.log(structure);
.as-console-wrapper { min-height: 100%!important; top: 0; }

The Explanation:

This ended up being a lot tricker (and much more interesting) than I imagined when I started working on it. Once you start concatenating directories the order of operation really matters.

Starting in buildStructure, we map over the array of paths to catch any entries without a leading slash. Then, sort them according to the number of directories they reference. This is so we can be sure we are working from the top of the structure towards the bottom.

Separate each path into an array of nodes, and pop off the file string. You're left with something like this:

const nodes = ['doc', 'data'];
const file = 'main.js';

Now we have to feed these nodes through findDirectory to find/create the file's location. The variable pointer is there to keep track of our position in the structure object, and any changes we make to the pointer will be replicated in the structure since they share reference equality.

The findDirectory function recursively processes each of the nodes to gradually build the path back up to its full length. Whenever we create a path that already exists in structures directories, we move inside it and start building the path up again to try and find the next one. If we can't find it, then we've got a brand new directory. The aim is to always end up inside the correct directory when we exit the function - creating it along the way if needs be.

To simplify things, say we have only two paths to log:

const paths = [
  'doc/data/main.js',
  'doc/data/dandu/sdasa.js'
];

For the first path, findDirectory is going to make three passes. These are the parameters that will be given to it on each pass:

pointer = structure.directories > same > same

subPath = '' > '/doc' > '/doc/data'

nodes = ['doc', 'data'] > ['data'] > []

We never got a match, so as the function exits it creates that directory on structure.directories. Now, the second path is going to make four passes:

pointer = 
  structure.directories > 
  same > 
  structure.directories./doc/data.directories > 
  same

subPath = '' > '/doc' > '' > '/dandu' 

nodes = ['doc', 'data', 'dandu'] > ['data', 'dandu'] > ['dandu'] > []

As you can see, on the second pass we made the string /doc/data which does exist on structure.directories. So we move into it, and since there's more nodes to process we create a new directories object there and enter that too. If there were no more nodes to process, we'd know we'd already arrived at the correct level and this would not be necessary. From here it's a case of simply building the path back up again and repeating the process.

Once we're in the right directory, we can place the file directly on the pointer and it will be registered on the structure. Once we move to the next path, the pointer will once again be pointing at structure.directories.

In cases where the are no nodes to process (file name only) - pass findDirectory the whole structures object instead and the file will go into the top level of the object.


Hopefully this explains things well enough and will be useful to you. I enjoyed working on this and would be pleased for any suggestions on how to improve it.

like image 162
lawrence-witt Avatar answered Mar 11 '26 14:03

lawrence-witt