I need a way to remove "unused" images from my filesystem, i.e. images that are never accessed from any point in my website (doesn't matter if I break external links. I might disable external hotlinking altogether). What's the best way of going about this? Regular users can add multiple attachments to topics/posts and content contributers can bulk upload large numbers of images which can be used in articles or image galleries.
The problem is that the images could be referenced in any of the following ways:
And to top it off, the path of the image could be an absolute or relative HTTP or PHP path and may or may not be built with string concatenation in PHP.
So obviously find/replace or regexing the database or filesystem is out of the question. But luckily for you and me, this system isn't fully implemented yet and I don't need anything that deals with an existing hoard of images. I just need to set up some efficient structure that will allow this in the future.
Some ideas I've thought of:
Ideas/suggestions? Is this just something you have to ignore and live with even if you're making a site with a ridiculous amount of images? Even if it's not worth it, how would something work just for proof-of-concept (I added the garbage-collection tag just because this might be going into that area conceptually).
I will admit that my experience with this was simpler than yours. I had no 'user generated content' so to speak, and my images were all in only templates or database with full path. But what I did is create a perl script that
<img> tagsA couple of notes: I realize that I could have made this script 'smoother' and done multiple things in multiple steps, but its use grew over time and I wanted clearly delineated processing steps so it couldn't ever run amok. I used the CSV to go back and clean up the information where the image didn't exist.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With