In C, using POSIX calls, how can I determine if a path is inside a target directory?
For example, a web server has its root directory in /srv
, this is getcwd()
for the daemon.
When parsing a request for /index.html
, it returns the contents of /srv/index.html
.
How can I filter out requests for paths outside of /srv
?
/../etc/passwd
,
/valid/../../etc/passwd
,
etc.
Splitting the path at /
and rejecting any array containing ..
will break valid accesses /srv/valid/../index.html
.
Is there a canonical way to do this with system calls? Or do I need to manually walk the path and count directory depth?
There's always realpath
:
The realpath() function shall derive, from the pathname pointed to by *file_name*, an absolute pathname that resolves to the same directory entry, whose resolution does not involve '.' , '..' , or symbolic links.
Then compare what realpath
gives you with your desired root directory and see if they match up.
You could also clean up the filename by hand by expanding the double-dots before you prepend the "/srv"
. Split the incoming path on slashes and walk through it piece by piece. If you get a "."
then remove it and move on; if you get a ".."
, then remove it and the previous component (taking care not go past the first entry in your list); if you get anything else, just move on to the next component. Then paste what's left back together with slashes between the components and prepend your "/srv/"
. So if someone gives you "/valid/../../etc/passwd"
, you'll end up with "/srv/etc/passwd"
and "/where/is/../pancakes/house"
will end up as "/srv/where/pancakes/house"
.
That way you can't get outside "/srv"
(except through symbolic links of course) and an incoming "/../.."
will be the same as "/"
(just like in a normal file system). But you'd still want to use realpath
if you're worried about symbolic under "/srv"
.
Working with the path name component by component would also allow you to break the connection between the layout you present to the outside world and the actual file system layout; there's no need for "/this/that/other/thing"
to map to an actual "/srv/this/that/other/thing"
file anywhere, the path could just be a key in some sort of database or some sort of namespace path to a function call.
To determine if a file F is within a directory D, first stat D to determine its device number and inode number (members st_dev and st_ino of struct stat).
Then stat F to determine if it is a directory. If not, call basename to determine the name of the directory containing it. Set G to the name of this directory. If F was already a directory, set G=F.
Now, F is within D if and only if G is within D. Next we have a loop.
while (1) {
if (samefile(d_statinfo.d_dev, d_statinfo.d_ino, G)) {
return 1; // F was within D
} else if (0 == strcmp("/", G) {
return 0; // F was not within D.
}
G = dirname(G);
}
The samefile function is simple:
int samefile(dev_t ddev, ino_t dino, const char *path) {
struct stat st;
if (0 == stat(path, &st)) {
return ddev == st.st_dev && dino == st.st_no;
} else {
throw ...; // or return error value (but also change the caller to detect it)
}
}
This will work on POSIX filesystems. But many filesystems are not POSIX. Problems to look out for include:
/a
is a bind mount of /b
, then /a/1
correctly appears to be inside /a
, but with the implementation above, /b/1
also appears to be inside /a
. I think that's probably the correct answer. However, if this is not the result you prefer, this is easily fixed by changing the return 1
case to call strcmp()
to compare the path names too. However, for this to work you will need to start by calling realpath
on both F and D. The realpath
call can be quite expensive (since it may need to hit the disk a number of times).//foo/bar
. POSIX allows path names beginning with //
to be special in a way which is somewhat not well defined. Actually I forget the precise level of guarantee about semantics that POSIX provides. I think that POSIX allows //foo/bar
and //baz/ugh
to refer to the same file. The device/inode check should still do the right thing for you but you may find it does not (i.e. you may find that //foo/bar
and //baz/ugh
can refer to the same file but have different device/inode numbers).This answer assumes that we start with an absolute path for both F and D. If this is not guaranteed you may need to do some conversion using realpath()
and getcwd()
. This will be a problem if the name of the current directory is longer than PATH_MAX
(which can certainly happen).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With