Is there a way to safely sanitize path input, without using realpath()
?
Aim is to prevent malicious inputs like ../../../../../path/to/file
$handle = fopen($path . '/' . $filename, 'r');
Not sure why you wouldn't want to use realpath
but path name sanitisation is a very simple concept, along the following lines:
/
), prefix it with the current working directory and /
, making it an absolute path./
with a single one (a)././
with /
./.
if at the end./anything/../
with /
./anything/..
if at the end.The text anything
in this case means the longest sequence of characters that aren't /
.
Note that those rules should be applied continuously until such time as none of them result in a change. In other words, do all six (one pass). If the string changed, then go back and do all six again (another pass). Keep doing that until the string is the same as before the pass just executed.
Once those steps are done, you have a canonical path name that can be checked for a valid pattern. Most likely that will be anything that doesn't start with ../
(in other words, it doesn't try to move above the starting point. There may be other rules you want to apply but that's outside the scope of this question.
(a) If you're working on a system that treats //
at the start of a path as special, make sure you replace multiple /
characters at the start with two of them. This is the only place where POSIX allows (but does not mandate) special handling for multiples, in all other cases, multiple /
characters are equivalent to a single one.
There is a Remove Dot Sequence algorithm described in RFC 3986 that is used to interpret and remove the special .
and ..
complete path segments from a referenced path during the process of relative URI reference resolution.
You could use this algorithms for file system paths as well:
// as per RFC 3986
// @see https://www.rfc-editor.org/rfc/rfc3986#section-5.2.4
function remove_dot_segments($input) {
// 1. The input buffer is initialized with the now-appended path
// components and the output buffer is initialized to the empty
// string.
$output = '';
// 2. While the input buffer is not empty, loop as follows:
while ($input !== '') {
// A. If the input buffer begins with a prefix of "`../`" or "`./`",
// then remove that prefix from the input buffer; otherwise,
if (
($prefix = substr($input, 0, 3)) == '../' ||
($prefix = substr($input, 0, 2)) == './'
) {
$input = substr($input, strlen($prefix));
} else
// B. if the input buffer begins with a prefix of "`/./`" or "`/.`",
// where "`.`" is a complete path segment, then replace that
// prefix with "`/`" in the input buffer; otherwise,
if (
($prefix = substr($input, 0, 3)) == '/./' ||
($prefix = $input) == '/.'
) {
$input = '/' . substr($input, strlen($prefix));
} else
// C. if the input buffer begins with a prefix of "/../" or "/..",
// where "`..`" is a complete path segment, then replace that
// prefix with "`/`" in the input buffer and remove the last
// segment and its preceding "/" (if any) from the output
// buffer; otherwise,
if (
($prefix = substr($input, 0, 4)) == '/../' ||
($prefix = $input) == '/..'
) {
$input = '/' . substr($input, strlen($prefix));
$output = substr($output, 0, strrpos($output, '/'));
} else
// D. if the input buffer consists only of "." or "..", then remove
// that from the input buffer; otherwise,
if ($input == '.' || $input == '..') {
$input = '';
} else
// E. move the first path segment in the input buffer to the end of
// the output buffer, including the initial "/" character (if
// any) and any subsequent characters up to, but not including,
// the next "/" character or the end of the input buffer.
{
$pos = strpos($input, '/');
if ($pos === 0) $pos = strpos($input, '/', $pos+1);
if ($pos === false) $pos = strlen($input);
$output .= substr($input, 0, $pos);
$input = (string) substr($input, $pos);
}
}
// 3. Finally, the output buffer is returned as the result of remove_dot_segments.
return $output;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With