Regex for UTF-8 valid filenames

Tags:

I am trying to process the names of the files my users upload. I want to support all valid UTF-8 characters except those that might pose a problem for display on an HTML webpage, access over a CLI interface, or storage and retrieval on a filesystem.

Anyway, I came up with the following lenient function and I'm wondering if it's safe enough to be used. I use prepared statements for all database queries and I always html encode my output, but I still like to know that this is also a well thought through approach.

// $filename = $_FILES['file']['name'];

$filename = 'Filename 123;".\'"."la\l[a]*(/.jpg
∮ E⋅da = Q,  n → ∞, ∑ f(i) = ∏ g(i), ∀x∈ℝ: ⌈x⌉ = −⌊−x⌋, α ∧ ¬β = ¬(¬α ∨ β),
  ℕ ⊆ ℕ₀ ⊂ ℤ ⊂ ℚ ⊂ ℝ ⊂ ℂ, ⊥ < a ≠ b ≡ c ≤ d ≪ ⊤ ⇒ (A ⇔ B),
  2H₂ + O₂ ⇌ 2H₂O, R = 4.7 kΩ, ⌀ 200 mm
sfajs,-=[];\',./09μετράει
าวนั้นเป็นชน
Καλημέρα κόσμε, コンニチハ
()_+{}|":?><';


// Replace symbols, punctuation, and ASCII control characters like \n or [BEL]
$filename = preg_replace('~[\p{S}\p{P}\p{C}]+~u', ' ', $filename);

Is this approach safe for me, and suitable for my users?

Update

To clarify, I do not use the filename for the name of the file on the filesystem. I generate a unique hash and use that - I just need to save the original name for the users befit since that is how they recognize their files. A SHA1 hash or UUID doesn't mean a thing to them.

440

asked Aug 14 '12 18:08

Xeoncross

1 Answers

The very first thing you need to do is to check your input is UTF-8.

mb_internal_encoding and mb_check_encoding are your friends.

You are using a blacklist, when it's good security practice to use a whitelist of allowed input.

Edit after the clarification:

You should be safe. Remember to filter Lm and No as well if you don't want to summon Zalgo.

answered Sep 22 '22 15:09

InternetSeriousBusiness

Related questions
                            
                                Yii - Query Manipulation for Custom CGridView with Advanced Search
                            
                                Strange Issue with Video File Upload in CodeIgniter
                            
                                A BaseModel in PHP MVC, good or bad?
                            
                                Struggling to store encrypted info in database field
                            
                                Converting audio files and preserving album artwork with ffmpeg
                            
                                improving a friends list query : counting the mutual friends
                            
                                How do I modify the path used by exec in php
                            
                                Retrieve fully rendered page using Curl, or other means?
                            
                                unable to use the string "execute(" in GET or POST
                            
                                php://input for file upload?
                            
                                What is the PHP "resource" type? [closed]
                            
                                Most efficient way to make an activity log
                            
                                Is it Possible to PHPUnit Mock Object to Replace one Created in Class?
                            
                                Should the business logic be separate from the model?
                            
                                Why is MySQL is returning some floats in scientific notation, but not others?
                            
                                Change Linux User password from PHP script
                            
                                Incorrect saturation calculation in RGB to HSL function
                            
                                vim php_javascriptInStrings option?
                            
                                Import CSV file into the MySQL database
                            
                                Problems with modifying the WordPress login page

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Regex for UTF-8 valid filenames

Tags:

php

filenames

file-upload

utf-8

sanitization

Update

Xeoncross

People also ask

1 Answers

InternetSeriousBusiness

Recent Activity

Donate For Us