I'm considering a regex to restrict punctuation in city names (worldwide). What would be a fairly inclusive whitelist of these?
I'm thinking:
(space)
. period
- hyphen
' apostrophe
Also thinking maybe comma or slash but I don't have any examples. Are there others?
This is the most inclusive whitelist of punctuation to be found in city names. The ASCII apostrophe codepoint may not be the one used when someone is entering an apostrophe on their keyboard.
If you've discerned the encoding of the submitted text, you should be able to see if it falls under the Punctuation block:
/\p{InGeneral_Punctuation}/
If you are limiting yourself to Latin-Extended, just use those:
/\p{InLatin_Extended-A}/
Also, ask yourself: What are the consequences of someone putting a funny character into my city name? Is that worse than the consequences of someone not being able to enter their correct address, if I exclude too much?
USPS standard address formatting calls for stripping all special characters except 'necessary' hyphens and dashes used in the primary and/or secondary street address lines and hyphens in the ZIP.
So if an address is:
John O'Toole
456 N 4-1/2 St
San José, CA 99999-4545
The post office prefers envelopes be labeled:
John O Toole
456 N 4 1/2 St
San Jose CA 9999-4545
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With