Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What punctuation characters are necessary for a city field?

I'm considering a regex to restrict punctuation in city names (worldwide). What would be a fairly inclusive whitelist of these?

I'm thinking:

 (space)
. period
- hyphen
' apostrophe

Also thinking maybe comma or slash but I don't have any examples. Are there others?

like image 233
User Avatar asked Feb 26 '14 03:02

User


2 Answers

This is the most inclusive whitelist of punctuation to be found in city names. The ASCII apostrophe codepoint may not be the one used when someone is entering an apostrophe on their keyboard.

If you've discerned the encoding of the submitted text, you should be able to see if it falls under the Punctuation block:

/\p{InGeneral_Punctuation}/

If you are limiting yourself to Latin-Extended, just use those:

/\p{InLatin_Extended-A}/

Also, ask yourself: What are the consequences of someone putting a funny character into my city name? Is that worse than the consequences of someone not being able to enter their correct address, if I exclude too much?

like image 144
heptadecagram Avatar answered Oct 01 '22 02:10

heptadecagram


USPS standard address formatting calls for stripping all special characters except 'necessary' hyphens and dashes used in the primary and/or secondary street address lines and hyphens in the ZIP.

So if an address is:

John O'Toole
456 N 4-1/2 St
San José, CA 99999-4545

The post office prefers envelopes be labeled:

John O Toole
456 N 4 1/2 St
San Jose CA 9999-4545
like image 45
user5203006 Avatar answered Oct 01 '22 00:10

user5203006