What characters are allowed in twitter hashtags?

Tags:

In developing an iOS app containing a twitter client, I must allow for user generated hashtags (which may be created elsewhere within the app, not just in the tweet body).

I would like to ensure any such hashtags are valid for twitter, so I would like to error check the entered value for invalid characters. Bear in mind that users may be from non-English speaking countries.

I am aware of the usual limitations, such as not beginning a hashtag with a number, and no special punctuation characters, but I was wondering if there is a known list of all additional characters that are technically allowed within hashtags (i.e. international characters).

472

asked Feb 12 '13 00:02

Karl White

1 Answers

Karl, as you've rightly pointed out, any word in any language can be a valid twitter hashtag (as long as it meets a number of basic criteria). As such what you are asking for is a list of valid international word characters. I'm sure someone has compiled such a list somewhere, but using it would not be the most efficient approach to reaching what appears to be your initial goal: ensuring that a given hashtag is valid for twitter.

I believe, what you are looking for is a regular expression that can match all word characters within a Unicode range. Such an expression would not be dependant on your locale and would match all characters in the modern typography that can appear as part of a word.

You didn't specify what language you are writing your app in, so I can't help you with a language specific implementation. However, the basic approach would be as follows:

Check if any of the bracket expressions or character classes already support Unicode character ranges in your language. If yes, then use them.
Check if there is regex modifier that can enable Unicode character range support for your language.

Most modern languages implement regular expressions in a fairly similar way and a lot of them borrow heavily from Perl, so I hope the following two example will put you on the right track:

Perl:

Use POSIX bracket expressions (eg: [[:alpha:]], [[:allnum:]], [[:digit:]], etc) as they give you greater control over the characters you want to match, compared to character classes (eg: \w).

Use /u modifier to enable Unicode support when pattern matching. Under this modifier, the ASCII platform effectively becomes a Unicode platform; and hence, for example, \w will match any of the more than 100,000 word characters in Unicode.

See Perl documentation for more info:

http://perldoc.perl.org/perlre.html#Character-set-modifiers
http://perldoc.perl.org/perlrecharclass.html#POSIX-Character-Classes

Ruby:

Use POSIX bracket expressions as they encompass non-ASCII characters. For instance, /\d/ matches only the ASCII decimal digits (0-9); whereas /[[:digit:]]/ matches any character in the Unicode Nd category.

See Ruby documentation for more info:

http://www.ruby-doc.org/core-2.1.1/Regexp.html#class-Regexp-label-Character+Classes

Examples:

Given a list of hashtags, the following regex will match all hashtags that start with a word character (inc. international word characters) followed by at least one other word character, a number or an underscore:

    m/^#[[:alpha:]][[:alnum:]_]+$/u     # Perl

    /^#[[:alpha:]][[:alnum:]_]+$/       # Ruby

199

answered Oct 05 '22 06:10

UrsaDK

Related questions
                            
                                Best way to draw graphs with Three.js
                            
                                Android - Google Maps API v2 - NoClassDefFoundError
                            
                                Convert a LaTex formula to a type that can be used inside SymPy
                            
                                Create guided tour in an Android app
                            
                                Cipher: What is the reason for IllegalBlockSizeException?
                            
                                Android Libraries in Android Studio
                            
                                Minimal implementation of sprintf or printf
                            
                                Environment.SpecialFolder.ApplicationData returns the wrong folder
                            
                                Dynamically adjust text color based on background image
                            
                                Android Gradle Build System: Create Jar Not Library
                            
                                Hierarchical SQL data (Recursive CTE vs HierarchyID vs closure table)
                            
                                confusion about SQL Server Express and localdb

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With