Logo Questions Linux Laravel Mysql Ubuntu Git Menu

An elegant way to get hashtags out of a string in Python?

I'm looking for a clean way to get a set (list, array, whatever) of words starting with # inside a given string.

In C#, I would write

var hashtags = input
    .Split (' ')
    .Where (s => s[0] == '#')
    .Select (s => s.Substring (1))
    .Distinct ();

What is comparatively elegant code to do this in Python?


Sample input: "Hey guys! #stackoverflow really #rocks #rocks #announcement"
Expected output: ["stackoverflow", "rocks", "announcement"]

like image 1000
Dan Abramov Avatar asked Jun 13 '11 14:06

Dan Abramov

2 Answers

there are some problems with the answers presented here.

  1. {tag.strip("#") for tag in tags.split() if tag.startswith("#")}

    [i[1:] for i in line.split() if i.startswith("#")]

wont works if you have hashtag like '#one#two#'

2 re.compile(r"#(\w+)") wont work for many unicode languages (even using re.UNICODE)

i had seen more ways to extract hashtag, but found non of them answering on all cases

so i wrote some small python code to handle most of the cases. it works for me.

def get_hashtagslist(string):
    ret = []
    hashtag = False
    for char in string:
        if char=='#':
            hashtag = True
            if s:

        # take only the prefix of the hastag in case contain one of this chars (like on:  '#happy,but i..' it will takes only 'happy'  )
        if hashtag and char in [' ','.',',','(',')',':','{','}'] and s:

        if hashtag:

    if s:

    return set(ret)
like image 91
Eyal Ch Avatar answered Oct 18 '22 16:10

Eyal Ch

With @inspectorG4dget's answer, if you want no duplicates, you can use set comprehensions instead of list comprehensions.

>>> tags="Hey guys! #stackoverflow really #rocks #rocks #announcement"
>>> {tag.strip("#") for tag in tags.split() if tag.startswith("#")}
set(['announcement', 'rocks', 'stackoverflow'])

Note that { } syntax for set comprehensions only works starting with Python 2.7.
If you're working with older versions, feed list comprehension ([ ]) output to set function as suggested by @Bertrand.

like image 34
utdemir Avatar answered Oct 18 '22 14:10
