Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Splitting a Domain name into constituent words (if possible)?

I want to break a domain name into constituent words and numbers e.g.

iamadomain11.com = ['i', 'am', 'a', 'domain', '11']

How do i do this? I am aware that there may be multiple sets possible, however, i am currently even ok, just getting 1 set of possibilities.

like image 361
demos Avatar asked Dec 18 '25 08:12

demos


1 Answers

This is actually solved in the O'Reilly Media book, Beautiful Data. In chapter 14, "Natural Language Corpus Data", he creates a splitter to do exactly as you want in Python using a giant freely available token frequency data set.

like image 57
Thien Avatar answered Dec 21 '25 01:12

Thien