Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Programmatically extract keywords from domain names

Let's say I have a list of domain names that I would like to analyze. Unless the domain name is hyphenated, I don't see a particularly easy way to "extract" the keywords used in the domain. Yet I see it done on sites such as DomainTools.com, Estibot.com, etc. For example:

ilikecheese.com becomes "i like cheese"
sanfranciscohotels.com becomes "san francisco hotels"
...

Any suggestions for accomplishing this efficiently and effectively?

Edit: I'd like to write this in PHP.

like image 607
Kevin Avatar asked Aug 22 '09 07:08

Kevin


People also ask

How do I search for a keyword in Python?

Python in its language defines an inbuilt module “keyword” which handles certain operations related to keywords. A function “iskeyword()” checks if a string is a keyword or not. Returns true if a string is a keyword, else returns false.


2 Answers

Ok, I ran the script I wrote for this SO question, with a couple of minor changes -- using log probabilities to avoid underflow, and modifying it to read multiple files as the corpus.

For my corpus I downloaded a bunch of files from project Gutenberg -- no real method to this, just grabbed all english-language files from etext00, etext01, and etext02.

Below are the results, I saved the top three for each combination.

expertsexchange: 97 possibilities
 -  experts exchange -23.71
 -  expert sex change -31.46
 -  experts ex change -33.86

penisland: 11 possibilities
 -  pen island -20.54
 -  penis land -22.64
 -  pen is land -25.06

choosespain: 28 possibilities
 -  choose spain -21.17
 -  chooses pain -23.06
 -  choose spa in -29.41

kidsexpress: 15 possibilities
 -  kids express -23.56
 -  kid sex press -32.65
 -  kids ex press -34.98

childrenswear: 34 possibilities
 -  children swear -19.85
 -  childrens wear -25.26
 -  child ren swear -32.70

dicksonweb: 8 possibilities
 -  dickson web -27.09
 -  dick son web -30.51
 -  dicks on web -33.63
like image 81
SquareCog Avatar answered Sep 28 '22 01:09

SquareCog


Might want to check out this SO question.

like image 40
Zed Avatar answered Sep 28 '22 03:09

Zed