Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP Syllable Detection [closed]

Tags:

php

nlp

I would like to find a way to be able to split a word into syllables with PHP. For example, the word "nevermore" ran through detect_syllables(), would return "nev-er-more." Are there any good APIs or something out there?

like image 775
jeremy Avatar asked Apr 12 '11 22:04

jeremy


2 Answers

There's a useful PHd thesis paper by Frank Liang that describes an exceptionally accurate algorithm for this: written over 25 years ago, it's still valid. But I'm not aware of any implementation in PHP

EDIT

A quick google has identified this link to a Text Statistics library in PHP, which includes algorithms for syllable counting within words (among other readability measuring algorithms). You should be able to find the code for syllable splitting here.

like image 183
Mark Baker Avatar answered Sep 22 '22 08:09

Mark Baker


I'm actually in the finishing stages of making a PHP Hyphenator class based upon Frank Liang's algorithm and the TeX dictionaries, which pretty much seems to be the appoach taken by all office suites. (Actually I found this topic while looking for a good name for it that wasn't already taken). With slowly improving support from browsers for the ­ entity, it's becoming a realistic option to hyphenate content in websites.

Core functionality is working; splitting (and thus counting) and/or hyphenating text and/or HTML, parsing TeX hyphen dictionaries, caching those parsed dictionaries. Some planned features are still missing but nothing that stops you from using it. Also there's no good documentation, samples, formal unittest or vanity site yet.

I've created a github site for it here and will post the current version on it ASAP, so check back in a few days.

I've only tested it with Dutch (my native language) and US English, so it may still have some issues with languages using different character sets.

like image 23
Martijn van der Lee Avatar answered Sep 20 '22 08:09

Martijn van der Lee