English is the most orthographically difficult language written with the Roman alphabet, meaning, there are not always clues to a word’s pronunciation from its spelling, and the reverse. Only speakers of English can enjoy the perverse event of the grammar school Spelling Bee. I was so bad at spelling in school that I assumed it was a “Spelling Be.”
[ Note: Unfortunately, this is a problem that is well suited only for native speakers of English, or candidates who have a high degree of proficiency with English, and a large enough vocabulary that they will be able to understand the subtlety of what is being asked. ]
There are well known measurements of reading difficulty, and most of them make use of the concept of the ratio of long words to short words. In other languages, we could define a long word as one where the number of letters was greater than some boundary, but in English a long word is defined as one with more than a number of syllables, and that number is usually two or three.
Counting the syllables in an English word is an example of a task that is not trivial, not mindbogglingly complex, and not entirely algorithmic. By this string of “nots” I imply the following:
- Not trivial: (num-letters)/k, where k is a constant, fails with a lot of common words like neighbor.
- Not complex: the number of syllables can be reliably estimated (as opposed to exactly calculated) in a single calculation within a program that reads English text.
- Not algorithmic: It is an estimation: the only way to do it perfectly is with a dictionary that lists every word and the number of syllables it has.
There are no 100% correct ways to count the syllables in English words, but there are a number of ways that are more than 99% accurate for a piece of text, particularly when you consider that no passage of 100 words has 100 unique words. In fact, the seventy small words known as function words comprise about 40% of all written text, regardless of the writer, the writer’s fluency, or the subject matter.
Explain your method for counting the number of syllables in English words.
Note that this question is not the much more complicated typographic question “How do we exactly hyphenate words on a printed page so that we don’t wind up with travesties like ‘learn-t’?” Rather, this is the simpler problem, “How can we approximately count the number of syllables simply and reliably?”
Commentary on the solutions:
There are many approaches and refinements. Eight years ago I used this problem in a programming class I taught at VCU, and there were quite a few solutions submitted as answers. Among twenty students, there were about three major approaches and a dozen or so refinements. One student did submit a (num-letters)/k solution that was generally effective. I was impressed that he had tested his approach with a number of passages of text, and had experimentally determined a value that worked as well as this approach can work.
Rather than present some particular solution here and thereby make it seem like there is only one worth using, here are some features that a probably to be found in solutions that are worth considering:
- A strategy for dealing with and recognizing silent letters.
- A consideration of the distribution of vowels and consonants.
- Counting the letters in a word.
- Making sure the algorithm ensures that every word has at least one syllable.
- Pairs of letters.