my top 5 “most difficult spelling systems” list

English orthography is a beautifully inconsistent, inconsistently beautiful, sadistic excuse for a “system.”  My favorite example is what happens when you add letters to the word “tough.”  (The symbols on the right are how the words are pronounced, and even if you don’t understand what they mean exactly, you can see that more changes than just a single character)
  • tough /tʌf/
  • though /ðoʊ/
  • through /θɹu:/  (notice, so far, not a single sound in common)
  • thought /θɔ:t/ (or in my dialect, /θɑ:t/)
  • thorough /ˈθʊɹoʊ/
So, there are at least four different vowels and sometimes an /f/ represented by <ough>, and there are two different sounds represented by the <th>.  Add words like “laugh,” “taught,” “draught,” and “drought” and you start to see how this spirals out of control really quickly.  Part of the problem is that we have, depending on the dialect, anywhere between about 35 and 45 distinct sounds (called phonemes), and yet a mere 26 letters to write them. Linguists, very appropriately, call that a “defective script,” and English is astonishingly defective.  Of the 26 letters we do have, several overlap in strange Venn-diagram-like sound mappings such as s-c-k-q-x.  It’s kind of mind-boggling really to think about this, but there is literally one sound in the entire English language that is written with one and only one group of letters, and that’s the so-called “soft th” /ð/ in “though” from above.  The letters <th> can represent multiple different sounds (e.g. Thomas), but the phoneme /ð/ can be written only one way.  Every other phoneme in English has at least two, sometimes more than 10, and, in the case of the sound /eɪ/ as in bay, pain, base, bass, ballet, obey, dossier, resume, or even résumé, etc., more than 20 different graphical representations!  So you can imagine that, if you were a speaker of another language trying to learn English, you might assume that native speakers had devised the system as some sort of cruel joke to keep foreigners from ever mastering it.
To every rule, there is some exception.  For every word like “photograph,” there are words like “haphazard” and “Stephen.”  Chore?  Choir and charade.  Singer?  Finger and angst.  Bureau?  Bureaucracy and beauty. Of course there are hundreds of homophones that are spelled (or spelt) differently but are pronounced the same, like write/rite/right, prince/prints, or soared/sword.  Those are decently common across languages.  It is quite rare, however, to find a language with anywhere near English’s number of heteronyms that are spelled the same and yet pronounced differently, such as having a gaping wound as opposed to being wound up, the bow of a ship or a bow and arrow.  It is not a moderate task to moderate discussions of English spelling.  Everyone knows about our silent “e,” but then there are things like silent “b” (debt, comb), silent “p” (psychology, pneumonia), silent “t” (castle, listen, soften, not to mention words that vary like “often”), silent “s” (island, debris), silent “l” (salmon, talk), and it just goes on and on like that.  It is genuinely pretty hellish, but are other languages any easier?  Are there even more horrific systems?  How does English orthography really stack up against the rest of the world’s written codes?  Well, first we need to survey the landscape and see what’s out there.
Before that, though, if you’re hearing the word “orthography” for the first time, it simply means the whole system of writing.  It comes from the Greek stems orthós-, meaning “standard,” “legal,” or “correct” (think “orthodox”), and -graphéin, meaning “to write.”  Linguists tend to use the term “orthography” rather than “spelling” for two main reasons: first and foremost, it makes us sound smarter and thus feel better about ourselves.  Second and almost equally important is the notion that a strict definition of “spelling” is relatively narrow in scope.  Spelling means the way in which words are written using letters and diacritics (which are accent marks like in résumé), and therefore doesn’t necessarily apply to all languages that have writing.  In contrast, orthography is a very broad term that encompasses many aspects of writing.  Here’s a simple graph:
Put simply, spelling is one part of orthography, but spelling is not everything you’d have to learn in trying to master English writing.  If I “spell out” some of the additional features that orthography covers, it might give a good idea of how daunting it could appear to would-be learners:
  • Punctuation and other unpronounced characters
    • Some languages don’t (or at least, didn’t) use punctuation at all, which might make an adjustment to a fully-punctuated system more taxing. Where there is punctuation, rules on usage are often quite complex, and they differ across languages, even between sister languages like French « Allons-y! » and Spanish «¡Vamos!»
    • Languages that use punctuation often use it to contrast meaningfully different phrases, like the famous “A woman without her man is useless” and “A woman: without her, man is useless.”  Those differences aren’t universal; they have to be learned
    • Emoticons and other non-letter characters like ellipses and dashes often carry meaning or perform important discourse functions, especially in informal contexts.  Anyone who has ever tried learning the stupefying array of Japanese emoticons knows the true meaning of despair. I mean, for crying out loud, there’s a specific emoticon for the act of playing volleyball! (/o^)/ °⊥ \(^o\)  When I first got that in a text message, I didn’t read that as it was intended: an invitation to go to the beach. I thought someone had to go to the hospital
  • Orientation
    • Just to list some examples, English, Russian, and Inuktitut are written horizontally from left to right
    • Arabic and Hebrew are written horizontally right to left
    • Classical Mongolian is written vertically left to right
    • Chữ Nôm (Old Vietnamese) is written vertically right to left
    • Modern Mandarin and Japanese are written in multiple different orientations depending on the format and level of formality
    • Egyptian Hieroglyphs were prototypically horizontal and right to left, but varied based on other stylistic factors and some characters were read in their own special order
    • There are yet other ways of orienting writing, too
  • Penmanship, scripts, and written styles
    • There are sometimes large differences between the way (or ways) in which things are written in the same language depending on the people, place, purpose, and time.  When people study a Chinese language, for example, they have to learn that type-font characters like 書道 can appear radically different in hand-written forms, but they have to recognize that the intent is the same.  Beginning learners of English are often confused with the Times New Roman “g” and “a/ɑ”
    • Block-print and cursive writing are other good examples in English, but even within cursive, there are different styles like Spencerian and D’Nealian, and there are often idiosyncrasies across languages
      • French cursive “1” often has the hook extending all the way down such that, to North American eyes, it looks almost like “Λ”
      • Japanese teachers of English were taught for many years to write “s” with a hook such that many still write it as “ʂ,” which is a different letter in other languages
  • Alternate characters and characters for special purposes
    • In English, accountants and mathematicians often write 0 (as well as 7 and z) with additional slashes to disambiguate or prevent fraud, but Ø is a separate letter in Swedish, for example, so Swedish accountants tend to write 0 with a dot in the center instead.  Similarly, many Japanese and Chinese legal documents write the numbers 1, 2, and 3 as “壱、弐、参” instead of the more general “一、二、三”
    • Roman numerals (like MMXII) are another part of our writing system that has to be learned in order to be fully literate
For some languages, parts of orthography fall into spelling, too, such as:
  • Spacing
    • Some languages put spaces between words like modern Korean, others like Japanese and Chinese do not
    • Some languages or language varieties that put spaces between words write compound words as one continuous sequence. German is probably the most famous example with its words like “Geschwindigkeitsbegrenzung,” which English spaces out into “speed limit”
    • English is wildly inconsistent on this one, though. We write fireman and highway in unbroken strings, fire truck and high school with spaces. Rollerskate, roller-skate, and roller skate are all well-attested in corpora, and even the snootiest of dictionaries often have multiple listings
  • Capitalization
    • Most languages are unicameral, meaning that they don’t have upper- and lower-case letters, but even closely-related languages that are bicameral often differ wildly in this respect; English doesn’t capitalize every noun anymore, but German still does
    • English is perhaps unique among languages in requiring that the pronoun “I,” but not “we,” “you,” or any other pronoun for that matter, is always capitalized
    • Conventions change over time. If you have an old version of Word (which is capitalized) and you run a spellcheck (no space), you might get prompted to capitalize words like “internet,” but that’s no longer the case (if you got that last pun, you can join me in tears of shame)
    • We can capitalize words mid-sentence for emphasis or to make a contrast, such as religion/Religion, or we can sometimes use ALL CAPS
    • French, English, and several other languages also have an interesting trend in the opposite direction, writing common acronyms in all lower-case (e.g. HIV/AIDS in French is “sida”; the word “radar” started as an acronym)
Even after all that, the most important, and in some ways the most obvious difference between the terms “orthography” and “spelling” is that “orthography” can refer to more types of grapheme systems. (A grapheme is just a unit of writing)  For example, it would be a little strange to talk about “spelling” for logographic writing systems like Chinese characters.  There are different ways to represent characters in Chinese, and to be sure, there are strict rules for how to write them with correct stroke order and such, but the characters don’t directly or consistently represent sounds per se.  In fact, there are quite a number of different types of written systems:
  • Alphabets like Georgian (there are actually three different Georgian alphabets) or Korean, where each character (or component part of the character) represents usually only one sound, although even the purest ones like Korean have exceptions and situational rules
    • English is alphabetic in a sense, but there are only two letters that consistently represent only one sound, <v> and <q>, both of those represent sounds that are sometimes also written using other letters, and there are an increasingly large number of foreign loanwords that break even those patterns
  • Abjads like Arabic or Hebrew, where the consonant sounds are written but some or all of the vowels are left unspecified, and the reader has to figure it out from context
  • Syllabaries where each character represents a whole syllable, not just a sound, which can be further divided into:
    • Abugidas like Ge’ez in Ethiopia where relations between related syllables are shown by added marks or changes to the same base character.  For example, in Inuktitut, the language of the Inuit people in North America, /ki/, /ka/, /ku/ are written ᑭ, ᑲ, ᑯ
    • “Arbitrary syllabaries” like Cherokee or Japanese kana where there is no relationship between similar sounds. Using the same syllables, /ki/ /ka/ /ku/ in Japanese hiragana are written き,か,く
  • There are yet other types of absolutely crazy hybrid systems like Mayan that used logograms and syllabics and plentiful rebuses (think of things like “gr8” for “great”) all together in the same characters.  How Mayans were able to read without having an aneurysm is beyond me.
So after all that, there’s quite a lot to consider, really, when we try to compare written systems in terms of difficulty level.  In truth, any objective comparison is flat-out impossible.  How could we compare English to a system like Hong Kong Cantonese that has more than 20,000 characters in common usage, some of which are just brutal like 戲劇 (which means “movie”), but which are nevertheless regular, represent units of meaning and not always sound, are pronounced almost always in only one way, and are composed of a limited set of simpler parts?   Put simply, we can’t.  They’re completely different challenges.  So, instead, I’ve arbitrarily limited my list to languages that either use syllabaries or alphabets, I’ve kept the list to languages that are alive today (otherwise, in my opinion, Mayan is hands-down the most astonishingly complex orthography ever devised), and I’ve excluded exceedingly rare languages or scripts like Afaka.  The remaining list is, I think understandably, not perfect, but I challenge you to come up with a better one, haha!
Last note: I have a few runners-up.  First, Thai orthography is pretty rough.  There are several characters borrowed from Sanskrit that are pronounced the same as other Thai letters but which are not interchangeable.  Thai is also interesting in that it marks tone in writing.  Its Eastern neighbour, Khmer, from which large parts of Thai script are derived, is also highly complex and irregular, with lots of variability depending on the surrounding sounds.  Both of these are, however, not quite so irregular and sadistic to make it into my subjectively-judged top 5.  The reason is simple, and also explains why English does make the list: spelling reform.
Most languages have undergone at least one, and often multiple spelling reforms, usually because the government or another authoritative body wants to standardize the language and modernize it.  Languages change over time, and pronunciation in particular is highly variable, but ink blots printed on a page don’t often change shape in response to social trends.  In French, the Académie Française was created during the time of Louis XIV to standardize the language.  They publish dictionaries, make recommendations for school curricula, and have helped to rein in the chaos (note: rein, not reign, or rain).  The French language gets a lot of flack for words like “ils accueillent” where half the letters are silent, and then words like “lent” where most of those same letters are pronounced, but once you understand a few rules about grammatical categories, reading is not so bad.  Linguists call this a distinction between “encoding” (writing) and “decoding” (making sense of that writing, usually reading), and while French is difficult to encode, the decoding process is much more doable.  That doable-ness is largely thanks to the reforms enforced (sometimes even through violence!) by the French powers that be.  Khmer, Thai, Spanish, Russian, Italian, Swedish, Mandarin, Korean, Hindi, Mongolian, indeed a great many of the world’s languages have undergone systemic reforms to try to make the writing system more regular than it was before.  In fact, another “runner-up” for me would be Danish, which is a particularly interesting example since it’s so closely related to Swedish but hasn’t gone through the same kinds of spelling reforms.  As for English, there have been multiple attempts, but the only major influential spelling reform (Noah Webster’s) only succeeded in creating a chasm between two separate standards with equally absurd inconsistencies and idiosyncrasies.  Hopefully, that provides enough context and caveats.  Here, finally, is my top 5 crazy spelling systems:
  • Uyghur has four completely separate alphabets that are all standard in modern usage, and historically there have been quite a few others. It’s one of very few languages based on Persian to obligatorily represent vowels, and there are quite a few of them: /y/ /ɪ/ /ø/ /æ/ /ɑ/ /u/ /e/ /o/, but the real irregularities come from its many Chinese loanwords that are often quite difficult to distinguish from other words and follow their own set of phonological rules
4) Burmese
  • The fact that this language’s orthography doesn’t match up with its pronunciation is a point of pride for some Burmese nationals.  There are even different words for “written language” and “spoken language” that mark the distinction.  In many ways, literate Burmese people can be said to practice diglossia, the command of multiple different dialects for different functions.  The spelling system is more or less regular viewed from the inside, but its difficulties and irregularities come from being written in stone hundreds of years ago and far removed from what has happened with the language since then
3) Irish Gaelic
  • The language is growing after it had declined in previous generations, but the writing system reflects its diverse and chaotic history.  There are digraphs and complicated allophonic rules up the wazoo, such that the word for Prime Minister, Taoiseach, is pronounced /ˈt̪ˠiːʃəx/.  A combination of having no standard spelling until the mid-20th century and huge dialectal variation, especially in vowels, has created cases where the orthography is regular in some regions for some words, and in others for other words, but nowhere for all of them
2) English
  • Truly, English is among the craziest spelling systems in the world.  Ruth Shemesh and Sheila Waller’s book explains a great deal of the subtle regularities, and I highly recommend it, but even they can’t make sense of the huge number of exceptions that continue to grow by the day.  There are some theorists who believe that English is becoming more and more like a logographic system, where basically each word has to be memorized as one chunky symbol with component parts, rather than analyzing each word internally from left to right.  That’s not far off, and yet English, I think, still only gets the silver medal in my book
1) Japanese
  • Interestingly, many of the same historical reasons for English’s, shall we say, “diversity” of spelling rules are shared by Japanese: several, chronologically-disparate waves of mass importation of foreign words, several of which use entirely foreign writing systems that were only sometimes, and then only partially, regularized.  Japanese uses three largely independent writing systems together, or increasingly four if you include roma-ji, which many academics do because of words like t-shirt, “Tシャツ,” and acronyms.  Two of those systems are mostly faithful phonetic syllabaries, but there are exceptions like particles.  The third system has more than 2,000 characters in common usage each with numerous, largely unpredictable readings depending on when and where the word originated.  All three are commonly used together in the same sentence, and combinations of two are often together in the same word: サボる (to skip class), アメリカ的 (American-style).  Pitch accent, which is contrastive for most dialects, isn’t marked anywhere (e.g. the “three hashi”: 端、橋、箸), and all the while many other characters are written with several different variants despite no meaningful phonological or even semantic contrast in any dialect (again, I’ll use “hashi”: 橋、槗).  Like English, there are tons of heteronyms, sometimes among quite frequent words like 甘い(umai, delicious / amai, sweet), 辛い(karai, spicy / tsurai, painful, or here’s a better definition), and then we get into proper nouns and fossilized expressions and what little regularity was left in the system breaks down completely
So, in my opinion, when we ask how bad English spelling is compared to other languages, the answer is: among the worst.  There are other, frighteningly complex systems out there, to be sure, but English finds a way to take its deceptively simple 26 letters and make the absolute most it can out of them.  As a final note, I think it’s important to say that, by “worst,” I don’t mean to say that we should look down on English orthography, or Japanese orthography for that matter.  In fact, in my mind, I could have equally replaced the “most difficult” in the title with “most interesting.”  Irish, English, Uyghur, Burmese, Japanese, and even French are fascinatingly rich and complex, and indeed in many ways I think they are tremendously valuable in their idiosyncrasies.  The spelling of a word in these languages contains an immense amount of information; we can know just by looking at a word like “know” that it came from German, whereas a word like “ascertain” came from Latin through French, which in turn tells us more about the connotations of those words, the usage patterns, and the history of English-speaking people.  We’d have “no” way (or at least, no immediately apparent visual way) of doing that if we spelled no and know identically.  These “top 5” are a testament to society and language’s ability to evolve and thrive within the infinitely complex interactions of people and peoples, and while that does imply a lot of baggage, I for one am not upset about having all that stuff to cart around.  I like stuff.
So yes, English spelling is a handful, but it’s not alone, and that’s a good thing.  Perhaps, instead of denouncing others for their poor spelling of a choice few words, we should in fact celebrate the fact that people get so many other highly irregular, largely nonsense spellings correct!  (Like, for instance, “people”)  In the very least, if you deal with foreign learners of English, I hope you can sympathize (or even sympathise) with their struggle.  They truly do have it pretty bad.

in defence (with a ‘c’) of ranting

In responce to cajoling from my peers, I’m starting up a blog as a public location where I can hopefully kickstart some productive discussion and thinking about language, its use, and the teaching and learning thereof.  If nothing else, this blog would serve some purpoce by allowing me a cathartic outlet to my often irrational, often disproportionate reactions to the various happenstanses, positive and negative, I experiense.  I can’t promice any cogent thematic elements or a regular schedule of posting–such is not really the spirit of ranting, and vicariously such is not the intended spirit of this blog.  Ranting by definition entails the semi-spontaneous, often emotionally charged, admittedly often even egocentric expression of in-the-moment thoughts. When exactly those moments occur, and whether they will inspire me to seek solase through this blog, I couldn’t say.  But if past behaviour is even a remotely useful predictor, I might post rather often.

By now, hopefully you’ve notised the pattern of the italicised words.  I had originally planned a different entry altogether, but then when I started writing the title, I got caught up in the longstanding litigious business of “Defence v. Defense.”  Let’s clarify a few things on this one, sinse, as with all too many things concerning language, there is a great deal of misinformation out there, and a great many laymen who, despite knowing very little, argue very passionately for one side or the other.

The word’s etymology follows an extremely common pattern for English words.  It traces back to the Latin “defensum,” which meant “something forbidden, defended against.”  From there it entered French, where that connotation is still very strong (i.e. “défense de fumer” for “No smoking”), and eventually made its way into English.  In French it’s still spelled with an ‘s’ at the end, and in both English and French it has historically had quite a large variety of spellings: deffans, deffenz, desfens are all attested amongst the various langues d’oïl, whereas the Brits have used diffens, diffense, diffence, and difence.  That raises the question, though, of why Anglophones seemingly uniquely decided to spell the word with a ‘c’ instead of the ‘s’ that all the other Romanse languages were using.

As with pretty much everything else in English orthography, there’s a rather simple underlying logic that ended up causing a highly complicated and idiosyncratic pattern.  We use -s in English as an affix to mark both plural nouns and third person singular subject-verb agreement (as in one pen, two pens; I pen a novel, he/she pens a novel).  When we add -s to words that end in a nasal like ‘n,’ it typically is pronounced [z].  That’s unlike French or Latin, where the additional -s either isn’t pronounced at all, or when it is, it’s pronounced with the hard [s] sound.  In English, though, defense/diffens/diffense and its ilk doesn’t end with [nz].  At some point, English writers noticed that the overwhelming majority of words that ended in -ce were unambiguously pronounced with the [s] sound, and so for many of the potentially troublesome pairs, we came up with the idea to use -ce or -s in contrast to mark the word’s intended rendering.  See, for example, the following pairs:

pens / pence — ones / once — hens / hence — fens / fence

We also have singular nouns that end in [nz] like “lens,” and singular nouns like “dance” that end with the sound [ns].  There are additionally pairs like lands / lance, and that last one is particularly relevant.

In addition to the noun “defence,” English has the verb “to defend,” which becomes “he/she/it defends,” and now we have our pair: defends/defence.  (That “d” is almost imperceptible, and often we insert a [d] or [t] sound in between an [n] and [s]/[z] naturally as the result of the tongue’s movement between the two positions. When English people listen to the two words, we pay much more attention to the voicing ([s] v. [z]) than we do to the “presense” or “absense” of the ‘d’.)

So, problem solved! Right?  Now we have two clear spellings for two different sounds.  Not quite though.  We continued borrowing words from French and Latin, which almost exclusively used the ‘s’ version, like “tense,” “sense,” and “suspense,” and those don’t use the -ce innovation.  We add additional morphology to words, creating things like “defensive,” which not even the OED spells “defencive.”  So the distinction gets really blurred.  We end up making decisions on an almost word-by-word, morpheme-by-morpheme basis, which change over time.  Very famously, the American spelling of the word in question was influenced by Webster’s dictionary, where he came down on the side of -se.

And so it remains today, where Americans usually spell the noun “defense,” and the rest of the English-speaking world sticks with “defence,” excepting instances where both groups are, understandably, confused.  As a proud Canuck, I’m going to stick with the “Canadian” spelling of the word, defence, in so much as there even is such a thing as a unified Canadian spelling system.  (The careful reader, or at least the overly nationalistic reader, probably already noticed spellings like “behaviour” and “italicised”)

If someone starts going on about how their particular choice of “defence” or “defense” is in any way “better,” “more systematic,” or especially, as I was once told, “more academic” than the other, at least now you can know that it’s a bunch of self-righteous hot air.  Neither spelling rule really works out in the end, and it becomes, as with so much else in language, a matter of choice to conform to group norms, whether the group or the norm actually exists or only exists in the mind of the language user.  In my mind, that’s justification enough.  Excuse me for being defencive about it.