The hardest language in the world – what are we talking about when we talk about complexity?

I recently came across a post in a language-learning community that presented a list of the 10 hardest languages in the world. I don’t remember exactly which languages were on the list the list, but I do recall that all of the languages were well-known, and that the list included Mandarin, Japanese and Danish. Sure, both Japanese and Mandarin work very differently from English in many domains, and use scripts (or a variety thereof, in the case of Japanese) that can take years to fully master. And sure, Danish phonology is a tough nut to crack by just about any standard. But surely, the 10 hardest languages in the world wouldn’t be, by some coincidence amongst those well known to us?

I thought I could name 10 languages off the top of my head that were all demonstrably harder than the languages on the list and flirted with the idea of sharing this in the comment section. I didn’t, because, as I immediately realised, my comment would be snowed under by other candidates for ’world’s hardest language’ – it was in fact French, one of them claimed, while another insisted it was Norwegian. As a rule I do not take part in internet discussions, especially when people claim Norwegian is the hardest language in the world. But it did give me the urge to clarify what constitutes a ‘hard’ language, and why establishing a realistic top ten of such languages would be notoriously difficult, if not impossible in and of itself. This post is about the difficulty with difficulty as a concept. While a number of readers were undoubtedly hoping for a simple and exciting answer, I claim that the question ultimately cannot be answered. I also argue that complexity is a more workable term than difficulty, and that (contrary to popular belief among linguists) some language are indeed more complex than others.

tumblr_o6zt1ugp6f1v8rqdko1_500

What is difficulty?
There is a difference between difficulty, which is a tricky concept in and of itself, and overall complexity. In quantifying difficulty, which is more about language learning, it ultimately depends on factors like talent, intelligence, memory, and so on how quickly one masters a second language. Moreover, the closeness of a language to your mother tongue in terms of vocabulary and structure matters as well. We at Fuzzy Grammar tend to communicate in English, but can easily speak our own language to one another without ever having to put much effort into learning each other’s language (we are Dutch and German respectively). For the average Turkish immigrant, however, learning Dutch or German is a daunting task. Likewise, Koreans are often baffled by how easy it is for Turkish immigrants to learn their language – Korean and Turkish are not demonstrably related, but do share many structural features such as SOV word order, strong agglutination and vowel harmony. We should therefore not work with difficulty as a notion, but with the complexity of a language, which is a potentially absolute concept (note: Trudgill 2001: 371-2 does in fact, surprisingly, seem to collapse difficulty into complexity).

What is complexity?
According to John McWhorter, “an area of grammar is more complex than the same area in another grammar to the extent that it encompasses more overt distinctions and/or rules than another grammar” (McWhorter 2005: 45; 2001: 136-7). It can mean, for instance, that a language has many marked phonemes or complex phonological rules. English, for example, has three plural allophones (ignoring pairs like mousemice and ox-oxen for the sake of convenience): /ᵻz/, /z/ and /s/, depending on the preceding sound. Dutch only has two: /s/ and /ən/. English is therefore more complex in this domain than Dutch, but less complex than, say, Tiv, which has more than 10.

It can also be that a language has more syntactic rules than another. German and Dutch are known for their unusual alternation between SVO word order in main clauses and SOV with a V2-rule in subordinate clauses. In most languages, there is little or no difference, so these languages are less complex in that particular domain than German and Dutch.

Another source of complexity is additional obligatory grammatical categories. Many South-American languages have obligatory evidentiality markers, the choice of which to use depends on the information source of the expressed proposition. Any language with such evidentiality markers is more complex than a hypothetical language that is exactly the same, except for lacking these markers.

But how do we measure complexity? More analytically than by using simple comparisons, of course. There is no generally accepted metric to measure complexity, but a number of attempts at making one have been made. McWhorter (2001) claims that creoles are simpler than non-creoles, but uses no metric and bases this primarily on a comparison between Tsez and Saramaccan. This is an unfortunate choice: while Saramaccan is arguably representative for most creoles, Tsez is a notoriously complex language. Parkvall (2008) uses a simple metric to measure the complexity of a large number of languages, partly as a response to McWhorter, and shows that it is indeed true that creoles are less complex than non-creoles. Nichols (2009) argues for a survey of complexity and sets out by proposing a rudimentary system that measures complexity.

The problem with measuring complexity is that complexity itself is a human concept and does not exist objectively in nature. Any attempt at measuring complexity therefore relies on our perception of the phenomenon, and no god-given, ’true metric’ really exists. While I do think complexity is a real thing, I think it is notoriously difficult to quantify. Take the example of Dutch vs. English plural marking, for instance. While English has three plural markers, Dutch has two, rendering English more complex. The English plural markers are very predictable, however: /ᵻz/ comes after sibilant consonants, /z/ after voiced non-sibilants and /s/ after voiceless non-sibilants. In Dutch the choice of plural marking is only partly predictable: /s/ comes after an unstressed syllable, /ən/ comes after a stressed syllable. But there are many exceptions to this rule, more so than is the case with the English plural. Does this make it more complex? Arguably it does. But how much more complex does it then become? How much complexity are such irregularities worth?

Are all languages equally complex?
Before, it was not only assumed that some languages are more complex than others, it was also assumed that some languages are simply superior to others. This idea was particularly popular during the romantic era, where scholars like Wilhelm von Humboldt and the Schlegel brothers considered a richness in forms (e.g. many case suffixes, rich verbal inflection) to be a sign of superiority, quite possibly due to the supposed superiority of Latin. Inherent to this mode of thought was the idea that languages reflect, or shape, the mind of its speakers. While scholars from each era were in opposed to this idea (e.g. the Neogrammarians towards the end of the 19th century), it persisted for quite a while. Indeed, I recently read the introduction to an 1850’s Zulu grammar book where it was stated that “like all backward cultures, Zulus disprefer sentences with many words”. We would at least expect the writer, a grammarian himself, to have acknowledged the fact that Zulu is a highly agglutinative language with heavily inflected verbs, and therefore does not need as many words as English speakers do, as most words themselves are laden with complexity.

It was not until the 20th century, likely due to a considerable (though unfortunately far from complete) decrease in racism, that the opposite idea became popular – all languages are equally complex, just as all humans are equal. In fact, it was the descriptivists in America, which were prominent during the first two thirds of the twentieth century, which assumed that languages did not differ in complexity. As Franz Boas famously said “There are oral languages, but no primitive languages. Each is complex in its own way”. On what grounds they believed this to be true is unclear, but it is easy to see why the idea is attractive; the descriptivists saw language as a reflection of its speakers’ minds, and to state that some languages are more complex than others is then easily interpreted as some cultures being inherently backward. The idea of uniformitarianism has been taken for truth ever since, but since no one has (to my knowledge) ever convincingly shown it to be true, it is worth investigating.

Let’s consider Boas’ statement and the idea behind it first. If we assume language reflects the mind of its speakers, it’s no wonder we wouldn’t want to say anything about their difference in complexity. Although I consider language to be a window into the human mind, I don’t think it’s fair to say that languages completely reflect the human mind. Architecture and dance are also windows into the human mind, but we are completely okay with saying some people build less complex houses or have more complex dance routines than others. Besides, primitiveness is not the same as reduced complexity; it is clear that all languages are able to fulfil their function, but this not the same as saying every language is equally complex. Thus there is a minimum amount of complexity needed for communication, but there is no reason why languages could not be more complex than that.

tumblr_o6zt3yRkMn1v8rqdko1_500

A more linguistic-based argument is known as the trade-off; high complexity in one language domain would lead to simplification in another. Languages like Finnish or Latin with elaborate cases systems also have freer word order, a reduction of complexity in syntax is thus ’bought’ by means of complex morphology.

An interesting contribution to the notion of trade-off comes from Bisang (2014). He demonstrates that languages can be covertly complex, which means that what is not expressed by overt forms has to be inferred. Bisang shows, for instance, that in the farmer kills the duckling in English (i) definiteness, (ii) number, (iii) tense, and (iv) agreement are obligatory, whereas in Mandarin none of these are. In Mandarin, therefore, this information has to be inferred rather than expressed. This dichotomy between overt and hidden complexity also involves the hearer, and is quite a well-known concept in linguistics; in Optimality Theory it is called faithfulness vs. markedness, in typology it is known as economy vs. iconicity, and it is also known by Levinson’s (2000) famous ”inference is cheap, articulation is expensive”. Another example of this is the ways languages mark plurality; a language can either mark both plural and singular overtly, just mark the plural (there are very few languages marking only the singular), or never mark number. Obviously, the latter option is efficient and the least complex structurally, but it potentially leaves the semantics of number open to interpretation (to the extent that this is not clear from the context, or quantifiers such as ‘two’ or ‘some’). The first option seems redundant to speakers of most languages, as the presence or absence of a marker, while about as good in distinguishing between two options as two separate markers, is also less economic in requiring additional material to be stored and uttered.

Another interesting addition comes from Hawkins’ (2004: 16-7, passim) efficiency principle Minimise Forms (MiF). Hawkins shows that languages vary in how many semantic roles may be assigned to a certain form. In English, for instance, a subject in the nominative form (that is, not accompanied by a preposition, and preceding the verb) may encode a theme (e.g. the book sold a thousand copies), an instrument (e.g. the key opened the door), or a location (e.g. this tent sleeps four). While in German these all need to be accompanied by prepositions that encode a certain semantic role, n English these examples do not need elements to introduce them, so this is arguably a reduction in complexity. Yet, we are dealing with a complex mapping of semantic roles onto a single type of constituent (subject) here. This was also concluded by Müller-Gotama, who investigated the transparency of subjects and objects in terms of which semantic roles they can take. He found that the less case marking a language had, the less transparent an argument generally was (Müller-Gotama 1994: 143). There is thus definitely some sort of trade-off here too.

What about this trade-off then?
As the previous section shows, there are direct trade-offs in languages in the sense that simplification in one domain results in complexification in another. There are a number of problems, however. For starters, even though we can demonstrate a trade-off, we stumble upon the same quantification problems mentioned earlier. As Sampson (2009) said:

Consider for instance Archi, spoken by a thousand people in one village 2,300 metres above sea level in the Caucasus. According to Aleksandr Kibrik [..], an Archi verb can inflect into any of about 1.5 million contrasting forms. English is said to be simple morphologically but more complex syntactically than some languages; but how much syntactic complexity would it take to balance the morphology of Archi? – and does English truly have that complex a syntax? Relative to some languages I know, English syntax as well as English morphology seems to be on the simple side.” (Sampson 2009: 3)

I agree with Sampson that although there is some trade-off, it’s definitely not true that this trade-off is absolute; there are languages that are simple in all domains, and there are languages that are more complex in all domains. So while McWhorter (2001) did not unequivocally demonstrate that creoles are necessarily simpler than non-creoles, he did demonstrate that one language can be more complex than another in all domains.

Secondly, there also seem to be traits in languages that are mainly (if not only) complexifying, without resulting in reduced complexity in another domain. A good example of this is the gender system found in many European languages. While arguably these served a semantic classifying function at some point in history, today’s remnants do not, and merely serve to complexify a language. Another example is the alternation between regular and irregular plurals, such as moose vs mooses and goose vs geese. When this alternation emerged it served an economical function, nowadays it is merely an irregularity. It therefore also seems to be the case that as speakers of a language we are left with the remnants of previous speakers’ striving for economy without gaining anything ourselves.

Thirdly, Bisang’s (2014) principle of hidden complexity sounds good, but it is difficult to demonstrate to what degree pragmatic inference actually takes place. It is tempting to say that a language that does not have a plural marker leaves the hearer guessing as to what number the speaker refers to, but it is not clear that number is as relevant to a speaker of such a language as it for an English speaker. And that’s a crucial point: when a language lacks an English-type category we say this has to be inferred pragmatically, but when English lacks a non-English-type category we would not. English, for instance, lacks evidential markers, but we wouldn’t say we pragmatically infer where a speaker got their information.

The idea of this essay is to demonstrate, within the space of a blog post, what we mean by difficulty and complexity, and how difficult it is to determine these. I also tried to show that there is indeed such a thing as complexity, but that it is notoriously difficult to quantify and to show exactly how much more complex languages are. While it is difficult enough to measure complexity, I also argued that showing every language displays exactly the same (somehow operationalised) amount of it is near impossible. So unless we can quantify complexity in all domains, and show that languages have an inherent complexity equilibrium, it seems that equicomplexity is more of an intuitive, well-meant idea than one we can actually work with.

As a bonus, here are 5 very difficult complex languages, In at least one domain:

-Georgian. According to some linguists, Georgian has the most complex inflection system in the world. The phonology is no walk in the park either.
-Iau. This Papuan Lakes Plain language has eight different tones that are lexically contrastive on nouns but which denote tense and aspect on verbs, on which they can also be combined.
– !Xóõ. This Khoisan language, according to most analyses, hold the record for the most consonants and the most clicks in the world. In addition, vowels can be plain, nasal, murmured, glottalised and strident.
-Bella Coola. A Salish language known for immensely complex morphology and extreme consonant clusters. The famous sequence [xɬpʼχʷɬtʰɬpʰɬːskʷʰt͡sʼ] ’he has in his possession a bunchberry plant’ is from Bella Coola.
-Archi. This Northeast Caucasian language really has it all. It has well over 70 consonants (as always, depending on the analysis), the vowels can be short, long and pharyngialised, and there are two tones. In addition, verbs can take into over a million forms (see above), and for nouns there is number and there are are 4 genders (although not marked on the noun itself), 10 cases and 5 ’locative cases’ which can all take one of the 6 directional affixes.

Sources
Bisang, W. (2014). Overt and Hidden Complexity: Two types of Complexity and their Implications. In Poznań Studies in Contemporary Linguistics 50 (2). 127–143.

Hawkins, J. (2004). Efficiency & Complexity in Grammars. Oxford: Oxford University

McWhorter, J. (2001). The World’s Simplest Grammars are Creole Grammars. In Linguistic Typology 5. 125-166.

McWhorter, J. H. 2005. Defining Creole. Oxford: Oxford University Press.

Nichols, J. (2009). Linguistic complexity: a comprehensive definition and survey. In Sampson, G., Gil, D. & Trudgill, P. (eds.) Language Complexity as an Evolving Variable. Oxford: University Press. 111-125.

Müller-Gotama, F. (1994). Grammatical relations: a cross-linguistic perspective on their syntax and semantics. Berlin: de Gruyter.

Parkvall, M. (2008). The simplicity of creoles in a cross-linguistic perspective. In Miestamo, M., Sinnemäki, K., & Karlsson, F. (eds.) Language Complexity. Tyology, contact, change. 265-286.

Sampson, G. (2009). A linguistic axiom challenged. In Sampson, G., Gil, D. & Trudgill, P. (eds.) In Language Complexity as an Evolving Variable. Oxford: University Press. 1-18.

Trudgill, P. (2001). Contact and simplification: historical baggage and directionality in linguistic change. In Linguistic Typology 5: 371–3.

The Lingivstiska Samlarbilder were taken from Mikael Parkvall’s collection. We kindly thank him for permission to use them.