Jump to content

英文维基 | 中文维基 | 日文维基 | 草榴社区

Brevity law

From Wikipedia, the free encyclopedia

In linguistics, the brevity law (also called Zipf's law of abbreviation) is a linguistic law that qualitatively states that the more frequently a word is used, the shorter that word tends to be, and vice versa; the less frequently a word is used, the longer it tends to be.[1] This is a statistical regularity that can be found in natural languages and other natural systems and that claims to be a general rule.

The brevity law was originally formulated by the linguist George Kingsley Zipf in 1945 as a negative correlation between the frequency of a word and its size. He analyzed a written corpus in American English and showed that the average lengths in terms of the average number of phonemes fell as the frequency of occurrence increased. Similarly, in a Latin corpus, he found a negative correlation between the number of syllables in a word and the frequency of its appearance. This observation says that the most frequent words in a language are the shortest, e.g. the most common words in English are: the, be (in different forms), to, of, and, a; all containing 1 to 3 phonemes. He claimed that this Law of Abbreviation is a universal structural property of language, hypothesizing that it arises as a result of individuals optimising form-meaning mappings under competing pressures to communicate accurately but also efficiently.[2][3]

Since then, the law has been empirically verified for almost a thousand languages of 80 different linguistic families for the relationship between the number of letters in a written word & its frequency in text.[4] The Brevity law appears universal and has also been observed acoustically when word size is measured in terms of word duration.[5] 2016 evidence suggests it holds in the acoustic communication of other primates.[6]

Log per-million word count as a function of wordlength (number of characters) in the Brown Corpus, illustrating Zipf's Brevity Law.

The origin of this statistical pattern seems to be related to optimization principles and derived by a mediation between two major constraints: the pressure to reduce the cost of production and the pressure to maximize transmission success. This idea is very related with the principle of least effort, which postulates that efficiency selects a path of least resistance or "effort". This principle of reducing the cost of production might also be related to principles of optimal data compression in information theory.[7]

See also

[edit]

References

[edit]
  1. ^ Zipf GK. 1949 Human behavior and the principle of least effort. Cambridge, MA: Addison-Wesley
  2. ^ Zipf GK. 1935 The Psychobiology of language, an introduction to dynamic philology. Boston, MA: Houghton–Mifflin
  3. ^ Zipf GK. 1949 Human behavior and the principle of least effort. Cambridge, MA: Addison-Wesley
  4. ^ Bentz C, Ferrer-i-Cancho R. 2016 Zipf's Law of abbreviation as a language universal. Universitätsbibliothek Tübingen.
  5. ^ Tomaschek F, Wieling M, Arnold D, Baayen RH. 2013 Word frequency, vowel length and vowel quality in speech production: an EMA study of the importance of experience. In Proc. of the 14th Annual Conf. of the International Speech Communication Association (INTERSPEECH 2013), Lyon, France, 25–29 August (eds F Bimbot et al.), pp. 1302–1306
  6. ^ Gustison ML, Semple S, Ferrer-i-Cancho R, Bergman TJ. 2016 Gelada vocal sequences follow Menzerath's linguistic law. Proc. Natl Acad. Sci. USA 113, E2750-E2758
  7. ^ Kanwal J, Smith K, Culbertson J, Kirby S. 2017 Zipf's Law of abbreviation and the principle of least effort: language users optimise a miniature lexicon for efficient communication. Cognition 165, 45–52. (doi:10.1016/j.cognition.2017.05.001)