Diacritic: Difference between revisions

From Citizendium
Jump to navigation Jump to search
imported>Domergue Sumien
imported>Ro Thorpe
mNo edit summary
 
(51 intermediate revisions by 2 users not shown)
Line 1: Line 1:
{{subpages}}
{{subpages}}
A '''diacritic''' or '''diacritic(al) mark''' or '''diacritic(al) sign''', in several [[writing system]]s,  is a little sign added on a character, modifying slightly this character, in order to give any information about the pronunciation or, sometimes, in order to distinguish a word from another word. For instance: the character '''e''' becomes '''é''', '''c''' becomes '''č''', '''o''' becomes '''ø''', '''s''' becomes '''ș''', '''ω''' becomes '''ώ''', '''и''' becomes '''й''', '''nh''' becomes '''n·h'''.
{{TOC|right}}
A '''diacritic''' or '''diacritic'''('''al''')''' mark''' or '''diacritic'''('''al''')''' sign''', in several [[writing system]]s,  is a little sign added on to a character, modifying it slightly, in order to give some information about its pronunciation or, sometimes, in order to distinguish one word from another. For instance: the character '''e''' becomes '''é''', '''c''' becomes '''č''', '''o''' becomes '''ø''', '''s''' becomes '''ș''', '''nh''' becomes '''n·h''', '''ω''' becomes '''ώ''', '''и''' becomes '''й''', '''ر''' becomes '''دّ'''.
 
A letter with a diacritic is called a '''modified letter'''.


==Concerned writing systems==
==Concerned writing systems==
Diacritics may occur in most writing systems.  
Diacritics may occur in most writing systems.  
* Some diacritics are unique to one writing system. For instance, the diacritic called [[shadda]], indicating that a consonant is geminate (doubled), is typical of the [[Arabic alphabet]]: ر ''(d)'' with a shadda becomes دّ ''(dd)'' .
* Some diacritics are unique to one writing system. For instance, the diacritic called [[shadda]], indicating that a consonant is geminate (doubled), is typical of the [[Arabic alphabet]]: '''ر''' ''(d)'' with a shadda becomes '''دّ''' ''(dd)'' .
* Several diacritics may be shared by different but resembling writing systems. It is notably the case for the [[Roman alphabet|Roman]], the [[Greek alphabet|Greek]] and the [[Cyrillic alphabet|Cyrillic]] alphabets, which can share the [[acute accent]] (´) and the [[dieresis]] (¨).
* Several diacritics may be shared by different but resembling writing systems. It is notably the case for the [[Roman alphabet|Roman]], the [[Greek alphabet|Greek]] and the [[Cyrillic alphabet|Cyrillic]] alphabets, which can share the [[acute accent]] (´) and the [[dieresis]] (¨).


==Examples of diacritics==
==Examples of diacritics==
Main diacritics found in the [[Roman alphabet|Roman]], [[Greek alphabet|Greek]] and [[Cyrillic alphabet|Cyrillic]] alphabets:
===[[Roman alphabet]]===
*[[accent]]
*[[accent]]
**[[acute accent]] (´): '''á, ć, é, ǵ, í, ń, ó, ŕ, ś, ú, ẃ, ý, ź'''...
**[[acute accent]] '''(´)''': '''á, ć, é, ǵ, í, ń, ó, ŕ, ś, ú, ẃ, ý, ź'''...
**[[grave accent]] (`): '''à, è, ì, ò, ù, ẁ, ỳ'''...
**[[grave accent]] '''(`)''': '''à, è, ì, ò, ù, ẁ, ỳ'''...
**[[double acute accent]] ( ˝ ): '''ő, ű'''...
**[[double acute accent]] '''( ˝ )''': '''ő, ű'''...
**[[circumflex accent]] ( ˆ ): '''â, ĉ, ê, ĝ, ĥ, î, ĵ, ô, ŝ, û, ŵ, ŷ, ẑ'''...
**[[circumflex accent]] '''(<sup>^</sup>)''': '''â, ĉ, ê, ĝ, ĥ, î, ĵ, ô, ŝ, û, ŵ, ŷ, ẑ'''...
*[[breve]] ( ˘ ): '''ă, ĕ, ğ, ĭ, ŏ, ŭ'''...
*[[breve]] '''( ˘ )''': '''ă, ĕ, ğ, ĭ, ŏ, ŭ'''...
*[[caron]] or [[haček]] ( ˇ ): '''č, ď (Ď), ě, ǧ, ň, ř, š, ť (Ť), ž'''...
*[[caron]] or [[haček]] '''( ˇ )''': '''č, ď (Ď), ě, ǧ, ň, ř, š, ť (Ť), ž'''...
*[[dieresis]] or [[umlaut]] (¨): '''ä, ë, ï, ö, ü, ÿ'''...
*[[dieresis]] or [[umlaut]] '''(¨)''': '''ä, ë, ï, ö, ü, ÿ'''...
*[[macron]] ( ¯ ): '''ā, ē, ī, ō, ū, ȳ'''...
*[[macron]] '''( ¯ )''': '''ā, ē, ī, ō, ū, ȳ'''...
*[[cedilla]] ( ¸ ): '''ç, ş'''...
*[[cedilla]] '''( ¸ )''': '''ç, ş'''...
*[[comma]] (,): '''ģ (Ģ), ķ, ļ, ņ, ș, ț'''...
*[[comma]] '''(,)''': '''ģ (Ģ), ķ, ļ, ņ, ș, ț'''...
*[[ogonek]] or [[nosinė]] ( ˛ ): '''ą, ę, į, ǫ, ų'''...
*[[ogonek]] or [[nosinė]] '''( ˛ )''': '''ą, ę, į, ǫ, ų'''...
*[[dot]]
*[[dot]]
**[[overdot]] (  ̇ ): '''ċ, ė, ż'''...
**[[overdot]] '''(  ̇ )''': '''ċ, ė, ġ, ż'''...
**[[underdot]] (&nbsp;  ̣ ): '''ạ, ḍ, ẹ, ḥ, ị, ọ, ṣ, ṭ, ụ, ẓ'''...
*** Note that the dot over '''i''' and '''j''' is not a diacritic mark and doesn't occur on uppercases ('''I, J'''). However, several [[Turkic languages]] distinguish a “dotted '''i'''” including the uppercase ('''İi''') and  “dotless '''ı'''” including the uppercase ('''Iı''').
**[[interpunct]] (·): '''ch·, g·, l·l, n·h, s·h'''...
**[[underdot]] '''(&nbsp;  ̣ )''': '''ạ, ḍ, ẹ, ḥ, ị, ọ, ṣ, ṭ, ụ, ẓ'''...
*[[hook]] or [[dấu hỏi]] (  ̉ ): '''ả, ɓ (Ɓ), ƈ, ɗ (Ɗ), ẻ, ƒ (Ƒ), ɠ (Ɠ), ỉ, ƙ (Ƙ), ɱ (Ɱ), ŋ (Ŋ), ỏ, ƥ (Ƥ), ƭ (Ƭ), ủ , ʋ (Ʋ), ⱳ, ỷ, ƴ, ȥ'''...
**[[interpunct]] '''(·)''': '''ch·, g·, l·l, n·h, s·h'''...
*[[horn]] or [[dấu móc]] (  ̛ ): '''ơ, ư'''...
*[[hook]] or [[dấu hỏi]] '''(  ̉ )''': '''ả, ɓ, ƈ, ɗ, ẻ, ƒ, ɠ, ỉ, ƙ, ŋ, ỏ, ƥ, ƭ, ủ , ʋ, ⱳ, ỷ, ƴ, ȥ'''...
*[[horn]] or [[dấu móc]] '''(  ̛ )''': '''ơ, ư'''...
*[[ring]]
*[[ring]]
**[[ring above]] or [[kroužek]] ( ˚ ): '''å, ů'''...
**[[ring above]] or [[kroužek]] '''( ˚ )''': '''å, ů'''...
**[[ring below]] ( ˳ ): '''ḁ'''...
**[[ring below]] '''( ˳ )''': '''ḁ'''...
*[[tilde]] (&nbsp; ̃ ): '''ã, ẽ, ĩ, ñ, õ, ũ'''...
*[[tilde]] '''(<sup>~</sup>)''': '''ã, ẽ, ĩ, ñ, õ, ũ'''...
*[[apostrophe]] (’): '''c’h, g’, o’'''...
*[[apostrophe]] '''(’)''': '''c’h''', '''ľ''', '''’s'''...
*[[stroke]] (/): '''ð (Ð), đ (Đ), ħ (Ħ), ł, ø'''...
*single opening [[quotation mark]] '''(‘)''': '''g‘, o‘'''...
*[[rough breathing]] or [[dasia]] (  )
*[[stroke]] '''(/)''': '''ð, đ, ħ, ł, ø'''...
*[[smooth breathing]] or [[psili]] (  ᾿ )
 
===[[Greek alphabet]]===
*[[accent]]
**[[acute accent]] '''(´)''': '''ά, έ, ή, ί, ό, ύ, ώ'''
**[[grave accent]] '''(`)''': '''ὰ, ὲ, ὴ, ὶ, ὸ, ὺ, ὼ'''
**''[[perispomene]]'', indifferently shaped like a Roman [[circumflex accent]] '''(<sup>^</sup>)''' or like a Roman [[tilde]] '''(<sup>~</sup>)''': '''ᾶ, ῆ, ῖ, ῦ, ῶ'''
*[[dieresis]] '''(¨)''': '''ϊ, ϋ'''
*[[iota subscript]] '''(ͺ)''': '''ᾳ, ῃ, ῳ'''
*[[breathing]]
**[[smooth breathing]] or [[psili]], resembling an apostrophe '''᾿ )''': '''ἀ, ἐ, ἠ, ἰ, ὀ, ὐ, ὠ, ῤ'''
**[[rough breathing]] or [[dasia]], resembling a right-oriented apostrophe '''῾  )''': '''ἁ, ἑ, ἡ, ἱ, ὁ, ὑ, ὡ, ῥ'''
Since 1982, diacritics have been simplified in modern [[Greek language|Greek]]: only the acute accent '''(´)''' and the dieresis '''(¨)''' are still mandatory.


==Modified letters==
==Status of modified letters==
A letter with a diacritic is called a ''modified letter''.  
A letter with a diacritic is called a ''modified letter''.  
* In some languages, a modified letter (with a diacritic) is considered as a simple variant of the basic letter (without diacritic). For instance, in Portuguese, ''ç'' is nothing but a variant of the letter ''c''.
* In some languages, a modified letter (with a diacritic) is considered as a simple variant of the basic letter (without diacritic). For instance, in Portuguese, ''ç'' is nothing but a variant of the letter ''c''.
Line 44: Line 59:
The quantitity and the frequency of diacritics may differ.  
The quantitity and the frequency of diacritics may differ.  


* Some languages have no diacritics at all in the current use. It is notably the case of [[English language|English]] and [[Malay language|Malay]] (although some diacritics may be seen in some borrowings, as in English ''café'' or ''cafe'', a word of French origin).
* A few languages have no diacritics at all in the general use. It is notably the case of [[English language|English]] and [[Malay language|Malay]] (although some diacritics may be used optionally in some borrowings, as in English ''café'' or ''cafe'', from French ''café'').


* A lot of languages use diacritics, which frequency varies a lot according to the language in question. For instance, diacritics are quite rare in [[Dutch language|Dutch]], which uses only ''ë'', and in [[Italian language|Italian]], which uses mainly ''à, è, é, ì, ò, ù''. On the opposite, other languages use a lot of different diacritics, sometimes placed on nearly each sentence or on nearly each word, as in [[Vietnamese language|Vietnamese]] or in classical [[Greek language|Greek]].
* A lot of languages use diacritics, which frequency varies a lot according to the language in question.  
** For instance, diacritics are quite rare in [[Dutch language|Dutch]], which uses sometimes ''ë'' (and rarely ''ä, ö, ï, ü''), and in [[Italian language|Italian]], which uses sometimes ''‑à, ‑è, ‑é, ‑ì, ‑ò, ‑ù'' at word ending.  
** On the opposite, other languages use a lot of different diacritics and may place them on nearly each word, as in [[Greek language|Greek]], [[Slovak language|Slovak]] or [[Czech language|Czech]], or even on each syllable, as in [[Vietnamese language|Vietnamese]] or [[Yoruba language|Yoruba]].


==Mandatory or optional uses==
==Diacritic affecting two characters==
Diacritics may be mandatory or optional, depending on the language in question.
In general, a diacritic affects one character.  


===Pedagogical use===
In a few languages, however, a diacritic may modify a group of letters, for instance:
Some languages use certain diacritics only as a pedagogical help and remove them in general use. For instance, [[Russian language|Russian]] only uses the [[acute accent]] (´) in learner-oriented publications, in order to show the place of the stress.
*The dieresis in [[Spanish language|Spanish]] ('''gu → gü''').
*The stroke in [[Maltese language|Maltese]] ('''gh → għ''').
*The cedilla in [[Manx language|Manx]] ('''ch → çh''').
*The interpunct in [[Francoprovençal language|Francoprovençal]] ('''ch → ch·''').
It occurs especially when the diacritic is placed between two letters, for example:
*The apostrophe in [[Breton language|Breton]] ('''ch → c’h''').
*The interpunct in [[Catalan language|Catalan]] ('''ll → l·l''') and in [[Occitan language|Occitan]] ('''nh → n·h''', '''sh → s·h''').


===Diacritics on uppercases===
==Diacritics avoided on uppercases==
In the writing systems which distinguish [[uppercase]] and [[lowercase]] letters, a few languages tend to use diacritics in general writings where lowercases and uppercases are mixed, but supress certain diacritics in all-uppercase sequences. This is a rule in [[Greek language|Greek]]; this is a frequent but nonstandard use in [[Spanish language|Spanish]] and [[French language|French]]: Greek ''νερό'' (''nero'', “water”) becomes ''ΝΕΡΟ'', Spanish ''águila'' (“eagle”) becomes ''ÁGUILA'' or less correctly ''AGUILA'', French ''côté'' (“side”) becomes ''CÔTÉ'' or less correctly ''COTE''.  
A few languages tend to avoid certain diacritics attached to [[uppercase]] letters, under certain circumstances.
*When a word begins with an uppercase (the rest of the word being in lowercase):
**In [[Greek language|Greek]], a diacritic is put above a lowercase but goes on the upper left side of an initial uppercase: ''ύφαλος'' (''ýfalos'') “underwater reef” becomes ''Ύφαλος''.
**Some users of [[French language|French]] remove diacritics on initial uppercases, but this is nonstandard: ''école'' “school” becomes ''École'' or less correctly ''Ecole''.
*In all-uppercase writings:
**In [[Greek language|Greek]], diacritics are removed in all-uppercase writings: ''ύφαλος'' (''ýfalos'') “underwater reef” becomes ''ΥΦΑΛΟΣ'', ''νερό'' (''neró'') “water” becomes ''ΝΕΡΟ''. However, the dieresis (¨) remains in all cases: ''Ταΰγετος'' (''Taÿ́getos'') “Taygetus” becomes ''ΤΑΫΓΕΤΟΣ''.
**In [[Spanish language|Spanish]] and [[French language|French]], some users remove diacritics in all-uppercase writings, but this is nonstandard: Spanish ''águila'' “eagle” becomes ''ÁGUILA'' or less correctly ''AGUILA'', French ''séquençage'' “sequencing” becomes ''SÉQUENÇAGE'' or less correctly ''SEQUENCAGE''. However, the Spanish tilde (<sup>~</sup>) remains in all cases: ''España'' “Spain” becomes ''ESPAÑA'' (never ''ESPANA*'').
**In [[italian language|Italian]], the only usual diacritic is an [[acute accent|acute]] or a [[grave accent]] at word ending. This accent may be replaced by an [[apostrophe]] on the upper right side of the last letter, but this is nonstandard: ''libertà'' “freedom” becomes ''LIBERTÀ'' or less correctly ''LIBERTA'''’'''''.


Some users of French keep diacritics on all-uppercase sequences but remove them on initial uppercases when followed by lowercases, so ''école'' “school” becomes ''École'' or less correctly ''Ecole''.
==Optional diacritics for pedagogical use==
 
Some languages use certain diacritics only as a pedagogical help and remove them in general use. For instance, [[Russian language|Russian]] only uses the [[acute accent]] (´) in learner-oriented publications, in order to show the place of the stress.
In [[italian language|Italian]], the only frequent diacritics are an [[acute accent]] and sometimes a [[grave accent]] on the last letter of a word. This accent may be replaced by an [[apostrophe]] at the right of the last letter, especially in all-uppercase sequences, but this is nonstandard: ''libertà'' (“freedom”) becomes ''LIBERTÀ'' or less correctly ''LIBERTA’''.

Latest revision as of 13:24, 11 November 2012

This article is a stub and thus not approved.
Main Article
Discussion
Related Articles  [?]
Bibliography  [?]
External Links  [?]
Citable Version  [?]
 
This editable Main Article is under development and subject to a disclaimer.

A diacritic or diacritic(al) mark or diacritic(al) sign, in several writing systems, is a little sign added on to a character, modifying it slightly, in order to give some information about its pronunciation or, sometimes, in order to distinguish one word from another. For instance: the character e becomes é, c becomes č, o becomes ø, s becomes ș, nh becomes n·h, ω becomes ώ, и becomes й, ر becomes دّ.

A letter with a diacritic is called a modified letter.

Concerned writing systems

Diacritics may occur in most writing systems.

  • Some diacritics are unique to one writing system. For instance, the diacritic called shadda, indicating that a consonant is geminate (doubled), is typical of the Arabic alphabet: ر (d) with a shadda becomes دّ (dd) .
  • Several diacritics may be shared by different but resembling writing systems. It is notably the case for the Roman, the Greek and the Cyrillic alphabets, which can share the acute accent (´) and the dieresis (¨).

Examples of diacritics

Roman alphabet

  • accent
  • breve ( ˘ ): ă, ĕ, ğ, ĭ, ŏ, ŭ...
  • caron or haček ( ˇ ): č, ď (Ď), ě, ǧ, ň, ř, š, ť (Ť), ž...
  • dieresis or umlaut (¨): ä, ë, ï, ö, ü, ÿ...
  • macron ( ¯ ): ā, ē, ī, ō, ū, ȳ...
  • cedilla ( ¸ ): ç, ş...
  • comma (,): ģ (Ģ), ķ, ļ, ņ, ș, ț...
  • ogonek or nosinė ( ˛ ): ą, ę, į, ǫ, ų...
  • dot
    • overdot ( ̇ ): ċ, ė, ġ, ż...
      • Note that the dot over i and j is not a diacritic mark and doesn't occur on uppercases (I, J). However, several Turkic languages distinguish a “dotted i” including the uppercase (İi) and “dotless ı” including the uppercase ().
    • underdot (  ̣ ): ạ, ḍ, ẹ, ḥ, ị, ọ, ṣ, ṭ, ụ, ẓ...
    • interpunct (·): ch·, g·, l·l, n·h, s·h...
  • hook or dấu hỏi ( ̉ ): ả, ɓ, ƈ, ɗ, ẻ, ƒ, ɠ, ỉ, ƙ, ŋ, ỏ, ƥ, ƭ, ủ , ʋ, ⱳ, ỷ, ƴ, ȥ...
  • horn or dấu móc ( ̛ ): ơ, ư...
  • ring
  • tilde (~): ã, ẽ, ĩ, ñ, õ, ũ...
  • apostrophe (’): c’h, ľ, ’s...
  • single opening quotation mark (‘): g‘, o‘...
  • stroke (/): ð, đ, ħ, ł, ø...

Greek alphabet

Since 1982, diacritics have been simplified in modern Greek: only the acute accent (´) and the dieresis (¨) are still mandatory.

Status of modified letters

A letter with a diacritic is called a modified letter.

  • In some languages, a modified letter (with a diacritic) is considered as a simple variant of the basic letter (without diacritic). For instance, in Portuguese, ç is nothing but a variant of the letter c.
  • In other languages, a modified letter may be considered as an independent letter, having its own place in the alphabet and being totally distinct from the diacritic-less letter. For instance, in Turkish, ç is a different letter from c.

Quantity and frequency

The quantitity and the frequency of diacritics may differ.

  • A few languages have no diacritics at all in the general use. It is notably the case of English and Malay (although some diacritics may be used optionally in some borrowings, as in English café or cafe, from French café).
  • A lot of languages use diacritics, which frequency varies a lot according to the language in question.
    • For instance, diacritics are quite rare in Dutch, which uses sometimes ë (and rarely ä, ö, ï, ü), and in Italian, which uses sometimes ‑à, ‑è, ‑é, ‑ì, ‑ò, ‑ù at word ending.
    • On the opposite, other languages use a lot of different diacritics and may place them on nearly each word, as in Greek, Slovak or Czech, or even on each syllable, as in Vietnamese or Yoruba.

Diacritic affecting two characters

In general, a diacritic affects one character.

In a few languages, however, a diacritic may modify a group of letters, for instance:

It occurs especially when the diacritic is placed between two letters, for example:

  • The apostrophe in Breton (ch → c’h).
  • The interpunct in Catalan (ll → l·l) and in Occitan (nh → n·h, sh → s·h).

Diacritics avoided on uppercases

A few languages tend to avoid certain diacritics attached to uppercase letters, under certain circumstances.

  • When a word begins with an uppercase (the rest of the word being in lowercase):
    • In Greek, a diacritic is put above a lowercase but goes on the upper left side of an initial uppercase: ύφαλος (ýfalos) “underwater reef” becomes Ύφαλος.
    • Some users of French remove diacritics on initial uppercases, but this is nonstandard: école “school” becomes École or less correctly Ecole.
  • In all-uppercase writings:
    • In Greek, diacritics are removed in all-uppercase writings: ύφαλος (ýfalos) “underwater reef” becomes ΥΦΑΛΟΣ, νερό (neró) “water” becomes ΝΕΡΟ. However, the dieresis (¨) remains in all cases: Ταΰγετος (Taÿ́getos) “Taygetus” becomes ΤΑΫΓΕΤΟΣ.
    • In Spanish and French, some users remove diacritics in all-uppercase writings, but this is nonstandard: Spanish águila “eagle” becomes ÁGUILA or less correctly AGUILA, French séquençage “sequencing” becomes SÉQUENÇAGE or less correctly SEQUENCAGE. However, the Spanish tilde (~) remains in all cases: España “Spain” becomes ESPAÑA (never ESPANA*).
    • In Italian, the only usual diacritic is an acute or a grave accent at word ending. This accent may be replaced by an apostrophe on the upper right side of the last letter, but this is nonstandard: libertà “freedom” becomes LIBERTÀ or less correctly LIBERTA.

Optional diacritics for pedagogical use

Some languages use certain diacritics only as a pedagogical help and remove them in general use. For instance, Russian only uses the acute accent (´) in learner-oriented publications, in order to show the place of the stress.