World Library  
Flag as Inappropriate
Email this Article

Combining Grapheme Joiner

Article Id: WHEBN0007271660
Reproduction Date:

Title: Combining Grapheme Joiner  
Author: World Heritage Encyclopedia
Language: English
Subject: Combining Diacritical Marks, Naming conventions (Unicode) (draft), Control characters, Unicode anomaly, Old Turkic (Unicode block)
Collection: Control Characters, Unicode Special Code Points
Publisher: World Heritage Encyclopedia
Publication
Date:
 

Combining Grapheme Joiner

The combining grapheme joiner (CGJ), U+034F ͏ combining grapheme joiner (HTML ͏) is a Unicode character that has no visible glyph and is "default ignorable" by applications. Its name is a misnomer and does not describe its function; the character does not join graphemes.[1] Its purpose is to separate characters that should not be considered digraphs.

For example, in a Hungarian language context, adjoining characters c and s would normally be considered equivalent to the cs digraph. If they are separated by the CGJ, they will be considered as two separate graphemes.

It is also needed for complex scripts. For example, in most cases the Hebrew cantillation accent Metheg is supposed to appear to the left of the vowel point and by default most display systems will render it like this even if it is typed before the vowel. But in some words in Biblical Hebrew the Metheg appears to the right of the vowel, and to tell the display engine to render it properly on the right, CGJ must be typed between the Metheg and the vowel. Compare:

he + pathah + metheg הַֽ
he + metheg + pathah הַֽ
he + metheg + CGJ + pathah הֽ͏ַ

(The examples in the table may not be supported if you don't have a font that properly supports Hebrew cantillation display. Ezra SIL SR is recommended.)

In the case of several consecutive combining diacritics, an intervening CGJ indicates that they should not be subject to canonical reordering.[2]

Compare to this the "zero-width non-joiner" (as it were a space mark of width zero) at U+200C in the General Punctuation range.

  1. ^ http://unicode.org/notes/tn27/
  2. ^ http://www.unicode.org/versions/Unicode6.0.0/ch16.pdf

External links

  • Unicode FAQ - Characters and Combining Marks
  • Unicode FAQ - Normalization
This article was sourced from Creative Commons Attribution-ShareAlike License; additional terms may apply. World Heritage Encyclopedia content is assembled from numerous content providers, Open Access Publishing, and in compliance with The Fair Access to Science and Technology Research Act (FASTR), Wikimedia Foundation, Inc., Public Library of Science, The Encyclopedia of Life, Open Book Publishers (OBP), PubMed, U.S. National Library of Medicine, National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health (NIH), U.S. Department of Health & Human Services, and USA.gov, which sources content from all federal, state, local, tribal, and territorial government publication portals (.gov, .mil, .edu). Funding for USA.gov and content contributors is made possible from the U.S. Congress, E-Government Act of 2002.
 
Crowd sourced content that is contributed to World Heritage Encyclopedia is peer reviewed and edited by our editorial staff to ensure quality scholarly research articles.
 
By using this site, you agree to the Terms of Use and Privacy Policy. World Heritage Encyclopedia™ is a registered trademark of the World Public Library Association, a non-profit organization.
 



Copyright © World Library Foundation. All rights reserved. eBooks from World eBook Library are sponsored by the World Library Foundation,
a 501c(4) Member's Support Non-Profit Organization, and is NOT affiliated with any governmental agency or department.