[ Previous | Next | Table of Contents | Index | Library Home | Legal | Search ]

Files Reference


LC_CTYPE Category for the Locale Definition Source File Format

Purpose

Defines character classification, case conversion, and other character attributes.

Description

The LC_CTYPE category of a locale definition source file defines character classification, case conversion, and other character attributes. This category begins with an LC_CTYPE category header and terminates with an END LC_CTYPE category trailer.

All operands for LC_CTYPE category statements are defined as lists of characters. Each list consists of one or more semicolon-separated characters or symbolic character names.

The following keywords are recognized in the LC_CTYPE category. In the descriptions, the term automatically included means that an error does not occur if the referenced characters are included or omitted. The characters will be provided if they are missing and will be accepted if they are present.

copy Specifies the name of an existing locale to be used as the definition of this category. If a copy statement is included in the file, no other keyword can be specified.
upper Defines uppercase letter characters. No character defined by the cntrl, digit, punct, or space keyword can be specified. At a minimum, the uppercase letters A-Z must be defined.
lower Defines lowercase letter characters. No character defined by the cntrl, digit, punct, or space keyword can be specified. At a minimum, the lowercase letters a-z must be defined.
alpha Defines all letter characters. No character defined by the cntrl, digit, punct, or space keyword can be specified. Characters defined by the upper and lower keywords are automatically included in this character class.
digit Defines numeric digit characters. Only the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9 can be specified.
alnum Defines alphanumeric characters. No character defined by the cntrl, punct, or space keyword can be specified. Characters defined by the alpha and digit keywords are automatically included in this character class.
space Defines whitespace characters. No character defined by the upper, lower, alpha, digit, graph, cntrl, or xdigit keyword can be specified. At a minimum, the <space>, <form-feed>, <newline>, <carriage return>, <tab>, and <vertical-tab> characters, and any characters defined by the blank keyword, must be specified.
cntrl Defines control characters. No character defined by the upper, lower, alpha, digit, punct, graph, print, xdigit, or space keyword can be specified.
punct Defines punctuation characters. A character defined as the <space> character and characters defined by the upper, lower, alpha, digit, cntrl, or xdigit keyword cannot be specified.
graph Defines printable characters, excluding the <space> character. If this keyword is not specified, characters defined by the upper, lower, alpha, digit, xdigit, and punct keywords are automatically included in this character class. No character defined by the cntrl keyword can be specified.
print Defines printable characters, including the <space> character. If this keyword is not specified, the <space> character and characters defined by the upper, lower, alpha, digit, xdigit, and punct keywords are automatically included in this character class. No character defined by the cntrl keyword can be specified.
xdigit Defines hexadecimal digit characters. The digits 0-9 and the letters A-F and a-f can be specified. The xdigit keyword defaults to its normal class limits.
blank Defines blank characters. If this keyword is not specified, the <space> and <horizontal-tab> characters are included in this character class. Any characters defined by this statement are automatically included in the space keyword class.
charclass Defines one or more locale-specific character class names as strings separated by semicolons. Each named character class can then be defined subsequently in the LC_CTYPE definition. A character class name consists of at least one, and at most 32 bytes, of alphanumeric characters from the portable character set symbols. The first character of a character class name cannot be a digit. The name cannot match any of the LC_CTYPE keywords defined in this section.
charclass-name Defines characters to be classified as belonging to the named locale-specific character class. Locale-specific named character classes need not exist in the POSIX locale.

If a class name is defined by a charclass keyword, but no characters are subsequently assigned to it, it represents a class without any characters belonging to it.

The charclass-name can be used as the Property parameter in the wctype subroutine, in regular expressions and shell pattern-matching expressions, and by the tr command.

toupper Defines the mapping of lowercase characters to uppercase characters. Operands for this keyword consist of semicolon-separated character pairs. Each character pair is enclosed in ( ) (parentheses) and separated from the next pair by a , (comma). The first character in each pair is considered lowercase; the second character is considered uppercase. Only characters defined by the lower and upper keywords can be specified.
tolower Defines the mapping of uppercase characters to lowercase characters. Operands for this keyword consist of semicolon-separated character pairs. Each character pair is enclosed in ( ) (parentheses) and separated from the next pair by a , (comma). The first character in each pair is considered uppercase; the second character is considered lowercase. Only characters defined by the lower and upper keywords can be specified.

The tolower keyword is optional. If this keyword is not specified, the mapping defaults to the reverse mapping of the toupper keyword, if specified. If the toupper and tolower keywords are both unspecified, the mapping for each defaults to that of the C locale.

The LC_CTYPE category does not support multicharacter elements. For example, the German sharp-s character is traditionally classified as a lowercase letter. There is no corresponding uppercase letter; in proper capitalization of German text, the sharp-s character is replaced by the two characters ss. This kind of conversion is outside of the scope of the toupper and tolower keywords.

Examples

The following is an example of a possible LC_CTYPE category listed in a locale definition source file:

LC_CTYPE
#"alpha" is by default "upper" and "lower"
#"alnum" is by default "alpha" and "digit"
#"print" is by default "alnum", "punct" and the space character
#"graph" is by default "alnum" and "punct"
#"tolower" is by default the reverse mapping of "toupper"
#
upper           <A>;<B>;<C>;<D>;<E>;<F>;<G>;<H>;<I>;<J>;<K>;<L>;<M>;\
                <N>;<O>;<P>;<Q>;<R>;<S>;<T>;<U>;<V>;<W>;<X>;<Y>;<Z>
#
lower           <a>;<b>;<c>;<d>;<e>;<f>;<g>;<h>;<i>;<j>;<k>;<l>;<m>;\
                <n>;<o>;<p>;<q>;<r>;<s>;<t>;<u>;<v>;<w>;<x>;<y>;<z>
#
digit           <zero>;<one>;<two>;<three>;<four>;<five>;<six>;\
                <seven>;<eight>;<nine>
#
space           <tab>;<newline>;<vertical-tab>;<form-feed>;\
                <carriage-return>;<space>
#
cntrl           <alert>;<backspace>;<tab>;<newline>;<vertical-tab>;/
                <form-feed>;<carriage-return>;<NUL>;<SOH>;<STX>;/
                <ETX>;<EOT>;<ENQ>;<ACK>;<SO>;<SI>;<DLE>;<DC1>;<DC2>;/
                <DC3>;<DC4>;<NAK>;<SYN>;<ETB>;<CAN>;<EM>;<SUB>;/
                <ESC>;<IS4>;<IS3>;<IS2>;<IS1>;<DEL>
#
punct           <exclamation-mark>;<quotation-mark>;<number-sign>;\
                <dollar-sign>;<percent-sign>;<ampersand>;<asterisk>;\
                <apostrophe>;<left-parenthesis>;<right-parenthesis>; 
                <plus-sign>;<comma>;<hyphen>;<period>;<slash>;/
                <colon>;<semicolon>;<less-than-sign>;<equals-sign>;\
                <greater-than-sign>;<question-mark>;<commercial-at>;\
                <left-square-bracket>;<backslash>;<circumflex>;\
                <right-square-bracket>;<underline>;<grave-accent>;\
                <left-curly-bracket>;<vertical-line>;<tilde>;\
                <right-curly-bracket>
#
xdigit          <zero>;<one>;<two>;<three>;<four>;<five>;<six>;\
                <seven>;<eight>;<nine>;<A>;<B>;<C>;<D>;<E>;<F>;\
                <a>;<b>;<c>;<d>;<e>;<f>
#
blank           <space>;<tab>
#
toupper  (<a>,<A>);(<b>,<B>);(<c>,<C>);(<d>,<D>);(<e>,<E>);\
                (<f>,<F>);(<g>,<G>);(<h>,<H>);(<i>,<I>);(<j>,<J>);\
                (<k>,<K>);(<l>,<L>);(<m>,<M>);(<n>,<N>);(<o>,<O>);\
                (<p>,<P>);(<q>,<Q>);(<r>,<R>);(<s>,<S>);(<t>,<T>);\
                (<u>,<U>);(<v>,<V>);(<w>,<W>);(<x>,<X>);(<y>,<Y>);\
                (<z>,<Z>)
#
END LC_CTYPE

Implementation Specifics

This category of the locale definition source file format is part of the Base Operating System (BOS) Runtime.

Files


/usr/lib/nls/loc/* Specifies locale definition source files for supported locales.
/usr/lib/nls/charmap/* Specifies character set description (charmap) source files for supported locales.

Related Information

The locale command, localedef command, tr command.

The wctype subroutine.

Character Set Description (charmap) Source File Format , Locale Definition Source File Format , Locale Method Source File Format .

For specific information about other locale categories and their keywords, see the LC_COLLATE category, LC_MESSAGES category, LC_MONETARY category, LC_NUMERIC category, and LC_TIME category for the locale definition source file format.

Changing Your Locale and Understanding the Locale Definition Source File in AIX 5L Version 5.1 System Management Concepts: Operating System and Devices.


[ Previous | Next | Table of Contents | Index | Library Home | Legal | Search ]