Unicode categories

This page summarizes the supported Unicode categories. To refer to a category in a lexical rule, use the construct: uc{Name}. The following table lists the supported Unicode character categories:

L Letter
Lu Letter, Uppercase
Ll Letter, Lowercase
Lt Letter, Titlecase
Lm Letter, Modifier
Lo Letter, Other
M Mark
Mn Mark, Non-spacing
Mc Mark, Spacing combining
Me Mark, Enclosing
N Number
Nd Number, Decimal Digit
Nl Number, Letter Includes Roman numerals
No Number, Other
P Punctuation
Pc Punctuation, Connector Includes the underscore
Pd Punctuation, Dash Includes hyphen characters
Ps Punctuation, Open Opening brackets
Pe Punctuation, Close Closing brackets
Pi Punctuation, Initial quote Opening quotation mark
Pf Punctuation, Final quote Closing quotation mark
Po Punctuation, Other
S Symbol
Sm Symbol, Math
Sc Symbol, Currency
Sk Symbol, Modifier
So Symbol, Other
Z Separator
Zs Separator, Space Includes the ASCII spaces
Zl Separator, Line Only the U+2028 LINE SEPARATOR
Zp Separator, Paragraph Only the U+2029 PARAGRAPH SEPARATOR
C Other
Cc Other, Control
Cf Other, Format
Cs Other, Surrogate
Co Other, Private Use
Cn Other, Not assigned