Working with Text |
You can categorize characters according to their properties. For instance, 'X' is an upper case letter and '4' is a decimal digit. Checking character properties is a common way to verify the data entered by end-users. If you are selling books online, for example, your order entry screen should verify that the characters in the quantity field are all digits.
Developers who aren't used to writing global software might determine a character's properties by comparing it with character constants. For instance, they might write code like this:
The preceding code is wrong because it works only with English text. To internationalize the previous example, replace it with the following statements:char ch; ... // This code is WRONG! if ((ch >= 'a' && ch <= 'z') || (ch >= 'A' && ch <= 'Z')) // ch is a letter ... if (ch >= '0' && ch <= '9') // ch is a digit ... if ((ch == ' ') || (ch == '\n') || (ch == '\t')) // ch is a whitespaceThechar ch; ... // This code is OK! if (Character.isLetter(ch)) ... if (Character.isDigit(ch)) ... if (Character.isSpaceChar(ch))Character
methods rely upon the Unicode Standard for determining the properties of a character. Unicode is a 16-bit character encoding that supports the world's major languages. In the Java programming language,char
values represent Unicode characters. If you check the properties of achar
with the appropriateCharacter
method, your code will work with all major languages. For example, theCharacter.isLetter
method returnstrue
if the character is a letter in Chinese, German, Arabic, or some other language.The following list gives some of the most useful the
Character
comparison methods. TheCharacter
API documentation fully specifies the methods.
isDigit
isLetter
isLetterOrDigit
isLowerCase
isUpperCase
isSpaceChar
isDefined
If you want to restrict your digit characters to the ISO-Latin-1 characters 0 - 9, then you should not use the
Character.isDigit
method. TheisDigit
method returnstrue
for characters that represent numbers in many languages. For example, it returnstrue
for Tamil digits, which are in the Unicode range\0BE7
-\u0BEF
. To verify that a character is an ISO-Latin-1 digit, check it like this:if (ch >= '0' && ch <= '9') // ch is an ISO-Latin-1 digitThe
Character.getType
method returns the Unicode category of a character. Each category corresponds to a constant defined in theCharacter
class. For instance,getType
returns theCharacter.UPPERCASE_LETTER
constant for the character 'A'. For a complete list of the category constants returned bygetType
, see theCharacter
API documentation. The following example shows how to usegetType
and theCharacter
category constants. All of the expressions in theseif
statements aretrue
:if (Character.getType('a') == Character.LOWERCASE_LETTER) ... if (Character.getType('R') == Character.UPPERCASE_LETTER) ... if (Character.getType('>') == Character.MATH_SYMBOL) ... if (Character.getType('_') == Character.CONNECTOR_PUNCTUATION)
Working with Text |