Java Encodings and Charset (from java.sun.com)
The classes java.io.InputStreamReader, java.io.OutputStreamWriter,
java.lang.String, and classes in the java.nio.charset package can
convert between Unicode and a number of other character encodings.
The supported encodings vary between different implementations of the
Java 2 platform. The java.lang
package specification (http://java.sun.com/j2se/1.4.2/docs/api/java/lang/package-summary.html#charenc) and the
class description for java.nio.charset.Charset (http://java.sun.com/j2se/1.4.2/docs/api/java/nio/charset/Charset.html)
list the encodings that any
implementation of the Java 2 platform, Standard Edition, v. 1.4.2 is
required to support.
Sun's Java 2 Software Development Kit, Standard Edition, v. 1.4.2
for all platforms (SolarisTM
operating environment, Linux, and Microsoft Windows) and the Java 2
Runtime Environment, Standard Edition, v. 1.4.2 for Solaris and Linux
support all encodings shown on this page. Sun's Java 2 Runtime
Environment, Standard Edition, v. 1.4.2 for Windows may be installed
as a complete international version or as a European languages
version. The J2RE installer by default installs a European languages
version if it recognizes that the host operating system only supports
European languages. If the installer recognizes that any other
language is needed, or if the user requests support for non-European
languages in a customized installation, a complete international
version is installed. The European languages version only supports
the encodings shown in the first table. The international version
(which includes the lib/charsets.jar file) supports all
encodings shown on this page.
The following tables show the encoding sets supported by J2SE
1.4.2. The canonical names used by the new java.nio APIs are in many
cases not the same as those used in the java.io and java.lang APIs.
Basic Encoding Set (contained in
lib/rt.jar)
Supported by java.nio, java.io and java.lang
APIs
Canonical Name for java.nio API
|
Canonical Name for java.io and java.lang API
|
Description
|
US-ASCII
|
ASCII
|
American Standard Code for Information Interchange
|
windows-1250
|
Cp1250
|
Windows Eastern European
|
windows-1251
|
Cp1251
|
Windows Cyrillic
|
windows-1252
|
Cp1252
|
Windows Latin-1
|
windows-1253
|
Cp1253
|
Windows Greek
|
windows-1254
|
Cp1254
|
Windows Turkish
|
windows-1257
|
Cp1257
|
Windows Baltic
|
ISO-8859-1
|
ISO8859_1
|
ISO 8859-1, Latin Alphabet No. 1
|
ISO-8859-2
|
ISO8859_2
|
Latin Alphabet No. 2
|
ISO-8859-4
|
ISO8859_4
|
Latin Alphabet No. 4
|
ISO-8859-5
|
ISO8859_5
|
Latin/Cyrillic Alphabet
|
ISO-8859-7
|
ISO8859_7
|
Latin/Greek Alphabet
|
ISO-8859-9
|
ISO8859_9
|
Latin Alphabet No. 5
|
ISO-8859-13
|
ISO8859_13
|
Latin Alphabet No. 7
|
ISO-8859-15
|
ISO8859_15
|
Latin Alphabet No. 9
|
KOI8-R
|
KOI8_R
|
KOI8-R, Russian
|
UTF-8
|
UTF8
|
Eight-bit UCS Transformation Format
|
UTF-16
|
UTF-16
|
Sixteen-bit UCS Transformation Format, byte order
identified by an optional byte-order mark
|
UTF-16BE
|
UnicodeBigUnmarked
|
Sixteen-bit Unicode Transformation Format, big-endian
byte order
|
UTF-16LE
|
UnicodeLittleUnmarked
|
Sixteen-bit Unicode Transformation Format, little-endian
byte order
|
Not available
|
UnicodeBig
|
Sixteen-bit Unicode Transformation Format, big-endian
byte order, with byte-order mark
|
Not available
|
UnicodeLittle
|
Sixteen-bit Unicode Transformation Format, little-endian
byte order, with byte-order mark
|
Extended Encoding Set (contained in
lib/charsets.jar)
Supported by java.nio, java.io and java.lang
APIs
Canonical Name for java.nio API
|
Canonical Name for java.io and java.lang API
|
Description
|
windows-1255
|
Cp1255
|
Windows Hebrew
|
windows-1256
|
Cp1256
|
Windows Arabic
|
windows-1258
|
Cp1258
|
Windows Vietnamese
|
ISO-8859-3
|
ISO8859_3
|
Latin Alphabet No. 3
|
ISO-8859-6
|
ISO8859_6
|
Latin/Arabic Alphabet
|
ISO-8859-8
|
ISO8859_8
|
Latin/Hebrew Alphabet
|
windows-31j
|
MS932
|
Windows Japanese
|
EUC-JP
|
EUC_JP
|
JISX 0201, 0208 and 0212, EUC encoding Japanese
|
x-EUC-JP-LINUX
|
EUC_JP_LINUX
|
JISX 0201, 0208 , EUC encoding Japanese
|
Shift_JIS
|
SJIS
|
Shift-JIS, Japanese
|
ISO-2022-JP
|
ISO2022JP
|
JIS X 0201, 0208, in ISO 2022 form, Japanese
|
x-mswin-936
|
MS936
|
Windows Simplified Chinese
|
GB18030
|
GB18030
|
Simplified Chinese, PRC standard
|
x-EUC-CN
|
EUC_CN
|
GB2312, EUC encoding, Simplified Chinese
|
GBK
|
GBK
|
GBK, Simplified Chinese
|
ISCII91
|
ISCII91
|
ISCII91 encoding of Indic scripts
|
x-windows-949
|
MS949
|
Windows Korean
|
EUC-KR
|
EUC_KR
|
KS C 5601, EUC encoding, Korean
|
ISO-2022-KR
|
ISO2022KR
|
ISO 2022 KR, Korean
|
x-windows-950
|
MS950
|
Windows Traditional Chinese
|
x-MS950-HKSCS
|
MS950_HKSCS
|
Windows Traditional Chinese with Hong Kong extensions
|
x-EUC-TW
|
EUC_TW
|
CNS11643 (Plane 1-3), EUC encoding, Traditional Chinese
|
Big5
|
Big5
|
Big5, Traditional Chinese
|
Big5-HKSCS
|
Big5_HKSCS
|
Big5 with Hong Kong extensions, Traditional Chinese
|
TIS-620
|
TIS620
|
TIS620, Thai
|
Extended
Encoding Set (contained in lib/charsets.jar)
Supported by java.io and java.lang
APIs
Canonical Name
|
Description
|
Big5_Solaris
|
Big5 with seven additional Hanzi ideograph character
mappings for the Solaris zh_TW.BIG5 locale
|
Cp037
|
USA, Canada (Bilingual, French), Netherlands, Portugal,
Brazil, Australia
|
Cp273
|
IBM Austria, Germany
|
Cp277
|
IBM Denmark, Norway
|
Cp278
|
IBM Finland, Sweden
|
Cp280
|
IBM Italy
|
Cp284
|
IBM Catalan/Spain, Spanish Latin America
|
Cp285
|
IBM United Kingdom, Ireland
|
Cp297
|
IBM France
|
Cp420
|
IBM Arabic
|
Cp424
|
IBM Hebrew
|
Cp437
|
MS-DOS United States, Australia, New Zealand, South
Africa
|
Cp500
|
EBCDIC 500V1
|
Cp737
|
PC Greek
|
Cp775
|
PC Baltic
|
Cp838
|
IBM Thailand extended SBCS
|
Cp850
|
MS-DOS Latin-1
|
Cp852
|
MS-DOS Latin-2
|
Cp855
|
IBM Cyrillic
|
Cp856
|
IBM Hebrew
|
Cp857
|
IBM Turkish
|
Cp858
|
Variant of Cp850 with Euro character
|
Cp860
|
MS-DOS Portuguese
|
Cp861
|
MS-DOS Icelandic
|
Cp862
|
PC Hebrew
|
Cp863
|
MS-DOS Canadian French
|
Cp864
|
PC Arabic
|
Cp865
|
MS-DOS Nordic
|
Cp866
|
MS-DOS Russian
|
Cp868
|
MS-DOS Pakistan
|
Cp869
|
IBM Modern Greek
|
Cp870
|
IBM Multilingual Latin-2
|
Cp871
|
IBM Iceland
|
Cp874
|
IBM Thai
|
Cp875
|
IBM Greek
|
Cp918
|
IBM Pakistan (Urdu)
|
Cp921
|
IBM Latvia, Lithuania (AIX, DOS)
|
Cp922
|
IBM Estonia (AIX, DOS)
|
Cp930
|
Japanese Katakana-Kanji mixed with 4370 UDC, superset of
5026
|
Cp933
|
Korean Mixed with 1880 UDC, superset of 5029
|
Cp935
|
Simplified Chinese Host mixed with 1880 UDC, superset of
5031
|
Cp937
|
Traditional Chinese Host miexed with 6204 UDC, superset
of 5033
|
Cp939
|
Japanese Latin Kanji mixed with 4370 UDC, superset of
5035
|
Cp942
|
IBM OS/2 Japanese, superset of Cp932
|
Cp942C
|
Variant of Cp942
|
Cp943
|
IBM OS/2 Japanese, superset of Cp932 and Shift-JIS
|
Cp943C
|
Variant of Cp943
|
Cp948
|
OS/2 Chinese (Taiwan) superset of 938
|
Cp949
|
PC Korean
|
Cp949C
|
Variant of Cp949
|
Cp950
|
PC Chinese (Hong Kong, Taiwan)
|
Cp964
|
AIX Chinese (Taiwan)
|
Cp970
|
AIX Korean
|
Cp1006
|
IBM AIX Pakistan (Urdu)
|
Cp1025
|
IBM Multilingual Cyrillic: Bulgaria, Bosnia,
Herzegovinia, Macedonia (FYR)
|
Cp1026
|
IBM Latin-5, Turkey
|
Cp1046
|
IBM Arabic - Windows
|
Cp1047
|
Latin-1 character set for EBCDIC hosts
|
Cp1097
|
IBM Iran (Farsi)/Persian
|
Cp1098
|
IBM Iran (Farsi)/Persian (PC)
|
Cp1112
|
IBM Latvia, Lithuania
|
Cp1122
|
IBM Estonia
|
Cp1123
|
IBM Ukraine
|
Cp1124
|
IBM AIX Ukraine
|
Cp1140
|
Variant of Cp037 with Euro character
|
Cp1141
|
Variant of Cp273 with Euro character
|
Cp1142
|
Variant of Cp277 with Euro character
|
Cp1143
|
Variant of Cp278 with Euro character
|
Cp1144
|
Variant of Cp280 with Euro character
|
Cp1145
|
Variant of Cp284 with Euro character
|
Cp1146
|
Variant of Cp285 with Euro character
|
Cp1147
|
Variant of Cp297 with Euro character
|
Cp1148
|
Variant of Cp500 with Euro character
|
Cp1149
|
Variant of Cp871 with Euro character
|
Cp1381
|
IBM OS/2, DOS People's Republic of China (PRC)
|
Cp1383
|
IBM AIX People's Republic of China (PRC)
|
Cp33722
|
IBM-eucJP - Japanese (superset of 5050)
|
ISO2022_CN_CNS
|
CNS11643 in ISO 2022 CN form, Traditional Chinese
(conversion from Unicode only)
|
ISO2022_CN_GB
|
GB2312 in ISO 2022 CN form, Simplified Chinese
(conversion from Unicode only)
|
JISAutoDetect
|
Detects and converts from Shift-JIS, EUC-JP, ISO 2022 JP
(conversion to Unicode only)
|
MS874
|
Windows Thai
|
MacArabic
|
Macintosh Arabic
|
MacCentralEurope
|
Macintosh Latin-2
|
MacCroatian
|
Macintosh Croatian
|
MacCyrillic
|
Macintosh Cyrillic
|
MacDingbat
|
Macintosh Dingbat
|
MacGreek
|
Macintosh Greek
|
MacHebrew
|
Macintosh Hebrew
|
MacIceland
|
Macintosh Iceland
|
MacRoman
|
Macintosh Roman
|
MacRomania
|
Macintosh Romania
|
MacSymbol
|
Macintosh Symbol
|
MacThai
|
Macintosh Thai
|
MacTurkish
|
Macintosh Turkish
|
MacUkraine
|
Macintosh Ukraine
|
Additional Extended Encoding Set
If you need additional encoding charsets support, such like utf-7, contact us.
* Reference brought to you by
Bugzero, it's more than just bug tracking software!
|