Seed7 - The extensible programming language
Seed7 FAQ Manual Screenshots Examples Libraries Algorithms Download Links
Libraries AES ARC4 Array ASN.1 Bigfile Bigint Bigrat Bin32 Bin64 Bitdata Bitmapfont Bitset Bitsetof Boolean Browser Bstring Bytedata Cards Category CC config Cgi Cgi dialog Char Charsets Cipher Clib_file Color CLI Commands Complex Console Crc32 Deflate DES Dialog Dir Draw Duration Echo Editline Enable_input Enable_io Enable_output Encoding Environment External_file File File system Float Font Forloop Ftp Getf Gethttp Gethttps Graph Graph_file Gzip Hash Hashsetof Hmac Html_ent Httpserv Idxarray Image Inflate Inflate file Inifile Integer Keybd Line Listener LZW Make Math Msgdigest Null_file Osfiles Picture utility Pixmap_file Pixmapfont PKCS #1 Poll Process Progs Propertyfile Rational Reference Ref_list Scanfile Scanstri Set Shell Smtp Sockbase Socket SQL base Stdio Strifile String Stritext Subrange Tar Text Time TLS/SSL Triple DES Unicode Utf16 Utf8 Vectorfont 3D Vector Window Wrinum X.509 Xmldom Xml_ent Zip
Libraries
Charsets Source Code
 previous   up   next 

Constant Summary
string
cp_437
Conversion table from code page 437 to Unicode.
string
cp_708
Conversion table from code page 708 to Unicode.
string
cp_720
Conversion table from code page 720 to Unicode.
string
cp_737
Conversion table from code page 737 to Unicode.
string
cp_775
Conversion table from code page 775 to Unicode.
string
cp_850
Conversion table from code page 850 to Unicode.
string
cp_852
Conversion table from code page 852 to Unicode.
string
cp_855
Conversion table from code page 855 to Unicode.
string
cp_857
Conversion table from code page 857 to Unicode.
string
cp_858
Conversion table from code page 858 to Unicode.
string
cp_860
Conversion table from code page 860 to Unicode.
string
cp_861
Conversion table from code page 861 to Unicode.
string
cp_862
Conversion table from code page 862 to Unicode.
string
cp_863
Conversion table from code page 863 to Unicode.
string
cp_864
Conversion table from code page 864 to Unicode.
string
cp_865
Conversion table from code page 865 to Unicode.
string
cp_866
Conversion table from code page 866 to Unicode.
string
cp_869
Conversion table from code page 869 to Unicode.
string
cp_874
Conversion table from code page 874 to Unicode.
string
cp_1125
Conversion table from code page 1125 to Unicode.
string
cp_1250
Conversion table from code page 1250 to Unicode.
string
cp_1251
Conversion table from code page 1251 to Unicode.
string
cp_1252
Conversion table from code page 1252 to Unicode.
string
cp_1253
Conversion table from code page 1253 to Unicode.
string
cp_1254
Conversion table from code page 1254 to Unicode.
string
cp_1255
Conversion table from code page 1255 to Unicode.
string
cp_1256
Conversion table from code page 1256 to Unicode.
string
cp_1257
Conversion table from code page 1257 to Unicode.
string
cp_1258
Conversion table from code page 1258 to Unicode.
string
iso_8859_1
Conversion table from ISO-8859-1 (Latin-1) to Unicode.
string
iso_8859_2
Conversion table from ISO-8859-2 (Latin-2) to Unicode.
string
iso_8859_3
Conversion table from ISO-8859-3 (Latin-3) to Unicode.
string
iso_8859_4
Conversion table from ISO-8859-4 (Latin-4) to Unicode.
string
iso_8859_5
Conversion table from ISO-8859-5 to Unicode.
string
iso_8859_6
Conversion table from ISO-8859-6 to Unicode.
string
iso_8859_7
Conversion table from ISO-8859-7 to Unicode.
string
iso_8859_8
Conversion table from ISO-8859-8 to Unicode.
string
iso_8859_9
Conversion table from ISO-8859-9 (Latin-5) to Unicode.
string
iso_8859_10
Conversion table from ISO-8859-10 (Latin-6) to Unicode.
string
iso_8859_11
Conversion table from ISO-8859-11 to Unicode.
string
iso_8859_13
Conversion table from ISO-8859-13 (Latin-7) to Unicode.
string
iso_8859_14
Conversion table from ISO-8859-14 (Latin-8) to Unicode.
string
iso_8859_15
Conversion table from ISO-8859-15 (Latin-9) to Unicode.
string
iso_8859_16
Conversion table from ISO-8859-16 (Latin-10) to Unicode.
string
mac_os_roman
Conversion table from Mac OS Roman encoding to Unicode.
string
koi8_r
Conversion table from KOI8-R encoding to Unicode.
string
koi8_u
Conversion table from KOI8-U encoding to Unicode.
string
mik
Conversion table from MIK encoding to Unicode.
string
tis_620
Conversion table from TIS-620 encoding to Unicode.
string
armscii_8
Conversion table from ArmSCII-8 encoding to Unicode.
string
geostd8
Conversion table from GEOSTD8 encoding to Unicode.
string
viscii
Conversion table from VISCII encoding to Unicode.
string
ns_4551_1
Conversion table from NS 4551-1 encoding to Unicode.
string
cp_037
Conversion table from code page 37 to Unicode.
string
cp_273
Conversion table from code page 273 to Unicode.
string
cp_277
Conversion table from code page 277 to Unicode.
string
cp_280
Conversion table from code page 280 to Unicode.
string
cp_285
Conversion table from code page 285 to Unicode.
string
cp_297
Conversion table from code page 297 to Unicode.
string
cp_500
Conversion table from code page 500 to Unicode.
string
cp_1047
Conversion table from code page 1047 to Unicode.

Function Summary
void
conv2unicode (inout string: stri, in string: codePage)
Convert a string with bytes from a code page encoding to UTF-32.
void
conv2unicodeByName (inout string: stri, in var string: charset)
Convert a string from a charset encoding to UTF-32.

Constant Detail

cp_437

const string: cp_437

Conversion table from code page 437 to Unicode. Code page 437 is the character set of the original IBM PC.


cp_708

const string: cp_708

Conversion table from code page 708 to Unicode. Code page 708 was outlined by ASMO to write Arabic.


cp_720

const string: cp_720

Conversion table from code page 720 to Unicode. The MS-DOS code page 720 is used to write Arabic.


cp_737

const string: cp_737

Conversion table from code page 737 to Unicode. The MS-DOS code page 737 is used to write Greek language.


cp_775

const string: cp_775

Conversion table from code page 775 to Unicode. The MS-DOS code page 775 is used to write the Estonian, Lithuanian and Latvian languages.


cp_850

const string: cp_850

Conversion table from code page 850 to Unicode. The MS-DOS code page 850 is used to write Western European languages.


cp_852

const string: cp_852

Conversion table from code page 852 to Unicode. The MS-DOS code page 752 is used to write Central European languages that use Latin script, such as Bosnian, Croatian, Czech, Hungarian, Polish, Romanian, Serbian and Slovak.


cp_855

const string: cp_855

Conversion table from code page 855 to Unicode. The MS-DOS code page 855 is used to write Cyrillic script.


cp_857

const string: cp_857

Conversion table from code page 857 to Unicode. The MS-DOS code page 857 is used to write Turkish.


cp_858

const string: cp_858

Conversion table from code page 858 to Unicode. The MS-DOS code page 858 is used to write Western European languages.


cp_860

const string: cp_860

Conversion table from code page 860 to Unicode. The MS-DOS code page 860 is used to write Portuguese.


cp_861

const string: cp_861

Conversion table from code page 861 to Unicode. The MS-DOS code page 861 is used to write Icelandic language.


cp_862

const string: cp_862

Conversion table from code page 862 to Unicode. The MS-DOS code page 862 is used to write Hebrew.


cp_863

const string: cp_863

Conversion table from code page 863 to Unicode. The MS-DOS code page 863 is used to write French language.


cp_864

const string: cp_864

Conversion table from code page 864 to Unicode. The MS-DOS code page 864 is used to write Arabic.


cp_865

const string: cp_865

Conversion table from code page 865 to Unicode. The MS-DOS code page 865 is used to write Nordic languages.


cp_866

const string: cp_866

Conversion table from code page 866 to Unicode. The MS-DOS code page 866 is used to write Cyrillic script.


cp_869

const string: cp_869

Conversion table from code page 869 to Unicode. The MS-DOS code page 869 is used to write Greek language.


cp_874

const string: cp_874

Conversion table from code page 874 to Unicode. The Windows code page 874 is used for the Thai language.


cp_1125

const string: cp_1125

Conversion table from code page 1125 to Unicode. The code page 1125 is used for the Ukrainian language.


cp_1250

const string: cp_1250

Conversion table from code page 1250 to Unicode. The Windows code page 1250 encodes the Latin alphabet for Central and Eastern European languages, that use Latin script. It can be used for encoding German, Polish, Czech, Slovak, Hungarian, Slovene, Bosnian, Croatian, Serbian, Romanian and Albanian.


cp_1251

const string: cp_1251

Conversion table from code page 1251 to Unicode. The Windows code page 1251 encodes the Latin/Cyrillic alphabet. It can be used is for encoding Russian, Bulgarian, Serbian and Macedonian.


cp_1252

const string: cp_1252

Conversion table from code page 1252 to Unicode. The Windows code page 1250 encodes the Latin alphabet for Western European languages. The Windows code page 1252 is a superset of ISO 8859-1.


cp_1253

const string: cp_1253

Conversion table from code page 1253 to Unicode. The Windows code page 1253 encodes the Latin/Greek alphabet.


cp_1254

const string: cp_1254

Conversion table from code page 1254 to Unicode. The Windows code page 1254 covers the Turkish language.


cp_1255

const string: cp_1255

Conversion table from code page 1255 to Unicode. The Windows code page 1255 encodes the Latin/Hebrew alphabet.


cp_1256

const string: cp_1256

Conversion table from code page 1256 to Unicode. The Windows code page 1256 encodes the Latin/Arabic alphabet.


cp_1257

const string: cp_1257

Conversion table from code page 1257 to Unicode. The Windows code page 1257 covers the Baltic languages.


cp_1258

const string: cp_1258

Conversion table from code page 1258 to Unicode. The Windows code page 1258 covers the Vietnamese language.


iso_8859_1

const string: iso_8859_1

Conversion table from ISO-8859-1 (Latin-1) to Unicode. ISO-8859-1 is the character set for Western European languages. ISO-8859-1 defines the first 256 code point assignments in Unicode. It can be used for encoding Afrikaans, Albanian, Basque, Breton, Catalan, Danish, English, Faroese, Galician, German, Icelandic, Malay, Irish, Italian, Latin, Leonese, Luxembourgish, Norwegian, Occitan, Portuguese, Rhaeto-Romanic, Scottish Gaelic, Spanish, Swahili, Swedish and Walloon.


iso_8859_2

const string: iso_8859_2

Conversion table from ISO-8859-2 (Latin-2) to Unicode. ISO-8859-2 is the character set for Eastern European languages. It can be used for encoding Bosnian, Croatian, Czech, German, Hungarian, Polish, Serbian, Slovak, Slovene and Sorbian.


iso_8859_3

const string: iso_8859_3

Conversion table from ISO-8859-3 (Latin-3) to Unicode. ISO-8859-3 is the character set for South European languages. It can be used for encoding Turkish, Maltese and Esperanto.


iso_8859_4

const string: iso_8859_4

Conversion table from ISO-8859-4 (Latin-4) to Unicode. ISO-8859-4 is the character set for North European languages. It can be used for encoding Estonian, Latvian, Lithuanian, Greenlandic and Sami.


iso_8859_5

const string: iso_8859_5

Conversion table from ISO-8859-5 to Unicode. ISO-8859-5 is the character set for the Latin/Cyrillic alphabet. It can be used for encoding Bulgarian, Belarusian, Russian, Serbian and Macedonian.


iso_8859_6

const string: iso_8859_6

Conversion table from ISO-8859-6 to Unicode. ISO-8859-6 is the character set for the Latin/Arabic alphabet.


iso_8859_7

const string: iso_8859_7

Conversion table from ISO-8859-7 to Unicode. ISO-8859-7 is the character set for the Latin/Greek alphabet.


iso_8859_8

const string: iso_8859_8

Conversion table from ISO-8859-8 to Unicode. ISO-8859-8 is the character set for the Latin/Hebrew alphabet.


iso_8859_9

const string: iso_8859_9

Conversion table from ISO-8859-9 (Latin-5) to Unicode. ISO-8859-9 is the character set to cover the Turkish language.


iso_8859_10

const string: iso_8859_10

Conversion table from ISO-8859-10 (Latin-6) to Unicode. ISO-8859-10 is the character set to cover the Nordic languages.


iso_8859_11

const string: iso_8859_11

Conversion table from ISO-8859-11 to Unicode. ISO-8859-11 is the character set for the Latin/Thai alphabet.


iso_8859_13

const string: iso_8859_13

Conversion table from ISO-8859-13 (Latin-7) to Unicode. ISO-8859-13 is the character set to cover the Baltic languages.


iso_8859_14

const string: iso_8859_14

Conversion table from ISO-8859-14 (Latin-8) to Unicode. ISO-8859-14 is the character set to cover the Celtic languages. It can be used for encoding Irish, Manx, Scottish Gaelic, Welsh, Cornish and Breton.


iso_8859_15

const string: iso_8859_15

Conversion table from ISO-8859-15 (Latin-9) to Unicode. ISO-8859-15 is the character set for Western European languages. It can be used for encoding Afrikaans, Albanian, Breton, Catalan, Danish, Dutch, English, Estonian, Faroese, Finnish, French, Galician, German, Icelandic, Irish, Italian, Kurdish, Latin, Luxembourgish, Malay, Norwegian, Occitan, Portuguese, Rhaeto-Romanic, Scottish Gaelic, Scots, Spanish, Swahili, Swedish, Tagalog and Walloon.


iso_8859_16

const string: iso_8859_16

Conversion table from ISO-8859-16 (Latin-10) to Unicode. ISO-8859-16 is the character set for South-Eastern European languages. It can be used for encoding Albanian, Croatian, Hungarian, Polish, Romanian, Serbian and Slovenian, but also French, German, Italian and Irish Gaelic.


mac_os_roman

const string: mac_os_roman

Conversion table from Mac OS Roman encoding to Unicode.


koi8_r

const string: koi8_r

Conversion table from KOI8-R encoding to Unicode. KOI8-R is an encoding used for Russian and Bulgarian.


koi8_u

const string: koi8_u

Conversion table from KOI8-U encoding to Unicode. KOI8-U is an encoding used for Ukrainian and Belorussian.


mik

const string: mik

Conversion table from MIK encoding to Unicode. MIK is an encoding used for the Bulgarian language.


tis_620

const string: tis_620

Conversion table from TIS-620 encoding to Unicode. TIS-620 is the Thai Industrial Standard encoding for the Thai language.


armscii_8

const string: armscii_8

Conversion table from ArmSCII-8 encoding to Unicode. ArmSCII-8 is an encoding for the Armenian alphabet.


geostd8

const string: geostd8

Conversion table from GEOSTD8 encoding to Unicode. GEOSTD is an encoding for the Georgian language.


viscii

const string: viscii

Conversion table from VISCII encoding to Unicode. VISCII is the Vietnamese Standard Code for Information Interchange.


ns_4551_1

const string: ns_4551_1

Conversion table from NS 4551-1 encoding to Unicode. NS 4551 version 1 is the national variant of ISO 646 for Norway.


cp_037

const string: cp_037

Conversion table from code page 37 to Unicode. Code page 37 is an EBCDIC code page with the full ISO-8859-1 (Latin-1) character set. This codepage is used in Australia, Brazil, Canada, New Zealand, Portugal, South Africa and USA.


cp_273

const string: cp_273

Conversion table from code page 273 to Unicode. Code page 273 is an EBCDIC code page with the full ISO-8859-1 (Latin-1) character set. This codepage is used in Austria and Germany.


cp_277

const string: cp_277

Conversion table from code page 277 to Unicode. Code page 277 is an EBCDIC code page with the full ISO-8859-1 (Latin-1) character set. This codepage is used in Denmark and Norway.


cp_280

const string: cp_280

Conversion table from code page 280 to Unicode. Code page 280 is an EBCDIC code page with the full ISO-8859-1 (Latin-1) character set. This codepage is used in Italy.


cp_285

const string: cp_285

Conversion table from code page 285 to Unicode. Code page 285 is an EBCDIC code page with the full ISO-8859-1 (Latin-1) character set. This codepage is used in Ireland and the United Kingdom.


cp_297

const string: cp_297

Conversion table from code page 297 to Unicode. Code page 297 is an EBCDIC code page with the full ISO-8859-1 (Latin-1) character set. This codepage is used in France.


cp_500

const string: cp_500

Conversion table from code page 500 to Unicode. Code page 500 is an EBCDIC code page with the full ISO-8859-1 (Latin-1) character set. This codepage is international.


cp_1047

const string: cp_1047

Conversion table from code page 1047 to Unicode. Code page 1047 is an EBCDIC code page with the full ISO-8859-1 (Latin-1) character set. This codepage is used for Open Systems.


Function Detail

conv2unicode

const proc: conv2unicode (inout string: stri, in string: codePage)

Convert a string with bytes from a code page encoding to UTF-32. When the function is called stri is assumed to be a string of bytes. The byte sequence in stri is encoded with the specified code page. When the function is left stri contains an UTF-32 unicode string.


conv2unicodeByName

const proc: conv2unicodeByName (inout string: stri, in var string: charset)

Convert a string from a charset encoding to UTF-32. When the function is called stri is assumed to be a string of bytes encoded with the specified charset. When the function is left stri contains an UTF-32 unicode string. The 'charset' encoding is specified with an IANA/MIME charset name. This way the function can be used to convert encoded data for internet protocols such as NNTP.

Raises:
RANGE_ERROR - The charset unknown


 previous   up   next