non ascii char’s

February 23, 2010 | By greg

ISO-8859-1

ISO-8859-1 has been the default character set in most browsers.
The first 128 characters of ISO-8859-1 is the original ASCII character-set (the numbers from 0-9, the uppercase and lowercase English alphabet, and some special characters).

The second part of ISO-8859-1 (codes from 160-255) contains some characters used in Western European countries and some commonly used special characters.

Entities are used to implement reserved characters or to express characters that cannot easily be entered with the keyboard.

ISO 8859-1, more formally cited as ISO/IEC 8859-1 or less formally as Latin-1, is part 1 of ISO/IEC 8859, a standard character encoding defined by ISO. It encodes what it refers to as Latin alphabet no. 1, consisting of 191 characters from the Latin script, each encoded as a single 8-bit code value.

ISO/IEC 8859-1 suffers from a number of deficiencies, including the omission of a few French diacritics and the lack of a Euro symbol. For this reason ISO/IEC 8859-15 has been developed as an update of ISO/IEC 8859-1 to add the required additional characters. (This required however the removal of some less used characters from ISO/IEC 8859-1, including fraction symbols and letter-free diacritics: ¤, ¦, ¨, ´, ¸, ¼, ½ and ¾.)

The name Latin-1 is an informal alias unrecognized by ISO or the IANA, but is perhaps meaningful in some computer software.

The following table shows ISO-8859-1, with the 3-letter abbreviations for the control characters.

DCS
ISO-8859-1
x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 xA xB xC xD xE xF
0x NUL SOH STX ETX EOT ENQ ACK BEL BS HT LF VT FF CR SO SI

1x DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US

2x SP ! " # $ % & ' ( ) * + , - . /

3x 0 1 2 3 4 5 6 7 8 9 : ; < = > ?

4x @ A B C D E F G H I J K L M N O

5x P Q R S T U V W X Y Z [ \ ] ^ _

6x ` a b c d e f g h i j k l m n o

7x p q r s t u v w x y z { | } ~ DEL

8x PAD HOP BPH NBH IND NEL SSA

ESA HTS HTJ VTS PLD PLU RI SS2 SS3

9xPU1 PU2 STS CCH MW SPA EPA

SOS SGCI SCI CSI ST OSC PM APC

Ax NBSP ¡ ¢ £ ¤ ¥ ¦ § ¨ © ª « ¬ ­ ® ¯

Bx ° ± ² ³ ´ µ · ¸ ¹ º » ¼ ½ ¾ ¿

Cx À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï

Dx Ð Ñ Ò Ó Ô Õ Ö × Ø Ù Ú Û Ü Ý Þ ß

Ex à á â ã ä å æ ç è é ê ë ì í î ï

Fx ð ñ ò ó ô õ ö ÷ ø ù ú û ü ý þ ÿ


IBM PC or MS-DOS Codepage 437, often abbreviated to CP437 and also known as DOS-US or OEM-US, is the original character set of the IBM PC, circa 1981.

The following is a table representing CP437 using the equivalent Unicode characters:

.0 .1 .2 .3 .4 .5

.6 .7 .8 .9 .A .B .C .D .E .F

263A

263B

2665

2666

2663

2660

2022

25D8

25CB

25D9

2642

2640

266A

266B

263C
 
1.
 

25BA

25C4

2195

203C

B6
§
A7

25AC

21A8

2191

2193

2192

2190

221F

2194

25B2

25BC
 
2.
 
20 !
21
"
22
#
23
$
24
%
25
&
26
'
27
(
28
)
29
*
2A
+
2B
,
2C
-
2D
.
2E
/
2F
 
3.
 
0
30
1
31
2
32
3
33
4
34
5
35
6
36
7
37
8
38
9
39
:
3A
;
3B
<
3C
=
3D
>
3E
?
3F
 
4.
 
@
40
A
41
B
42
C
43
D
44
E
45
F
46
G
47
H
48
I
49
J
4A
K
4B
L
4C
M
4D
N
4E
O
4F
 
5.
 
P
50
Q
51
R
52
S
53
T
54
U
55
V
56
W
57
X
58
Y
59
Z
5A
[
5B
\
5C
]
5D
^
5E
_
5F
 
6.
 
`
60
a
61
b
62
c
63
d
64
e
65
f
66
g
67
h
68
i
69
j
6A
k
6B
l
6C
m
6D
n
6E
o
6F
 
7.
 
p
70
q
71
r
72
s
73
t
74
u
75
v
76
w
77
x
78
y
79
z
7A
{
7B
|
7C
}
7D
~
7E

2302
 
8.
 
Ç
C7
ü
FC
é
E9
â
E2
ä
E4
à
E0
å
E5
ç
E7
ê
EA
ë
EB
è
E8
ï
EF
î
EE
ì
EC
Ä
C4
Å
C5
 
9.
 
É
C9
æ
E6
Æ
C6
ô
F4
ö
F6
ò
F2
û
FB
ù
F9
ÿ
FF
Ö
D6
Ü
DC
¢
A2
£
A3
¥
A5

20A7
ƒ
192
 
A.
 
á
E1
í
ED
ó
F3
ú
FA
ñ
F1
Ñ
D1
ª
AA
º
BA
¿
BF

2310
¬
AC
½
BD
¼
BC
¡
A1
«
AB
»
BB
 
B.
 

2591

2592

2593

2502

2524

2561

2562

2556

2555

2563

2551

2557

255D

255C

255B

2510
 
C.
 

2514

2534

252C

251C

2500

253C

255E

255F

255A

2554

2569

2566

2560

2550

256C

2567
 
D.
 

2568

2564

2565

2559

2558

2552

2553

256B

256A

2518

250C

2588

2584

258C

2590

2580
 
E.
 
α
3B1
ß
DF
Γ
393
π
3C0
Σ
3A3
σ
3C3
µ
B5
τ
3C4
Φ
3A6
Θ
398
Ω
3A9
δ
3B4

221E
φ
3C6
ε
3B5

2229
 
F.
 

2261
±
B1

2265

2264

2320

2321
÷
F7

2248
°
B0

2219
·
B7

221A

207F
²
B2

25A0
 
A0


The repertoire of CP437 was taken from the character set of Wang word-processing machines, as explicitly admitted by Bill Gates in the interview of him and Paul Allen in the 2nd of October 1995 edition of Fortune Magazine:

"... we were also fascinated by dedicated word processors from Wang, because we believed that general-purpose machines could do that just as well. That's why, when it came time to design the keyboard for the IBM PC, we put the funny Wang character set into the machine--you know, smiley faces and boxes and triangles and stuff. We were thinking we'd like to do a clone of Wang word-processing software someday."

CP437 is inadequate for internationalisation, as it lacks characters necessary for some languages, such as À (capital A with grave) for French, and has only a few Greek letters. Later MS-DOS character sets, such as CP850 (DOS Latin-1), CP852 (DOS Central-European) and CP737 (DOS Greek), filled the gaps for international use while still being nearly compatible with CP437 by retaining the box-drawing characters. All CP437 characters are in Unicode and in Microsoft's WGL4 character set, therefore in most of the fonts on Microsoft Windows, and also in the VGA font of Linux, and the ISO 10646 fonts for X11.


· [·]     © [©]     ® [®]     ™ [™]     ' [']     ° [°] ex: 32°F

For info on more special characters see
HTML Escape Character Codes (and 'cursor over')


Reserved Characters in HTML

Some characters are reserved in HTML and XHTML. For example, you cannot use
the greater than or less than signs within your text because the browser could
mistake them for markup.

HTML and XHTML processors must support the five special characters listed in
the table below:

Character Entity Number Entity Name Description
" " " quotation mark
' ' ' (does not work in IE) apostrophe 
& & & ampersand
< < < less-than
> > > greater-than

Note: Entity names are case sensitive!


ISO 8859-1 Symbols

Char
(&#xxx;)
Entity Name Description
  (160)   non-breaking space
¡ (161) ¡ inverted exclamation mark
¢ (162) ¢ cent
£ £ pound
¤ ¤ currency
¥ ¥ yen
¦ ¦ broken vertical bar
§ § section
¨ ¨ spacing diaeresis
© © copyright
ª (170) ª feminine ordinal indicator
« « angle quotation mark (left)
¬ ¬ negation
­ ­ soft hyphen
® ® registered trademark
¯ ¯ spacing macron
° ° degree
± ± plus-or-minus 
² ² superscript 2
³ ³ superscript 3
´ (180) ´ spacing acute
µ µ micro
paragraph
· · middle dot
¸ ¸ spacing cedilla
¹ ¹ superscript 1
º º masculine ordinal indicator
» » angle quotation mark (right)
¼ ¼ fraction 1/4
½ ½ fraction 1/2
¾ (190) ¾ fraction 3/4
¿ (191) ¿ inverted question mark
× (215) × multiplication
÷ (247) ÷ division

Char. (#) Entity Name Description
À (192) À capital a, grave accent
Á (193) Á capital a, acute accent
  capital a, circumflex accent
à à capital a, tilde
Ä Ä capital a, umlaut mark
Å Å capital a, ring
Æ Æ capital ae
Ç Ç capital c, cedilla
È (200) È capital e, grave accent
É É capital e, acute accent
Ê Ê capital e, circumflex accent
Ë Ë capital e, umlaut mark
Ì Ì capital i, grave accent
Í Í capital i, acute accent
Î Î capital i, circumflex accent
Ï Ï capital i, umlaut mark
Ð Ð capital eth, Icelandic
Ñ Ñ capital n, tilde
Ò (210) Ò capital o, grave accent
Ó Ó capital o, acute accent
Ô Ô capital o, circumflex accent
Õ Õ capital o, tilde
Ö (214) Ö capital o, umlaut mark
Ø (216) Ø capital o, slash
Ù Ù capital u, grave accent
Ú Ú capital u, acute accent
Û Û capital u, circumflex accent
Ü (220) Ü capital u, umlaut mark
Ý Ý capital y, acute accent
Þ Þ capital THORN, Icelandic
ß ß small sharp s, German
à à small a, grave accent
á á small a, acute accent
â â small a, circumflex accent
ã ã small a, tilde
ä ä small a, umlaut mark
å å small a, ring
æ (230) æ small ae
ç ç small c, cedilla
è è small e, grave accent
é é small e, acute accent
ê ê small e, circumflex accent
ë ë small e, umlaut mark
ì ì small i, grave accent
í í small i, acute accent
î î small i, circumflex accent
ï ï small i, umlaut mark
ð (240) ð small eth, Icelandic
ñ ñ small n, tilde
ò ò small o, grave accent
ó ó small o, acute accent
ô ô small o, circumflex accent
õ õ small o, tilde
ö (246) ö small o, umlaut mark
ø (248) ø small o, slash
ù ù small u, grave accent
ú (250) ú small u, acute accent
û û small u, circumflex accent
ü ü small u, umlaut mark
ý ý small y, acute accent
þ þ small thorn, Icelandic
ÿ ÿ small y, umlaut mark

Leave a Reply

Website url (required)

Comment / Question