It is well known that, in order to include the characters <, & and > in an HTML document the author has to write < & and > since these characters have special meanings to HTML interpreters. There are a significant number of special characters that can be included using this sort of notation. These are all characters outside the normal ASCII character set, their direct inclusion as binary is uncertain as they all have the "high" bit set, further there are several different interpretations of high bit set characters in use.
In HTML there are two ways of including such special characters
For example the French word for French is français. [Note the small hook under the letter c, this makes it into c cedilla] The HTML could be either
français or françaisThere are some contexts in which HTML 3.2 distinguishes between letters and other symbols, the special characters that are treated as letters are indicated, these are the only characters for which the HTML 3.2 standard specifically lists entities.
Description | Entity | Decimal Value | &#<decimal> | &<entity> | Treated as letter |
---|---|---|---|---|---|
required space | nbsp | 160 | N | ||
inverted exclamation | iexcl | 161 | ¡ | ¡ | N |
cent sign | cent | 162 | ¢ | ¢ | N |
pound sign | pound | 163 | £ | £ | N |
currency sign | curren | 164 | ¤ | ¤ | N |
yen sign | yen | 165 | ¥ | ¥ | N |
broken bar | brvbar | 166 | ¦ | ¦ | N |
section sign | sect | 167 | § | § | N |
umlaut | uml | 168 | ¨ | ¨ | N |
copyright sign | copy | 169 | © | © | N |
feminine ordinal | ordf | 170 | ª | ª | N |
left guillemot | laquo | 171 | « | « | N |
logical not sign | not | 172 | ¬ | ¬ | N |
soft hyphen | shy | 173 | | | N |
registered trademark | reg | 174 | ® | ® | N |
spacing macron | macr | 175 | ¯ | ¯ | N |
degree | deg | 176 | ° | ° | N |
plus-minus sign | plusmn | 177 | ± | ± | N |
superscript 2 | sup2 | 178 | ² | ² | N |
superscript 3 | sup3 | 179 | ³ | ³ | N |
spacing acute | acute | 180 | ´ | ´ | N |
mu | micro | 181 | µ | µ | N |
pilcrow | para | 182 | ¶ | ¶ | N |
middle dot | middot | 183 | · | · | N |
spacing cedilla | cedil | 184 | ¸ | ¸ | N |
superscript 1 | sup1 | 185 | ¹ | ¹ | N |
masculine ordinal | ordm | 186 | º | º | N |
right guillemot | raquo | 187 | » | » | N |
one quarter | frac14 | 188 | ¼ | ¼ | N |
half | frac12 | 189 | ½ | ½ | N |
three quarters | frac34 | 190 | ¾ | ¾ | N |
inverted question mark | iquest | 191 | ¿ | ¿ | N |
A grave | Agrave | 192 | À | À | Y |
A acute | Aacute | 193 | Á | Á | Y |
A circumflex | Acirc | 194 | Â | Â | Y |
A tilde | Atilde | 195 | Ã | Ã | Y |
A diaeresis | Auml | 196 | Ä | Ä | Y |
A ring | Aring | 197 | Å | Å | Y |
AE ligature | AElig | 198 | Æ | Æ | Y |
C cedilla | Ccedil | 199 | Ç | Ç | Y |
E grave | Egrave | 200 | È | È | Y |
E acute | Eacute | 201 | É | É | Y |
E circumflex | Ecirc | 202 | Ê | Ê | Y |
E diaeresis | Euml | 203 | Ë | Ë | Y |
I grave | Igrave | 204 | Ì | Ì | Y |
I acute | Iacute | 205 | Í | Í | Y |
I circumflex | Icirc | 206 | Î | Î | Y |
I diaeresis | Iuml | 207 | Ï | Ï | Y |
ETH | ETH | 208 | Ð | Ð | Y |
N tilde | Ntilde | 209 | Ñ | Ñ | Y |
O grave | Ograve | 210 | Ò | Ò | Y |
O acute | Oacute | 211 | Ó | Ó | Y |
O circumflex | Ocirc | 212 | Ô | Ô | Y |
O tilde | Otilde | 213 | Õ | Õ | Y |
O diaeresis | Ouml | 214 | Ö | Ö | Y |
multiplication sign | times | 215 | × | × | N |
O slash | Oslash | 216 | Ø | Ø | Y |
U grave | Ugrave | 217 | Ù | Ù | Y |
U acute | Uacute | 218 | Ú | Ú | Y |
U circumflex | Ucirc | 219 | Û | Û | Y |
U diaeresis | Uuml | 220 | Ü | Ü | Y |
Y acute | Yacute | 221 | Ý | Ý | Y |
THORN | THORN | 222 | Þ | Þ | Y |
sharp s | szlig | 223 | ß | ß | Y |
a grave | agrave | 224 | à | à | Y |
a acute | aacute | 225 | á | á | Y |
a circumflex | acirc | 226 | â | â | Y |
a tilde | atilde | 227 | ã | ã | Y |
a diaeresis | auml | 228 | ä | ä | Y |
a ring | aring | 229 | å | å | Y |
ae ligature | aelig | 230 | æ | æ | Y |
c cedilla | ccedil | 231 | ç | ç | Y |
e grave | egrave | 232 | è | è | Y |
e acute | eacute | 233 | é | é | Y |
e circumflex | ecirc | 234 | ê | ê | Y |
e diaeresis | euml | 235 | ë | ë | Y |
i grave | igrave | 236 | ì | ì | Y |
i acute | iacute | 237 | í | í | Y |
i circumflex | icirc | 238 | î | î | Y |
i diaeresis | iuml | 239 | ï | ï | Y |
eth | eth | 240 | ð | ð | Y |
n tilde | ntilde | 241 | ñ | ñ | Y |
o grave | ograve | 242 | ò | ò | Y |
o acute | oacute | 243 | ó | ó | Y |
o circumflex | ocirc | 244 | ô | ô | Y |
o tilde | otilde | 245 | õ | õ | Y |
o diaeresis | ouml | 246 | ö | ö | Y |
division sign | divide | 247 | ÷ | ÷ | N |
o slash | oslash | 248 | ø | ø | Y |
u grave | ugrave | 249 | ù | ù | Y |
u acute | uacute | 250 | ú | ú | Y |
u circumflex | ucirc | 251 | û | û | Y |
u diaeresis | uuml | 252 | ü | ü | Y |
y acute | yacute | 253 | ý | ý | Y |
thorn | thorn | 254 | þ | þ | Y |
y diaeresis | yuml | 255 | ÿ | ÿ | Y |
eth and thorn are Icelandic letters.
A spacing cedilla (184) is a cedilla that appears after the character it was associated with. I.e. C cedilla would appear as c¸ rather than ç. The usefulnes of the spacing marks is unclear, non-spacing marks would have been much more useful as any composite special character could then have been formed, such as the w circumflex used in Welsh, u macron used in some Anglo-Saxon place names, l slash used in zloty (Polish Currency), S with upside down circumflex in Skoda (Czech car manufacturer) there would also then have been room for more of the Greek alphabet widely used in science and mathematics. Perhaps one day WWW browsers will be Unicode capable.
Both Netscape 3.0 and Microsoft Internet Explorer 3.0 handled all these correctly although I'm not too sure what a masculine or feminine ordinal is supposed to look like.