charset
charset Attribute
The charset
attribute is typically used in a <meta>
tag inside the document's <head>
section. UTF-8
is the recommended character encoding as it supports nearly all characters used in human languages.
Syntax
<meta charset="character-set">
Example
Setting the character encoding to UTF-8
in HTML:
Meta Example
Some text...
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
</head>
<body>
<h1>My Website</h1>
<p>Some text...</p>
</body>
</html>
ASCII Character Set
ASCII
was one of the earliest character encoding standards, containing 128
characters:
- Uppercase and lowercase English letters (A-Z, a-z)
- Digits from 0 to 9
- Symbols such as
!
,$
,+
,-
,@
,<
, and>
ANSI Character Set
ANSI
(Windows-1252) was an early Windows encoding system:
- Matches
ASCII
for characters 0–127 - Includes additional special characters from 128–159
- Aligns with
UTF-8
for characters 160–255
To use ANSI in HTML:
<meta charset="Windows-1252">
ISO-8859-1 Character Set
ISO-8859-1
was the default encoding for HTML 4, supporting 256 characters. It shares similarities with ASCII
and ANSI
but has certain differences.
- Matches ASCII for characters 0–127
- Skips the range 128–159
- Aligns with
ANSI
andUTF-8
for 160–255
Usage in HTML
HTML 4
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">
HTML 5
<meta charset="ISO-8859-1">
UTF-8 Character Set
UTF-8
is a universal character encoding that supports many character sets.
- Matches ASCII for 0–127
- Excludes characters 128–159
- Aligns with ANSI and
ISO-8859-1
for160–255
- Supports thousands of additional characters
Usage in HTML
<meta charset="UTF-8">
Common Character Encodings
UTF-8:
The most widely used encoding, supporting almost all characters worldwide. Example:<meta charset="UTF-8">
.ISO-8859-1 (Latin-1):
Supports most Western European languages. Example:<meta charset="ISO-8859-1">
.Windows-1252:
Similar to ISO-8859-1 with additional characters. Example:<meta charset="Windows-1252">
.UTF-16:
Less common for the web. Example:<meta charset="UTF-16">
.ISO-8859-2:
Supports Central and Eastern European languages. Example:<meta charset="ISO-8859-2">
.GBK:
Used for Simplified Chinese characters. Example:<meta charset="GBK">
.Shift_JIS:
Used for Japanese text. Example:<meta charset="Shift_JIS">
.EUC-KR:
Used for the Korean language. Example:<meta charset="EUC-KR">
.
Values
character-set
- Specifies the character set, such as UTF-8 or ISO-8859-1.
Applies To
The charset
attribute is used in the following HTML element:
Character Set Comparison
The table below highlights key differences between the mentioned character sets.
Num | ASCII | ANSI | ISO-8859-1 | UTF-8 | Description |
---|---|---|---|---|---|
0 | NUL | NUL | NUL | NUL | Null character |
1 | SOH | SOH | SOH | SOH | Start of Header |
2 | STX | STX | STX | STX | Start of Text |
3 | ETX | ETX | ETX | ETX | End of Text |
4 | EOT | EOT | EOT | EOT | End of Transmission |
5 | ENQ | ENQ | ENQ | ENQ | Enquiry |
6 | ACK | ACK | ACK | ACK | Acknowledgment |
7 | BEL | BEL | BEL | BEL | Bell |
8 | BS | BS | BS | BS | Backspace |
9 | TAB | TAB | TAB | TAB | Horizontal Tab |
10 | LF | LF | LF | LF | Line Feed |
32 | Space | Space | Space | Space | Space |
48-57 | 0-9 | 0-9 | 0-9 | 0-9 | Digits |
65-90 | A-Z | A-Z | A-Z | A-Z | Uppercase Latin letters |
97-122 | a-z | a-z | a-z | a-z | Lowercase Latin letters |
128-159 | (unused) | Control characters (not used) | |||
160 | Non-breaking space | ||||
161-255 | Various | Various | Various | Various | Extended characters |
Conclusion
The charset
attribute is essential for defining a webpage’s character encoding, ensuring correct text display, particularly for non-ASCII characters. While older encodings like ASCII
, ANSI
, and ISO-8859-1
were once common, UTF-8
is now the standard due to its wide-ranging language support. Using <meta charset="UTF-8">
is strongly recommended for modern web development, as it ensures compatibility and consistent text rendering across various platforms.