charset

The charset attribute in HTML defines the document's character encoding, ensuring proper display of text, especially for non-ASCII characters.

charset Attribute

The charset attribute is typically used in a <meta> tag inside the document's <head> section. UTF-8 is the recommended character encoding as it supports nearly all characters used in human languages.

Syntax

index.html
<meta charset="character-set">

Example

Setting the character encoding to UTF-8 in HTML:

Meta Example

My Website

Some text...

index.html
<!DOCTYPE html>
<html>

<head>
<meta charset="UTF-8">
</head>

<body>
<h1>My Website</h1>
<p>Some text...</p>
</body>

</html>

ASCII Character Set

ASCII was one of the earliest character encoding standards, containing 128 characters:

  • Uppercase and lowercase English letters (A-Z, a-z)
  • Digits from 0 to 9
  • Symbols such as !, $, +, -, @, <, and >

ANSI Character Set

ANSI (Windows-1252) was an early Windows encoding system:

  • Matches ASCII for characters 0–127
  • Includes additional special characters from 128–159
  • Aligns with UTF-8 for characters 160–255

To use ANSI in HTML:

index.html
<meta charset="Windows-1252">

ISO-8859-1 Character Set

ISO-8859-1 was the default encoding for HTML 4, supporting 256 characters. It shares similarities with ASCII and ANSI but has certain differences.

  • Matches ASCII for characters 0–127
  • Skips the range 128–159
  • Aligns with ANSI and UTF-8 for 160–255

Usage in HTML

HTML 4

index.html
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">

HTML 5

index.html
<meta charset="ISO-8859-1">

UTF-8 Character Set

UTF-8 is a universal character encoding that supports many character sets.

  • Matches ASCII for 0–127
  • Excludes characters 128–159
  • Aligns with ANSI and ISO-8859-1 for 160–255
  • Supports thousands of additional characters

Usage in HTML

index.html
<meta charset="UTF-8">

Common Character Encodings

  • UTF-8: The most widely used encoding, supporting almost all characters worldwide. Example: <meta charset="UTF-8">.
  • ISO-8859-1 (Latin-1): Supports most Western European languages. Example: <meta charset="ISO-8859-1">.
  • Windows-1252: Similar to ISO-8859-1 with additional characters. Example: <meta charset="Windows-1252">.
  • UTF-16: Less common for the web. Example: <meta charset="UTF-16">.
  • ISO-8859-2: Supports Central and Eastern European languages. Example: <meta charset="ISO-8859-2">.
  • GBK: Used for Simplified Chinese characters. Example: <meta charset="GBK">.
  • Shift_JIS: Used for Japanese text. Example: <meta charset="Shift_JIS">.
  • EUC-KR: Used for the Korean language. Example: <meta charset="EUC-KR">.

Values

  • character-set
    • Specifies the character set, such as UTF-8 or ISO-8859-1.

Applies To

The charset attribute is used in the following HTML element:

Character Set Comparison

The table below highlights key differences between the mentioned character sets.

NumASCIIANSIISO-8859-1UTF-8Description
0NULNULNULNULNull character
1SOHSOHSOHSOHStart of Header
2STXSTXSTXSTXStart of Text
3ETXETXETXETXEnd of Text
4EOTEOTEOTEOTEnd of Transmission
5ENQENQENQENQEnquiry
6ACKACKACKACKAcknowledgment
7BELBELBELBELBell
8BSBSBSBSBackspace
9TABTABTABTABHorizontal Tab
10LFLFLFLFLine Feed
32SpaceSpaceSpaceSpaceSpace
48-570-90-90-90-9Digits
65-90A-ZA-ZA-ZA-ZUppercase Latin letters
97-122a-za-za-za-zLowercase Latin letters
128-159(unused)Control characters (not used)
160  Non-breaking space
161-255VariousVariousVariousVariousExtended characters

Conclusion

The charset attribute is essential for defining a webpage’s character encoding, ensuring correct text display, particularly for non-ASCII characters. While older encodings like ASCII, ANSI, and ISO-8859-1 were once common, UTF-8 is now the standard due to its wide-ranging language support. Using <meta charset="UTF-8"> is strongly recommended for modern web development, as it ensures compatibility and consistent text rendering across various platforms.