Character Data Types
Working with Characters
The char
type stores a single character code unit. sizeof(char)
is 1 byte by definition, and a byte is at least 8 bits (CHAR_BIT
≥ 8).
Character literals are enclosed in single quotes (e.g., 'A'
, '9'
, '$'
) and may include escape sequences such as '\n'
for newline.
The exact narrow character encoding (ASCII, UTF-8, etc.) is implementation-defined; don’t assume ASCII everywhere.
Character Examples
Examples of declaring and printing characters:
#include <iostream>
using namespace std;
int main() {
char firstLetter = 'A';
char digit = '9';
char symbol = '$';
cout << firstLetter << " " << digit << " " << symbol;
return 0;
}
A 9 $
Escape Sequences
Common escapes include \n
(newline), \t
(tab), \a
(bell), \'
(single quote), \"
(double quote), and \\
(backslash). You can also use hexadecimal (\x41
) or octal (\101
) escapes.
#include <iostream>
using namespace std;
int main() {
char nl = '\n'; // newline
char bell = '\a'; // bell/alert
char Ahex = '\x41'; // 'A'
cout << 'A' << bell << "Hello" << nl;
cout << Ahex << "\n";
return 0;
}
AHello A
Encodings and ASCII
Many platforms use ASCII or a superset (like UTF-8) for the narrow execution character set, but the C++ standard does not require ASCII.
If you rely on numeric codes, prefer readable escape sequences (e.g., '\n'
instead of 10).
In ASCII specifically, codes 0–31 are control characters and 32–126 are printable.
#include <iostream>
using namespace std;
int main() {
char newline = '\n'; // preferred over numeric 10
char beep = '\a'; // preferred over numeric 7
cout << "Alert" << beep << "New" << newline << "Line";
return 0;
}
AlertNew Line
Signedness and <code><cctype></code> Functions
char
may be signed or unsigned depending on the implementation. When passing a char
to classification functions like std::isalpha
, cast to unsigned char
to avoid undefined behavior for negative values.
#include <iostream>
#include <cctype>
using namespace std;
int main() {
char ch = 'ß'; // example: may be negative if char is signed under some encodings/locals
bool alpha = std::isalpha(static_cast<unsigned char>(ch));
cout << boolalpha << alpha;
return 0;
}
true
Unicode and Wider Character Types
For Unicode text, prefer std::string
(commonly UTF-8) or wider character types when appropriate:
- char8_t
(C++20) for UTF-8 literals: u8"…"
- char16_t
/ char32_t
for UTF-16/UTF-32: u"…"
, U"…"
- wchar_t
is implementation-defined width and encoding
Remember that a char
stores a code unit, not necessarily a complete user-visible character (grapheme).
#include <string>
#include <iostream>
using namespace std;
int main() {
const char8_t* u8 = u8"Hi"; // UTF-8 code units
const char16_t* u16 = u"Hi"; // UTF-16
const char32_t* u32 = U"Hi"; // UTF-32
// Convert to std::u8string/std::u16string/std::u32string as needed
std::u8string s8 = u8"Hello";
cout << "Sizes: " << sizeof(char) << ", " << sizeof(char16_t) << ", " << sizeof(char32_t);
return 0;
}
Sizes: 1, 2, 4