String Length
Getting String Length
You can determine the number of characters in a string using the length()
or size()
methods (they are equivalent in C++). Both run in O(1) time and return std::string::size_type
(an unsigned type).
#include <iostream>
#include <string>
int main() {
std::string str = "Hello";
std::cout << "Length: " << str.length() << '\n';
std::cout << "Size: " << str.size();
return 0;
}
Length: 5 Size: 5
Empty Strings
You can check if a string has no characters using the empty()
method, which returns true (1) if the string is empty.
#include <iostream>
#include <string>
int main() {
std::string s1 = "Content";
std::string s2;
std::cout << "s1 empty? " << s1.empty() << '\n';
std::cout << "s2 empty? " << s2.empty();
return 0;
}
s1 empty? 0 s2 empty? 1
Length vs Capacity
A string stores both its length (the number of code units it currently holds) and its capacity (the amount of memory allocated). Capacity may be larger than length to reduce reallocations when the string grows.
#include <iostream>
#include <string>
int main() {
std::string str;
std::cout << "Initial capacity: " << str.capacity() << '\n';
str = "This is a longer string";
std::cout << "Length: " << str.length() << '\n';
std::cout << "Capacity: " << str.capacity() << '\n';
// Tip: reserve to avoid repeated reallocations
str.clear();
str.reserve(100);
std::cout << "After reserve(100), capacity: " << str.capacity() << '\n';
return 0;
}
Initial capacity: 15 Length: 22 Capacity: 30 After reserve(100), capacity: 100
Unicode Caveat (UTF-8)
In a narrow std::string
, size()/length()
count bytes, not human-visible characters. With UTF-8 text, multi-byte code points (e.g., emoji) make size larger than the perceived character count.
If you need code-point counts in UTF-8, you can approximate by counting non-continuation bytes; for true user-perceived characters (grapheme clusters), a text library is required.
#include <iostream>
#include <string>
// Count UTF-8 code points by counting bytes that are not 10xxxxxx (continuations)
std::size_t utf8_codepoints(const std::string& s) {
std::size_t n = 0;
for (unsigned char c : s) {
if ((c & 0xC0) != 0x80) ++n; // not a continuation byte
}
return n;
}
int main() {
std::string s = u8"😀a"; // U+1F600 + 'a' -> 2 code points, 5 bytes
std::cout << "bytes: " << s.size() << '\n';
std::cout << "code points (approx): " << utf8_codepoints(s) << '\n';
return 0;
}
bytes: 5 code points (approx): 2
Best Practices
- Prefer '\n'
over std::endl
to avoid needless flushing.
- Remember size()
is unsigned; take care when subtracting or comparing with signed values.
- Use reserve()
to pre-allocate when you know growth ahead of time; shrink_to_fit()
is non-binding.
- For international text, be explicit about encodings; consider std::u8string
/std::u32string
or a Unicode library for advanced needs.