Last 30 Days
No notifications
A string is a sequence of characters stored as an array of characters. In most languages, strings are immutable — each modification creates a new string.
| Operation | Time |
| Access char by index | O(1) |
| Search substring | O(n×m) naive, O(n) KMP |
| Concatenation | O(n) |
| Comparison | O(min(n,m)) |
Strings *look* like just "arrays of characters", and in C they almost are. In every other modern language they're sneakier — immutable, often UTF-16 or UTF-8, with subtle costs hiding behind innocent-looking code like \s += c\ in a loop. Get the mental model right early and a lot of "weird" string problems become routine.
A string is a sequence of characters with a known length. Different languages store that sequence differently:
| Language | Storage | Mutable? | Encoding |
| C | char array + \'\\0'\ terminator | yes | ASCII / single-byte |
| C++ | \std::string\ (dynamic char array) | yes | ASCII / UTF-8 |
| Java | immutable \String\ wrapping \char[]\ | no | UTF-16 |
| Python | immutable \str\ object | no | Unicode (variable width) |
| JavaScript | immutable \String\ primitive | no | UTF-16 |
In C, \"hello"\ is literally \['h','e','l','l','o','\\0']\; \strlen\ walks until the \\\0\ and is therefore O(n). In Java/JS/Python the length is stored alongside the data, so \s.length\ is O(1).
The single biggest practical consequence of immutability:
\\\`js
let s = '';
for (let i = 0; i < n; i++) s += chars[i]; // O(n²) — not O(n)!
\\\`
Each \+=\ allocates a new string of length \i + 1\ and copies. n of those = O(n²). Use a builder: arrays + \join\ in JS/Python, \StringBuilder\ in Java/C#.
| Operation | Time | Why |
| --- | :---: | --- |
\s[i]\ | O(1) | direct index (in fixed-width encodings) |
\s.length\ | O(1) in JS/Java/Python; O(n) in C | length stored vs scanned |
\s + t\ | O(n + m) | new buffer, copy both |
\s.substring(i, j)\ | O(j − i) | copy that range |
\s.indexOf(p)\ | O(n × m) worst case | naive scan |
compare \s == t\ | O(min(n, m)) | char by char |
\s.split(",")\ | O(n) | one scan |
\s.replace(...)\ | O(n) | scan + rebuild |
| reverse | O(n) | new buffer (immutable langs) |
1. Concatenating in a loop with \+=\. As above — it's O(n²). Build an array, then \join\.
2. Comparing strings with \==\ in Java. \==\ compares references; \.equals()\ compares content. (\==\ is fine in JS/Python — they compare by value for strings.)
3. Assuming \s[i]\ is "the i-th character". Emojis and many CJK characters take 2 UTF-16 units in JS/Java. \"😀".length\ is 2. Use \Array.from(s)\ or codepoint iteration if you care about user-visible characters.
4. Forgetting case + locale. \"İ".toLowerCase()\ differs in Turkish vs English. For interview problems, \toLowerCase()\ is fine; for real apps, prefer locale-aware comparisons.
5. Using regex for trivial work. \/^abc/.test(s)\ is fine; complex backtracking patterns over user input is a denial-of-service vector (ReDoS).
6. Treating Python \str\ and \bytes\ as the same. \b"hello"\ is bytes; \"hello"\ is text. Mixing them throws.
The single most common string technique. For lowercase English, a 26-int array beats a hash map every time:
\\\`js
function isAnagram(a, b) {
if (a.length !== b.length) return false;
const c = new Array(26).fill(0);
for (let i = 0; i < a.length; i++) {
c[a.charCodeAt(i) - 97]++;
c[b.charCodeAt(i) - 97]--;
}
return c.every(x => x === 0);
}
\\\`
This pattern unlocks: anagram check, "first non-repeating character", "is one string a permutation of another", "minimum window to contain all of pattern's chars" (with sliding window).
\\`js
function isPalindrome(s) {
let l = 0, r = s.length - 1;
while (l < r) {
if (s[l] !== s[r]) return false;
l++; r--;
}
return true;
}
\\\``text
"longest substring without repeating characters"
"smallest window containing all chars of T"
"longest substring with at most k distinct chars"
`
These all use the same skeleton: expand \right\, when the window is invalid shrink \left\, record the best valid window. The window state is usually a frequency map.
Naive search is O(n × m): for each position in the text, try matching all of \pattern\.
KMP (Knuth-Morris-Pratt) runs in O(n + m) by precomputing an LPS (Longest Proper Prefix that's also a Suffix) table for the pattern, then never re-comparing matched characters. Worth knowing the *idea* even if you don't memorise the code: when you mismatch, the LPS table tells you how far you can safely jump in the pattern instead of restarting.
For \"ABCABD"\, LPS = \[0, 0, 0, 1, 2, 0]\.
Compare substrings in O(1) by comparing their hashes:
\hash(s[i+1..i+m]) = (hash(s[i..i+m-1]) − s[i]·b^(m-1)) · b + s[i+m]\
Useful for "find any of these k patterns in this text" (hash all patterns, slide one rolling hash) and for substring-equality problems on competitive programming. Real production code uses Boyer-Moore or platform-specific SIMD instead.
"👨👩👧"\ is *one* visible glyph, *three* code points (with two zero-width joiners), and *eight* UTF-16 code units. \s.length\ reports the last number — almost never what users mean.Array.from(s).length\ (counts code points) or the \Intl.Segmenter\ API (counts grapheme clusters).Three approaches, increasingly clever:
1. Brute force: O(n³) — every substring, check palindrome. 2. Expand around centres: O(n²) — for each of 2n−1 centres (chars and gaps), expand outward while equal. 3. Manacher's algorithm: O(n) — uses a clever symmetry argument so each character is visited a constant number of times.
For interviews, "expand around centres" is the sweet spot: short to write, fast enough.
A trie (prefix tree) stores strings character-by-character along edges. Look-ups, inserts, and prefix queries all run in O(L) where L is the string length. Used by:
Regex is genuinely useful (validation, parsing simple grammars, search-and-replace). Two things to know:
1. The "boring" features (literal characters, classes, anchors, quantifiers) are O(n) and safe.
2. Backreferences and certain alternation patterns can backtrack exponentially. \/^(a+)+$/.test("aaaaaaaaaaaaaaaaaaaaaaaaa!")\ can take seconds. Never run untrusted regex on user input, and never run user-controlled input through hand-written nested-quantifier regex.
Use the engine's documented "linear-time" mode where available (Go's RE2, Rust's \regex\ crate).
| Job | Better representation |
| Fixed-format binary data | \Uint8Array\ / \bytes\ |
| Big integer arithmetic | language-native bigint |
| Append millions of pieces | array + \join\ / \StringBuilder\ |
| Set membership for many small strings | hash set / trie |
| Comparing huge identical prefixes | rolling hash |
1. Write \isPalindrome(s)\ (two pointers). Then extend it to ignore non-alphanumerics and case.
2. Implement \anagram(a, b)\ using a 26-int array. Verify on \"listen" / "silent"\ and \"" / ""\.
3. Solve Longest Substring Without Repeating Characters (LC 3) with a sliding window + hash map.
4. Solve Group Anagrams (LC 49). Use the sorted-string of each word as the hash-map key. Discuss why this works in O((n + m) · m log m) where m is max word length.