Notifications

No notifications

/Phase 2

Strings

Strings

A string is a sequence of characters stored as an array of characters. In most languages, strings are immutable — each modification creates a new string.

Key Concepts

  • Immutability: In Python/Java/JS, strings cannot be modified in place
  • ASCII/Unicode: Characters map to numbers (A=65, a=97)
  • Null-terminated: In C, strings end with '\0'

Common String Operations

OperationTime
Access char by indexO(1)
Search substringO(n×m) naive, O(n) KMP
ConcatenationO(n)
ComparisonO(min(n,m))

Important Patterns

  • Palindrome: Reads same forward and backward
  • Anagram: Same characters, different order
  • Substring vs Subsequence: Contiguous vs non-contiguous
  • Pattern Matching: KMP, Rabin-Karp algorithms
  • Two Pointers: Palindrome check, reverse string

On this page

Detailed Theory

Strings *look* like just "arrays of characters", and in C they almost are. In every other modern language they're sneakier — immutable, often UTF-16 or UTF-8, with subtle costs hiding behind innocent-looking code like \s += c\ in a loop. Get the mental model right early and a lot of "weird" string problems become routine.

What a String Actually Is

A string is a sequence of characters with a known length. Different languages store that sequence differently:

LanguageStorageMutable?Encoding
Cchar array + \'\\0'\ terminatoryesASCII / single-byte
C++\std::string\ (dynamic char array)yesASCII / UTF-8
Javaimmutable \String\ wrapping \char[]\noUTF-16
Pythonimmutable \str\ objectnoUnicode (variable width)
JavaScriptimmutable \String\ primitivenoUTF-16

In C, \"hello"\ is literally \['h','e','l','l','o','\\0']\; \strlen\ walks until the \\\0\ and is therefore O(n). In Java/JS/Python the length is stored alongside the data, so \s.length\ is O(1).

The single biggest practical consequence of immutability:

\\\`js let s = ''; for (let i = 0; i < n; i++) s += chars[i]; // O(n²) — not O(n)! \\\`

Each \+=\ allocates a new string of length \i + 1\ and copies. n of those = O(n²). Use a builder: arrays + \join\ in JS/Python, \StringBuilder\ in Java/C#.

Daily String Operations and Their Real Costs

OperationTimeWhy
---:---:---
\s[i]\O(1)direct index (in fixed-width encodings)
\s.length\O(1) in JS/Java/Python; O(n) in Clength stored vs scanned
\s + t\O(n + m)new buffer, copy both
\s.substring(i, j)\O(j − i)copy that range
\s.indexOf(p)\O(n × m) worst casenaive scan
compare \s == t\O(min(n, m))char by char
\s.split(",")\O(n)one scan
\s.replace(...)\O(n)scan + rebuild
reverseO(n)new buffer (immutable langs)

Beginner Mistakes to Skip

1. Concatenating in a loop with \+=\. As above — it's O(n²). Build an array, then \join\. 2. Comparing strings with \==\ in Java. \==\ compares references; \.equals()\ compares content. (\==\ is fine in JS/Python — they compare by value for strings.) 3. Assuming \s[i]\ is "the i-th character". Emojis and many CJK characters take 2 UTF-16 units in JS/Java. \"😀".length\ is 2. Use \Array.from(s)\ or codepoint iteration if you care about user-visible characters. 4. Forgetting case + locale. \"İ".toLowerCase()\ differs in Turkish vs English. For interview problems, \toLowerCase()\ is fine; for real apps, prefer locale-aware comparisons. 5. Using regex for trivial work. \/^abc/.test(s)\ is fine; complex backtracking patterns over user input is a denial-of-service vector (ReDoS). 6. Treating Python \str\ and \bytes\ as the same. \b"hello"\ is bytes; \"hello"\ is text. Mixing them throws.

Intermediate: Frequency Counting

The single most common string technique. For lowercase English, a 26-int array beats a hash map every time:

\\\`js function isAnagram(a, b) { if (a.length !== b.length) return false; const c = new Array(26).fill(0); for (let i = 0; i < a.length; i++) { c[a.charCodeAt(i) - 97]++; c[b.charCodeAt(i) - 97]--; } return c.every(x => x === 0); } \\\`

This pattern unlocks: anagram check, "first non-repeating character", "is one string a permutation of another", "minimum window to contain all of pattern's chars" (with sliding window).

Intermediate: Two Pointers on Strings

  • Palindrome check: pointers from both ends, compare, walk inward.
  • Reverse words: reverse whole string, then reverse each word.
  • Valid palindrome ignoring punctuation/case: same as palindrome, but skip non-alphanumeric chars.
\\\`js function isPalindrome(s) { let l = 0, r = s.length - 1; while (l < r) { if (s[l] !== s[r]) return false; l++; r--; } return true; } \\\`

Intermediate: Sliding Window on Strings

`text "longest substring without repeating characters" "smallest window containing all chars of T" "longest substring with at most k distinct chars" `

These all use the same skeleton: expand \right\, when the window is invalid shrink \left\, record the best valid window. The window state is usually a frequency map.

Intermediate: Pattern Matching — Naive vs KMP

Naive search is O(n × m): for each position in the text, try matching all of \pattern\.

KMP (Knuth-Morris-Pratt) runs in O(n + m) by precomputing an LPS (Longest Proper Prefix that's also a Suffix) table for the pattern, then never re-comparing matched characters. Worth knowing the *idea* even if you don't memorise the code: when you mismatch, the LPS table tells you how far you can safely jump in the pattern instead of restarting.

For \"ABCABD"\, LPS = \[0, 0, 0, 1, 2, 0]\.

Intermediate: Rolling Hash (Rabin-Karp)

Compare substrings in O(1) by comparing their hashes:

\hash(s[i+1..i+m]) = (hash(s[i..i+m-1]) − s[i]·b^(m-1)) · b + s[i+m]\

Useful for "find any of these k patterns in this text" (hash all patterns, slide one rolling hash) and for substring-equality problems on competitive programming. Real production code uses Boyer-Moore or platform-specific SIMD instead.

Advanced: Encodings That Will Bite You

  • ASCII = 7 bits, 128 chars — English only.
  • UTF-8 = 1–4 bytes per code point, ASCII-compatible. The web standard.
  • UTF-16 = 2 or 4 bytes per code point. Used internally by JS, Java, Windows APIs.
  • Code unitcode pointgrapheme cluster. \"👨‍👩‍👧"\ is *one* visible glyph, *three* code points (with two zero-width joiners), and *eight* UTF-16 code units. \s.length\ reports the last number — almost never what users mean.
If you need user-visible character count in JS, use \Array.from(s).length\ (counts code points) or the \Intl.Segmenter\ API (counts grapheme clusters).

Advanced: Palindromic Substrings

Three approaches, increasingly clever:

1. Brute force: O(n³) — every substring, check palindrome. 2. Expand around centres: O(n²) — for each of 2n−1 centres (chars and gaps), expand outward while equal. 3. Manacher's algorithm: O(n) — uses a clever symmetry argument so each character is visited a constant number of times.

For interviews, "expand around centres" is the sweet spot: short to write, fast enough.

Advanced: Tries Are Strings' Best Friend

A trie (prefix tree) stores strings character-by-character along edges. Look-ups, inserts, and prefix queries all run in O(L) where L is the string length. Used by:

  • autocomplete / typeahead
  • spellcheckers
  • IP routing tables (binary trie)
  • "longest common prefix" queries
  • competitive-programming problems with up to ~10⁵ words
You'll cover them in depth in the Tries topic — but every time you see "many strings, prefix queries", think trie.

Advanced: Regex — Power and Footguns

Regex is genuinely useful (validation, parsing simple grammars, search-and-replace). Two things to know:

1. The "boring" features (literal characters, classes, anchors, quantifiers) are O(n) and safe. 2. Backreferences and certain alternation patterns can backtrack exponentially. \/^(a+)+$/.test("aaaaaaaaaaaaaaaaaaaaaaaaa!")\ can take seconds. Never run untrusted regex on user input, and never run user-controlled input through hand-written nested-quantifier regex.

Use the engine's documented "linear-time" mode where available (Go's RE2, Rust's \regex\ crate).

Advanced: When NOT to Use a String

JobBetter representation
Fixed-format binary data\Uint8Array\ / \bytes\
Big integer arithmeticlanguage-native bigint
Append millions of piecesarray + \join\ / \StringBuilder\
Set membership for many small stringshash set / trie
Comparing huge identical prefixesrolling hash

Practice Path

1. Write \isPalindrome(s)\ (two pointers). Then extend it to ignore non-alphanumerics and case. 2. Implement \anagram(a, b)\ using a 26-int array. Verify on \"listen" / "silent"\ and \"" / ""\. 3. Solve Longest Substring Without Repeating Characters (LC 3) with a sliding window + hash map. 4. Solve Group Anagrams (LC 49). Use the sorted-string of each word as the hash-map key. Discuss why this works in O((n + m) · m log m) where m is max word length.