Notifications

No notifications

/Phase 2

String Operations

String Operations in Python 🔤

Strings in Python are immutable sequences of Unicode characters. Every string method returns a new string — the original is never modified.

Creating Strings

s1 = "Hello"
s2 = 'World'
s3 = """Multi-line
string"""
raw = r"C:\Users\name"   # Raw string (no escapes)

Indexing & Slicing

Strings support the same indexing and slicing as lists:

s = "Python"
s[0]      # "P"
s[-1]     # "n"
s[1:4]    # "yth"
s[::-1]   # "nohtyP"

Essential String Methods

MethodDescriptionExample
upper()ALL CAPS"hi".upper() → "HI"
lower()all lowercase"HI".lower() → "hi"
strip()Remove leading/trailing whitespace" hi ".strip() → "hi"
split(sep)Split into list"a,b,c".split(",") → ["a","b","c"]
join(list)Join list into string"-".join(["a","b"]) → "a-b"
replace(a,b)Replace occurrences"hello".replace("l","r") → "herro"
find(sub)Find index (-1 if missing)"hello".find("ll") → 2
startswith()Check prefix"hello".startswith("he") → True
endswith()Check suffix"file.py".endswith(".py") → True
isdigit()All digits?"123".isdigit() → True
isalpha()All letters?"abc".isalpha() → True

String Formatting

name, age = "Alice", 30
# f-string (recommended, Python 3.6+)
print(f"{name} is {age} years old")
# .format()
print("{} is {} years old".format(name, age))
# % formatting (legacy)
print("%s is %d years old" % (name, age))

f-string Power Features

pi = 3.14159
print(f"{pi:.2f}")        # 3.14  (2 decimal places)
print(f"{1000000:,}")     # 1,000,000  (comma separator)
print(f"{'hi':>10}")      # "        hi"  (right-align)
print(f"{'hi':<10}")      # "hi        "  (left-align)

> Key insight: Strings are immutable — s.upper() doesn't change s, it returns a new string. Always assign the result: s = s.upper().

On this page

Detailed Theory

Strings are how programs talk to humans — names, messages, files, JSON, HTML, logs. Python's strings are friendly on the outside (just text) and powerful underneath (Unicode-correct, immutable, full toolkit of methods).

What a String Actually Is

name = "Alice"
greeting = 'hello'
block = """line 1
line 2"""

A str is an immutable sequence of Unicode code points. "Immutable" means *every* operation that looks like a change actually returns a new string — the original is untouched.

Single / double / triple quotes are equivalent; pick whatever avoids escaping.

Basic Operations

len(s)                # length
s + t                  # concat
s * 3                  # repeat
s[0], s[-1]            # indexing
s[1:4], s[::-1]        # slicing, reverse
"py" in s              # substring test

Indexing returns a 1-character string — there's no separate "char" type.

Daily String Methods

s.lower(), s.upper(), s.title(), s.capitalize()
s.strip(), s.lstrip(), s.rstrip()           # remove whitespace
s.replace("old", "new")
s.split(","), s.rsplit(",", 1)               # to list
",".join(parts)                                # from list
s.startswith("http"), s.endswith(".csv")
s.find("x"), s.index("x")                    # -1 vs ValueError
s.isdigit(), s.isalpha(), s.isalnum()
s.count("a")
s.zfill(4), s.center(20, "-"), s.ljust(10)

Methods always return a new string. Chain them: raw.strip().lower().replace(" ", "-").

f-Strings — The Modern Format

name, age = "Alice", 30
f"Hello {name}, age {age}"
f"{age:03}"             # 030  — pad to 3 with zeros
f"{3.14159:.2f}"        # 3.14
f"{value:,}"             # 1,000,000
f"{name=}"               # name='Alice'  — great for debugging
f"{datetime.now():%Y-%m-%d}"

f-strings are fast, readable, and the standard since Python 3.6. Don't use %-formatting or .format() for new code.

Beginner Mistakes to Skip

1. Modifying a string in place. s[0] = "A" → TypeError. Build a new string. 2. Building strings with += in a loop. O(n²). Collect parts in a list and "".join(parts) at the end. 3. == vs is for strings. Always use == for value comparison; is is for identity. 4. Mixing bytes and strings. b"hi" + "there" raises. Decode bytes to str (b.decode("utf-8")) or vice versa. 5. Forgetting raw strings for paths/regex. Use r"C:\Users\Alice" or r"\d+" to skip escape headaches. 6. if s == "" instead of if not s. Empty string is falsy — the latter is idiomatic.

Intermediate: Splitting, Joining, Cleaning

# CSV-ish parsing
row = "alice,30,admin"
name, age, role = row.split(",")

# Re-emit out = ",".join([name, age, role])

# Clean noisy input text = " Hello World " " ".join(text.split()) # collapse whitespace → "Hello World"

split() with no argument splits on *any* whitespace and drops empties — a tidy way to clean messy text.

Intermediate: Searching & Replacing

s.find("x")     # -1 if absent
s.index("x")    # raises ValueError
s.replace("a", "b", 1)   # only first occurrence

For anything pattern-based, jump to re (regex). For simple cases, the built-ins are faster and clearer.

Intermediate: Format Spec Mini-Language

Inside f-strings, after : you have a powerful mini-language:

f"{n:>10}"      # right-align in width 10
f"{n:<10}"      # left-align
f"{n:^10}"      # center
f"{n:08.2f}"    # 0-pad, width 8, 2 decimals → "00003.14"
f"{n:+,.2f}"    # sign + thousands separators → "+1,234.56"
f"{n:.2%}"      # percent → "3.14%"
f"{n:b} {n:o} {n:x}"   # binary / octal / hex

Advanced: Unicode — Strings Are Code Points

len("café")          # 4
len("👋")            # 1 (one code point) — may be 2 in JS
ord("A"), chr(65)
"café".encode("utf-8")   # b'caf\xc3\xa9'
b'caf\xc3\xa9'.decode("utf-8")

Key distinction:

  • str = sequence of Unicode code points (text).
  • bytes = raw 8-bit values (network, files, binary).
At I/O boundaries (sockets, files, APIs) you encode/decode with an explicit codec (utf-8 is almost always right).

Advanced: Performance — Concatenation & Interning

# Slow: O(n²)
s = ""
for w in words:
    s += w

# Fast: O(n) s = "".join(words)

Small string literals are interned — reused from a cache, so "hi" is "hi" is often True. Don't rely on it; it's an implementation detail.

Advanced: Regular Expressions (re)

import re
re.findall(r"\d+", "a12 b34")     # ['12', '34']
re.sub(r"\s+", " ", text)          # collapse whitespace
m = re.match(r"(\w+)@(\w+)", email)
m.group(1), m.group(2)

Use raw strings (r"\d+") so backslashes don't double up. For *fixed* checks, prefer startswith / in (faster, clearer); reach for regex when patterns vary.

Advanced: Templating Beyond f-Strings

  • string.Template — safer for user-supplied templates (no arbitrary code execution like f-strings).
  • Jinja2 — industry-standard for HTML/email/config templates.
  • textwrap — wrap/dedent multi-line strings cleanly.

Practice Path

1. Take " Hello, World! " and produce "hello-world" using only str methods (strip, lower, replace). 2. Format a float 1234.5678 to "$1,234.57" with an f-string. 3. Replace a slow s += word loop with "".join(...); time both on 50k words. 4. Use re.findall(r"\b\w+@\w+\.\w+\b", text) to extract emails from a paragraph.