Notifications

No notifications

/Phase 2

Dictionaries & Sets

Dictionaries & Sets in Python πŸ—‚οΈ

Dictionaries store data as key-value pairs. Keys must be unique and hashable. Since Python 3.7, dicts maintain insertion order.

Creating Dictionaries

person = {"name": "Alice", "age": 30, "city": "NYC"}
empty = {}
from_pairs = dict(name="Bob", age=25)

CRUD Operations

OperationSyntaxDescription
Created["key"] = valueAdd a new key-value pair
Readd["key"] or d.get("key")Access value (get returns None if missing)
Updated["key"] = new_valueOverwrite existing value
Deletedel d["key"] or d.pop("key")Remove a pair

Useful Dict Methods

MethodReturns
d.keys()All keys (view object)
d.values()All values (view object)
d.items()All (key, value) pairs
d.get(k, default)Value for key, or default
d.update(other)Merge another dict
d.setdefault(k, v)Get or set default

Dict Comprehension

squares = {x: x**2 for x in range(5)}
# {0: 0, 1: 1, 2: 4, 3: 9, 4: 16}

Sets

Sets are unordered collections of unique elements:

colors = {"red", "green", "blue"}
nums = set([1, 2, 2, 3])  # {1, 2, 3}

Set Operations

OperationSyntaxSymbol
Uniona.union(b)a \b
Intersectiona.intersection(b)a & b
Differencea.difference(b)a - b
Symmetric Diffa.symmetric_difference(b)a ^ b

frozenset is an immutable set β€” can be used as a dict key or inside other sets.

> Tip: Use dicts when you need fast lookups by key. Use sets when you need uniqueness or set-math operations.

On this page

Detailed Theory

Dictionaries and sets are Python's *hash-based* collections. A dict maps keys to values (think: a phonebook). A set holds unique values (think: tags on a post). Both give you O(1) average-case lookup β€” the secret sauce that makes most real-world Python code fast.

What a Dict Actually Is

user = {
    "name": "Alice",
    "age": 30,
    "role": "admin",
}
user["name"]              # "Alice"
user["email"] = "a@x.com"  # add
del user["age"]            # remove
"role" in user             # True (O(1))

A dict is an unordered… actually, insertion-ordered since Python 3.7. Iterating yields keys in the order they were added.

What a Set Actually Is

tags = {"python", "web", "python"}   # β†’ {"python", "web"}
tags.add("async")
tags.discard("web")
"python" in tags                       # O(1)

No duplicates, no ordering guarantee. Empty set: set() ({} is an empty dict).

Daily Dict Methods

d.get(k, default)         # safe lookup, doesn't raise
d.setdefault(k, default)  # get or insert
d.pop(k, default)
d.update(other)            # merge in
d.keys(), d.values(), d.items()
for k, v in d.items(): ...

Daily Set Methods

a | b      # union
a & b      # intersection
a - b      # difference
a ^ b      # symmetric difference
a <= b     # subset

Great for de-duping (set(items)) and fast membership filtering.

Beginner Mistakes to Skip

1. d["missing"] β†’ KeyError. Use d.get("missing") or d.get("missing", default). 2. Using a list as a key. Lists aren't hashable β€” they can mutate. Use a tuple instead. 3. {} for an empty set. That's an empty dict. Use set(). 4. Iterating then mutating. for k in d: del d[k] raises RuntimeError. Iterate list(d.keys()) or build a new dict. 5. d.has_key(k). Removed in Python 3. Use k in d. 6. Mistaking dict order for sortedness. Insertion order β‰  sorted order. Use sorted(d.items()) if you need that.

Intermediate: Comprehensions

lengths = {name: len(name) for name in names}
active  = {k: v for k, v in users.items() if v["active"]}
uniques = {x.lower() for x in tags}

Dict and set comprehensions are as expressive as list comprehensions β€” use them whenever you're transforming or filtering.

Intermediate: Iteration Patterns

for k in d:                  # keys
for v in d.values():
for k, v in d.items():        # canonical loop

sorted_by_value = sorted(d.items(), key=lambda kv: kv[1], reverse=True)

Intermediate: collections β€” The Power Tools

from collections import defaultdict, Counter, OrderedDict, ChainMap

buckets = defaultdict(list) for word in words: buckets[word[0]].append(word) # no manual key check

freq = Counter(words) # {word: count} freq.most_common(3) # top 3

  • defaultdict(factory) β€” missing keys auto-create a value (perfect for grouping).
  • Counter β€” multiset / histogram. Has most_common, supports + and -.
  • ChainMap β€” lookup across multiple dicts (config layering).

Intermediate: Merging Dicts

merged = {a, b}            # b overrides a
merged = a | b                 # 3.9+, same idea
a.update(b)                     # in-place

Advanced: Hash Tables β€” Why It's Fast

A dict stores entries in a contiguous array. When you insert key k:

1. Compute hash(k). 2. Take the hash modulo the table size to pick a slot. 3. If the slot is busy (collision), probe to the next free slot.

Lookup repeats steps 1–3 until it finds the key (or an empty slot, meaning "absent"). Average cost: O(1). Worst case (lots of collisions): O(n), but Python's hash randomisation makes that rare.

When the table fills past ~2/3, Python resizes (doubles) and rehashes everything β€” amortised O(1) writes.

Advanced: Why Keys Must Be Hashable

A key's hash is computed *once* and used to find its slot. If a key's hash could change after insertion, lookups would silently break. Hence:

  • Hashable: int, float, str, bool, tuple (of hashables), frozenset, immutable custom classes.
  • Not hashable: list, dict, set β€” they're mutable.
For a custom class, define __hash__ and __eq__ consistently β€” @dataclass(frozen=True) does this automatically.

Advanced: Complexity Cheat-Sheet

OpDict avgSet avg
Lookup / insert / deleteO(1)O(1)
IterationO(n)O(n)
Worst caseO(n)O(n)

Lists are O(n) for membership, dicts/sets are O(1). When in doubt, prefer a set/dict for lookups.

Advanced: Real-World Patterns

  • Group by: defaultdict(list).
  • Count: Counter.
  • De-dupe while preserving order: list(dict.fromkeys(items)).
  • Cache / memoise: functools.lru_cache (built on a dict).
  • Two-way map: keep two dicts in sync (or bidict from PyPI).
  • Set intersection for filtering: approved = wanted & allowed.

Practice Path

1. Convert a list of (key, value) tuples into a dict with a comprehension, then back to a list of tuples. 2. Use Counter to find the 5 most common words in a sentence. 3. Group a list of users by their country field using defaultdict(list). 4. Deduplicate a list while preserving order using dict.fromkeys; explain why this works.