Notifications

No notifications

File Handling in Python

Python provides built-in functions to read, write, and manage files. The open() function is your gateway to file I/O, and the with statement ensures files are properly closed.

Opening Files

file = open("data.txt", "r")    # Open for reading
content = file.read()
file.close()                     # Must close manually!

File Modes

ModeDescriptionCreates?Truncates?
"r"Read (default)NoNo
"w"WriteYesYes
"a"AppendYesNo
"x"Exclusive createYes (error if exists)No
"rb"Read binaryNoNo
"w+"Read + WriteYesYes

The with Statement (Context Manager)

Always use with — it automatically closes the file, even if an exception occurs:

with open("data.txt", "r") as f:
    content = f.read()
# File is auto-closed here

Reading Methods

MethodReturns
f.read()Entire file as a string
f.read(n)Next n characters
f.readline()Next line (with \n)
f.readlines()List of all lines
for line in fIterate line-by-line (memory efficient)

Writing Files

with open("output.txt", "w") as f:
    f.write("Line 1\n")
    f.writelines(["Line 2\n", "Line 3\n"])

Working with CSV

import csv
with open("data.csv") as f:
    reader = csv.DictReader(f)
    for row in reader:
        print(row["name"], row["score"])

Working with JSON

import json
with open("config.json") as f:
    data = json.load(f)       # Parse JSON → dict

with open("output.json", "w") as f: json.dump(data, f, indent=2) # Dict → JSON file

pathlib — Modern Path Handling

from pathlib import Path
p = Path("data") / "file.txt"
text = p.read_text()
p.write_text("new content")

> Tip: Always use with for file operations. For large files, iterate line-by-line instead of reading the whole file into memory.

On this page

Detailed Theory

Real programs read and write data: config files, logs, CSVs, JSON APIs, images, models. Python's file I/O is short and safe — once you internalise with open(...) and pick the right mode, the rest is just choosing the format.

What File I/O Actually Is

A file is a stream of bytes the OS lets you read/write. Python wraps that with a friendly object: open it, read or write, close it. The with block guarantees the close call — even on errors.

The Canonical Pattern

with open("notes.txt", "r", encoding="utf-8") as f:
    content = f.read()

with open("out.txt", "w", encoding="utf-8") as f: f.write("hello\n")

Always:

1. Use with (auto-closes). 2. Specify encoding="utf-8" for text — platform defaults bite you on Windows. 3. Pick the right mode.

File Modes

ModeMeans
"r"read text (default)
"w"write text, truncates existing file
"a"append text, creates if missing
"x"exclusive create, fails if exists
"rb" / "wb" / "ab"binary versions (raw bytes)
"r+"read and write (rare; usually pick one)

Reading Text

f.read()                # whole file as one string
f.read(1024)            # first 1024 chars
f.readline()            # one line (incl. \n)
f.readlines()           # list of all lines
for line in f:          # streams line-by-line, low memory
    process(line.rstrip())

Prefer the for line in f form for big files — it doesn't load everything at once.

Writing Text

with open("log.txt", "a", encoding="utf-8") as f:
    f.write("event 1\n")
    f.writelines(["event 2\n", "event 3\n"])  # no auto-newline

write/writelines don't add newlines — you handle them.

Beginner Mistakes to Skip

1. Forgetting with. open without with leaks file handles when an exception happens. 2. Using "w" when you meant "a". "w" truncates the file the moment you open it. Use "a" to append. 3. No encoding. On Windows, default is often cp1252; you'll get UnicodeDecodeError on non-ASCII text. Always pass encoding="utf-8". 4. Reading huge files with .read(). Loads the whole thing into RAM. Iterate line-by-line. 5. Mixing text and binary. Don't open an image with "r" — use "rb" for any non-text. 6. Hard-coded paths. Use pathlib.Path and join with / so your code works on Windows, mac, and Linux.

Intermediate: pathlib — Modern Paths

from pathlib import Path

base = Path("data") file = base / "users" / "alice.json" file.parent.mkdir(parents=True, exist_ok=True)

file.write_text('{"name": "alice"}', encoding="utf-8") text = file.read_text(encoding="utf-8")

for p in base.rglob("*.json"): print(p, p.stat().st_size)

Path gives you a friendly, OS-aware API: joining (/), exists(), mkdir(parents=True), rglob, stat, with_suffix. Prefer it over os.path in new code.

Intermediate: JSON

import json

data = json.loads('{"name": "Alice", "age": 30}') # str → dict text = json.dumps(data, indent=2, ensure_ascii=False) # dict → str

with open("users.json", "r", encoding="utf-8") as f: users = json.load(f) with open("users.json", "w", encoding="utf-8") as f: json.dump(users, f, indent=2, ensure_ascii=False)

Mnemonic: s = string (loads/dumps), no s = file (load/dump).

Intermediate: CSV

import csv

with open("users.csv", newline="", encoding="utf-8") as f: for row in csv.DictReader(f): print(row["name"], row["age"])

with open("out.csv", "w", newline="", encoding="utf-8") as f: w = csv.DictWriter(f, fieldnames=["name", "age"]) w.writeheader() w.writerows([{"name": "Alice", "age": 30}])

newline="" is the standard incantation — avoids double-newlines on Windows. For anything beyond trivial CSVs, reach for pandas (pd.read_csv).

Intermediate: Binary Files

with open("image.png", "rb") as f:
    data = f.read()        # bytes

with open("copy.png", "wb") as f: f.write(data)

Binary mode never decodes — you get raw bytes. Use it for images, PDFs, executables, anything non-text.

Advanced: Context Managers

The with statement isn't magic — it just calls __enter__ and __exit__ on whatever follows. You can build your own:

from contextlib import contextmanager

@contextmanager def chdir(path): import os old = os.getcwd() os.chdir(path) try: yield finally: os.chdir(old)

with chdir("/tmp"): ...

Pattern: acquire → yield → release. Used for files, locks, transactions, temporary directories, mocks in tests.

Advanced: Atomic Writes & Tempfiles

Naive write of a config file: open config.json, start writing, crash mid-way — file is corrupted. Atomic write:

import os, tempfile, json

def atomic_write_json(path, data): dir_ = os.path.dirname(path) or "." fd, tmp = tempfile.mkstemp(dir=dir_) try: with os.fdopen(fd, "w", encoding="utf-8") as f: json.dump(data, f) os.replace(tmp, path) # atomic rename finally: if os.path.exists(tmp): os.remove(tmp)

os.replace is atomic on the same filesystem — the file either has the old contents or the new, never half-written.

Advanced: Encoding Pitfalls & BOMs

  • Always utf-8 unless someone forces another codec on you.
  • Excel sometimes writes CSVs as utf-8-sig (UTF-8 with BOM). Read with encoding="utf-8-sig" to drop the BOM cleanly.
  • For unknown text files, chardet / charset-normalizer can guess.

Advanced: Performance & Streaming

  • Iterate line-by-line for large files (for line in f:).
  • Process in fixed-size chunks for binary (while chunk := f.read(64 * 1024):).
  • For tabular data (millions of rows), use pandas or Polars — they read in compiled code.
  • For zipped files, gzip.open, zipfile, tarfile open them directly without manual extract.

Practice Path

1. Read notes.txt line-by-line and print only lines starting with #. 2. Convert a list of dicts to JSON on disk, then read it back and assert equality. 3. Walk a directory recursively with Path.rglob("*.py") and print each file's line count. 4. Implement an atomic-write helper using tempfile.mkstemp + os.replace; intentionally crash before os.replace and confirm the original file is unchanged.