Notifications

No notifications

/Phase 3

Matplotlib & Seaborn

Matplotlib & Seaborn — Visualizing Data for Business Insights

Data visualization transforms raw numbers into actionable stories. Matplotlib is Python's foundational plotting library, while Seaborn builds on top of it with statistical graphics and beautiful defaults.

Matplotlib Core API

ComponentPurposeExample
plt.figure()Create a new figure canvasfig = plt.figure(figsize=(10,6))
plt.subplot()Add subplot to figureax = fig.add_subplot(1, 2, 1)
plt.plot()Line plotplt.plot(x, y, 'b-o')
plt.bar()Bar chartplt.bar(categories, values)
plt.scatter()Scatter plotplt.scatter(x, y, c=colors)
plt.hist()Histogramplt.hist(data, bins=20)

Seaborn — Statistical Visualization Made Easy

import seaborn as sns
sns.set_theme(style="whitegrid")

sns.barplot(data=df, x="region", y="sales", hue="category") sns.heatmap(df.corr(), annot=True, cmap="coolwarm") sns.boxplot(data=df, x="department", y="salary") sns.pairplot(df, hue="segment", diag_kind="kde")

Choosing the Right Chart

Data QuestionChart TypeWhen to Use
Trend over time?Line plotTime series, stock prices
Compare categories?Bar chartRevenue by region
Distribution?Histogram / BoxplotSalary distribution
Relationship?Scatter plotAd spend vs revenue
Composition?Pie / Stacked barMarket share breakdown
Correlation matrix?HeatmapFeature correlations

Customization Essentials

plt.title("Monthly Revenue", fontsize=16, fontweight="bold")
plt.xlabel("Month"); plt.ylabel("Revenue ($)")
plt.legend(loc="upper left"); plt.grid(True, alpha=0.3)
plt.tight_layout()  # Prevent label overlap

Always label axes, add a title, and include a legend when using multiple series — your audience should understand the chart without explanation.

On this page

Detailed Theory

Charts are how analysts *show* findings. Matplotlib is the foundational Python plotting library; Seaborn sits on top and adds statistics + nicer defaults. Together they cover 95% of static charts you'll ever need.

What Matplotlib Actually Is

Matplotlib draws a Figure (the canvas) which contains one or more Axes (each plot). Inside an Axes are Artists — lines, bars, text, ticks. Two ways to drive it:

  • pyplot (state machine): plt.plot(x, y) — quick, implicit "current axes".
  • Object-oriented (explicit): fig, ax = plt.subplots(); ax.plot(x, y) — what you should use for anything you'll re-edit.
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(8, 4))
ax.plot(months, revenue, marker='o')
ax.set_title('Monthly Revenue')
ax.set_xlabel('Month'); ax.set_ylabel('₹')
fig.tight_layout(); plt.show()

The Five Charts That Cover Most Reports

ChartUse it for
Linetrend over time
Barcompare categories
Histogramdistribution of one numeric column
Box / violindistribution across groups
Scatterrelationship between two numeric columns

Master these before pies, radar, or 3D anything.

Beginner Mistakes to Skip

1. Pie charts with 7+ slices. Humans can't compare angles. Use a sorted bar chart instead. 2. Dual-axis charts to fake correlation. Almost always misleads — separate panels are clearer. 3. Truncated y-axes that exaggerate tiny changes. Default to starting bars at zero. 4. 3D charts. Cool-looking, hard to read. Skip them. 5. Red/green only. ~8% of men are red-green colour-blind. Use viridis or cividis. 6. No title, no axis labels, no units. A chart without context is decoration.

Intermediate: Subplots & Layout

fig, axes = plt.subplots(2, 2, figsize=(10, 6), sharex=True)
axes[0, 0].plot(...)
axes[0, 1].bar(...)
axes[1, 0].hist(...)
axes[1, 1].scatter(...)
fig.suptitle('Quarterly snapshot')
fig.tight_layout()

sharex/sharey line up axes so the eye can compare across panels. tight_layout (or constrained_layout=True) prevents overlap.

Intermediate: Seaborn — Stats Built In

import seaborn as sns
sns.set_theme(style='whitegrid')

sns.barplot(data=df, x='country', y='revenue', errorbar=('ci', 95)) sns.boxplot(data=df, x='category', y='price') sns.histplot(df, x='age', hue='gender', kde=True) sns.scatterplot(data=df, x='ad_spend', y='revenue', hue='channel') sns.heatmap(df.corr(numeric_only=True), annot=True, cmap='coolwarm', center=0)

Seaborn loves long-format DataFrames (one row per observation). It auto-computes 95% CIs on bars, KDEs on histograms, regression lines on regplot.

Intermediate: Styling That Looks Professional

plt.rcParams.update({
    'figure.dpi': 110,
    'font.size': 11,
    'axes.spines.top': False,
    'axes.spines.right': False,
    'axes.grid': True,
    'grid.alpha': 0.3,
})

Drop top/right spines, soften the grid, bump DPI for crisp screenshots. Pick one font, one accent colour, stick to it across the report.

Intermediate: Annotations Tell the Story

ax.axhline(target, color='gray', linestyle='--', label='Target')
ax.annotate('Pricing change',
            xy=('2026-03', 1200), xytext=('2026-01', 1500),
            arrowprops={'arrowstyle': '->'})
ax.legend(frameon=False, loc='upper left')

A chart with a callout converting raw lines into a *story* always beats a clean unlabelled chart.

Advanced: Choosing the Right Colour Palette

  • Sequential (Blues, viridis) — ordered low→high (e.g. revenue heat).
  • Diverging (coolwarm, RdBu) — meaningful midpoint (profit vs loss, change vs baseline).
  • Categorical (Set2, tab10) — distinct unordered groups; cap at ~7 categories.
Always check with a colour-blind simulator (viridis and cividis are perceptually uniform and CB-safe).

Advanced: Performance for Large Data

  • ax.scatter(x, y, alpha=0.1, rasterized=True) — rasterise heavy point clouds; lines/text stay vector.
  • For >1M points: datashader or plotly + WebGL — they bin pixels server-side.
  • Save final figures with fig.savefig('out.png', dpi=200, bbox_inches='tight') for slides; .pdf or .svg for print.

Advanced: When to Switch to Plotly

Matplotlib/Seaborn = static reports. Reach for Plotly when you need:

  • Hover tooltips (executives love them).
  • Range sliders and zoom.
  • Embedding in Streamlit / Dash / web apps.
The DataFrame stays the same; just import plotly.express as px and call px.line(df, x=..., y=..., color=...).

Advanced: Reproducible Figure Pipelines

Professional reports build figures in functions, not notebooks of copy-pasted cells:

def plot_revenue(df, ax=None):
    ax = ax or plt.gca()
    ax.plot(df['month'], df['revenue'], marker='o')
    ax.set_title('Revenue')
    return ax

Now the same function works in notebooks, dashboards, and CI screenshot tests.

Practice Path

1. Take any CSV, plot one of each: line over time, bar by category, histogram, boxplot, scatter — all with titles, labels, units. 2. Build a 2×2 subplot "snapshot" page with shared x-axis and tight_layout. 3. Replicate one of your matplotlib charts in Seaborn using long-format data. 4. Add an annotation + horizontal target line that turns the chart into a one-glance story.