CampusCrate is a student operating system for B.Tech students in India, combining opportunities, academic resources, communities, learning hubs, and career tools in one platform.

Who is CampusCrate built for?

CampusCrate is built for Indian engineering and B.Tech students who need a structured place to discover opportunities, access college resources, join societies, learn technical skills, and build their career profile.

What can students find on CampusCrate?

Students can find hackathons, internships, competitions, notes, test papers, cheatsheets, college communities, DSA and development learning tracks, roadmaps, profiles, and AI career tools.

Learn Code and Practice

Pandas — Data Analysis Powerhouse 🐼

Pandas provides two core data structures — Series (1D labeled array) and DataFrame (2D labeled table) — that make data cleaning, transformation, and analysis intuitive and fast.

Creating DataFrames

import pandas as pd
# From a dictionary
df = pd.DataFrame({
    "name": ["Alice", "Bob", "Charlie"],
    "age": [25, 30, 35],
    "salary": [70000, 80000, 90000]
})
# From a CSV file
df = pd.read_csv("data.csv")# From JSON
df = pd.read_json("data.json")

Selection Methods

Method	Syntax	Use Case
Column	`df["col"]` or `df.col`	Single column (Series)
Multiple cols	`df[["a", "b"]]`	Subset of columns
loc	`df.loc[row, col]`	Label-based selection
iloc	`df.iloc[row, col]`	Integer-position selection
Boolean	`df[df["age"] > 30]`	Filter rows by condition

Essential Operations

# Add a column
df["bonus"] = df["salary"] * 0.1
# Remove a column
df = df.drop(columns=["bonus"])
# Sort
df = df.sort_values("salary", ascending=False)# Group and aggregate
df.groupby("department")["salary"].mean()

Handling Missing Data

Method	Description
`df.isna()`	Detect NaN values
`df.dropna()`	Remove rows with NaN
`df.fillna(value)`	Replace NaN with value
`df.interpolate()`	Fill NaN by interpolation

Merging DataFrames

# SQL-style join
result = pd.merge(df1, df2, on="id", how="left")# Concatenate vertically
combined = pd.concat([df1, df2], ignore_index=True)

> Tip: Use df.describe() for a quick statistical summary. Use df.info() to inspect dtypes and memory usage.

Pandas is *the* tool for tabular data in Python — CSVs, Excel sheets, SQL exports, log files, scraped tables. If your data fits in RAM and looks like a spreadsheet, pandas is the answer. It's the workhorse of data analysis, ML data prep, and reporting.

What pandas Actually Is

Two core types:

Series — a 1-D labelled column (like a NumPy array + an index).
DataFrame — a 2-D labelled table: rows have an index, columns have names.

Under the hood: each column is (usually) a NumPy array, so column-wise math is NumPy-fast.

Creating & Loading

import pandas as pd
df = pd.DataFrame({
    "name":  ["Alice", "Bob", "Cara"],
    "age":   [30, 25, 35],
    "city":  ["NYC", "LA", "NYC"],
})df = pd.read_csv("users.csv")
df = pd.read_excel("book.xlsx", sheet_name="Sheet1")
df = pd.read_json("users.json")
df = pd.read_parquet("big.parquet")     # fast, compressed, typed

Writers mirror the readers: df.to_csv(...), to_parquet, to_json. For anything > a few hundred MB, parquet beats CSV every time.

Inspecting a DataFrame

df.head(), df.tail()
df.shape, df.columns, df.dtypes
df.info()                # types + memory
df.describe()             # numeric summary
df.isna().sum()           # nulls per column

First thing to do with any new dataset: info() + describe() + isna().sum(). Saves hours.

Selecting Data

df["age"]                  # one column → Series
df[["name", "age"]]        # multiple columns → DataFrame
df.loc[0]                   # row by label
df.iloc[0]                  # row by position
df.loc[df["age"] > 30]      # boolean mask
df.loc[df["city"] == "NYC", ["name", "age"]]

loc = label-based, iloc = integer-position-based. Mix them up and you'll get headaches; pick deliberately.

Beginner Mistakes to Skip

1. for i, row in df.iterrows(): for everything. Painfully slow. Vectorise with column ops or groupby. 2. Chained assignment like df[df.x > 0]["y"] = 1. Triggers SettingWithCopyWarning and silently does nothing. Use df.loc[df.x > 0, "y"] = 1. 3. Ignoring dtypes. A column read from CSV may be object (string) when you expect numbers. Convert with astype / pd.to_numeric / pd.to_datetime. 4. Using == to compare with NaN. x == NaN is always False. Use x.isna() / x.notna(). 5. Mixing loc and iloc mentally. Pick one per operation; never index by integer with loc. 6. Holding the whole CSV in memory "just in case". usecols=, dtype=, nrows= keep read_csv lean.

Intermediate: Cleaning

df.drop_duplicates()
df.dropna(subset=["email"])
df.fillna({"age": df["age"].median(), "city": "unknown"})
df["name"] = df["name"].str.strip().str.lower()
df["signup"] = pd.to_datetime(df["signup"], errors="coerce")
df = df.rename(columns={"old": "new"})

String methods live under .str, datetime methods under .dt. Vectorised, no loops.

Intermediate: Transforming

df["is_adult"] = df["age"] >= 18
df["name_len"] = df["name"].str.len()
df["bracket"]  = pd.cut(df["age"], bins=[0,18,40,100], labels=["kid","adult","senior"])

If you find yourself reaching for .apply(lambda row: ...), pause — there's almost always a vectorised version that's 10–100× faster.

Intermediate: GroupBy (Split-Apply-Combine)

df.groupby("city")["age"].mean()
df.groupby("city").agg(
    avg_age=("age", "mean"),
    n_users=("name", "count"),
    youngest=("age", "min"),
)df.groupby("city")["age"].transform("mean")   # broadcast back to rows

GroupBy is pandas' superpower. Almost every dashboard query is a groupby.

Intermediate: Joins (Merge)

merged = users.merge(orders, on="user_id", how="left")

how = "inner" / "left" / "right" / "outer". Same mental model as SQL JOIN. Validate with indicator=True to see which rows matched.

Advanced: Method Chaining & `pipe`

result = (
    df
    .query("age > 25")
    .assign(tax=lambda d: d["salary"] * 0.3)
    .groupby("department", as_index=False)["tax"].sum()
    .sort_values("tax", ascending=False)
)

Reads top-to-bottom like a pipeline. pipe(func) lets you slot custom functions into the chain. Resist mutating intermediate variables — chains are easier to refactor.

Advanced: Indexes & Time Series

df = df.set_index("signup").sort_index()
df.loc["2026-01"]                     # all of January
df.resample("W")["sales"].sum()       # weekly sums
df.rolling("7D")["sales"].mean()      # 7-day moving avg

A well-chosen index (especially a sorted datetime) unlocks slicing by date and resample/rolling windows that would be brutal in raw NumPy.

Advanced: Memory & Performance

Convert low-cardinality strings to category (df["city"] = df["city"].astype("category")) — huge memory + groupby wins.
Downcast numerics: pd.to_numeric(s, downcast="integer").
Read only needed columns: pd.read_csv(..., usecols=["a","b"]).
Avoid apply with Python functions; prefer column ops, groupby.agg, or NumPy ufuncs.
For multi-GB data: use Polars (lazy, columnar, extremely fast) or DuckDB (SQL on parquet); switch back to pandas only at the boundaries.

Advanced: When Not to Use pandas

Tiny scripts (< 100 rows) — a list of dicts is simpler.
Streaming / out-of-memory data — Polars, Dask, DuckDB.
Heavy SQL transforms — just run SQL.
Tensor maths / ML training — NumPy / PyTorch.

Pandas is for *labelled, in-memory, mixed-type tables*. Use it where it shines.

Practice Path

1. Load any CSV (titanic, iris, your own) and produce a mini report: info, describe, null counts, top 5 rows. 2. Compute a per-group mean using groupby(...).agg(...) and verify with a vectorised mask. 3. Convert a slow for-loop transformation into a method chain (assign + boolean masks). 4. Save a DataFrame to parquet, reload it, and confirm dtypes survive (csv would lose them).

Pandas for Data Analysis

Pandas — Data Analysis Powerhouse 🐼

Creating DataFrames

Selection Methods

Essential Operations

Handling Missing Data

Merging DataFrames

On this page

What pandas Actually Is

Creating & Loading

Inspecting a DataFrame

Selecting Data

Beginner Mistakes to Skip

Intermediate: Cleaning

Intermediate: Transforming

Intermediate: GroupBy (Split-Apply-Combine)

Intermediate: Joins (Merge)

Advanced: Method Chaining & `pipe`

Advanced: Indexes & Time Series

Advanced: Memory & Performance

Advanced: When Not to Use pandas

Practice Path

Pandas — Data Analysis Powerhouse 🐼

Creating DataFrames

Selection Methods

Essential Operations

Handling Missing Data

Merging DataFrames

On this page

What pandas Actually Is

Creating & Loading

Inspecting a DataFrame

Selecting Data

Beginner Mistakes to Skip

Intermediate: Cleaning

Intermediate: Transforming

Intermediate: GroupBy (Split-Apply-Combine)

Intermediate: Joins (Merge)

Advanced: Method Chaining & pipe

Advanced: Indexes & Time Series

Advanced: Memory & Performance

Advanced: When *Not* to Use pandas

Practice Path

Advanced: Method Chaining & `pipe`

Advanced: When Not to Use pandas