CampusCrate is a student operating system for B.Tech students in India, combining opportunities, academic resources, communities, learning hubs, and career tools in one platform.

Who is CampusCrate built for?

CampusCrate is built for Indian engineering and B.Tech students who need a structured place to discover opportunities, access college resources, join societies, learn technical skills, and build their career profile.

What can students find on CampusCrate?

Students can find hackathons, internships, competitions, notes, test papers, cheatsheets, college communities, DSA and development learning tracks, roadmaps, profiles, and AI career tools.

Learn Code and Practice

Statistics is just the language for talking about uncertain data. You will use it to describe what's typical, spot what's unusual, and decide whether a difference between two groups is real or just noise. You don't need a math degree — four or five ideas, applied carefully, cover most analyst work.

What Statistics Actually Is

Two halves you'll come back to forever:

Descriptive statistics — "summarise this dataset": mean, median, spread, distribution.
Inferential statistics — "draw a conclusion about the population from a sample": confidence intervals, hypothesis tests.

Descriptive answers *what is*; inferential answers *what's likely true beyond what we measured*.

Center: Mean vs Median vs Mode

Measure	What it tells you	Watch out for
Mean	Arithmetic average	Pulled hard by outliers
Median	Middle value when sorted	Robust to outliers
Mode	Most frequent value	Useful for categorical data

Income data: mean salary in a startup of 9 engineers + 1 founder is misleading. The median is the honest number.

Spread: Variance, Std Dev, IQR

variance σ² = average squared distance from the mean
std dev  σ = sqrt(variance)              → same units as the data
IQR        = Q3 − Q1                     → robust spread, ignores tails

Rule of thumb on a roughly normal distribution: ~68% of values within 1σ, ~95% within 2σ, ~99.7% within 3σ (the empirical rule).

The Normal Distribution (Bell Curve)

Many measurements — heights, test scores, errors — cluster near a mean and thin out at the extremes. A lot of statistical tools assume normality, so it's the first shape to recognise. Skewed data (income, page views) often needs a log transform before normal-style analysis works.

Beginner Mistakes to Skip

1. Reporting only the mean. Always pair with a spread (std dev, IQR) and a sample size. 2. Confusing correlation with causation. "Ice cream sales correlate with drownings" — both are caused by summer. 3. Tiny samples, big claims. A survey of 12 friends is not the country. 4. Cherry-picking the time window. Pick the window before looking at the result. 5. Treating p-values as truth. p < 0.05 is a convention, not a magic threshold. 6. Ignoring outliers without investigating them. They're sometimes the most interesting row in the dataset.

Intermediate: Probability Basics

Probability ranges from 0 (impossible) to 1 (certain).

P(A and B) = P(A) × P(B|A)        → multiply for both happening
P(A or B)  = P(A) + P(B) − P(A and B)
P(A|B)     = P(A and B) / P(B)   → conditional, the heart of Bayes

The most important everyday tool is conditional probability — "given that the user clicked the email, how likely are they to buy?".

Intermediate: Sampling & The Central Limit Theorem

You rarely have the whole population. You have a *sample*. The CLT says:

> Whatever shape the population has, the distribution of the *sample mean* tends toward normal as sample size grows (n ≥ ~30 is the rule of thumb).

This is why methods built on normality work even when the underlying data is messy — you're using means.

Sampling methods worth knowing: simple random, stratified (proportional buckets), systematic (every kth), cluster (whole groups). Bad sampling → bad answer, no matter how clever the math.

Intermediate: Confidence Intervals

A 95% CI says: *if we repeated this experiment 100 times, ~95 of the intervals we built would contain the true value*.

95% CI = x̄ ± 1.96 × (σ / √n)        ' z = 1.96 for 95%, 2.576 for 99%

Report intervals, not just point estimates. "Conversion lifted by 1.2% ± 0.4%" tells the reader far more than "+1.2%".

Intermediate: Hypothesis Testing in Plain English

You want to know if a change is real. Set up two competing claims:

H₀ (null) — "there is no effect".
H₁ (alt) — "there is an effect".

Collect data, compute a test statistic, look up its p-value.

p-value < 0.05 → reject H₀ (effect is unlikely to be due to chance).
p-value ≥ 0.05 → fail to reject H₀ (cannot conclude an effect).

The p-value is not "probability the null is true" — it's "probability of seeing data this extreme *if* the null were true".

Intermediate: t-Tests — Three Flavours

Test	Question
One-sample	Does this group's mean differ from a known value?
Two-sample (independent)	Do two groups (A vs B) have different means?
Paired	Did the same units change after a treatment?

Use a t-test when comparing means and you have small/unknown population variance. For proportions ("% who clicked") use a z-test for proportions.

Advanced: Type I & Type II Errors, Power

Type I (α) — false positive. Saying there's an effect when there isn't. Usually capped at 5%.
Type II (β) — false negative. Missing a real effect.
Power = 1 − β — the chance you detect a real effect. Aim for 80%.

Low power = small samples = you'll miss real wins. Always compute the required sample size before running an A/B test.

Advanced: Correlation, Regression, Causation

Pearson r — linear correlation, range −1 to +1. Sensitive to outliers.
Spearman ρ — rank-based, robust, captures monotonic relationships.
Linear regression (y = a + bx) — fits the best line, gives an interpretable slope.

A strong correlation never proves causation. To claim cause you need either an experiment (random assignment) or a causal-inference design (instrumental variables, diff-in-diff, RDD).

Advanced: A/B Testing Pitfalls

Peeking — stopping the test when p drops below 0.05 inflates false positives. Decide n in advance.
Multiple comparisons — testing 20 metrics at α=0.05 → ~one false positive expected by chance. Use Bonferroni or FDR correction.
Sample-ratio mismatch — if you split 50/50 but your data shows 47/53, randomization is broken — stop and investigate.
Novelty effect — short tests over-credit shiny new things. Run for at least one full business cycle (often a week).

Advanced: Bayesian Thinking (One Page)

Bayesian stats updates a prior belief with evidence to get a posterior:

posterior ∝ prior × likelihood

In A/B testing it answers "what's the probability B is better than A?" directly — no p-values, no awkward "fail to reject". Tools: PyMC, Stan; or just scipy.stats.beta for conversion-rate experiments.

Practice Path

1. For a real dataset, compute mean / median / std dev / IQR; plot a histogram and a boxplot. 2. Build a 95% confidence interval for the mean using scipy.stats (or Excel CONFIDENCE.NORM) and explain it in one sentence. 3. Run an independent two-sample t-test on two groups, report the p-value and effect size. 4. Calculate the sample size you'd need for an A/B test with baseline 5%, MDE 0.5%, power 80%.

Measure	Formula	What It Tells You
Mean	$\bar{x} = \frac{\sum x_i}{n}$	Average value (sensitive to outliers)
Median	Middle value when sorted	Central tendency (robust to outliers)
Mode	Most frequent value	Most common observation
Range	Max − Min	Spread of data
Variance	$\sigma^2 = \frac{\sum(x_i - \bar{x})^2}{n}$	Average squared deviation
Std Dev	$\sigma = \sqrt{\text{Variance}}$	Spread in original units

Statistics & Probability

Statistics & Probability — The Mathematical Foundation of Data Analytics

Descriptive Statistics — Summarizing Data

Probability Basics

Normal Distribution (Bell Curve)

Correlation (Pearson's r)

Hypothesis Testing

On this page