Notifications

No notifications

/Phase 2

Strings in C

Words Are Just Arrays of Characters šŸ”¤

C has no built-in string type. A "string" in C is simply a char array that ends with a special null terminator '\0'.

char name[6] = {'R', 'a', 'h', 'u', 'l', '\0'};
char name2[] = "Rahul";   // same thing — compiler adds '\0' for you

Memory layout:

[ 'R' ][ 'a' ][ 'h' ][ 'u' ][ 'l' ][ '\0' ]
  0     1      2      3      4      5

The '\0' is not the digit '0' (which is 48). It's the byte with value 0 — it tells every C function "the string ends here".

Three Ways to Make a String

char a[] = "Hello";          // size auto = 6 (5 chars + '\0')
char b[20] = "Hi";           // 20 bytes, only first 3 used
char *c = "World";           // pointer to read-only string literal

> āš ļø Strings created with char *c = "..." are read-only. Modifying c[0] = 'X'; is undefined behaviour (often a crash).

Reading a String

char buf[100];
scanf("%s", buf);            // āš ļø unsafe — can overflow
scanf("%99s", buf);          // āœ… caps input length
fgets(buf, sizeof(buf), stdin);  // āœ… best — reads whole line including spaces

Common String Functions ()

FunctionPurpose
strlen(s)Length (excludes '\0')
strcpy(dst, src)Copy string
strcat(dst, src)Append src to dst
strcmp(a, b)Compare; 0 = equal, <0 = a0 = a>b
strchr(s, c)Find first occurrence of char c
strstr(haystack, needle)Find substring

On this page

Detailed Theory

Strings in C are deceptively simple — and that simplicity is the source of an entire category of security bugs that have plagued software for 50 years.

Why the Null Terminator?

Look at strlen:

size_t strlen(const char *s) {
    size_t n = 0;
    while (s[n] != '\0') n++;   // count bytes until we hit '\0'
    return n;
}

Without '\0', strlen would have no way to know where the string ends. Every standard C string function relies on this terminator.

If you forget to put '\0' at the end of a char array, strlen will keep walking past your array — possibly reading hundreds of bytes that aren't yours, until it accidentally finds a zero byte. That's the famous buffer over-read bug (Heartbleed was exactly this).

"Hello" Is Actually 6 Bytes

char s[] = "Hello";
printf("%zu\n", sizeof(s));    // 6 — 'H' 'e' 'l' 'l' 'o' '\0'
printf("%zu\n", strlen(s));    // 5 — strlen does NOT count '\0'

Always remember: a string of N characters needs N + 1 bytes of storage.

char* vs char[] — Subtle but Critical

char *p = "Hello";       // p points to a string literal in READ-ONLY memory
char  q[] = "Hello";     // q is a writable copy on the stack

p[0] = 'X'; // āŒ undefined behaviour — usually crashes q[0] = 'X'; // āœ… fine — q is now "Xello"

char *p = "Hi"char q[] = "Hi"
Where storedRead-only data segmentStack (writable)
Modifiable?āŒ Noāœ… Yes
Can be reassigned?āœ… p = "Bye";āŒ Array name is constant

strcpy Is a Footgun

char small[5];
strcpy(small, "Hello, World");   // āŒ writes 13 bytes into 5 — buffer overflow!

strcpy doesn't check the destination size. It will happily write past the end, corrupting whatever comes after — possibly your return address, which is how attackers used to take over programs.

Safer alternatives:

char dst[10];
snprintf(dst, sizeof(dst), "%s", src);   // āœ… truncates safely
strncpy(dst, src, sizeof(dst) - 1);      // āš ļø doesn't always null-terminate
dst[sizeof(dst) - 1] = '\0';             //    so add this

Modern advice: prefer snprintf over strcpy and strcat for any production code.

scanf("%s") Is Also a Footgun

char name[10];
scanf("%s", name);   // āŒ if user types 50 chars, BOOM

Use a width specifier or fgets:

scanf("%9s", name);                       // āœ… reads at most 9 chars
fgets(name, sizeof(name), stdin);         // āœ… reads a full line

fgets includes the trailing '\n' if the line fits — strip it:

size_t len = strlen(name);
if (len > 0 && name[len-1] == '\n') name[len-1] = '\0';

strcmp — Not What You Expect

strcmp("apple", "banana")   // negative (apple < banana alphabetically)
strcmp("hi",    "hi")       // 0
strcmp("zoo",   "ant")      // positive

Returns the difference of the first non-matching characters. Never use == to compare strings — that compares pointers:

if (a == b)         // āŒ compares ADDRESSES
if (strcmp(a, b) == 0)   // āœ… compares CONTENT

String Concatenation Costs

char buf[100] = "Hello";
strcat(buf, ", ");
strcat(buf, "World");
strcat(buf, "!");

Each strcat walks the string from the start to find '\0' — so building a long string with many strcats is O(n²). For heavy concatenation, track a position pointer or use snprintf once.

Common Operations Cheat-Sheet

char s[] = "Hello, World";

strlen(s); // 12 strchr(s, 'W'); // pointer to "World" strstr(s, "World"); // pointer to "World" strcmp(s, "Hello"); // > 0 char copy[20]; strcpy(copy, s);

/* Convert case manually */ for (int i = 0; s[i]; i++) { if (s[i] >= 'a' && s[i] <= 'z') s[i] -= 32; // → uppercase }

/* Reverse in place */ int n = strlen(s); for (int i = 0, j = n - 1; i < j; i++, j--) { char t = s[i]; s[i] = s[j]; s[j] = t; }

Bug Summary

MistakeResult
Forgot '\0'Functions read past your array
Wrote to char * literalCrash or undefined
Used strcpy to small bufferBuffer overflow
Compared with ==Compared addresses, not text
Used scanf("%s") without widthStack smash if input too long

Strings are the place where C beginners learn (the hard way) why memory safety matters.