Linked Lists in C: An Introduction

Published in

Star Gazers

19 min readMar 7, 2021

Disclaimer: I’m just a mere mortal (Web Development Student). Please don’t take anything in here for granted. If anyone with more experience in this topic has any suggestions for improving this piece, I’d love to hear from you!

Overview of C as a programming language

If you’re a beginner coming from another higher level programming language such as Python or JavaScript, you may be wondering what the heck C is. Simply put, C is a widely used general purpose programming language developed in the early 1970s by American computer scientist Dennis M. Ritchie at Bell Laboratories.

Unlike a language such as JavaScript, C isn’t really a “High level” programming language with lots of abstractions and things happening under the hood for you.

What does “High level” and “Low level” mean in the context of programming languages though?

Lower-Level languages:

Provides little to no abstraction.
These languages are more readable by machines, and are further.
All memory management has to be conducted by the programmer, the language does not do this automatically for you.
An example of ‘true’ low level languages are Assembly and Machine Code.

High-Level languages:

Feature abstraction. This makes them much more readable for humans and overall, much easier to use as a result.
All memory management is performed by the language automatically, generally the programmer does not have to worry about this, as this paradigm is abstracted away.
JavaScript and Python are examples of higher level languages.

It’s debated if C is considered a ‘true’ low level language these days, as it sits in a bit of a grey area. C clearly features some abstractions over a language such as Assembly. For example, C allows you to manually manipulate memory in your computer, which is generally understood to be a feature of lower level languages, however it still features some abstractions.

A good example of a potential abstraction in C is that you have the ability to import libraries such as <string.h> In C, strings technically don’t exist as a datatype, as a string is technically an array of characters. When imported, the <string.h> library abstracts away this low level detail of working with strings in C, allowing you to use strings in C more like you would in another higher level language such as JavaScript.

It’s important to understand these distinctions between higher and lower level programming languages before we start thinking about how linked lists work in C.

How C interacts with a computers hardware & memory

As previously mentioned, in order to understand how linked lists work in a language such as C, one must first understand how C interacts with your computer’s hardware, and more specifically, how it interacts with the memory in your PC.

Wait hang on Jacob! What the heck do you mean by “Hardware”? Essentially a computer’s hardware can be broken down into the following, all of which a language like C has access to manipulate and perform actions on:

CPU: This is the ‘brain’ of your PC. It’s responsible for processing all information from programs run by your computer.
HDD/SSD: This is essentially a storage device responsible for storing permanent data. This data can come in many different forms, but is essentially anything saved or installed on your computer: for example, computer programs, family photos or your operating system
RAM: This is the important one! Random Access Memory, or RAM, is hardware in your computer whose job is to temporarily store on-the-fly information created by programs and to do so in a way that makes this data immediately accessible. Unlike a HDD or SSD, RAM is fast-access but always temporary!

Ram in your computer! Source: Lecture 4 — CS50x

There’s more hardware in a real computer than just these 3 components, but for the purpose of this article I have limited this to the 3 above options to keep this within the scope what's ultimately going on with linked list in C.

I highlighted RAM being important in context to C. The little black chips inside of the picture above is where information is stored when we are running software, such as a C program we have written.

We can think of these black chips as being divided into individual bytes. While obviously not drawn to scale, we can arbitrarily claim that each square in this above diagram is an individual byte in your computer’s memory.

Let’s say we are writing a C program. Inside of our program:

This will essentially reserve 1 byte of memory in your C program to store a character we assign to a variable, inside of our computers memory.

Other datatypes in C take up the following amount of space in your computers memory:

char: 1 byte
int: 2 bytes
float: 4 bytes
double: 8 bytes

The main takeaway here is to remember that in C, all of your variables which contain your data types are stored in your computers ram like in the pictures above, this is important to remember for the next section as we take a deeper dive into how pointers work in C.

Pointers in C

Now that we understand how C can interact with your computers hardware and more specifically, how it can store data in RAM at your programs runtime, we can now start to dive into what pointers in a language like C are and how they work.

Pointers can be tough concept for beginners to wrap their head around. The one thing to keep in mind with pointers in a language in C is that a pointer is a variable in your computers memory, whose value is the address of another variable. I cannot stress this point enough and just to be clear.

A pointer is a variable whose value is the address of another variable.

Okay, so what do I actually mean by this? Say we have decided to create an integer into our computers memory:

When we compile and run this program using the make command from CS50's library, we would expect to see the following output:

~/ $ make int
clang -ggdb3 -O0 -std=c11 -Wall -Werror -Wextra -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wshadow int.c -lcrypt -lcs50 -lm -o int
~/ $ ./int
50

Cool! If we go back to out picture, we have also essentially completed the following in our computers memory in this program on runtime:

Remember that RAM is only temporary memory, so as soon as our program is finished running, these 4 bytes in RAM are relinquished.

Okay awesome. So we understand this integer we created in our computers memory in our C application at runtime has reserved 4 bytes in RAM, we printed out the integer stored inside of this variable, then terminated the program, thus freeing up our memory. Lets take a look at the following code example below however to walk you through:

When we compile and run this C program, we should see the following output in our terminal:

~/ $ make int
clang -ggdb3 -O0 -std=c11 -Wall -Werror -Wextra -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wshadow int.c -lcrypt -lcs50 -lm -o int
~/ $ ./int
0x7ffce9ba2728
~/ $

“Whoa whoa whoa, what the heck is 0x7ffce9ba2728? That’s completely cryptic!”. This is a pretty fair judgement to make the first time you come across something like this. But let’s try break it down into something which hopefully makes a bit more sense.

Before I dive into how this works I want to reiterate because I cannot stress this enough.

A pointer is a variable whose value is the address of another variable.

For the the following code:

int n = 50;

It would have reserved 4 bytes in our computers RAM to store our integer variable like so:

int *p = &n;

Here, when we say int *p, we are simply declaring a pointer variable which will exist somewhere in our computers memory:

int *p = &n;

Now you may be asking what are we assigning to this pointer variable? Well by assigning &n to our pointer variable int *p, we are essentially telling C to allocate the physical memory address of int n as the value of our int *p pointer variable.

Our pointer variable “pointing” to the memory address of where our integer is located.

Finally, when we said the following:

printf(“%p\n”, p);

We were telling C to print the memory address of where we initially set our pointer variable to point to. Which in our case, was our int n variable in our computers memory.

Notice that there’s a pretty clear distinction here. We told C to print the memory location our int n variable it sitting at in our computers memory and not the value of what's inside the variable at that memory location. Printing a plain old memory address seems a bit cryptic and pointless right? All we generally care about are the vales stored inside of our variables. So how would we go about printing out the value of what's inside of our int n variable instead of the memory address?

Instead of saying:

printf(“%p\n”, p);

We can do the following instead:

printf(“%d\n”, *p);

By modifying the script with this simple change:

We should now be able to see the following in our terminal:

~/ $ make int
clang -ggdb3 -O0 -std=c11 -Wall -Werror -Wextra -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wshadow int.c -lcrypt -lcs50 -lm -o int
~/ $ ./int
50

The technical term for this idea of accessing a value inside of a variable, which we have a pointer in our computers memory pointing to is called dereferencing.

A good thing to also remember is that when we use a pointer to dereference another variable, we don’t just have the ability to simply access data from another variable somewhere in memory, but we can also manipulate the values of these variables too.

For a more detailed walkthrough of how this dereferencing works, I highly recommend checking out this video from Stanford:

It might feel like we’re kind of just jumping through hoops at the moment not really accomplishing anything yet. However I promise that this pattern of using pointers will become more relevant later on when we start to discuss how to implement linked lists in C, and why they must be implemented by using pointers.

Arrays in C cannot be resized

If you’re coming from the word of JavaScript, you might be shocked to discover that arrays in C are completely static.

When you create an array in C, you have to allocate a block of contiguous memory for the array to exist inside of our computers memory. The array is a simple block of memory. The size of this array depends on how many items we are storing in the array and the datatypes of those items.

For example, if we declared an array of 4 integers and tried to print out the values at each index like so:

We would expect to see the following output in our terminal:

~/ $ make array
clang -ggdb3 -O0 -std=c11 -Wall -Werror -Wextra -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wshadow array.c -lcrypt -lcs50 -lm -o array
~/ $ ./array
19 10 8

This array of 3 integers would occupy our computers memory something like this:

Remember the array of integers will take up 12 bytes inside of our computers RAM, due to our scores array containing 3 integers, all of which are 4 bytes each in size.

Arrays like this in C are great if you’re dealing with a situation where you know the size of the array containing your data won’t need to change. They’re also great for accessing things really fast (accessing an element in an array happens in constant time or O(1) . But what happens if we run into a scenario where we need to dynamically increase or decrease the size of the array that we are working with? JavaScript offers you the ability to do so on arrays by using methods such as Array.prototype.push() and Array.prototype.pop() but this is impossible to do in a language like C.

In order to get around these ground rules in a lower level language such as C, we’re going to have to implement a data structure such as a linked list to get around this. Linked lists have the nifty ability which allows them to grow and shrink on demand.

Inserting a new element into arrays in C is expensive and difficult

Something else that is worth remembering is that in a language such as C, inserting a new element into an array of elements is expensive because:

More room in the existing array must be created for a new element we wish to add.
On top of this, if we’re inserting a new element into an existing array say right in the middle, we are forced to shift around all of our existing elements in the array to new indexes. Which is slow!

For example, say that we have a sorted array of integers like so in our C program:

int scores[3] = {1, 3, 4};

Currently the integers in our array are located at the following indexes:

1: index[0]
3: index[1]
4: index[2]

If we wanted to insert the integer 2 into our array, while also maintaining our sorted order, we’d have to move both of our numbers inside index [1] and index [2]. Not only is this expensive to do but can also difficult to pull off. You may have already guessed, but we will later use a linked list to improve this ease of insertion and deletion.

Dynamic Memory allocation in C

Okay, so we currently understand:

How pointers work in C.
C as a structured language. Because of this, Arrays in C are fixed in size. Once the size of an array is declared, you cannot change it.
We want to implement a linked list to get around this problem above.

In order to implement a linked list in C, we first need to understand how memory allocation in C works.

It’s important to understand how memory management works in a language like C, and potential traps to avoid with it. These traps can potentially yield with disastrous effects like crashing you computer if you’re not careful.

To allocate memory dynamically, we can utilize library functions such as malloc(), calloc(), realloc() and free(). These functions are defined in the <stdlib.h> header file.

So we’ve already seen one way to use pointers where we:

Declare a variable somewhere in memory in our computer.
We declare a second pointer variable which points to our first variable.

This does have its flaws though. It requires us to Know exactly how much memory we are going to need the moment that our C program is compiled and we need to be able to name or identify all of the variables we might encounter in our program.

What if our program will run for a really long time, accepting numerous amounts of user data dynamically and we can’t estimate how many units of memory in our computer we’ll need in order to anticipate and store this in our program at runtime.

Dynamic memory allocation allows us to get around this problem. It achieves this by once again using pointers! In this context pointers will allow us to:

Dynamically allocate memory to in our program at runtime as it is needed.
We need to remember that when we are dynamically allocating memory in our program, it lives in a pool of memory known as the heap.
Also worth remembering is that any variables that we have declared in our program with a name generally lives in a pool of memory known as the stack.

For a more detailed walkthrough of how this dynamic memory allocation works, I highly recommend checking out this video from Stanford

Alright enough theory. Lets actually look at an examples in C of how to statically obtain an integer in vs dynamically obtain an integer in C as mentioned in the video above.

// statically obtain an integer
int x;// dynamically obtain an integer
int *px = malloc(sizeof(int));

In the first example, we are simply declaring a variable called x in our computers memory statically and throwing it onto the stack.

In the second example, we are going to dynamically obtain 4 bytes of memory by calling sizeof(int) from the heap, we then call malloc() to have our program allocate this memory and assign a pointer to it called *px

Just like before with our pointers, we can deference with our int *px pointer to access that memory we just allocated dynamically.

There’s some potential problems with using dynamically allocated memory however:

Dynamically-allocated memory is not automatically returned to the system to be used later when the function who created said memory has completed execution.
If you’re not careful, forgetting to return memory back to the system when you’re finished with it can trigger a memory leak in your system. This can affect your systems performance or in the worst case scenario, cause your entire operating system to crash.
Always remember that when you are finished working with some sort of dynamically-allocated memory you must free() it!

Here’s a classic example of what not to do when working with dynamic memory allocation:

In this above example, we are accidently creating an infinite loop where on each passthrough of the loop, we are allocating 4 bytes of memory to new variable without freeing it. A pattern such as this will eventually cause our entire operating system to crash, as we will continue to dynamically allocate memory forever until we run out:

If you weren’t already, congratulations! You’re now in the exclusive club of people who can also appreciate the following joke:

Okay. So to fix this problem we’ll probably want to do something more like this:

Here we are:

Declaring an array of 3 pointer variables, which will anticipate to eventually be pointed to something of memory with a datatype of int.
Looping through to a set number we know to avoid an infinite loop.
On each passthrough of the first loop, we are dynamically allocating enough memory in our computer to store an integer datatype in it and then we are pointing each pointer being looped through in our array to each corresponding “block” of memory .
We are then using the free() method to responsibly do our garbage collection, ensuring that we are not polluting our computers memory when we are done.

Now that we understand how pointers and dynamic memory allocation works in C, lets bring it all together and implement our first simple linked list!

Implementing a simple Linked List in C

Okay. So how do we actually go about implementing a linked list in C? Like an array, a linked list is a way to store a collection of elements. Also like an array, these could be characters, integers, whatever. Much like the tree data structure, every element in a linked list is stored in the form of a node.

A node is represented like this:

Each node is comprised of two sub-elements or parts:

The data part: Stores the relevant data inside of an element.
The next part: Stores the pointer which will “link” to the next node in our linked list.

When a completed linked list is formed:

Every node points to the next node in sequential order.
The first node in the linked list is used as a reference for traversing the linked list. This is called the head. The head is a reference to the first node in the linked list.
The last node in the linked list, sometimes referred to as the tail always points to null, indicating the end of the linked list has been reached.
If a list is empty, the head is a null reference.

Okay lets actually try to implement this idea into code now. In order to do this, we’ll need to implement this using structures and pointers. Our final simple implementation of a linked list in C will look something like this:

This script will output the following linked list into your terminal:

~/ $ make linkedlist
clang -ggdb3 -O0 -std=c11 -Wall -Werror -Wextra -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wshadow linkedlist2.c -lcrypt -lcs50 -lm -o linkedlist2
~/ $ ./linkedlist2
Numbers in linked list:
2
4
5

Whoa, whoa, whoa. This is a lot to take in! Don’t worry. I’m going to break it down step by step.

First up, we are declaring our own custom struct for each node in our linked list:

typedef struct node
{
    int number;
    struct node *next;
}
node;

By doing this, we are creating a struct which will build the following representation of a single node in our linked list in code:

If you’re not familiar with structs in C, I recommend reading up on them from the following:

C struct

In this tutorial, you'll learn about struct types in C Programming. You will learn to define and use structures with…

www.programiz.com

Now that we have figured out how to implement a single node in our linked list, we can now think about how to fill each node with our desired data and then ‘link’ our nodes together.

We first end up calling the main() function inside of our C program to achieve this. Inside of this function:

We’re essentially going to use 3 pointers in this function to implement this linked list:

node *list is a pointer that will keep track of the HEAD of our linked list
node *tmp is a pointer that will keep track of the current node being traversed in the linked list as we build it
node *n is a pointer which will point to the struct in our computers memory to a new node struct every time we need it.

node *list = NULL;

We create a pointer variable using the node datatype we previously created in our struct and also initially set the value it is pointing to to be NULL. This indicates that it is currently pointing to nothing in memory. This will also act as the head of our linked list.

node *n = malloc(sizeof(node));

Then we call malloc(), reserving 4 bytes in memory into our computers RAM while also assigning a pointer called to it called n (for node). This pointer expects to be pointed somewhere in our computers memory where the datatype of what its pointing to is of our custom node datatype.

if (n != NULL)
{        
  n->number = 2;        
  n->next = NULL;    
}

We then perform a safety check to ensure our n pointer is not pointing to a NULL value in memory, avoiding a potential segmentation fault.

If we’re safe, we can then deference the node pointer we declared before. In our computers memory, we set the following properties of the struct our pointer is pointing to the following:

n->number: 2
n->next: null

list = n;

We then tell our list pointer to point to the the same area in memory our node *n pointer has been pointing to, the first node in our linked list.

We also set a *tmp pointer to point to the same node our list pointer has just been set to point to.

 n = malloc(sizeof(node)); if (n != NULL)
 {
   n->number = 4;
   n->next = NULL;
 }

We call malloc() again, reserving 4 more bytes in memory into our computers RAM while also re-assigning our n pointer to the new node in memory.

You may have noticed by this point, our new node is completely orphaned. Our tmp pointers->next property hasn’t been updated to anything yet.

 while (tmp->next != NULL)
 {
   tmp = tmp->next;
 }

Here we need to implement a while loop, which will wait until the tmp pointers -> property is not a null value. When this condition evaluates to true, we then set the temp pointers next property to point to the new node we just added into our linked list

We pretty much repeat this process for every new node that we want to hardcode into our linked list.

tmp->next = n;
print_list(list);
return 0;

Once we reach the end of the nodes we wish to create in our linked list, our temp pointer will be sitting on node 4. As a last step we need to set this temp pointers -> property to point to the final node our n pointer has allocated in our computers memory, completing.

So far I haven’t mentioned how to dynamically insert, delete and free() nodes from a linked list after you have created it. As you can tell, this is quite a deep topic and will require a part two to dive further into these concepts.

However if you happen to be impatient and cannot wait. In the meantime feel free to check out an implementation of a linked list which is able to correctly handle the creation, insertion and deletion of nodes in a linked list more dynamically.

jallen2034/linked-lists-in-c

Implemented a simple command line program in C to add and delete nodes in a linked list…

github.com

This implementation is even more complex than the one above so do not feel discouraged if its confusing at first. I’ll have to go over it in more detail in part two!

If you’re also interested in learning more about this topic and much more, I’d also recommend visiting the following references from Harvard’s CS50:

https://cs50.harvard.edu/x/2021/

CS50 is a fantastic and completely free Massive Open Online Course for any beginners seeking to improve their CS skills. I can’t recommend it enough.

This article was also written using resources from the following:

Happy coding!