Breaking Down Race Conditions

Concurrent programming has become increasingly prevalent in modern software development, enabling applications to perform multiple tasks simultaneously and efficiently utilize available computing resources. However, with concurrency comes the potential for subtle bugs known as race conditions.

What is race condition?

A race condition occurs when multiple components or threads of a program concurrently attempt to access and modify shared data, leading to unpredictable and unintended outcomes. This situation can arise due to the non-deterministic nature of thread execution, where the order and timing of operations are not guaranteed.

When multiple threads access shared data simultaneously, they can interfere with each other’s operations, causing data corruption, inconsistent states, or even system crashes. The result is often unexpected and difficult to reproduce, making race conditions challenging to detect and debug. While the concept of a race condition may seem harmless, the consequences can be severe and disastrous.

Process and thread

Before going further, we need to first understand the concepts of processes and threads.

Processes : A process is like a container that provides all the necessary resources for executing a program. When a process is started, it begins with at least one thread called the primary thread. However, it can create additional threads from any of its existing threads, allowing for concurrent execution of tasks within the process.

Threads : A thread is an individual unit of execution within a process. Think of it as a small worker within the larger process. All threads within a process share the same memory space and system resources.

Differences between processes and threads

When it comes to processes and threads, there’s an important difference to understand. In modern computer operating systems, processes have their own allocated memory segments, which prevent one process from accessing another process’s data. This protection ensures that processes don’t overwrite each other’s information.

However, threads work differently. Threads are created within a process and share the same memory space. Memory is divided into chunks called “pages,” typically around 4 kilobytes in size. Suppose we have two threads with different variables, but these variables happen to reside within the same page of memory.

The problem arises when both threads read and modify the same variables. Each thread has to read the entire page, even though they only need a few bytes. If one thread writes to the page after the other, it overwrites everything, including the changes made by the other thread. This creates a problem of data inconsistency.

To solve this, modern operating systems have a concept called a “mutual exclusion lock” or mutex for short. When a thread accesses a page for modification, it checks if the mutex bit on the page is set. If it’s set, indicating that another thread is modifying the page, the accessing thread has to wait until the mutex is cleared. This ensures that only one thread can modify the page at a time, preventing conflicts and data corruption.

In multi-processing, where different processes run independently, this issue doesn’t arise since they have separate memory segments. However, in multi- threading, where threads share memory, this protection is necessary to prevent threads from interfering with each other’s data.

The purpose of multi-threading is to improve efficiency. CPUs can work very quickly when the data they need is already in their registers. But if the data is in a different location, such as RAM or storage, it takes a relatively long time for the CPU to fetch it. During this waiting time, the CPU can be utilized for other tasks. Multi-threading allows programs to perform useful work while waiting for slower operations like memory or storage access to complete.

It’s important to note that a single processor core capable of multi- threading, also known as hyper-threading, cannot execute both threads simultaneously on the same core. They can only execute at the same time if scheduled on different cores, where each core handles a separate thread independently.

How race conditions happened?

In programming, there is a term called concurrency which refers to the ability of a program to execute multiple tasks in overlapping intervals. It is a fundamental concept in modern software development, enabling improved performance and responsiveness by allowing different parts of a program to run concurrently.

When multiple threads or processes are executing concurrently, they may interact and access shared resources, such as variables or data structures, simultaneously. This concurrent access introduces the possibility of race conditions.

A race condition occurs when the behavior of a program depends on the relative timing or interleaving of concurrent operations. It arises when two or more threads or processes attempt to access and modify shared data concurrently, without proper synchronization or coordination.

The problem with race conditions is that the outcome of such concurrent access becomes unpredictable and depends on the particular scheduling and timing of the threads or processes involved. This unpredictability can lead to incorrect results, data corruption, and even system failures.

In summary, both threads and processes can be involved in race conditions when accessing shared resources. However, threads are more prone to race conditions because they share the same memory space, while processes have their own isolated memory space but can still experience race conditions if they interact with shared resources through inter-process communication.

For example, if two threads try to update a shared variable simultaneously without proper synchronization, the final value of the variable may become unpredictable and depend on the execution order of the threads. On the other hand, If multiple processes try to access or modify shared resources concurrently without proper synchronization race conditions can still occur.

“Time-of-check” to “time-of-use” attack

A “time of check to time of use” (TOCTOU) attack is a type of security vulnerability that occurs when there is a change in the state of a resource between the time it is checked and the time it is used. The vulnerability arises from a race condition, where the attacker takes advantage of the gap between two operations to manipulate the system.

Imagine a scenario like this. Suppose there is a banking application that allows users to transfer funds from one account to another. The application checks if the user has sufficient funds before processing the transaction.

In normal circumstances, the application will follow these steps.
Application checks the user’s account balance to ensure the user has enough funds.
If it is sufficient, the amount requested to be transferred by the user will be deducted from the account balance.
After the transaction is completed, the user’s account balance will be updated.

In the database it will look like this.
User’s account balance is selected with SELECT to validate if the amount on the user’s account is sufficient.
If sufficient, the balance will be deducted with the transferred amount.
If the transaction is completed, existing data will be updated with UPDATE and SET.

It seems like nothing is wrong with the process and yes, there is really nothing wrong with it. However, we don’t consider the concurrency mechanism, where users (or attacker) might be able to manipulate the above process.

For example, an attacker has $1000 in his account balance and wants to exploit the race condition in his bank application. He/she wants to transfer not only one, but three transactions simultaneously: $500, $700, and $800. In the database, when the account balance is selected, each process will have $1000. Because the process is running at the same time, each $1000 will be deducted to the amount of the transferred money. So, the result will be $500, $300, and $200. When updated, we will not know which one will “win the race” and be taken as the end value. In spite of only a single value being used, all three processes get executed which is the main problem caused by the condition.

By exploiting this race condition, the attacker manages to manipulate the system to their advantage. TOCTOU attacks are particularly dangerous in scenarios involving critical resources, such as financial transactions or access control systems. Instead, the attacker action is done within the same time and before the balance update.

Conclusion

By understanding the complexities and risks associated with race conditions, developers can proactively design and implement concurrent systems with appropriate synchronization strategies. Later on, we are going to learn how to prevent this race condition and possibly learn how to exploit this condition as well.