a close up of a computer screen with numbers
Sat Apr 29

Identifying and Understanding Insecure Deserialization Vulnerability

Serialization and deserialization are crucial concepts in any programming language that enable the transfer of complex data structures between systems. Simply put, serialization involves converting an object into a format that can be easily stored or transmitted, while deserialization reverses this process by reconstructing the object from the serialized data.

The significance of the deserialization process is frequently disregarded as a potential security vulnerability, despite its critical role in the transfer of complex data structures between systems. Insecure deserialization can leave applications open to attacks, including remote code execution, which can have serious consequences for the system and its users

Understanding the serialization and deserialization concept

An easy to grasp example for this case is a save feature in a game. In a game, players are often allowed to save the current state. While a program (game) is running, the data runs over the Random Access Memory and the save feature is basically converting this data into a binary that can be stored in a hard disk. The reason for this is once your program/game stops running, every chunk of data stored in Random Access Memory is gone. By writing it on the hard disk this state is preserved, and if you want to get into where you were going recently, this state can be restored so you don’t have to start all over again.

Serialization is the process of converting objects into sequential byte streams. The object can be very complex and the serialization process is making the stored data much more simpler. In the previous example, we use serialization when trying to write the data in a hard disk. With serialization, the objects are preserved along with its attribute and its assigned value, making the current state persisted.

Deserialization is a reversal of this process. Instead of storing the objects, the byte streams are being stored to get fully functional objects. While often associated with game software, it has widespread application across all programming languages used in website development, particularly in the front- end. This process is critical to the transfer of complex data structures between systems and is a fundamental aspect of web development.

Serialization in many languages

Several programming languages such as Ruby, PHP, Java, and Python, have native support for serialization. Though, it can be quite complex depending on the language. Different languages may have different terms of referring to serialization. For example, Ruby may use the term “marshaling” while Python has “pickling” which basically refers to the same thing, serialization. This is due to the fact that Ruby and Go utilized the package or library called “Marshal” to handle this process while python uses the “pickle” module.

What is insecure deserialization?

Some websites may implement serialization on user’s data. The problem may occur if the attacker is able to manipulate this data and get deserialized by the website or the system.
In certain scenarios, the deserialized data can result in further privilege escalation, potentially even leading to complete system takeover.

Insecure deserialization occurs when there is a vulnerability for malicious actors to manipulate the object when the data is deserialized by the website. Insecure deserialization happens when the website creates objects from data that it should not trust. This can occur with any accessible class, regardless of what class was originally expected. That’s why insecure deserialization is also referred to as an “object injection” vulnerability.

Why do we need serialization?

Serialization is an important concept in software development for various reasons. Firstly, it enables persistence, which allows the object state to be stored in a database, memory or file for retrieval later. This is particularly useful when working with large and complex objects that are needed to be kept for future use.

Another benefit of serialization is replication, which involves obtaining a copy of objects by converting them into byte streams. This can be useful for tasks such as creating backups or transferring data between systems.

Serialization also plays a critical role in communication between different systems. By serializing objects, they can be easily transferred or sent over a network, allowing for seamless data exchange and sharing.

Finally, serialization can also be used for caching purposes. Building up an object can be a time-consuming process, but by serializing the object, the time required can be significantly reduced. This is particularly useful for frequently accessed objects where caching can improve performance.

How does insecure deserialization occur?

Despite its benefits, sometimes the web developer is not aware or simply lacks understanding of how detrimental this process is if not correctly implemented. The problem arises when the developer is trying to let the user’s input to be deserialized, while ideally, it is not supposed to be deserialized at all.

They might think that the validation and several checks are sufficient to sanitize the input. Unfortunately, all of this approach is often known to be ineffective due to the fact that it is virtually impossible to implement. Even worse, the checks may happen right after the deserialization has been executed, which is too late. It is a common misconception to assume that deserialization is always secure and reliable. Even in a form of binary, it is still possible for the attacker to exploit.

Another problem may come from how common it is to see numbers of dependencies involved in the modern website. These dependencies work as a building block to make a fully functional website. For example, many websites which are PHP or Java-based depend on some libraries or frameworks that have known vulnerabilities (which we will discuss in another article).

Due to the fact that most modern websites rely heavily on these libraries, it is difficult to ensure that they are safe and secure. Even in a single website, it is possible to have hundreds or even thousands of dependencies. The attackers may take advantage of these dependencies, to let the website run a malicious code by injecting it to the dependency. In other words, they use it as an attack vector.

How dangerous is insecure deserialization?

If an arbitrary code can be executed by the application, the code will have the same privileges as the application itself. The consequence of this kind of attack can be devastating leading to unauthorized access to sensitive data, denial of service, or complete system takeover.
Therefore, it can be helpful to consider the perspective of an attacker in identifying the serialization format used by a website.

Identifying serialization format

Exploiting insecure deserialization vulnerabilities depends on what language the website is using. Before going deeper into this, let’s see how we can identify this vulnerability.

PHP serialization format

Serialization in PHP uses human readable string format with the letters representing the data type (e.g. s for “string”) and numbers which represent the characters length. Consider the following code Fruit object with name and color attributes.

<?php
class Fruit {
  public $name;
  public $color;

  function __construct($name, $color) {
    $this->name = $name;
    $this->color = $color;
  }

  function __destruct() {
    $data = serialize("This ($this->name) is ($this->color)");
    echo $data;
  }
}

$apple = new Fruit("Apple","Red");
$banana = new Fruit("Banana","Yellow")

?>

PHP uses serialize() for this purpose. The serialized output of the above code will look like this.

s:25:"This (Banana) is (Yellow)";s:21:"This (Apple) is (Red)";

Java serialization format

While PHP uses readable string for serialization, some languages like Java use binary serialization formats. For example, the source code for serialization looks like this.

import java.io.Serializable;
import java.io.FileOutputStream;
import java.io.ObjectOutputStream;
import java.io.IOException;
public class SerializeExample {
  public static void main(String[] args) {

    // Create an object to serialize
    Fruit apple = new Fruit("Apple", "Red");
    Fruit banana = new Fruit("Banana", "Yellow");

    // Serialize the object
    try {
      FileOutputStream fileOut = new FileOutputStream("fruit.ser");
      ObjectOutputStream out = new ObjectOutputStream(fileOut);
      out.writeObject(apple);
      out.writeObject(banana);
      out.close();
      fileOut.close();
      System.out.println("Serialized data is saved in fruit.ser");
     } catch (IOException i) {
       i.printStackTrace();
     }
  }
}

class Fruit implements Serializable {
   String name;
   String color;

   public Fruit(String name, String color) {
       this.name = name;
       this.color = color;
   }
}

The deserialization code of the above example will be saved in a file called fruit.ser. According to the specification, Java serialization will be encoded and start with aced. To prove this, you can run the code and check the fruit.ser bytecode with the following command on your terminal.

$xxd fruit.ser

The above command will show the byte code results like this one.

00000000: aced 0005 7372 0005 4672 7569 743c f57d ....sr..Fruit<.}
00000010: 9a8a 8e2f 1e02 0002 4c00 0563 6f6c 6f72 .../....L..color
00000020: 7400 124c 6a61 7661 2f6c 616e 672f 5374 t..Ljava/lang/St
00000030: 7269 6e67 3b4c 0004 6e61 6d65 7100 7e00 ring;L..nameq.~.
00000040: 0178 7074 0003 5265 6474 0005 4170 706c .xpt..Redt..Appl
00000050: 6573 7100 7e00 0074 0006 5965 6c6c 6f77 esq.~..t..Yellow
00000060: 7400 0642 616e 616e 61                     t..Banana

Ruby serialization format

As previously mentioned, Ruby uses a built-in module called “Marshal”. The serialization code in Ruby looks like this.

class Fruit
  attr_reader :name :color

  def initialize(name, color)
    @name = name
    @color = color
  end
end

apple = Fruit.new(“Apple”,”Red”)
banana = Fruit.new(“Banana”,”Yellow”)

serialized_apple = Marshal.dump(apple)
serialized_banana = Marshal.dump(banana)

puts serialized_apple
puts serialized_banana

The serialization object from the above code will look like this.

"\x04\bo:\nFruit\a:\n@nameI\"\nApple\x06:\x06ET:\v@colorI\"\bRed\x06;\aT"
"\x04\bo:\nFruit\a:\n@nameI\"\vBanana\x06:\x06ET:\v@colorI\"\vYellow\x06;\aT"

Once you can identify the serialization in many different languages, you can try exploiting these vulnerabilities which we will cover later as it deserves a whole new article.

Summary

The concept of serialization and deserialization is not too complex to understand. However, some are not aware of how dangerous it could be especially in the context of the website where almost everyone could access it. Due to security reasons, it is better to avoid using serialization for any user-controlled data at all. Unfortunately the dependencies that a website uses could lead to the same vulnerabilities which are way more difficult to handle. Therefore, if you are developers and somehow need to use any deserialization process, always check before the deserialization process is executed.