Just what the heck is a “Buffer Overflow” anyway?!

[Let me start this with a disclaimer; a warning; or maybe a promise: This is designed to be an accessible series that describes common software vulnerabilities, their effects, and potential mitigations. I’m writing this for myself as much as for others in the hope that it will simplify some of the concepts, code, and terms of art that come up in software and systems development. Frankly, I’m probably going to mess this up in some way (I just know it) so when I do, please feel free to correct me (nicely) and I will publish both a correction and probably a mea culpa.]

I promised (maybe threatened is more accurate) on Twitter recently [yes, it was over ~~6 months~~ two years ago. “Recently” is a relative term and can be excused by a global pandemic] to start publishing a series of articles on common software vulnerabilities, what they look like, and how to (potentially) mitigate them. I figure I’ll start at one of the easiest to explain and (hopefully) understand: Buffer Overflows.

But first…

Having a shared language helps us to understand what we’re all talking about so let’s get a couple of basic definitions and such out of the way. I promise this will be the only time we have to look at this, all future posts will just refer back to here.
A “vulnerability” as defined by the Common Vulnerabilities and Exposures (CVE) (a list sponsored by the U.S. Department of Homeland Security (DHS), Cybersecurity and Infrastructure Security Agency (CISA), and run by MITRE (already an acronym) is

“…a weakness in the computational logic (e.g., code) found in software and hardware components that, when exploited, results in a negative impact to confidentiality, integrity, or availability.”
http://cve.mitre.org/about/terminology.html
[emphasis added]

whereas an “exposure” is

“…a system configuration issue or a mistake in software that allows access to information or capabilities that can be used by a hacker as a stepping-stone into a system or network. ”
http://cve.mitre.org/about/terminology.html

The National Institute of Standards and Technology (NIST) maintains a database of CVE’s called the National Vulnerability Database (NVD)

And finally the Common Weakness Enumeration (CWE) is a community-developed list of common software security weaknesses that is, again, sponsored by DHS and CISA and maintained by MITRE. CWE entries describe particular types of vulnerabilities, how they may commonly appear / happen in code, and potential mitigations.

Gee, thanks for the acronym list. I’ll remember that forever, but back to the task at hand: what is a buffer overflow?

At it’s most basic description, a Buffer Overflow is simply what its name says it is: a program has allocated some space in memory (buffer) to store data but it is presented more data than that buffer will or can hold, and when it tries to put data into the buffer, the data can overflow to somewhere else. Here’s a way to visualize this: Think of a one gallon bucket (sigh. Ok, a 3.78 liter bucket if you must). If you try to fill it with two gallons, the bucket can’t hold that much and the water overflows and gets your floor all wet. Simple, right? This is a very common error in coding. A search of the NVD for vulnerabilities related to CWE-119 shows that they account for about 11% of the total disclosed vulnerabilities. CWE-119 is the “Improper Restriction of Operations with the Bounds of a Memory Buffer” weakness which is the parent type of a bunch of different types of buffer overruns. Note the key phrase above: “total disclosed vulnerabilities.” Odds are there are more instances out there that no one has exposed / reported on yet.

CWE-119 “**Improper Restriction of Operations within the Bounds of a Memory Buffer**” entries vs total entries in the NVD

But…so what?

You may be thinking to yourself, “So? Some data didn’t go into the right place. I mean, that sucks and all, but it’s just an error…” And most of the time you might be correct. The data could just overflow, get put in the wrong place, return an invalid result, the program crashes, or something. However, any computational error like this is a security concern because it could affect the confidentiality, integrity, or availability of data. At it’s most basic exploitation, the data returned by an application could just be flat out wrong (integrity), or could cause the application to crash (availability), or could reveal data that it shouldn’t (confidentiality). Let’s explore this a bit more to see what it could mean in practice. Let’s look at a stack overflow. This is really only one type of buffer overflow, but it’s one that I think is fairly easy to understand.

Shall we look at code?

Let’s take a look at some very simple code to see what happens when a buffer overrun occurs and then what could happen if we exploited it.

/***********************************************************************
* Original Author:	Falken, Stephen W.
* File Creation Date:	06/22/1973
* Development Group:	W.O.P.R. Ops.
* Description:		War Operation Planned Response (W.O.P.R.) system login control.
**********************************************************************/

#include <iostream>
using namespace std;

int main(void)
{
	int authorized = 0;	//Is the user authorized? Default is 0 or "no"
	char cUsername[7];	// Holds the username entered by the user.


	{

		system("CLS");
	start:
		//Prompt for a logon
		cout << "LOGON: ";
		cin >> cUsername;
		// Let's make sure I can get back in...
		if (strcmp(cUsername, "Joshua") == 0)
		{
			// It me
			authorized = 1;
		}

		if (authorized != 0) // Check if the user is authorized
		{
			// greet
			cout << "GREETINGS PROFESSOR FALKEN\nHOW ARE YOU FEELING TODAY?\n";
			int waitFlag=0;
			do {
				cin >> waitFlag;
			} while (waitFlag != 1);
		}
		else
		{
			// The user is not authorized
			cout << "INDENTIFICATION NOT RECOGNIZED BY SYSTEM\n--CONNECTION TERMINATED--\n\n";

		}
		goto start;
	}
}

The W.O.P.R. login code above is pretty simple. We allocate a couple of values for the system to use, and then ask the user to give us some information. In this case we’re allocating a 7 char buffer value for the username , and an integer value to store the result of our authentication check. If the user enters the correct login value (“Joshua” in this case) the value of “authorized” will change from 0 to 1. The system will then check the value of “authorized” and if it’s not equal to 0 [zero], the application will continue. Otherwise, the code will reject the login and loop back to start again. But what happens if the user enters more than the username buffer can hold? Well, that’s where it gets interesting.

Let’s take a look at what happens when we run the above code and purposefully overload the buffer:

A command shell showing various attempts to log in to an application

The first entry is a single “a” character. Since it’s well within our buffer size of 7, the rest of the code runs as expected and the users login is denied (because a single letter “a” != “Joshua”). The next line is using seven “a” characters and since that’s still not enough data the code runs as expected. The next line is using 8 “a” characters and while still larger than our buffer size, it’s not yet bled into the authorized value. Finally, we enter 9 “a” characters, and we see that we’re now “authenticated”.
Effectively what happened here was we overfilled the username bucket and “spilled” in to the value of authorized. If we look at the value that was assigned to authorized, we’d see that instead of still being “0” [zero] it now has a value of “97” (which is the ASCII value of “a”). Since the code that is checking for authorization is simply looking to see if authorized is equal to zero, we’re now magically “authorized” (because anything that is not zero is considered “good”). If we look at what the memory layout looks like it makes it a little simpler to understand:

As you can see, this is a fairly trivial error, but it’s one that developers can make quite often and has been known and around for quite a while (in fact it happens so often that the compiler in Visual Studio had a BUNCH of different options that I had to disable to even make the buffer overflow even work. Heck I don’t even think I got them all).

What if we were to not just overwrite the value of “authorized” though? Could we maybe even get this application to execute code that it doesn’t normally do? For example, what if we were to go beyond the value of “authorized” in memory and passed in some byte code or “shell” code? Could we get the system to execute it? The short answer is “yes”. Yes we certainly could.

I hope you found this little blog post interesting. If you’d like more like this, or have particular areas you’d like to know more about, let me know. I will (really) try to get these out more often in the future.