MIT course "Computer Systems Security". Lecture 3: "Buffer overflow: exploits and protection", part 1

Massachusetts Institute of Technology. Lecture course # 6.858. "Security of computer systems". Nikolai Zeldovich, James Mykens. year 2014


Computer Systems Security is a course on the development and implementation of secure computer systems. Lectures cover threat models, attacks that compromise security, and security methods based on the latest scientific work. Topics include operating system (OS) security, capabilities, information flow control, language security, network protocols, hardware protection and security in web applications.

Lecture 1: "Introduction: threat models" Part 1 / Part 2 / Part 3
Lecture 2: "Control of hacker attacks" Part 1 / Part 2 / Part 3
Lecture 3: "Buffer overflow: exploits and protection" Part 1 / Part 2 / Part 3

Welcome to the lecture about exploits for buffer overflow. Today, we end the discussion on Baggy bounds and then move on to other methods of buffer overflow protection.



Next we will talk about the print materials of today's lecture, which are devoted to Blind Return Oriented Programming (BROP) - blind back-oriented programming. This is a technique of exploits that can be carried out even if the attacker does not possess the target binary code. These exploits are aimed at destroying canaries in 64-bit systems. So, if you were like me when I first read these materials, you should feel like watching the movie by Christopher Nolan. It was just a brain explosion!

We are going to consider how these gadgets work properly. Therefore, I hope that by the end of the lecture you will be able to understand all these high technologies described in the lecture materials. But first, as I said, we’ll end the discussion with Baggy bounds . Consider a very simple example.

Suppose we are going to assign a pointer p and allocate for it a size of 44 bytes. Suppose also that the slot size is 16 bytes.

What happens when we assign the malloc function? You already know that in this case the Baggy bounds system will try to supplement this distribution with a logarithm of 2n . So for our pointer of 44 bytes in size 64 bytes of memory will be allocated. But the slot size is 16 bytes, so we will have 64/16 = 4 tables of 16-byte boundaries. Each of these entries will be placed in the size distribution log.

Next, we assign another pointer char * q = p + 60 . We see that this value is out of bounds, because the size of p is 44 bytes, and here it is 60 bytes. But Baggy bounds works in such a way that in this case nothing terrible will happen, although the programmer should not have done so.

Now let's assume that the next thing we do is assign another pointer, which will be equal to char * r = q + 16 . Now this will actually cause an error, because the offset size will be 60 + 16 = 76, which is 12 bytes more than the size of 4 slots (4x16 = 64 bytes), which the Baggy bounds system has allocated. And this excess is indeed more than half the slot.



If you remember, in this case, the Baggy bounds system will immediately respond to a critical synchronization error, which will cause the program to crash and actually stop it.

So let's imagine that we only have two lines:

char * p = malloc (44)
char * q = p + 60

And there is no third line with the code. Instead, we do this:

char * s = q + 8

In this case, the pointer will have the value 60 + 8 = 68 bits, which is 4 bytes more than the boundaries assigned by Baggy bounds . In fact, this will not cause a critical error, although the value goes beyond the boundaries. What we did here was set to a high order bit pointer. So if someone subsequently tries to dereference it, it will lead to a critical error at this point.



And the last thing we do is assign another pointer:

char * t = s - 32

In fact, we did this - we returned the pointer to the border. So if initially s went outside, now we have returned it to the initially allocated volume, which we created for the pointer. Therefore, now t will not have a high-order bit in its composition, and it can be safely dereferenced.



Audience: How does a program know that r has an excess greater than half the stack?

Professor Mickens: notice that when we created r , we received an instrumental code that will work in all these pointer operations. So we can tell where q will be located, and we know that it is within the framework of Baggy bounds . Therefore, when we perform this q + 16 operation, the Baggy bounds tools know where the initial value came from. And then, if an offset of this original size occurs , Baggy bounds will easily determine that the magnitude of this offset is greater than ½ the size of the slot.

In principle, when you perform operations with pointers, you should see whether they have exceeded the selected size or not. At some point, you have a pointer located within the boundaries of Baggy bounds , and then something happens that causes it to go beyond the boundaries. So, just when this happens, we will find out that some chicheness has come out of our code.

Hope this is understandable. It was a very brief overview of the homework, but I hope that you can easily understand it.

So, we have a pointer that looks like this:

char * p = malloc (256) , then we add the pointer char * q = p + 256 , after which we will try to dereference this pointer.

So what happens when this happens? Well, note that 256 is a 2n sequence, so it will be within Baggy bounds . Therefore, when we add another 256 bits, this means that we make one more pass to the end of the Baggy bounds . As in the previous example, this line is good enough, but it causes the high-order bit to be set for q . So when we try to dereference it, everything will explode and have to call our insurance agent. It's clear?



From these 2 examples you can understand how the Baggy bounds system works . As I mentioned in the last lecture, you really do not need to instrument every pointer operation if you can use static code analysis to find out that a certain set of pointer operations is safe. I will postpone further discussion of some static analyzes, but suffice it to say that you do not always need to perform these mathematical operations, we have already tested this before.

Another issue that is mentioned in Piazza is how to ensure the compatibility of the Baggy bounds system with previous, non-tool libraries. The idea is that when Baggy bounds initialize the boundary tables, they establish that all records must be within 31 bits. Therefore, when we read the table of boundaries, each entry in it represents a value of the form 2n + 31 . Thus, by initializing initial bounds of 31 bits, we assume that each pointer will have the maximum possible size of 2n + 31 . Let me give you a very simple example that will clarify this matter.

Suppose we have the memory space that we use for the heap. This memory space consists of two components. From above, we have a heap that was allocated using a non-tool code, and below we have a heap that was allocated with an instrumental code. So what will Baggy bounds do? As you remember, this system has the concept of a slot, whose size is 16 bits. Therefore, the table of boundaries will consist of 2 sections, initiated from 31 bits.

However, when running the tool code, it will actually use the Baggy bounds algorithm to set appropriate values ​​for this row of the table.



When the pointer comes from the top of the memory space, it is always set to the maximum possible boundaries of 2n + 31 . This means that Baggy bounds will never consider that a pointer operation that “came” from a non-instrumental library can go beyond the boundaries.

The idea is that, in the instrumental code, we will always perform these comparisons for pointers, but if we set the pointer entry boundaries for a non-instrumental code like 2n + 31 , then we will never have a dereference error. That is, we have a good interaction between the Baggy bounds code entries and the non-instrumental entries of the previous libraries.

This means that we have this system, which is good, because it does not cause the program to crash when using non-tool libraries, but it also has a problem. The problem is that we can never define the boundaries of pointers that are generated by a non-tool code. Because we will never set a higher order bit, for example, when this pointer gets too much or too little space. Thus, we actually cannot ensure the safety of memory for operations that take place when using a non-tool code. You also cannot determine when we pass a pointer, which has gone beyond the limits of dimensions, from the tool code to the non-tool code. In this case, something unimaginable can happen. If you have such a pointer, pulled from the instrumental code, then it has a high order bit set to 1. So it seems that it has gigantic dimensions.

We know that if we simply put this code in the instrumental code, we can clear this flag at some points when it returns to the bounds. But if we just give this huge address to the non-tool code, then it can do something unimaginable. It can even return this pointer back to the border, but we will never have the opportunity to clear this bit of high order. So we still have problems even using the scheme shown here.

Audience: If we have an instrumental code for allocating memory, does it use the same malloc function that the attribute code uses?

Professor: this is a slightly delicate matter. If we consider the case that is present here, then this is strictly observed, since we have two memory areas, each of which obeys the rules established for it. But in principle, this will depend on the code that the selected programming language uses. Imagine that in C ++, for example, you can assign your own determinant. So it depends on certain details of the code.

Audience: how can the determinant check if the 31-bit limit is set or not?

Professor: at the lower levels, the distribution algorithms work in such a way that when you call an unknown system, the pointer moves up. So if you have several allocators, then they all try to allocate memory, each of them has its own piece of memory, which they reserve for themselves, basically, correctly. So in real life it may be more fragmented than at a high level.

So, everything that we reviewed above related to the work of Baggy bounds in 32-bit systems. Consider what happens when using 64-bit systems. In such systems, you can actually get rid of the table of boundaries, because we can store some information about the boundaries in the pointer itself.

Consider what a regular pointer looks like in the Baggy bounds system. It consists of 3 parts. For the first, zero part, 21 bits are allocated, 5 bits are allocated for the size, this is the main log size, and 38 more are the normal address bits.



The reason that this does not limit the size of the address of the program you are using is that the majority of high-order bits of the operating system and / or equipment located in the first 2 parts of the pointer do not allow the application to be used for various reasons. So, as it turned out, we do not greatly reduce the number of applications used in the system. This is what a regular pointer looks like.

What happens when we have only one of these pointers? Well, in a 32-bit system, all we can do is simply set a high order bit and hope that this thing will never get more than half the size of the slot. But now that we have all this additional address space, you can put the offset beyond the OOB (out of bound) boundaries directly into this pointer. So we can do something like what is shown in the figure, dividing the pointer into 4 parts and redistributing its size.

Thus, we can get 13 bits for the offset boundaries, that is, write how far this OOB pointer will be from where it should be. Then again, you can set the actual size of the object specified here, equal to 5, and the remainder of the zero part, which will now be 21-13 = 8 bits. And then our address part of 38 bits follows. In this example, you see the benefits of using 64-bit systems.



Note that here we have the usual size for a regular pointer, in both cases this size is 64 bits, and its description is of an elementary nature. And this is good, because using “thick” pointers we would need a lot of words to describe them.
Also note that it is easy to apply non-tool code here, because it works and uses the same size as regular pointers. We can put these things in a struct , for example, and the size of this struct will remain the same. So it is very good when we have the opportunity to work in the 64-bit world.

Audience: why in the second case is the offset located in front of the size, and not as in the previous case, and what happens if the offset size is large?

Professor: I think that in some cases we have certain limiting problems that we have to work on. For example, a problem will arise if there are more bits. But in principle, I don’t think there is a reason why you couldn’t read some of these things. Unless certain strict conditions, which I don’t think about right now, should have stipulated the size of the zero part, otherwise there could be problems with the hardware.

So, you can still initiate a buffer overflow in the Baggy bounds system, since applying the above approaches does not solve all the problems, right? Another problem you may encounter if you have a non-tool code, because we will not be able to detect any problems in the non-tool code. You may also encounter memory vulnerabilities that arise from dynamic memory allocation systems. If you remember, in the previous lecture we looked at this strange pointer free malloc , and Baggy bounds could not prevent the occurrence of such things.

We also discussed at the last lecture the fact that code pointers do not have boundaries that would be associated with them. Suppose we have a structure in which the memory buffer is located at the bottom, and a pointer at the top, and a buffer overflow occurs. We assume that the buffer overflow is still within Baggy bounds . Then you must override this function pointer. Otherwise, if we try to use it, it can send a malicious code to attack a controlled part of the memory. And in this case, borders will not help us, because we have no borders that are connected, associated with these function pointers.

So, what is the price of using the Baggy bounds system? In fact, we have only 4 components of this price.



The first is space. Because if you use a thick pointer, then obviously you should make the pointers larger. If you are using the Baggy bounds system that we just talked about, you should keep a table of boundaries. And this table has such a slot size that allows you to control how big this table will be until you have exhausted the possibilities of the memory allocated for it.

In addition, you also increased the load on the CPU, which is forced to do all these instrumental operations with pointers. Since for each or almost every pointer, it is necessary to perform boundary checks using the same modes of operation, which will slow down the execution of your program.

There is also a problem with false alarms. We have already discussed that it may happen that the program generates pointers that go beyond the boundaries, but never tries to dereference them. Strictly speaking, this is not a problem. Baggy bounds flags these "outrageous" pointers if they go beyond 1/2 of the slot size, at least in a 32-bit solution.

What you will see in most security tools is that false alarms reduce the likelihood that people will use these tools. Because in practice, we all hope that we care about security, but what really excites people? They want to be able to upload their stupid photos on Facebook, they want to speed up the download process and so on. Thus, if you really want your security tools to be in demand, they must have zero false alarms. Attempting to catch all vulnerabilities usually results in false alarms that will annoy either the developers or the users.

Unproductive costs are also caused by the fact that you need compiler support. Because you have to add all the tools to the system, a pointer check bypass, and so on and so forth.

So, for using the Baggy bounds system, we have to pay a price that consists of space overruns, increased CPU load, false alarms, and the need to use a compiler.

This completes the Baggy bounds discussion.

Now we can think of two other buffer overflow mitigation strategies. In fact, they are much easier to explain and understand.

One of these approaches is called non-executable memory . Its basic idea is that the paging hardware will indicate 3 bits R , W, and X — read, write, and execute — for each page that you have in memory. That is, the program can read this memory, write to it, execute it. 2 , , , , .

, . , , , , - . , . « W X » , , , , , . , , . . , , . ?

- , . , . , , .

, , , , . , , . .

, . – just-in-time , .

-, JavaScript . JavaScript, , - - «» , - «» , x86 . , .
. , , just-in-time W , X . , , .

— . , , , .



, , . GDB, , . , . , . .

, . , , , , , .

: , , — . , , , , , , , - . , , .

, , , GDB , , , , . .
, , , , . - , - , . , . , , .

, ? , . , , . , , , , - , .

27:10

Continued:

MIT course "Computer Systems Security". 3: « : », 2


Full version of the course is available here .

, . ? ? , 30% entry-level , : VPS (KVM) E5-2650 v4 (6 Cores) 10GB DDR4 240GB SSD 1Gbps $20 ? ( RAID1 RAID10, 24 40GB DDR4).

Dell R730xd 2 ? Only we have 2 x Intel Dodeca-Core Xeon E5-2650v4 128GB DDR4 6x480GB SSD 1Gbps 100 TV from $ 249 in the Netherlands and the USA! . c Dell R730xd 5-2650 v4 9000 ?

Source: https://habr.com/ru/post/416839/


All Articles