MIT course "Computer Systems Security". Lecture 2: "Control of hacker attacks", part 2

Massachusetts Institute of Technology. Lecture course # 6.858. "Security of computer systems". Nikolai Zeldovich, James Mykens. year 2014

Computer Systems Security is a course on the development and implementation of secure computer systems. Lectures cover threat models, attacks that compromise security, and security methods based on the latest scientific work. Topics include operating system (OS) security, capabilities, information flow control, language security, network protocols, hardware protection and security in web applications.

Lecture 1: "Introduction: threat models" Part 1 / Part 2 / Part 3
Lecture 2: "Control of hacker attacks" Part 1 / Part 2 / Part 3

So, we have a buffer over which we place a canary. Above is the saved value of the break point pointer saved EBP and the return address is placed above it. If you remember, the overflow comes from the bottom up, so before you get to the return address, it will first destroy the canary.

Audience: why will it affect the canary?

Professor: because it is assumed that the attacker does not know how to arbitrarily “jump” in the memory. Traditional memory overflow attacks begin with a hacker examining the buffer size limit, after which the overflow begins from the bottom line. But you are right - if an attacker can get directly to the line of the return address, no canary will help us. However, in the case of a traditional buffer overflow attack, everything should happen exactly this way - from the bottom up.

Thus, the main idea of using canaries is that we allow a malicious exploit to overflow a memory buffer. We have a runtime code that, when it returns from a function, checks the canary to make sure that it has the correct value.

Audience: Can an attacker rewrite the return address and change the canary? How can he verify that it has been modified, but continues to perform its function?

Professor: yes, it can. Thus, you must have some kind of code fragment that will actually check this before the function returns. That is, in this case, it is necessary to have the support of the compiler, which will actually expand the calling convention . So that part of the return sequence occurs before we consider the validity of this value, to ensure that the canary has not been destroyed. Only after that can we think of something else.

Audience: Can't an attacker know or guess what the meaning of the canary is?

Professor: that's exactly what I'm going to tell you! What is the problem with this circuit? What if, for example, we put A in each program? Or a whole branch of 4 values of A? Of course, any hacker can find out the size of the buffer, its capacity, and thus determine the position of the canary in any system. Therefore, we can use different types of values that we put in our canary to prevent this.

There is one thing you can do with our canary. It will be a very funny canary type, which uses C program functions and handles special characters, the so-called deterministic canary type.

Imagine that you used the symbol 0 for the canary. The binary value of zero is the zero byte, the zero character in ASCII. A value of -1 means return to the previous position, and so on. Many functions stop or change when they encounter characters or values such as 0, CR, LF, -1. Imagine that you, as a hacker, use a certain line management function to climb up the buffer, hit the 0 character in the canary and the process stops! If you use the carriage return-oriented function -1, which is often used as a line terminator, the process will also stop. So -1 is another magic sign.

There is one more thing that can be used in the "canary" - these are random values that are difficult to guess to the attacker. The strength of a random value is based on how difficult it is for an attacker to guess. For example, if an attacker understands that there are only 3 bits of entropy on your system, then he can use a brute-force attack. Therefore, the possibility of using random numbers to protect against attacks is quite limited.

Audience: it usually happens that I read from a different buffer and write what I read into this buffer of a given stack. In this situation, it seems that the random value of the canary is useless, because I read the data from another buffer and know where the canary is. I have another buffer that I control and which I never check. And in this buffer I can put quite a lot of what I want to put. I don't need a random canary because I can safely rewrite it. So I don’t see how it really works - in the scenario you proposed, when the function stops while reading data from the buffer.

Professor: I understand your question - you mean that we use a deterministic "canary", but do not use one of the functions of the standard library, which can be "deceived" by our symbols 0, CR, LF, -1. Then yes, in the situation described by you, the canary is not needed.

The idea is that you can fill this buffer with bytes from anywhere, but anything that allows you to guess these values or get them randomly will result in defeat.

Audience: Is it possible to use something like the number of seconds or milliseconds as random numbers and use them in a canary?

Professor: data calls do not contain such a large number of accidents as you think. Because the program has journal entries or a function that you can call to find out the time when the program was loaded, and other such things. But in general, you are right - in practice, if you can use a hardware device, usually low-level, with the best system timings, this kind of approach can work.

Audience: even if we have time to view the logs about the beginning of the buffer overflow, it is still important how much we will refuse the request. And if we cannot get control over how long a computer request to the server takes, then it is doubtful that one can guess the exact time deterministically.

Professor: quite right, I have already said that evil lies in the details, this is exactly the case. In other words, if you have some way to, for example, determine the type of the timing channel, you may find that the amount of entropy, or the number of accidents, does not fill the whole timestamp, but much less. Therefore, an attacker can determine the hour and minute when you did this, but not a second.

Audience: to record trying to minimize your own randomness is a bad idea?

Professor: quite right!

Audience: usually we just have to use everything that our systems support, right?

Professor: Yes, it is. It's like inventing your own cryptosystem, which is another popular thing that our graduates sometimes want to do. But we are not the NSA, we are not mathematicians, so it usually does not work out. So you're absolutely right about that.

But even if you use system randomness, you can still get less bits of entropy than you expect. I will give you an example of phase randomization of addresses. It is on this principle that the stack canaries approach works . Since we are engaged in computer security, you are probably wondering when the canaries fail to cope with their task and if there are ways to fail the canary.

One such method is attacking by rewriting function pointers. Because if the blow is applied to the function pointer, the canary cannot do anything.

Suppose you have a code like int * ptr ... .. that triggers a pointer, no matter how, then you have a char buf [128] buffer, the gets (buf) function, and at the very bottom is a pointer that is assigned some value : * ptr = 5 .

I note that we did not attempt to attack the return address of the function containing this code. As you can see, if the buffer is full, the pointer address located above it will be damaged. If an attacker can damage this pointer, then he can then assign a 5 to one of the addresses he controls. Everyone can see that the "canary" will not help here? Because we do not attack the way in which the function returns.

Audience: Can the pointer be below the buffer?

Professor: maybe, but the order of specific variables depends on many different things, on the way the compiler arranges the content, on the size of the hardware column, and so on. But you are right, if the buffer overflow goes up, and the pointer is located below in front of the buffer, then the overflow will not be able to damage it.

Audience: why can't you associate the canary with the canary function, as you did with the return address?

Professor: this is an interesting point! You can do these things. In fact, you can try to imagine a compiler that, whenever it has a pointer, always tries to add an addition to some things. However, checking all these things is too expensive. Because every time you want to use any pointer or call any function, you must have a code that will check if this canary is correct. Basically, you could do something like that, but does that make sense? We see that “canaries” do not help in this situation.

And one more thing that we discussed earlier is that if the attacker can guess the chance, then, in principle, the random "canaries" will not work. Creating safety resources based on chance is a separate, very complex topic, so we will not go into it.

Audience: so the canary contains less bits than the return address? Because otherwise, you could not just remember this address and check if it has changed?

Professor: let's see. You talk about this scheme when the canary is located above the buffer, and you mean that the system cannot be secure if you can’t look at the return address and see if it has been changed.

Yes and no. Note that if a buffer overflow attacks, everything above it will be rewritten, so this can still cause problems. But basically, if these things were unchangeable in some sense, then you could do something like this. But the problem is that in many cases, the manipulation of the return address is quite a complicated thing. Because you can imagine that a special function can be called from different places, and so on. In this case, we run a little ahead, and if there is time at the end of the lecture, we will return to this.

These are the situations in which the canary can fail. There are other places where failure is possible, for example, when attacking malloc and free functions. The malloc function allocates a block of memory of a certain size in bytes and returns a pointer to the beginning of the block. The contents of the allocated memory block is not initialized; it remains with undefined values. And the free function frees up memory allocated dynamically earlier.

This is a unique attack in the style of S. See what happens here. Imagine that you have two pointers here, p and q, for which we allocate 1,024 bytes of memory to each of these pointers using malloc . Suppose we are doing a strcpy function for p from some kind of buffer error that is controlled by an attacker. This is where the overflow occurs. And then we run the free q and free p commands. This is pretty simple code, right?

We have 2 pointers for which we have allocated memory, we use one of them for a particular function, a buffer overflow occurs, and we release the memory of both pointers.

Suppose that the memory lines of both objects p and q are located in the memory space next to each other. This can happen bad things, is not it? Because the strcpy function is used to copy the contents of str2 to str1 . The str2 argument must be a pointer to a string ending in zero, and strcpy returns a pointer to str1 . If str1 and str2 overlap, then the behavior of the strcpy function is undefined.

Therefore, the strycpy function that processes memory p can also affect the memory allocated for q . And it can cause problems.

It is possible that you did something of the sort in your own code unintentionally when using some kind of weird type of pointers. And everything seems to be working, but when you need to call the free function, this is what happens. And the attacker can take advantage of it, I will explain why this is happening.

Imagine that inside the implementation of the free and malloc functions, the selected block looks like this.

Let's assume that there are visible application data at the top of the block, and below we have the size of the variable. This size is not what the application directly sees, but a kind of “accounting”, which is either free or malloc , so that you know the size of the allocated memory buffer. Near the selected block is a free block. Suppose that a free block has some metadata that looks like this: on top of us is the block size, below there is free space, the pointer “back” is located even lower, and below it is the “forward” pointer. And at the very bottom of the block size is shown again.

Why do we have 2 pointers here? Because the memory allocation system in this case uses a doubly linked list to track how the free blocks are related to each other. Therefore, when you select a free block, you exclude it from this doubly linked list. And then, when you free it, you will do some arithmetic for the pointer and put these things in order. Then you add it to this linked list, right?

Whenever you hear about pointer arithmetic, you should think that this is your "canary". Because there will be a lot of problems. Let me remind you that we have a buffer overflow p . If we assume that p and q are next to each other, or very close to the memory space, then eventually it may happen that this buffer overflow can overwrite some size data for the selected pointer q - this is the bottom of our dedicated block. If you continue to follow my thought from the very beginning, your imagination will tell you where everything starts to go wrong. Indeed, in essence, what ultimately happens with these operations is free q and free p - they look at this metadata in a dedicated block in order to do all the necessary manipulations with the pointer.

That is, at some point in the execution, the free functions are going to get some kind of pointer based on the size value: p = get.free.block (size) , and size is what the attacker controls because he overflowed the buffer, correctly ?

He did a bunch of arithmetic calculations, looked at the back function and pointers of this block, and now he is going to do something like update the “back” pointer and the “forward” pointer — these are the two bottom lines.

But in reality it should not bother you. This is just a sample of the code that takes place in this case. But the fact is, because of the size rewritten by the hacker, he now controls this pointer, which passes through the free function. And because of this, the two states available here in the bottom lines are actually updates of pointers. And since the attacker was able to control this p , he actually controls these two pointers. This is where an attack can occur.

Therefore, when free works and trying to do something like a union of these two blocks, you have a double-linked list. Because if you have two blocks that collide with each other and both are free, you want to combine them into one big block.

But if we control the size, this means that we control the whole process of the four lines shown above. This means that if we understand how overflow works, then we can write data to memory in the way we choose. As I said, such things often happen with your own code if you are not smart with a pointer. When you make some mistake with double freeing like free q and free p or something else, your function crashes. Because you messed up with the metadata that reside in each of these highlighted blocks, and at some point this calculation will indicate a kind of “garbage” value, after which you will be “dead”. But if you are an attacker, you can choose this value and use it to your advantage.

Let's now move on to a different approach to prevent buffer overflow attacks. This approach is to check the boundaries. The purpose of checking boundaries is to make sure that when you use a particular pointer, it will only refer to what is a memory object. And this pointer is within the acceptable limits of this memory object. This is the main idea of verification. It is actually quite simple - at a high level. Again, in the case of C, it is very difficult to understand such things. They say that this actually means: that the pointer is within or outside the boundaries, is it valid or invalid?

For example, suppose you have two parts of the code — here I’ll write them down. You declare an array of characters of 1024 bytes and declare a pointer, and then you get the address of one of the elements in X: char X [1024] , char * y = & x [108].

Does it make sense? Is this a good idea? Hard to say. , , . , , - .
- , , , . . , , , . , , . , , .

, , struct union . , . : integer , struct , int .

, union , . , integer , struct , .

, , - : int p: & (u,s,k) , : u, s, k.

, , , , . , , union integer , struct . , , , . .

p' , p , p' , .

, , . , , union . , - - union , , , . , , X. , , , , . , , . .

, . , p p' , . .

? Electric fencing – . , , , , .

, - , . , , . , . , , , .

- C C++, , , . - , , - . , . , «» — , , , . , , .

, guard page – ! , .

59:00

Continued:

MIT course "Computer Systems Security". Lecture 2: "Control of hacker attacks", part 2

Full version of the course is available here .

Thank you for staying with us. Do you like our articles? Want to see more interesting materials? Support us by placing an order or recommending to friends, 30% discount for Habr users on a unique analogue of the entry-level servers that we invented for you: The whole truth about VPS (KVM) E5-2650 v4 (6 Cores) 10GB DDR4 240GB SSD 1Gbps from $ 20 or how to share the server? (Options are available with RAID1 and RAID10, up to 24 cores and up to 40GB DDR4).

Dell R730xd 2 times cheaper? Only we have 2 x Intel Dodeca-Core Xeon E5-2650v4 128GB DDR4 6x480GB SSD 1Gbps 100 TV from $ 249 in the Netherlands and the USA! Read about How to build an infrastructure building. class c using servers Dell R730xd E5-2650 v4 worth 9000 euros for a penny?

Source: https://habr.com/ru/post/416047/

All Articles

MIT course "Computer Systems Security". Lecture 2: "Control of hacker attacks", part 2

Massachusetts Institute of Technology. Lecture course # 6.858. "Security of computer systems". Nikolai Zeldovich, James Mykens. year 2014

More articles: