Introduction
For the new project, I needed to extract the level data from the classic 1985 video game
Super Mario Bros (SMB) . More specifically, I wanted to extract the background graphics of each level of the game without an interface, moving sprites, etc.
Of course, I could simply glue the images from the game and, perhaps, automate the process using machine vision techniques. But it seemed to me more interesting is the method described below, which allows to investigate those elements of the levels that cannot be obtained with the help of screenshots.
At the first stage of the project, we will learn 6502 assembly language and an emulator written in Python. The full source code is posted
here .
Source Code Analysis
Reverse engineering of any program is much easier if there is its source code, and we have SMB sources in the form of
17 thousand lines of assembler code 6502 (NES processor) , published by doppelganger. Since Nintendo has not released the official release of the source code, the code was created by disassembling the SMB machine code, with a painful decoding of the meaning of each part, adding comments and meaningful symbolic names.
Having performed a quick file search, I found something similar to the levels data we need:
;level 1-1
L_GroundArea6:
.db $50, $21
.db $07, $81, $47, $24, $57, $00, $63, $01, $77, $01
.db $c9, $71, $68, $f2, $e7, $73, $97, $fb, $06, $83
.db $5c, $01, $d7, $22, $e7, $00, $03, $a7, $6c, $02
.db $b3, $22, $e3, $01, $e7, $07, $47, $a0, $57, $06
.db $a7, $01, $d3, $00, $d7, $01, $07, $81, $67, $20
.db $93, $22, $03, $a3, $1c, $61, $17, $21, $6f, $33
.db $c7, $63, $d8, $62, $e9, $61, $fa, $60, $4f, $b3
.db $87, $63, $9c, $01, $b7, $63, $c8, $62, $d9, $61
.db $ea, $60, $39, $f1, $87, $21, $a7, $01, $b7, $20
.db $39, $f1, $5f, $38, $6d, $c1, $af, $26
.db $fd
If you are unfamiliar with assembler, then I will explain: all this simply means "insert such a set of bytes into the compiled program, and then allow other parts of the program to refer to it using the
L_GroundArea6
symbol
L_GroundArea6
" You can take this fragment as an array, in which each element is a byte.
The first thing to notice is that the amount of data is very small (about 100 bytes). Therefore, we exclude all types of coding, allowing arbitrary placement of blocks at the level. After a bit of searching, I found that this data is being read (after several indirect addressing operations) in the
AreaParserCore . This sub-procedure in turn causes many other sub-procedures, ultimately calling specific sub-procedures for each type of objects allowed in the scene (for example,
StaircaseObject
,
VerticalPipe
,
RowOfBricks
) for more than 40 objects:
Abbreviated Call AreaParserCore
for AreaParserCore
The procedure writes to the
MetatileBuffer
: a 13-byte memory section, which is one column of blocks in a level, each byte of which denotes a separate block. Metatile (metatile) is a 16x16 block from which SMB game backgrounds are composed:
Level with rectangles described around metatailsThey are called metatails, because each consists of four 8x8-pixel tiles, but more on this below.
The fact that the decoder works with predefined objects explains the small size of the level: the level data should refer only to the types of objects and their location, for example, “place the pipe at (20, 16), a number of blocks at (10, 5), ... ". However, this means that it takes a lot of code to turn raw level data into meta files.
Porting this amount of code to create your own level unpacker would take too much time, so let's try a different approach.
py65emu
If we had an interface between Python and 6502 assembly language, we could call the
AreaParserCore
sub-
AreaParserCore
for each level column, and then use more comprehensible Python to convert the block information into the desired image.
Py65emu , a laconic 6502 emulator with Python interface, appears on the scene. Here's how the same memory configuration is configured in py65emu as in NES:
from py65emu.cpu import CPU from py65emu.mmu import MMU
After that, we can execute individual instructions using the
cpu.step()
method, examine the memory using
mmu.read()
, examine the machine registers using
cpu.ra
,
cpu.r.pc
, etc. In addition, we can write to memory using
mmu.write()
.
It is worth noting that this is just an NES processor emulator: it does not emulate other hardware, such as the PPU (Picture Processing Unit), so it cannot be used to emulate the entire game. However, it should be enough to call the parsing sub-procedure, because it does not use any other hardware devices except the CPU and memory.
The plan is to configure the CPU as shown above, and then for each level column initialize the memory partitions with the input values required for the
AreaParserCore
, call the
AreaParserCore
, and then read back the column data. After completing these operations, we use Python to assemble the result into the finished image.
But before that, we need to compile the assembler listing into machine code.
x816
As stated in the source code, the assembler is compiled using x816. The x816 is the 6502 assembler for MS-DOS used by the
homebrew community for NES and ROM hackers. It works great in
DOSBox .
Along with the ROM program, which is necessary for py65emu, the x816 assembler creates a symbol file that binds characters to their location in memory in the address space of the CPU. Here is a fragment of the file:
AREAPARSERCORE = $0093FC ; <> 37884, statement #3154
AREAPARSERTASKCONTROL = $0086E6 ; <> 34534, statement #1570
AREAPARSERTASKHANDLER = $0092B0 ; <> 37552, statement #3035
AREAPARSERTASKNUM = $00071F ; <> 1823, statement #141
AREAPARSERTASKS = $0092C8 ; <> 37576, statement #3048
Here we see that the
AreaParserCore
function in the source code can be accessed at
0x93fc
.
For convenience, I have written a symbol file parser that matches character names and addresses:
sym_file = SymbolFile('SMBDIS.SYM') print("0x{:x}".format(sym_file['AREAPARSERCORE']))
Sub-procedures
As stated in the plan above, we want to learn how to call the Python
AreaParserCore
.
To understand the mechanics of a subroutine, let's examine the short subroutine and the corresponding challenge:
WritePPUReg1: sta PPU_CTRL_REG1 ; A 1 PPU sta Mirror_PPU_CTRL_REG1 ; rts ... jsr WritePPUReg1
The
jsr
instruction (jump to subroutine, “go to sub procedure”) adds the PC register to the stack and assigns it the value of the address referenced by
WritePPUReg1
. The PC register tells the processor the address of the next instruction to load, so that the next instruction executed after the
jsr
instruction is the first line of
WritePPUReg1
.
At the end of the subroutine, the
rts
(return from subroutine, “return from subroutine”) instruction is executed. This command removes the stored value from the stack and stores it in the PC register, which causes the CPU to execute the instruction following the
jsr
call.
A remarkable property of subprocedures lies in the fact that it is possible to create embedded calls, that is, calls to subprocedures inside sub procedures. Return addresses will be added to the stack and retrieved in the correct order, in the same way as it happens with function calls in high-level languages.
Here is the code for executing sub-procedures from Python:
def execute_subroutine(cpu, addr): s_before = cpu.rs cpu.JSR(addr) while cpu.rs != s_before: cpu.step() execute_subroutine(cpu, sym_file['AREAPARSERCORE'])
The code saves the current value of the stack pointer register (
s
), emulates the
jsr
call, and then executes instructions until the stack returns to its original height, which only happens after the first subroutine returns. This will be useful, because now we have a way to directly call sub-procedures 6502 from Python.
However, we have forgotten something: how to pass the input values for this sub-procedure? We need to tell the procedure what level we want to render and which column we need to parse.
Unlike functions in high-level languages, sub-procedures in 6502 assembly cannot receive explicitly specified input data. Instead, the input data is transferred by specifying memory locations somewhere before the call, which are then read inside the sub-procedure call. If we consider the size of the
AreaParserCore
, then the reverse development of the necessary input data by simply studying the source code will be very complex and error-prone.
Valgrind for NES?
To find a way to determine the input values of
AreaParserCore
, I used the
memcheck tool for Valgrind as an example. Memcheck recognizes operations to access uninitialized memory by storing "shadow" memory in parallel with each piece of real allocated memory. The shadow memory records whether it was written to the corresponding real memory. If the program reads to the address to which it never performed the recording, then an uninitialized memory error is output. We can run the
AreaParserCore
with such a tool that tells us which input data you need to set before calling the subroutine.
In fact, writing a simple memcheck version for py65emu is very easy:
def format_addr(addr): try: symbol_name = sym_file.lookup_address(addr) s = "0x{:04x} ({}):".format(addr, symbol_name) except KeyError: s = "0x{:04x}:".format(addr) return s class MemCheckMMU(MMU): def __init__(self, *args, **kwargs): super().__init__(*args, **kwargs) self._uninitialized = array.array('B', [1] * 2048) def read(self, addr): val = super().read(addr) if addr < 2048: if self._uninitialized[addr]: print("Uninitialized read! {}".format(format_addr(addr))) return val def write(self, addr, val): super().write(addr, val) if addr < 2048: self._uninitialized[addr] = 0
Here we wrapped the py65emu memory management unit (MMU). This class contains an array of
_uninitialized
, the elements of which tell us whether there has ever been an entry in the corresponding byte of the emulated RAM. In the event of an uninitialized read, the address of the invalid read operation and the name of the corresponding character are displayed.
Here are the results that the MMU gives in a wrapper when calling
execute_subroutine(sym_file['AREAPARSERCORE'])
:
Uninitialized read! 0x0728 (BACKLOADINGFLAG):
Uninitialized read! 0x0742 (BACKGROUNDSCENERY):
Uninitialized read! 0x0741 (FOREGROUNDSCENERY):
Uninitialized read! 0x074e (AREATYPE):
Uninitialized read! 0x075f (WORLDNUMBER):
Uninitialized read! 0x0743 (CLOUDTYPEOVERRIDE):
Uninitialized read! 0x0727 (TERRAINCONTROL):
Uninitialized read! 0x0743 (CLOUDTYPEOVERRIDE):
Uninitialized read! 0x074e (AREATYPE):
...
Looking through the code, you can see that many of these values are set by the
InitializeArea
, so let's run the script again, calling this function first. Repeating this process, we come to the following sequence of calls, which requires specifying only the world number and the area number:
mmu.write(sym_file['WORLDNUMBER'], 0)
The code writes the first 48 World 1-1 columns to
metatile_data
, using the
IncrementColumnPos
metatile_data
to increment the internal variables needed to track the current column.
But the contents of
metatile_data
superimposed on screenshots from the game (bytes with a value of 0 are not shown):
Obviously,
metatile_data
clearly corresponds to the background information.
Metatail Graphics
(To see the final result, you can go directly to the section "We connect everything".)
Now let's see how to turn the resulting number of metatails into real images. The steps below are invented by analyzing the sources and reading the documentation from the awesome
Nesdev Wiki .
To understand how to render each meta-file, we first need to talk about the NES color palettes. The NES PPU of the console can generally render 64 different colors, but the black color is duplicated several times (see
Nesdev for
details ):
Each Mario level can use only 10 of these 64 colors for the background, divided into 4 four-color palettes; The first color is always the same. Here are four palettes for World 1-1:
Let's now consider an example of a metatay number, represented in binary form. Here is the number of the cracked stones metatail tile, which is a World Level 1-1 level ground:
The palette index tells us which palette to use when rendering meta-mail (in our case, palette 1). The palette index is also an index of the following two arrays:
MetatileGraphics_Low:
.db <Palette0_MTiles, <Palette1_MTiles, <Palette2_MTiles, <Palette3_MTiles
MetatileGraphics_High:
.db >Palette0_MTiles, >Palette1_MTiles, >Palette2_MTiles, >Palette3_MTiles
The combination of these two arrays gives us a 16-bit address, which in our example points to
Palette1_Mtiles
:
Palette1_MTiles:
.db $a2, $a2, $a3, $a3 ;vertical rope
.db $99, $24, $99, $24 ;horizontal rope
.db $24, $a2, $3e, $3f ;left pulley
.db $5b, $5c, $24, $a3 ;right pulley
.db $24, $24, $24, $24 ;blank used for balance rope
.db $9d, $47, $9e, $47 ;castle top
.db $47, $47, $27, $27 ;castle window left
.db $47, $47, $47, $47 ;castle brick wall
.db $27, $27, $47, $47 ;castle window right
.db $a9, $47, $aa, $47 ;castle top w/ brick
.db $9b, $27, $9c, $27 ;entrance top
.db $27, $27, $27, $27 ;entrance bottom
.db $52, $52, $52, $52 ;green ledge stump
.db $80, $a0, $81, $a1 ;fence
.db $be, $be, $bf, $bf ;tree trunk
.db $75, $ba, $76, $bb ;mushroom stump top
.db $ba, $ba, $bb, $bb ;mushroom stump bottom
.db $45, $47, $45, $47 ;breakable brick w/ line
.db $47, $47, $47, $47 ;breakable brick
.db $45, $47, $45, $47 ;breakable brick (not used)
.db $b4, $b6, $b5, $b7 ;cracked rock terrain <--- This is the 20th line
.db $45, $47, $45, $47 ;brick with line (power-up)
.db $45, $47, $45, $47 ;brick with line (vine)
.db $45, $47, $45, $47 ;brick with line (star)
.db $45, $47, $45, $47 ;brick with line (coins)
...
When multiplying the metatail index by 4, it becomes the index of this array. The data is formatted in 4 entries per line, so our example of metatail refers to the twentieth line, marked with the comment
cracked rock terrain
.
Four entries of this line are in fact identifiers of tiles: each meta-file consists of four tiles of 8x8 pixels in the following order - upper left, lower left, upper right and lower right. These identifiers are transmitted directly to the PPU of the NES console. The identifier refers to 16 bytes of data in the CHR-ROM console, and each entry starts at
0x1000 + 16 * < >
:
0x1000 + 16 * 0xb4: 0b01111111 0x1000 + 16 * 0xb5: 0b11011110
0x1001 + 16 * 0xb4: 0b10000000 0x1001 + 16 * 0xb5: 0b01100001
0x1002 + 16 * 0xb4: 0b10000000 0x1002 + 16 * 0xb5: 0b01100001
0x1003 + 16 * 0xb4: 0b10000000 0x1003 + 16 * 0xb5: 0b01100001
0x1004 + 16 * 0xb4: 0b10000000 0x1004 + 16 * 0xb5: 0b01110001
0x1005 + 16 * 0xb4: 0b10000000 0x1005 + 16 * 0xb5: 0b01011110
0x1006 + 16 * 0xb4: 0b10000000 0x1006 + 16 * 0xb5: 0b01111111
0x1007 + 16 * 0xb4: 0b10000000 0x1007 + 16 * 0xb5: 0b01100001
0x1008 + 16 * 0xb4: 0b10000000 0x1008 + 16 * 0xb5: 0b01100001
0x1009 + 16 * 0xb4: 0b01111111 0x1009 + 16 * 0xb5: 0b11011111
0x100a + 16 * 0xb4: 0b01111111 0x100a + 16 * 0xb5: 0b11011111
0x100b + 16 * 0xb4: 0b01111111 0x100b + 16 * 0xb5: 0b11011111
0x100c + 16 * 0xb4: 0b01111111 0x100c + 16 * 0xb5: 0b11011111
0x100d + 16 * 0xb4: 0b01111111 0x100d + 16 * 0xb5: 0b11111111
0x100e + 16 * 0xb4: 0b01111111 0x100e + 16 * 0xb5: 0b11000001
0x100f + 16 * 0xb4: 0b01111111 0x100f + 16 * 0xb5: 0b11011111
0x1000 + 16 * 0xb6: 0b10000000 0x1000 + 16 * 0xb7: 0b01100001
0x1001 + 16 * 0xb6: 0b10000000 0x1001 + 16 * 0xb7: 0b01100001
0x1002 + 16 * 0xb6: 0b11000000 0x1002 + 16 * 0xb7: 0b11000001
0x1003 + 16 * 0xb6: 0b11110000 0x1003 + 16 * 0xb7: 0b11000001
0x1004 + 16 * 0xb6: 0b10111111 0x1004 + 16 * 0xb7: 0b10000001
0x1005 + 16 * 0xb6: 0b10001111 0x1005 + 16 * 0xb7: 0b10000001
0x1006 + 16 * 0xb6: 0b10000001 0x1006 + 16 * 0xb7: 0b10000011
0x1007 + 16 * 0xb6: 0b01111110 0x1007 + 16 * 0xb7: 0b11111110
0x1008 + 16 * 0xb6: 0b01111111 0x1008 + 16 * 0xb7: 0b11011111
0x1009 + 16 * 0xb6: 0b01111111 0x1009 + 16 * 0xb7: 0b11011111
0x100a + 16 * 0xb6: 0b11111111 0x100a + 16 * 0xb7: 0b10111111
0x100b + 16 * 0xb6: 0b00111111 0x100b + 16 * 0xb7: 0b10111111
0x100c + 16 * 0xb6: 0b01001111 0x100c + 16 * 0xb7: 0b01111111
0x100d + 16 * 0xb6: 0b01110001 0x100d + 16 * 0xb7: 0b01111111
0x100e + 16 * 0xb6: 0b01111111 0x100e + 16 * 0xb7: 0b01111111
0x100f + 16 * 0xb6: 0b11111111 0x100f + 16 * 0xb7: 0b01111111
CHR-ROM is a read-only memory fragment that only PPU can access. It is separated from the PRG-ROM, which stores the program code. Therefore, the above data is not in the source code and they need to be obtained from the ROM dump of the game.
16 bytes for each tile is a 2-bit 8x8 tile: the first bit is the first 8 bytes, and the second is the second 8 bytes:
21111111 13211112
12222222 23122223
12222222 23122223
12222222 23122223
12222222 23132223
12222222 23233332
12222222 23111113
12222222 23122223
12222222 23122223
12222222 23122223
33222222 31222223
11332222 31222223
12113333 12222223
12221113 12222223
12222223 12222233
23333332 13333332
We bind this data to palette 1:
... and combine the pieces:
Finally we got a rendered tile.
We put everything together
Repeating this procedure for each metatail, we get a fully rendered level.
And because of this, we were able to extract SMB graphics using Python!