Your First Program

Your First Program

Your First Register

The CPU thinks in very simple terms. It moves data around, changes data, makes decisions based on data, and takes action based on data. Most of the time, this data is stored in registers.

Simply put, registers are containers for data. The CPU can put data into registers, move data between registers, and so on. These registers, at a hardware level, are implemented using very expensive chips, crammed into shockingly microscopic spaces, and accessed at a frequency where even physical concepts such as the speed of light impact their performance. Hence, the number of registers that a CPU can have is extremely constrained. Different CPU architectures have different amounts of registers, different names for these registers, and so on, but typically, there are between 10 and 20 “general purpose” registers that program code can use for any reason, and up to a few dozen other ones that are used for special purposes.

In x86’s modern incarnation, x86_64, programs have access to 16 general purpose registers. In this challenge, we will learn about our first one: rax. Hi, Rax!

rax, a single x86 register, is a tiny piece of the massively complex design of the x86 CPU, but this is where we’ll start. Like the other registers, rax is a container for a small amount of data. You move data into rax with the mov instruction. Instructions are specified as an operator (in this case, mov), and operands, which represent additional data (in this case, it will be the specification of rax as a destination, and the value we will want to store there).

For example, if you wanted to store the value 1337 into rax, the x86 Assembly would look like:

1
mov rax, 1337

You can see a few things:

  1. The destination (rax) is specified before the source (the value 1337).
  2. The operands are separated by a comma.
  3. It is really simple!

In this challenge, you will write your first assembly. You must move the value 60 into rax. Write your program in a file with a .s extension, such as rax-challenge.s (while not mandatory, .s is the typical extension for assembly files), and pass it as an argument to the /challenge/check file (e.g., /challenge/check rax-challenge.s). You can use either your favorite text editor or the text editor in pwn.college’s VSCode Workspace to implement your .s file!


ERRATA: If you’ve seen x86 assembly before, there is a chance that you’ve seen a slightly different dialect of it. The dialect used in pwn.college is “Intel Syntax”, which is the correct way to write x86 assembly (as a reminder, Intel created x86). Some courses incorrectly teach the use of “AT&T Syntax”, causing enormous amounts of confusion. We’ll touch on this slightly in the next module and then, hopefully, never have to think about AT&T Syntax again.

翻译

CPU的思维方式非常简单。它负责移动数据、修改数据、根据数据做出决策,并基于数据采取行动。大多数时候,这些数据存储在寄存器中。

简而言之,寄存器就是存放数据的容器。CPU可以将数据存入寄存器,在寄存器之间移动数据等等。在硬件层面,这些寄存器通过极其昂贵的芯片实现,被压缩在惊人微观的空间中,并以连光速这样的物理概念都会影响其性能的频率进行访问。因此,CPU所能拥有的寄存器数量受到极大限制。不同的CPU架构拥有不同数量的寄存器,对这些寄存器的命名也各不相同,但通常会有10到20个“通用”寄存器,程序代码可以出于任何原因使用它们,此外还有最多几十个用于特殊用途的寄存器。

在现代x86的版本,即x86_64中,程序可以访问16个通用寄存器。在本挑战中,我们将学习第一个寄存器:rax。你好,Rax!

rax,作为单个x86寄存器,是x86 CPU庞大复杂设计中的一小部分,但我们将从这里开始。与其他寄存器一样,rax是一个用于存放少量数据的容器。您可以通过mov指令将数据移入rax。指令由操作符(本例中为mov)和操作数(代表附加数据,本例中是指定rax作为目标位置,以及我们想要存储的值)组成。

例如,如果您想将值1337存入rax,x86汇编代码将如下所示:

mov rax, 1337
您可以注意到以下几点:

  • 目标(rax)在源(值1337)之前指定。
  • 操作数之间用逗号分隔。
  • 这真的非常简单!

在本挑战中,您将编写您的第一段汇编代码。您必须将值60移入rax。请将您的程序写入扩展名为.s的文件中,例如rax-challenge.s(虽然这不是强制要求,但.s是汇编文件的典型扩展名),并将其作为参数传递给/challenge/check文件(例如:/challenge/check rax-challenge.s)。您可以使用您喜欢的文本编辑器或pwn.college的VSCode工作区中的文本编辑器来实现您的.s文件!

勘误:如果您之前接触过x86汇编,可能会见过稍有不同的语法风格。pwn.college中使用的风格是“Intel语法”,这是编写x86汇编的正确方式(提醒一下,Intel创造了x86)。有些课程错误地教授“AT&T语法”的使用,造成了巨大的混淆。我们将在下一个模块中略微提及这一点,然后希望再也不必考虑AT&T语法了。

SOLVE

编写一个文件challenge-rax.s

1
mov rax, 60

然后将文件作为/challenge/check的参数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
hacker@your-first-program~your-first-register:~$ /challenge/check challenge-rax.s

Checking the assembly code...
... YES! Great job!


Congratulations, you have written your first program!
Now let's see what happens when you run it:

hacker@your-first-program~your-first-register:/home/hacker$ /tmp/your-program
Segmentation fault (core dumped)
hacker@your-first-program~your-first-register:/home/hacker$

... uh oh. The program crashed! We'll go into more details about
what a Segmentation Fault is later, but in this case, the program
crashed because, after the CPU moved the value 60 into rax, it was
never instructed to stop execution. With no further instructions
to execute, and no directive to stop, it crashed.

In the next level, we'll learn about how to stop program execution.
For now, here is your flag for your first (crashing) program!


Here is your flag!
pwn.college{cUtbn87tpKT_U5T8TwasdaBJCLG.dNDN4UDLxYDNzgzW}

Your first syscall

So, your first program crashed… Don’t worry, it happens! In this challenge, you’ll learn how to make your program cleanly exit instead of crashing.

Starting your program and cleanly stopping it are actions handled by your computer’s Operating System. The operating system manages the existence of programs and interactions between the programs, your hardware, the network environment, and so on.

Your programs “interact” with the CPU using assembly instructions such as the mov instruction you wrote earlier. Similarly, your programs interact with the operating system (via the CPU, of course) using the syscall, or System Call instruction.

Like how you might use a phone call to interact with a local restaurant to order food, programs use system calls to request the operating system to carry out actions on the program’s behalf. As a bit of an overgeneralization, anything your program does that doesn’t involve performing computation on data is done with a system call.

There are a lot of different system calls your program can invoke. For example, Linux has around 330 different ones, though this number changes over time as syscalls are added and deprecated. Each system call is indicated by a syscall number, counting upwards from 0, and your program invokes a specific syscall by moving its syscall number into the rax register and invoking the syscall instruction. For example, if we wanted to invoke syscall 42 (a syscall that you’ll learn about sometime later!), we would write two instructions:

mov rax, 42
syscall
Very cool, and super easy!

In this challenge, we’ll learn our first syscall: exit. The exit syscall causes a program to exit. By explicitly exiting, we can avoid the crash we ran into with our previous program!

Now, the syscall number of exit is 60. Go and write your first program: it should move 60 into rax, then invoke syscall to cleanly exit!

翻译

看来你的第一个程序崩溃了……别担心,这很正常!在这个挑战中,你将学习如何让程序正常退出而不是崩溃。

程序的启动和正常终止都是由计算机的操作系统处理的。操作系统管理着程序的存在,以及程序与硬件、网络环境等之间的交互。

你的程序通过汇编指令(比如之前写的mov指令)与CPU进行”交互”。同样地,程序也通过系统调用(syscall)指令与操作系统进行交互(当然是通过CPU)。

就像你可能会打电话给当地餐厅订餐一样,程序使用系统调用来请求操作系统代表程序执行某些操作。稍微概括地说,程序中所有不涉及数据计算的操作都是通过系统调用完成的。

程序可以调用许多不同的系统调用。例如,Linux有大约330种不同的系统调用,不过这个数字会随着系统调用的新增和废弃而不断变化。每个系统调用都有一个从0开始编号的系统调用号,程序通过将系统调用号存入rax寄存器并执行syscall指令来调用特定的系统调用。例如,如果我们想调用42号系统调用(这个系统调用你以后会学到!),我们会写两条指令:

mov rax, 42
syscall
非常酷,而且超级简单!

在这个挑战中,我们将学习第一个系统调用:exit。exit系统调用会让程序退出。通过显式地调用退出,我们就可以避免上一个程序遇到的崩溃问题!

现在,exit的系统调用号是60。去编写你的第一个程序吧:它应该将60移入rax,然后调用syscall来正常退出!

SOLVE

编写一个文件syscall.s

1
2
mov rax, 60 
syscall

然后将这个文件作为/challenge/check的参数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
hacker@your-first-program~your-first-syscall:~$ /challenge/check syscall.s

Checking the assembly code...
... YES! Great job!

Okay, now you have written your first COMPLETE program!
All it'll do is exit, but it'll do so cleanly, and we can
build from there!

Let's see what happens when you run it:

hacker@your-first-program~your-first-syscall:/home/hacker$ /tmp/your-program
hacker@your-first-program~your-first-syscall:/home/hacker$

Neat! Your program exited cleanly! Let's push on to make things
more interesting! Take this with you:


Here is your flag!
pwn.college{Mqj-rsiBNsMv6T0GBYsXWPhP62P.dhjN4UDLxYDNzgzW}

Exit Codes

As you might know, every program exits with an exit code as it terminates. This is done by passing a parameter to the exit system call.

Similarly to how a system call number (e.g., 60 for exit) is specified in the rax variable, parameters are also passed to the syscall through registers. System calls can take multiple parameters, though exit takes only one: the exit code. The first parameter to a system call is passed via another register: rdi. rdi is what we will focus on in this challenge.

In this challenge, you must make your program exit with the exit code of 42. Thus, your program will need three instructions:

  1. Set your program’s exit code (move it into rdi).
  2. Set the system call number of the exit syscall (mov rax, 60).
  3. syscall!

Now, go and do it!

翻译

如您所知,每个程序在终止时都会返回一个退出码。这是通过向exit系统调用传递参数来实现的。

就像系统调用号(例如exit是60)是通过rax寄存器指定一样,参数也是通过寄存器传递给系统调用的。系统调用可以接受多个参数,不过exit只需要一个参数:退出码。系统调用的第一个参数是通过另一个寄存器rdi传递的。这个挑战中我们将重点关注rdi寄存器。

在此挑战中,您需要让程序以退出码42结束运行。因此,您的程序需要包含三条指令:

  1. 设置程序的退出码(将其移入rdi)
  2. 设置exit系统调用的系统调用号(mov rax, 60)
  3. 执行syscall!

现在,请开始实现吧!

SOLVE

编写一个文件exit.s

1
2
3
mov rdi, 42
mov rax, 60
syscall

然后将这个文件作为/challenge/check的参数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
hacker@your-first-program~exit-codes:~$ /challenge/check exit.s

Checking the assembly code...
... YES! Great job!

Let's check what your exit code is! It should be 42 to succeed!

Go go go!

hacker@your-first-program~exit-codes:/home/hacker$ /tmp/your-program
hacker@your-first-program~exit-codes:/home/hacker$ echo $?
42
hacker@your-first-program~exit-codes:/home/hacker$

Neat! Your program exited with the correct error code! But in this level,
we built the executable for you. Next, you'll learn how to build the executable
yourself, and then you'll be ready to walk the path of Assembly!

For now, take this with you:



Here is your flag!
pwn.college{YdHeQunaY_uuE437HUd5XSJaXX8.dBzN4UDLxYDNzgzW}

Building Executables

So you’ve written your first program? But until now, we’ve handled the actual building of it into an executable that your CPU can actually run. In this challenge, you will build it!

To build an executable binary, you need to:

  1. Write your assembly in a file (often with a .S or .s syntax. We’ll use program.s in this example).
  2. Assemble your assembly file into an object file (using the as command).
  3. Link one or more executable object files into a final executable binary (using the ld command)!

Let’s take this step by step:

Writing assembly.
The assembly file contains, well, your assembly code. For the previous level, this might be:

1
2
3
4
5
hacker@dojo:~$ cat program.s
mov rdi, 42
mov rax, 60
syscall
hacker@dojo:~$

But it needs to contain just a tad more info. We mentioned that we’re using the Intel assembly syntax in this course, and we’ll need to let the assembler know that. You do this by prepending a directive to the beginning of your assembly code, as such:

1
2
3
4
5
6
hacker@dojo:~$ cat program.s
.intel_syntax noprefix
mov rdi, 42
mov rax, 60
syscall
hacker@dojo:~$

.intel_syntax noprefix tells the assembler that you will be using Intel assembly syntax, and specifically the variant of it where you don’t have to add extra prefixes to every instruction. It isn’t actually an x86 instruction (like mov and syscall), and so it doesn’t end up in our final executable binary or runs on the CPU. We’ll talk about other directives later, but for now, we’ll let the assembler figure it out!

Assembling Assembly Code into Object Files.
Next, we’ll assemble the code. This is done using the assembler, as, as so:

1
2
3
4
5
6
7
8
9
10
11
hacker@dojo:~$ ls
program.s
hacker@dojo:~$ cat program.s
.intel_syntax noprefix
mov rdi, 42
mov rax, 60
syscall
hacker@dojo:~$ as -o program.o program.s
hacker@dojo:~$ ls
program.o program.s
hacker@dojo:~$

Here, the as tool reads in program.s, assembles it into binary code, and outputs an object file called program.o. This object file has actual assembled binary code, but it is not yet ready to be run. First, we need to link it.

Linking Object Files into an Executable.
In a typical development workflow, source code is compiled and assembly is assembled to object files, and there are typically many of these (generally, each source code file in a program compiles into its own object file). These are then linked together into a single executable. Even if there is only one file, we still need to link it, to prepare the final executable. This is done with the ld (stemming from the term “link editor”) command, as so:

1
2
3
4
5
6
7
hacker@dojo:~$ ls
program.o program.s
hacker@dojo:~$ ld -o program program.o
ld: warning: cannot find entry symbol _start; defaulting to 0000000000401000
hacker@dojo:~$ ls
program.o program.s program
hacker@dojo:~$

This creates an program file that we can then run! Here it is:

1
2
3
4
hacker@dojo:~$ ./program
hacker@dojo:~$ echo $?
42
hacker@dojo:~$

In the shell, $? holds the exit code of the last executed command.

Neat! Now you can build programs. In this challenge, go ahead and run through these steps yourself. Build your executable, and pass it to /challenge/check for the flag!


_start?
The attentive learner might have noticed that ld prints a warning about entry symbol _start. The _start symbol is, essentially, a note to ld about where in your program execution should begin when the ELF is executed. The warning states that, absent a specified _start, execution will start right at the beginning of the code. This is just fine for us!

If you want to silence the error, you can specify the _start symbol, in your code, as so:

1
2
3
4
5
6
7
8
9
10
11
12
13
hacker@dojo:~$ cat program.s
.intel_syntax noprefix
.global _start
_start:
mov rdi, 42
mov rax, 60
syscall
hacker@dojo:~$ as -o program.o program.s
hacker@dojo:~$ ld -o program program.o
hacker@dojo:~$ ./program
hacker@dojo:~$ echo $?
42
hacker@dojo:~$

There are two extra lines here. The second, _start:, adds a label called start, pointing to the beginning of your code. The first, .global _start, directs as to make the _start label globally visible at the linker level, instead of just locally visible at the object file level. As ld is the linker, this directive is necessary for the _start label to be seen.

For all the challenges in this dojo, starting execution at the beginning of the file is just fine, but if you don’t want to see those warnings pop up, now you know how to prevent them!

翻译

你已经写了第一个程序?但到目前为止,我们还没有处理如何将其构建成CPU实际可执行的文件。在这个挑战中,你将完成构建过程!

要构建可执行二进制文件,你需要:

  1. 将汇编代码写入文件(通常使用.S或.s扩展名。本例中使用program.s)
  2. 将汇编文件汇编成目标文件(使用as命令)
  3. 将一个或多个目标文件链接成最终的可执行二进制文件(使用ld命令)!

让我们逐步进行:

编写汇编代码
汇编文件包含你的汇编代码。对于上一个关卡,代码可能是:

1
2
3
4
5
hacker@dojo:~$ cat program.s
mov rdi, 42
mov rax, 60
syscall
hacker@dojo:~$

但实际上还需要包含一点额外信息。我们提到过本课程使用Intel汇编语法,需要让汇编器知道这一点。你可以在汇编代码开头添加指令来实现:

1
2
3
4
5
6
hacker@dojo:~$ cat program.s
.intel_syntax noprefix
mov rdi, 42
mov rax, 60
syscall
hacker@dojo:~$

.intel_syntax noprefix告诉汇编器你将使用Intel汇编语法,特别是那种不需要为每条指令添加额外前缀的变体。它实际上不是x86指令(如mov和syscall),因此不会出现在最终的可执行文件中或在CPU上运行。

将汇编代码汇编成目标文件
接下来我们进行汇编。使用汇编器as来完成:

1
2
3
4
5
6
7
8
9
10
11
hacker@dojo:~$ ls
program.s
hacker@dojo:~$ cat program.s
.intel_syntax noprefix
mov rdi, 42
mov rax, 60
syscall
hacker@dojo:~$ as -o program.o program.s
hacker@dojo:~$ ls
program.o program.s
hacker@dojo:~$

as工具读取program.s,将其汇编成二进制代码,并输出名为program.o的目标文件。这个目标文件包含实际的汇编二进制代码,但还不能直接运行。我们需要先进行链接。

将目标文件链接成可执行文件
在典型的开发工作流中,源代码被编译、汇编成目标文件,通常会有多个这样的文件(一般来说,程序中的每个源代码文件都会编译成自己的目标文件)。然后将它们链接在一起形成单个可执行文件。即使只有一个文件,我们仍然需要链接它以准备最终的可执行文件。使用ld命令完成:

1
2
3
4
5
6
7
hacker@dojo:~$ ls
program.o program.s
hacker@dojo:~$ ld -o program program.o
ld: warning: cannot find entry symbol _start; defaulting to 0000000000401000
hacker@dojo:~$ ls
program.o program.s program
hacker@dojo:~$

这样就创建了一个可以运行的program文件!运行结果:

1
2
3
4
hacker@dojo:~$ ./program
hacker@dojo:~$ echo $?
42
hacker@dojo:~$

在shell中,$?保存了上一个执行命令的退出码。

很好!现在你可以构建程序了。在这个挑战中,请亲自完成这些步骤。构建你的可执行文件,并将其传递给/challenge/check来获取flag!

_start?
细心的学习者可能注意到ld输出了关于入口符号_start的警告。_start符号本质上是告诉ld当ELF文件执行时应该从程序的哪个位置开始执行。警告说明,在没有指定_start的情况下,执行将从代码开头开始。这对我们来说完全没问题!

如果你想消除这个错误,可以在代码中指定_start符号:

1
2
3
4
5
6
7
8
9
10
11
12
13
hacker@dojo:~$ cat program.s
.intel_syntax noprefix
.global _start
_start:
mov rdi, 42
mov rax, 60
syscall
hacker@dojo:~$ as -o program.o program.s
hacker@dojo:~$ ld -o program program.o
hacker@dojo:~$ ./program
hacker@dojo:~$ echo $?
42
hacker@dojo:~$

这里多了两行代码。第二行_start:添加了一个指向代码开头的start标签。第一行.global _start指示as使 _start 标签在链接器级别全局可见,而不仅仅在目标文件级别局部可见。由于ld是链接器,这个指令对于 _start 标签能被看到是必要的。

对于本道场中的所有挑战,从文件开头开始执行完全没问题,但如果你不想看到这些警告,现在你知道如何避免它们了!

SOLVE

编写一个文件program.s

1
2
3
4
5
6
.intel_syntax noprefix
.global _start
_start:
mov rdi, 42
mov rax, 60
syscall

再经过一系列操作

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
hacker@your-first-program~building-executables:~$ as -o program.o program.s
hacker@your-first-program~building-executables:~$ ld -o program program.o
/nix/store/mkvc0lnnpmi604rqsjdlv1pmhr638nbd-binutils-2.44/bin/ld: warning: cannot find entry symbol _start; defaulting to 0000000000401000
hacker@your-first-program~building-executables:~$ /challenge/check program

Checking the assembly code...
... YES! Great job!

Let's check what your exit code is! It should be 42 to succeed!

Go go go!

hacker@your-first-program~building-executables:/home/hacker$ /tmp/your-program
hacker@your-first-program~building-executables:/home/hacker$ echo $?
42
hacker@your-first-program~building-executables:/home/hacker$

Neat! Your program exited with the correct error code! But what
if it hadn't? Next, we'll learn about some simple debugging.
For now, take this with you:



Here is your flag!
pwn.college{0vbOlzRbGC9IK07lJvhdoxlMMxA.QXwcjMwEDLxYDNzgzW}

Moving Between Register

Okay, let’s learn about one more register: rsi! Like rdi, rsi is a place you can park some data. For example:

1
mov rsi, 42

Of course, you can also move data around between registers! Watch:

1
2
mov rsi, 42
mov rdi, rsi

Just like the first line there moves 42 into rsi, the second line moves the value in rsi to rdi. Here, we have to mention one complication: by move, we really mean set. After the snippet above, rsi and rdi will be 42. It’s a mystery as to why the mov was chosen rather than something reasonable like set (even very knowledgeable people resort to wild speculation when asked), but it was, and here we are.

Anyways, on to the challenge! In this challenge, we will store a secret value in the rsi register, and your program must exit with that value as the return code. Since exit uses the value stored in rdi as the return code, you’ll need to move the secret value in rsi into rdi. Run /challenge/check and pass it your code for the flag! /challenge/check will set the secret value in rsi before running your code. Good luck!

翻译

好的,让我们再来学习一个寄存器:rsi!和rdi一样,rsi是一个可以存放数据的地方。例如:

mov rsi, 42

当然,你也可以在寄存器之间移动数据!看这个例子:

mov rsi, 42
mov rdi, rsi

就像第一行将42移入rsi一样,第二行将rsi中的值移到了rdi。这里我们需要说明一个细节:所谓的”移动”实际上是指”设置”。在上面的代码片段执行后,rsi和rdi的值都会是42。至于为什么选择”mov”这个名称而不是更合理的”set”(即使知识渊博的人被问及此事时也只能进行各种猜测),这确实是个谜,但事实就是如此。

言归正传,开始挑战吧!在这个挑战中,我们将在rsi寄存器中存储一个秘密值,你的程序必须以该值作为退出码结束运行。由于exit系统调用使用rdi中存储的值作为退出码,你需要将rsi中的秘密值移动到rdi中。运行/challenge/check并将你的代码传递给它来获取flag!/challenge/check会在运行你的代码之前在rsi中设置秘密值。祝你好运!

SOLVE

编写一个secret.s文件

1
2
3
4
5
6
.intel_syntax noprefix
.global _start
_start:
mov rdi, rsi
mov rax, 60
syscall

然后将这个文件作为/challenge/check的参数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
hacker@your-first-program~moving-between-registers:~$ /challenge/check secret.s

Checking the assembly code...
... YES! Great job!

Let's check what your exit code is! It should be our secret
value stored in register rsi (value 50) to succeed!

hacker@your-first-program~moving-between-registers:/home/hacker$ /tmp/your-program
hacker@your-first-program~moving-between-registers:/home/hacker$ echo $?
50
hacker@your-first-program~moving-between-registers:/home/hacker$

Neat! Your program passed the tests! Great job!

Here is your flag!
pwn.college{0wbajGmvVBLHtY_ZMbiX0hHXlRe.dlDN1YDLxYDNzgzW}

Your First Program
http://example.com/Your First Program/
作者
briteny-pwn
发布于
2025年3月6日
许可协议