Going from a binary file to a running process

This blog post is heavily inspired by the series making-our-own-executable-packer written by FasterThanLi.me, his twitter and his mastodon.

Introduction

In this post I’m not going to simply copy-and-past his code, I want to deeply understand what’s going on therefore I’m going to add support for the AARCH64. I’m going to emphasize on the part I didn’t know, or had problems with. All the assembly will be different from the original post, because it’s going to be arm64 assembly and not x86_64.

PS: I’m currently writing before I finished the project therefore I might fail during the process.

What’s the background ?

I won’t be as detailed as the original post, therefore i recommand you to read his series first, because it is good. If you want to have a thorough reading that’s where you’ll find it. If you don’t, you’ll just read me struggling to understand what’s going on. 😅 I’ll let myself use external tools and I assume knowledge of the memory (stack, heap, the registers and that stuff is mapped to page with some permissions when you allocate memory). That’s all, i’ll explain the rest on the way.

The code structure

Just like Amos, I’m using rust and nom to parse the ELF file (it’s heavily inspired I warned you).

I’m using the same code structure as Amos aka a lib parsing the ELF and a binary running it therefore we need to understand the structure of an ELF and then know what to do with each section.

Here is the structure of the code.

text
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
./delf:
    Cargo.toml
    src
        lib.rs
        parse.rs

./elk:
    Cargo.toml
    src
        main.rs
    

Let’s create the structure project.

bash
1
2
cargo new --lib delf
cargo new elk

We can now add the dependencies to the Cargo.toml file for the elk binary let’s see how it looks like. cat elk/Cargo.toml

toml
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
[package]
name = "elk"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
delf = { path = "../delf" }
mmap = "0.1"
region = "3.0.0"

and the dependencies for the delf library. cat delf/Cargo.toml

toml
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
[package]
name = "delf"
version = "0.1.0"
edition = "2021"

[dependencies]
derive-try-from-primitive = "1.0.0"
derive_more = "0.99.17"
enumflags2 = "0.7.5"
nom = "7.1.1"

OK, we are good to go !

Before we start

Because GCC or other high-end compiler are pulling a lot of tricks to make our life easier, we need to create the simplest possible binary, and then we will try to parse it and run it (hopefully).

asm
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
//hello_label.asm
.text 			// code section
.globl _start
_start:

	adr x1,msg //~ load the address of the message in x1
	mov x0, 1 	// stdout has file descriptor 1
	mov x2, 14	// size of buffer
	mov x8, 64 	// sys_write() is at index 64 in kernel functions table
	svc #0 		// generate kernel call sys_write(stdout, msg, len);

	mov x0, 123 	// exit code
	mov x8, 93 	// sys_exit() is at index 93 in kernel functions table
	svc #0 		// generate kernel call sys_exit(123);

.data			// data section
msg: 
    .ascii "Hello World.\n" //the .ascii defines the type of the data 

If you never have written assembly, here is a quick explanation of what’s going on:

  • .text defines a section of code
  • .globl _start defines the entry point of the program as being the label _start, it’s the first line called when the program is executed
  • _start defines a label where we can goto (it’s like a function name) where we can jump. This label is used as the entry point of the program.

In the arm doc Procedure call standard, it is said that arguments are passed (up to 8 of them !) in the register x0… x7. You can read more about the calling convention there. The assembly code is already commented.

Now we can compile the assembly and run it using the following commands:

bash
1
2
as hello_label.asm -o hello_label.o && ld hello_label.o -o hello_label && rm hello_label.o
./hello_label 

We see the “hello world” output, it should be ok, but let’s see the exit code to see if everything really went well

bash
1
echo $? #ouputs 123 because $? is the exit code of the last command

Let’s try to parse it now !

Parsing an ELF file

Before starting we need to know what’s in an ELF file. Basically an ELF file is composed of a magic number which is [0x7f, 0x45, 0x4c, 0x46] a header describing sections of the file and the sections themselves. The header also contains some info on the elf such as which target it should run on. For example here is the list of target that rustc (rust compiler) supports: rustc target list.

Let’s create a type of input for the parser.

rust
1
2
3
4
//copied from Amos's code
// delf/src/parser.rs
pub type Input<'a> = &'a [u8];
pub type Result<'a, O> = nom::IResult<Input<'a>, O, nom::error::VerboseError<Input<'a>>>;

The Result type is really horrible, but at leat we don’t write it every time! let’s create a file structure to hold our elf data.

rust
1
2
3
//copied from Amos's code
// src/lib.rs
pub struct File {}

Let’s go parsing now !

rust
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
// delf/src/lib.rs
mod parse;

//skiped struct def

impl file{
    const MAGIC_ELF: &'static [u8] = &[0x7f, 0x45, 0x4c, 0x46];
    pub fn parse(i: parse::Input) -> parse::Result<Self> {
        let i = tag(Self::MAGIC_ELF)(i)?;
    }
}

OK, we know the magic number, but not much more to be honest. But we will improvise on the way.

tag is a nom parser that checks if the input starts with the given tag.
We can see from the doc that it returns a function, we call the returned function with the input to get the remaining bytes after the tag, or an error (propagated using the ? operator).

We can the header part that doesn’t change using the following code:

rust
1
2
3
4
5
6
7
8
9
// delf/src/lib.rs
let (i, _) = tuple((
            context("Magic", tag(Self::MAGIC_ELF)),
            context("Class", tag(&[0x2])),
            context("Endianness", tag(&[0x1])),
            context("Version", tag(&[0x1])),
            context("OS ABI", nom::branch::alt((tag(&[0x0]), tag(&[0x3])))),
            context("Padding", take(8_usize)),
        ))(i)?;