This is just me documenting my journey through radare2 (rasm)plugin creation (as someone who has yet to actually contribute to radare2), a list of notes taken when I was developing the plugin.

Idea: Let’s implement a disassembler and an assembler for some machine code that I created for a CTF RE challenge, callled SPARCle. It’s very minimal (17 different instructions) with 4-byte opcodes. So it should be easy to support.

Finding the files/directories we’ll work from

Just a bit of looking around, resulted in this list:

  • libr/anal/p/anal_sparc_gnu.c: defines the registers, mapping to r2 conditionals, opcodes, …
  • libr/anal/d/cc-sparc-32.sdb.txt: contains mapping between arguments and the registers they go into.
  • libr/asm/arch/sparc/gnu/sparc-dis.c: contains function to create a hash table out of the opcode, compare two opcodes, print_insn_sparc() which prints one instruction from MEMADDR on INFO->STREAM
  • libr/asm/arch/sparc/gnu/sparc-opc.c: contains all the instruction names together with lookup names and lookup value functions to decode the instructions (sparc_encode_prefetch(), sparc_decode_prefetch(), sparc_decode_membar(), sparc_encode_membar(), sparc_encode_asi(), sparc_decode_asi(), …)
  • libr/asm/d/sparc.sdb.txt: contains a lookup table for the meaning of instruction names (“bn=branch never”) Probably for one of those verbose functions in r2.
  • libr/asm/p/asm_sparc_gnu.c: contains functions such as “disassemble()”, print_address(), memory_error_func(), buf_fprintf(),
  • libr/include/r_asm.h: contains a list of pointers to RAsmPlugin structs. We’ll need to add ours to this header file. “extern RAsmPlugin r_asm_plugin_sparc_gnu;”
  • libr/include/r_anal_ex.h: defines enums for all kinds of stuff that cpu architectures do for example an enum with types for binary operaties (shift left, xor, and, not, …), an enum for conditionals (equals, not equazl, greater than, less than, …), code operations (jmp, call, ret, leave, …), memory operations (load from mem, load from ref,). Nut also mappings to their string representations (bool_str == “bool”, cond_eq_str == “==”, ….)
  • libr/include/r_anal.h: contains also a lot of enums but also plugin pointers for RAnalPlugin. “extern RAnalPlugin r_anal_plugin_sparc_gnu;“.

I’ll probably have to create a RAnalPlugin and a RAsmPlugin.

For the RAsmPlugin, we find the struct we need to implement in libr/include/r_asm.h

typedef struct r_asm_plugin_t {
	const char *name;
	const char *arch;
	const char *author;
	const char *version;
	const char *cpus;
	const char *desc;
	const char *license;
	void *user; // user data pointer
	int bits;
	int endian;
	bool (*init)(void *user);
	bool (*fini)(void *user);
	int (*disassemble)(RAsm *a, RAsmOp *op, const ut8 *buf, int len);
	int (*assemble)(RAsm *a, RAsmOp *op, const char *buf);
	RAsmModifyCallback modify;
	int (*set_subarch)(RAsm *a, const char *buf);
	char *(*mnemonics)(RAsm *a, int id, bool json);
	const char *features;
} RAsmPlugin;

Always visit https://github.com/radare/radare2/blob/master/libr/include/r_asm.h for the latest definition.

We now have to create our own C file which implements this functionality. We do this in libr/asm/p (“p” stands for plugin afaict):

#include <stdio.h>
#include <string.h>
#include <r_types.h>
#include <r_lib.h>
#include <r_asm.h>

static int disassemble(RAsm *a, RAsmOp *op, const ut8 *buf, int len) {
        return 0;
}

static int assemble(RAsm *a, RAsmOp *op, const char *buf) {
        return 0;
}

RAsmPlugin r_asm_plugin_sparcle = {
        .name = "sparcle",    
        .arch = "sparcle",    
        .author = "koffiedrinker",                                                                                                  
        .version = "0.1",     
        .desc = "SPARCle architecture",
        .bits = 8 | 16 | 32,  
        .endian = R_SYS_ENDIAN_LITTLE | R_SYS_ENDIAN_BIG, 
        .init = NULL, 
        .fini = NULL, 
        .disassemble = &disassemble,  
        .assemble = &assemble,
};


#ifndef CORELIB
RLibStruct radare_plugin = {
        .type = R_LIB_TYPE_ASM,
        .data = &r_asm_plugin_sparcle                                                                                               
        .version = R2_VERSION
};
#endif

And we also create Makefile in this directory, called libr/asm/p/sparcle.mk. Again, I copied from an easy architecture (Brainfuck) and changed the name everywhere…

OBJ_SPARCLE=asm_sparcle.o
TARGET_SPARCLE=asm_sparcle.${EXT_SO}
STATIC_OBJ+=${OBJ_SPARCLE}
ifeq ($(WITHPIC),1)
ALL_TARGETS+=${TARGET_SPARCLE}
${TARGET_SPARCLE}: ${OBJ_SPARCLE}
	${CC} $(call libname,asm_sparcle) ${LDFLAGS} ${CFLAGS} -o ${TARGET_SPARCLE} ${OBJ_SPARCLE}
endif

Add it to the main Makefile in the directory (libr/asm/p/Makefile):

ARCHS+=sparcle.mk

Now we just add our plugin as an external dependency in libr/include/r_asm.h (at the bottom):

extern RAsmPlugin r_asm_plugin_sparcle;

And then in plugins.def.cfg, we add our architecture to the static variable:

asm.sparcle

Compile and run:

[koffiedrinker@ctf radare2]$ ./configure
[koffiedrinker@ctf radare2]$ make
[koffiedrinker@ctf radare2]$ ./binr/radare2/radare2 -a sparcle -
Cannot set bits 64 to 'sparcle'
Cannot set bits 64 to 'sparcle'
 -- See you at the defcon CTF
[0x00000000]>

The error is probably because our architecture didn’t include 64 in its bits variable of RAsmPlugin. (I added it after this error and now everything is fine :P )

Time to start implementing the disassemble function. Since I have never worked with the radare2 code before, the first thing to figure out is what we get as input and what we have to do (and what integer we have to return as output).

From the Brainfuck implementation, I gather that we get some bytes as input in the buf argument and then strcpy() a string (the disassembly meaning (like input is 0x90, then put string “nop”)) into op->buf_asm.

The Chip8 implementation uses r2 handy functions like r_read_be16() to read 16 bits (big-endian) and then goes from there. It uses snprintf(op->buf_asm, R_ASM_BUFSIZE, "nop") instead of just strcpy(). It also shows how to handle/show jump instructions depending on the value of the code: snprintf (op->buf_asm, R_ASM_BUFSIZE, "jp 0x%03x", nnn);

By searching for r_read_be16, I found all (?) handy functions like this:

  • r_read_le8
  • r_read_le16
  • r_read_le32
  • r_read_le64
  • r_read_be8
  • r_read_be16
  • r_read_be32
  • r_read_be64

Koffiedrinker from the future here: They’re defined in r_endian.h And there’s more than the list above.

RAsmOp is defined as:

typedef struct r_asm_op_t {
	int size; // instruction size
	int bitsize; // instruction size in bits (or 0 if fits in 8bit bytes)
	int payload; // size of payload (opsize = (size-payload))
	// But this is pretty slow..so maybe we should add some accessors
	ut8  buf[R_ASM_BUFSIZE + 1];
	char buf_asm[R_ASM_BUFSIZE + 1];
	char buf_hex[R_ASM_BUFSIZE + 1];
	RBuffer *buf_inc;
} RAsmOp;

Not sure why the implementations don’t set the other values.

The first argument passed to disassemble is of type RAsm. The definition is this:

typedef struct r_asm_t {
	char *cpu;
	int bits;
	int big_endian;
	int syntax;
	ut64 pc;
	void *user;
	_RAsmPlugin *cur;
	_RAsmPlugin *acur;
	RList *plugins;
	RBinBind binb;
	RParse *ifilter;
	RParse *ofilter;
	Sdb *pair;
	RSyscall *syscall;
	RNum *num;
	char *features;
	int invhex; // invalid instructions displayed in hex
	int pcalign;
	int dataalign;
	int bitshift;
	bool immdisp; // Display immediates with # symbol (for arm stuff).
	SdbHash *flags;
	int seggrn;
} RAsm;

So it basically contains the Architecture environment. I can see how having easy access to these variables might handle some complex architectures (with support for different endianness, bits, …)

The fourth parameter (int len) probably contains the length of the buf parameter (since a lot of architectures will have opcodes with nullbytes in them).

Setting a print debugging environment

To get a hang of disassembling, I’m gonna add a handy function in my disassembler that’ll write to a logfile my debugging output (I’m stupid like this but I really like printf() debugging :) ).

/* dp, short for debug printing */
static void dp(char* str) {
        char* debugfile = "/tmp/r2_sparcle.log";
        FILE* fp = fopen(debugfile, "a");
        if(fp == NULL)
                return;

        fprintf(fp, str);
        fclose(fp);
}

static int disassemble(RAsm *a, RAsmOp *op, const ut8 *buf, int len) {
        char debugprintbuf[256];
        dp("disassemble function called!\n");
        snprintf(debugprintbuf, 256, "\tCPU: %s, Bits: %d, Big Endian: %d\n", a->cpu, a->bits, a->big_endian);
        dp(debugprintbuf);
        return 0;
}

(Fugly code, I know. That’s how I roll.)

When we compile and start r2 (r2 -a sparcle -b 32 -) and then perform pd 1, we get into our logfile:

[koffiedrinker@ctf tmp]$ tail -f r2_sparcle.log
disassemble function called!
        CPU: x86, Bits: 32, Big Endian: 0
disassemble function called!
        CPU: x86, Bits: 32, Big Endian: 0
disassemble function called!
        CPU: x86, Bits: 32, Big Endian: 0
disassemble function called!
        CPU: x86, Bits: 32, Big Endian: 0

I have no idea why the function is called four times when I only do pd 1. When I print out the buf pointer and length argument, I get:

        Buffer (buf): 0x5584e70f9040, Length (len): 16
        Buffer (buf): 0x5584e70fa410, Length (len): 16
        Buffer (buf): 0x5584e70fa410, Length (len): 16
        Buffer (buf): 0x5584e70fa410, Length (len): 16

Which doesn’t really make a lot of sense (why 3 times the same buffer and same size?).

Koffiedrinker from the future here: The disassemble function returns the number of bytes successfully disassembled. Because we were returning zero, radare2 did some (unknown) stuff. If we return 4 at the end of the function and do pd 1, we nicely consume four bytes (and only get called once!). If we do pd 10 we get:

        Buffer (buf): 0x558d9ca720d0, Length (len): 160
        Buffer (buf): 0x558d9ca720d4, Length (len): 156
        Buffer (buf): 0x558d9ca720d8, Length (len): 152
        Buffer (buf): 0x558d9ca720dc, Length (len): 148
        Buffer (buf): 0x558d9ca720e0, Length (len): 144
        Buffer (buf): 0x558d9ca720e4, Length (len): 140
        Buffer (buf): 0x558d9ca720e8, Length (len): 136
        Buffer (buf): 0x558d9ca720ec, Length (len): 132
        Buffer (buf): 0x558d9ca720f0, Length (len): 128
        Buffer (buf): 0x558d9ca720f4, Length (len): 124

Seems like expected behaviour!

Implementing SPARCle

This is actually a lot less work than I thought, since creating a disassembler is just making a nice string of each opcode.

SPARCle itself is very easy, the opcodes are always 4 bytes long and there are only 16 or so different instructions.

A very simple implementation is thus:

static int re_read_le24(char* str) {
        int r = 0;
        r += (str[2] << 16);
        r += (str[1] << 8);
        r += str[0];
        return r;
}

static int disassemble(RAsm *a, RAsmOp *op, const ut8 *buf, int len) {
        char debugprintbuf[256];
        //snprintf(debugprintbuf, 256, "\tCPU: %s, Bits: %d, Big Endian: %d\n", a->cpu, a->bits, a->big_endian);
        snprintf(debugprintbuf, 256, "\tBuffer (buf): %p, Length (len): %d\n", buf, len);
        dp(debugprintbuf);

        /* An instruction is always 4 bytes. Return if we don't get four bytes to read */
        if(len < 4) {
                return 0;
        }
        op->size = 4; /* Set opcode size to 4 bytes */

        /* FETCH */
        char opcode[4];
        memcpy(opcode, buf, 4);

        /* DECODE */
        switch(opcode[0]) {
                case 0x00:
                        snprintf(op->buf_asm, R_ASM_BUFSIZE, "exit %d", re_read_le24(opcode+1));
                        break;
                default:
                        snprintf(op->buf_asm, R_ASM_BUFSIZE, "Crash and burn.");
                        break;
        }

        return 4;
}

We just take an opcode, parse it and return the correct meaning of the bytes.

In radare2:

[koffiedrinker@ctf radare2]$ radare2 -a sparcle -b 32 -
 -- Press any key to continue ...
[0x00000000]> pd 10
            0x00000000      00000000       exit 0
            0x00000004      00000000       exit 0
            0x00000008      00000000       exit 0
            0x0000000c      00000000       exit 0
            0x00000010      00000000       exit 0
            0x00000014      00000000       exit 0
            0x00000018      00000000       exit 0
            0x0000001c      00000000       exit 0
            0x00000020      00000000       exit 0
            0x00000024      00000000       exit 0
[0x00000000]> wx 00000100
[0x00000000]> pd 10
            0x00000000      00000100       exit 256
            0x00000004      00000000       exit 0
            0x00000008      00000000       exit 0
            0x0000000c      00000000       exit 0
            0x00000010      00000000       exit 0
            0x00000014      00000000       exit 0
            0x00000018      00000000       exit 0
            0x0000001c      00000000       exit 0
            0x00000020      00000000       exit 0
            0x00000024      00000000       exit 0

Implementing all the opcodes:

	/* DECODE */
	switch(opcode[0]) {
		case 0x00:
			snprintf(op->buf_asm, R_ASM_BUFSIZE, "exit %d", r_read_le24(opcode+1));
			break;
		case 0x01:
			snprintf(op->buf_asm, R_ASM_BUFSIZE, "or r%d, r%d, r%d", opcode[1], opcode[2], opcode[3]);
			break;
		case 0x02:
			snprintf(op->buf_asm, R_ASM_BUFSIZE, "mov mem[0x%04x], r%d", r_read_le16(opcode+1), opcode[3]);
			break;
		case 0x03:
			snprintf(op->buf_asm, R_ASM_BUFSIZE, "mov r%d, mem[0x%04x]", opcode[1], r_read_le16(opcode+2));
			break;
		case 0x04:
			snprintf(op->buf_asm, R_ASM_BUFSIZE, "mov 0x%04x, r%d", r_read_le16(opcode+1), opcode[3]);
			break;
		case 0x05:
			snprintf(op->buf_asm, R_ASM_BUFSIZE, "and r%d, r%d, r%d", opcode[1], opcode[2], opcode[3]);
			break;
		case 0x06:
			snprintf(op->buf_asm, R_ASM_BUFSIZE, "shl r%d, %d, r%d", opcode[1], opcode[2], opcode[3]);
			break;
		case 0x07:
			snprintf(op->buf_asm, R_ASM_BUFSIZE, "shr r%d, %d, r%d", opcode[1], opcode[2], opcode[3]);
			break;
		case 0x08:
			snprintf(op->buf_asm, R_ASM_BUFSIZE, "add r%d, r%d, r%d", opcode[1], opcode[2], opcode[3]);
			break;
		case 0x09:
			snprintf(op->buf_asm, R_ASM_BUFSIZE, "cmp r%d, r%d, r%d", opcode[1], opcode[2], opcode[3]);
			break;
		case 0x0a:
			snprintf(op->buf_asm, R_ASM_BUFSIZE, "je r%d, 0x%04x", opcode[1], r_read_le16(opcode+2));
			break;
		case 0x0b:
			snprintf(op->buf_asm, R_ASM_BUFSIZE, "jne r%d, 0x%04x", opcode[1], r_read_le16(opcode+2));
			break;
		case 0x0c:
			snprintf(op->buf_asm, R_ASM_BUFSIZE, "jl r%d, 0x%04x", opcode[1], r_read_le16(opcode+2));
			break;
		case 0x0d:
			snprintf(op->buf_asm, R_ASM_BUFSIZE, "jg r%d, 0x%04x", opcode[1], r_read_le16(opcode+2));
			break;
		case 0x0e:
			snprintf(op->buf_asm, R_ASM_BUFSIZE, "mov mem[r%d], r%d", opcode[1], opcode[2]);
			break;
		case 0x0f:
			snprintf(op->buf_asm, R_ASM_BUFSIZE, "mov r%d, mem[r%d]", opcode[1], opcode[2]);
			break;
		case 0x10:
			snprintf(op->buf_asm, R_ASM_BUFSIZE, "shl r%d, r%d, r%d", opcode[1], opcode[2], opcode[3]);
			break;
		default:
			snprintf(op->buf_asm, R_ASM_BUFSIZE, "Crash and burn.");
			break;
	}

I also had to fix an implementation error in the r_read functions that I implemented where a char value of 0xff would become -1 (Oxffffffff) when cast to an integer.

Eventually I ended up with this.

Implementing an assembler

From the Brainfuck assembler, I gather that I need to write to op->buf the machine code that is represented by the string passed to it. But I still don’t know exactly what I’m getting passed. Let’s figure that out with our debug printing…

static int assemble(RAsm *a, RAsmOp *op, const char *buf) {
	dp("Assemble() called!\n");
	char pbuf[256];
	snprintf(pbuf, 256, "Buf: %s\n", buf);
	dp(pbuf);
	return 0;
}
[koffiedrinker@ctf radare2]$ r2 -a sparcle -b 32 -
 -- Mess with the best, Die like the rest
[0x00000000]> pa hahaha; hehehe; sow
00000000
[0x00000001]> q
[koffiedrinker@ctf radare2]$ rasm2 -a sparcle "jmp hue"
00000000
[koffiedrinker@ctf radare2]$ rasm2 -a sparcle "jmp hue; jmp hie; jmp hah"                                                           
000000000000000000000000

Gives:

Assemble() called!
Buf: hahaha
Assemble() called!
Buf: hahaha
Assemble() called!
Buf: hahaha
Assemble() called!
Buf: jmp hue
Assemble() called!
Buf: jmp hue
Assemble() called!
Buf: jmp hue
Assemble() called!
Buf: jmp hue
Assemble() called!
Buf: jmp hie
Assemble() called!
Buf: jmp hah
Assemble() called!
Buf: jmp hue
Assemble() called!
Buf: jmp hie
Assemble() called!
Buf: jmp hah
Assemble() called!
Buf: jmp hue
Assemble() called!
Buf: jmp hie
Assemble() called!
Buf: jmp hah

So we get passed one instruction at the time. Great!

Since it’s just implementing an Assembler from here on forwards, I just give this link to the code with the function implemented. It was a real pain, I can tell you. It can assemble all the instructions but you need to make sure that anything within brackets is without a space.

As an example:

[koffiedrinker@ctf radare2]$ rasm2 -a sparcle 'mov 0, r2 ; mov mem[r2], r3 ; mov 0, r4 ; cmp r4, r3, r4 ; je r4, 68 ; or r3, r0, r4 ; shl r3, 4, r3 ; add r3, r4, r3 ; shl r3, 3, r3 ; add, r3, r4, r3 ; mov 255, r4 ; and r3, r4, r3 ; mov r3, mem[r2] ; mov 1, r4 ; add r2, r4, r2 ; cmp r0, r0, r4 ; je r4, 4 ; exit 0 ; '
040000020e02030004000004090403040a044400010300040603040308030403060303030803040304ff0004050304030f0302000401000408020402090000040a04040000000000

Now back to the fun stuff!

How can I get fancy lines from jumps to their location?

Since our disassembler function is just taking hexbytes and converting it to a readable string, radare2 doesn’t actually understand the meaning behind these strings. Taking a peek into the file list above, rendered the observation that meaning is added in the anal(ysis) plugin. For example, take a look at the Chip8 implmentation:


static int chip8_anop(RAnal *anal, RAnalOp *op, ut64 addr, const ut8 *data, int len) {
	memset (op, '\0', sizeof (RAnalOp));
	ut16 opcode = r_read_be16 (data);
	uint8_t nibble = opcode & 0x0F;
	uint16_t nnn = opcode & 0x0FFF;
	uint8_t kk = opcode & 0xFF;
	op->size = 2;
	op->addr = addr;
	switch (opcode & 0xF000) {

	case 0x1000:
		op->type = R_ANAL_OP_TYPE_JMP;
		op->jump = nnn;
		op->fail = addr + op->size;
		break;

	//...
	case 0x8000:
		switch (nibble) {
		case 0x0:
			op->type = R_ANAL_OP_TYPE_UNK;
			break;
		case 0x1:
			op->type = R_ANAL_OP_TYPE_OR;
			break;
		case 0x2:
			op->type = R_ANAL_OP_TYPE_AND;
			break;
		case 0x3:
			op->type = R_ANAL_OP_TYPE_XOR;
			break;
		case 0x4:
			op->type = R_ANAL_OP_TYPE_ADD;
			break;
		//...
		}
	}
	// ...
}

RAnalPlugin r_anal_plugin_chip8 = {
	.name = "chip8",
	.desc = "CHIP8 analysis plugin",
	.license = "LGPL3",
	.arch = "chip8",
	.bits = 32,
	.op = &chip8_anop,
};

Let’s a look at the RAnalPlugin definition:

typedef struct r_anal_plugin_t {
	char *name;
	char *desc;
	char *license;
	char *arch;
	char *author;
	char *version;
	int bits;
	int esil; // can do esil or not
	int fileformat_type;
	int custom_fn_anal;
	int (*init)(void *user);
	int (*fini)(void *user);
	int (*reset_counter) (RAnal *anal, ut64 start_addr);
	int (*archinfo)(RAnal *anal, int query);
	ut8* (*anal_mask)(RAnal *anal, int size, const ut8 *data, ut64 at);

	// legacy r_anal_functions
	RAnalOpCallback op;
	RAnalBbCallback bb;
	RAnalFnCallback fcn;

	// overide the default analysis function in r_core_anal_fcn
	RAnalAnalyzeFunctions analyze_fns;

	// parse elements from a buffer
	RAnalOpFromBuffer op_from_buffer;
	RAnalBbFromBuffer bb_from_buffer;
	RAnalFnFromBuffer fn_from_buffer;

	// analysis algorithm to use instead of the default
	// r_anal_ex_recursive_descent when using perform_analysis from
	// RAnalEx stuffs
	RAnalExAnalysisAlgorithm analysis_algorithm;
	// order in which these call backs are
	// used with the recursive descent disassembler
	// analysis
	// 0) Before performing any analysis is start, opportunity to do any pre analysis.
	// in the current function
	RAnalExCallback pre_anal;
	// 1) Before any ops are bbs are created
	RAnalExCallback pre_anal_fn_cb;
	// 2) Just Before an op is created.
	// if current_op is set in state, then an op in the main alg wont be processed
	RAnalExCallback pre_anal_op_cb;
	// 3) After a op is created.
	// the current_op in state is used to fix-up the state of op before creating a bb
	RAnalExCallback post_anal_op_cb;
	// 4) Before a bb is created.
	// if current_op is set in state, then an op in the main alg wont be processed
	RAnalExCallback pre_anal_bb_cb;
	// 5) After a bb is created.
	// the current_bb in state is used to fix-up the state of before performing analysis
	// with the current bb
	RAnalExCallback post_anal_bb_cb;
	// 6) After processing is bb and cb is completed, opportunity to do any post analysis.
	// in the current function
	RAnalExCallback post_anal_fn_cb;

	// 6) After bb in a node is completed, opportunity to do any post analysis.
	// in the current function
	RAnalExCallback post_anal;

	RAnalExCallback revisit_bb_anal;

	// command extension to directly call any analysis functions
	RAnalCmdExt cmd_ext;

	RAnalRegProfCallback set_reg_profile;
	RAnalRegProfGetCallback get_reg_profile;
	RAnalFPBBCallback fingerprint_bb;
	RAnalFPFcnCallback fingerprint_fcn;
	RAnalDiffBBCallback diff_bb;
	RAnalDiffFcnCallback diff_fcn;
	RAnalDiffEvalCallback diff_eval;

	RAnalIsValidOffsetCB is_valid_offset;

	RAnalEsilCB esil_init; // initialize esil-related stuff
	RAnalEsilLoopCB esil_post_loop;	//cycle-counting, firing interrupts, ...
	RAnalEsilInterruptCB esil_intr; // interrupts
	RAnalEsilTrapCB esil_trap; // traps / exceptions
	RAnalEsilCB esil_fini; // deinitialize
} RAnalPlugin;

That was more than I expected.

It also seems that the op function (that Chip8 uses) is actually a legacy function, according to the struct comments. The old way seems to provide a callback whenever an opcode is found (this is the opcode we define in RAsmPlugin, we pass length and a string to a RAsmOp) with RAnalOpCallback, a basic block callback and a function callback.

The new way is a more in depth way of being involved in the analysis.

  • RAnalExAnalysisAlgorithm analysis_algorithm: If you want to replace the r_anal_ex_recursive_descent implementation.
  • RAnalExCallback pre_anal: Before any analysis is done.
  • RAnalExCallback pre_anal_fn_cb: Before Analysis of “function” in current block? Comments say that this is before there are any basic blocks at all… Need to see what’s passed to it to really understand it.
  • RAnalExCallback pre_anal_op_cb: Before Analysis of “operation/opcode” in current block, so just before an op is created. If ou set the state variable in (current_)op, then the main algorithm won’t do anything with it.
  • RAnalExCallback post_anal_op_cb: After an opcode is created. Use the state variable in (current_)op.
  • RAnalExCallback pre_anal_bb_cb: Before Analysis of building blocks (so beofre they’re created) (State in current_bb)
  • RAnalExCallback post_anal_bb_cb: After a basic block was created, see state variable again I guess?
  • RAnalExCallback post_anal_fn_cb: After a “function” (I think) was analyzed (so when it hits a ret or something and the end of a function is reached?).
  • RAnalExCallback post_anal: After the analysis loop (I think)
  • RAnalExCallback revisit_bb_anal: In case you need another go at each basic block?

We can also overwrite the “function finder” with analyze_fns. There’s some stuff about ESIL (I’ll ignore it for now) and then there are three functions to overwrite from buffers: RAnalOpFromBuffer, RAnalBbFromBuffer and RAnalFnFromBuffer.

There’s also this list of functions:

  • RAnalRegProfCallback set_reg_profile
  • RAnalRegProfGetCallback get_reg_profile
  • RAnalFPBBCallback fingerprint_bb
  • RAnalFPFcnCallback fingerprint_fcn
  • RAnalDiffBBCallback diff_bb
  • RAnalDiffFcnCallback diff_fcn
  • RAnalDiffEvalCallback diff_eval

But we don’t care about them right now. Let’s try and use post_anal_op_cb and see whether we can update the opcode that was created with the meaning.

/* radare - Apache 2.0 - Copyright 2010-2015 - pancake and koffiedrinker */

#include <string.h>
#include <r_types.h>
#include <r_lib.h>
#include <r_asm.h>
#include <r_anal.h>
#include <r_anal_ex.h>
#include <r_cons.h>

/* dp, short for debug printing */
static void dp(char* str) {
        char* debugfile = "/tmp/r2_sparcle.log";
        FILE* fp = fopen(debugfile, "a");
        if(fp == NULL)
                return;

        fprintf(fp, str);
        fclose(fp);
}

static int sparcle_op(RAnal *anal, RAnalState *state, ut64 addr) {
        dp("Anal Sparcle says hi!\n");
        return 0; /* Return value doesn't matter since r2 only calls it, never check result. */
}

RAnalPlugin r_anal_plugin_sparcle = {
        .name = "sparcle",
        .desc = "sparcle analysis plugin",
        .license = "Apache",
        .arch = "sparcle",
        .bits = 8 | 16 | 32 | 64,
        .post_anal_op_cb = &sparcle_op,
        0
};

#ifndef CORELIB
RLibStruct radare_plugin = {
        .type = R_LIB_TYPE_ANAL,
        .data = &r_anal_plugin_sparcle,
        .version = R2_VERSION
};
#endif

Should work (a).

As with the RAnalPlugin, we need to create a build file, add our build file in the Makefile, add it in r_anal.h and in the plugin-list.

Build file libr/anal/p/sparcle.mk (copied from Brainfuck build file, once again :) ):

BJ_SPARCLE=anal_sparcle.o

STATIC_OBJ+=${OBJ_SPARCLE}
TARGET_SPARCLE=anal_sparcle.${EXT_SO}

ALL_TARGETS+=${TARGET_SPARCLE}

${TARGET_SPARCLE}: ${OBJ_SPARCLE}
        ${CC} $(call libname,anal_sparcle) ${LDFLAGS} ${CFLAGS} -o anal_sparcle.${EXT_SO} ${OBJ_SPARCLE}

Add it to libr/anal/p/Makefile:

ARCHS=null.mk ppc_gnu.mk ppc_cs.mk arm_gnu.mk avr.mk xap.mk dalvik.mk sh.mk ebc.mk gb.mk malbolge.mk ws.mk h8300.mk cr16.mk v850.mk
msp430.mk sparc_gnu.mk sparc_cs.mk x86_cs.mk cris.mk 6502.mk snes.mk riscv.mk vax.mk xtensa.mk rsp.mk sparcle.mk

Add a line in libr/include/r_anal.h:

extern RAnalPlugin r_anal_plugin_sparcle;

And in the plugins.def.cfg file:

anal.sparcle

Compile and run :)

make mrproper
./configure
make
sudo make install

And when we run it, exactly nothing happens. Not even if I run “aa”, “aaa” or “aaaa”. Hmpf.

static void r_anal_ex_perform_post_anal_op_cb(RAnal *anal, RAnalState *state, ut64 addr) {
        printf("Is this all just a big lie?\n");
        if (anal->cur && anal->cur->post_anal_op_cb) {
                printf("It seems not.\n");
                anal->cur->post_anal_op_cb (anal, state, addr);
        }
}

So let’s backtrack how this function is used in the Java analysis plugin:

  • anal_ex_perform_post_anal_op_cb is called in r_anal_ex_analysis_driver.
  • r_anal_ex_analysis_driver is called in r_anal_ex_perform_analysis.
  • r_anal_ex_perform_analysis is called in Java Analysis plugin by handle_bb_cf_recursive_descent (four times),java_post_anal_linear_sweep (one time) and analyze_method (one time).
  • analyze_method is called from analyze_from_code_buffer.
  • java_post_anal_linear_sweep is passed to the ls RAnalPlugin struct (.post_anal).
  • handle_bb_cf_recursive_descent is called from java_recursive_descent.
  • java_recursive_descent is passed to the (normal) RAnalPlugin struct (.post_anal_bb_cb)
  • analyze_from_code_buffer is called from analyze_from_code_attr and java_analyze_fns_from_buffer.
  • java_analyze_fns_from_buffer is called from java_analyze_fns which is called from both RAnalPlugin structs (.analyze_fns)
  • analyze_from_code_attr is called from java_analyze_fns which is called from both RAnalPlugin structs (.analyze_fns).

Because that’s quite a lot of code to go through, I asked on IRC whether I should actually use these new functions. The answer was (as I expected) that they were made for the Java plugin and were not really expected to be used.

So good news, we can just use .op.

This means our code becomes:

/* typedef int (*RAnalOpCallback)(RAnal *a, RAnalOp *op, ut64 addr, const ut8 *data, int len); */
static int sparcle_op(RAnal *anal, RAnalOp *op, ut64 addr, const ut8 *data, int len) {
        dp("Anal Sparcle says hi!\n");
        char pbuf[256];
        snprintf(pbuf, 256, "Data length: %d, Data: %x, Op: %p\n", len, data, op);
        dp(pbuf);
        return 0;
}

RAnalPlugin r_anal_plugin_sparcle = {
        .name = "sparcle",
        .desc = "sparcle analysis plugin",
        .license = "Apache",
        .arch = "sparcle",
        .bits = 8 | 16 | 32 | 64,
        .op = &sparcle_op,
        0
};

which gives:

....
Anal Sparcle says hi!
Data length: 1024, Data: 1424ef10, Op: 0x7ffd96f03b00
Anal Sparcle says hi!
Data length: 256, Data: 141060d0, Op: 0x7ffd96f036e0
Anal Sparcle says hi!
Data length: 256, Data: 141060d0, Op: 0x7ffd96f036e0
Anal Sparcle says hi!
Data length: 4096, Data: 1424dd90, Op: 0x7ffd96f04660

Since this function more or less expects us to disassemble the bytes, we will copy the code that we made in the disassemble function and clean it up a bit. We then just use the op types defined in r_anal.h.

typedef enum {
	R_ANAL_OP_TYPE_COND  = 0x80000000, // TODO must be moved to prefix?
	//TODO: MOVE TO PREFIX .. it is used by anal_ex.. must be updated
	R_ANAL_OP_TYPE_REP   = 0x40000000, /* repeats next instruction N times */
	R_ANAL_OP_TYPE_MEM   = 0x20000000, // TODO must be moved to prefix?
	R_ANAL_OP_TYPE_REG   = 0x10000000, // operand is a register
	R_ANAL_OP_TYPE_IND   = 0x08000000, // operand is indirect
	R_ANAL_OP_TYPE_NULL  = 0,
	R_ANAL_OP_TYPE_JMP   = 1,  /* mandatory jump */
	R_ANAL_OP_TYPE_UJMP  = 2,  /* unknown jump (register or so) */
	R_ANAL_OP_TYPE_RJMP  = R_ANAL_OP_TYPE_REG | R_ANAL_OP_TYPE_UJMP,
	R_ANAL_OP_TYPE_IJMP  = R_ANAL_OP_TYPE_IND | R_ANAL_OP_TYPE_UJMP,
	R_ANAL_OP_TYPE_IRJMP = R_ANAL_OP_TYPE_IND | R_ANAL_OP_TYPE_REG | R_ANAL_OP_TYPE_UJMP,
	R_ANAL_OP_TYPE_CJMP  = R_ANAL_OP_TYPE_COND | R_ANAL_OP_TYPE_JMP,  /* conditional jump */
	R_ANAL_OP_TYPE_MJMP  = R_ANAL_OP_TYPE_MEM | R_ANAL_OP_TYPE_JMP,  /* conditional jump */
	R_ANAL_OP_TYPE_UCJMP = R_ANAL_OP_TYPE_COND | R_ANAL_OP_TYPE_UJMP, /* conditional unknown jump */
	R_ANAL_OP_TYPE_CALL  = 3,  /* call to subroutine (branch+link) */
	R_ANAL_OP_TYPE_UCALL = 4, /* unknown call (register or so) */
	R_ANAL_OP_TYPE_RCALL = R_ANAL_OP_TYPE_REG | R_ANAL_OP_TYPE_UCALL,
	R_ANAL_OP_TYPE_ICALL = R_ANAL_OP_TYPE_IND | R_ANAL_OP_TYPE_UCALL,
	R_ANAL_OP_TYPE_IRCALL= R_ANAL_OP_TYPE_IND | R_ANAL_OP_TYPE_REG | R_ANAL_OP_TYPE_UCALL,
	R_ANAL_OP_TYPE_CCALL = R_ANAL_OP_TYPE_COND | R_ANAL_OP_TYPE_CALL, /* conditional call to subroutine */
	R_ANAL_OP_TYPE_UCCALL= R_ANAL_OP_TYPE_COND | R_ANAL_OP_TYPE_UCALL, /* conditional unknown call */
	R_ANAL_OP_TYPE_RET   = 5, /* returns from subroutine */
	R_ANAL_OP_TYPE_CRET  = R_ANAL_OP_TYPE_COND | R_ANAL_OP_TYPE_RET, /* conditional return from subroutine */
	R_ANAL_OP_TYPE_ILL   = 6,  /* illegal instruction // trap */
	R_ANAL_OP_TYPE_UNK   = 7, /* unknown opcode type */
	R_ANAL_OP_TYPE_NOP   = 8, /* does nothing */
	R_ANAL_OP_TYPE_MOV   = 9, /* register move */
	R_ANAL_OP_TYPE_CMOV  = 9 | R_ANAL_OP_TYPE_COND, /* conditional move */
	R_ANAL_OP_TYPE_TRAP  = 10, /* it's a trap! */
	R_ANAL_OP_TYPE_SWI   = 11,  /* syscall, software interrupt */
	R_ANAL_OP_TYPE_UPUSH = 12, /* unknown push of data into stack */
	R_ANAL_OP_TYPE_PUSH  = 13,  /* push value into stack */
	R_ANAL_OP_TYPE_POP   = 14,   /* pop value from stack to register */
	R_ANAL_OP_TYPE_CMP   = 15,  /* compare something */
	R_ANAL_OP_TYPE_ACMP  = 16,  /* compare via and */
	R_ANAL_OP_TYPE_ADD   = 17,
	R_ANAL_OP_TYPE_SUB   = 18,
	R_ANAL_OP_TYPE_IO    = 19,
	R_ANAL_OP_TYPE_MUL   = 20,
	R_ANAL_OP_TYPE_DIV   = 21,
	R_ANAL_OP_TYPE_SHR   = 22,
	R_ANAL_OP_TYPE_SHL   = 23,
	R_ANAL_OP_TYPE_SAL   = 24,
	R_ANAL_OP_TYPE_SAR   = 25,
	R_ANAL_OP_TYPE_OR    = 26,
	R_ANAL_OP_TYPE_AND   = 27,
	R_ANAL_OP_TYPE_XOR   = 28,
	R_ANAL_OP_TYPE_NOR   = 29,
	R_ANAL_OP_TYPE_NOT   = 30,
	R_ANAL_OP_TYPE_STORE = 31,  /* store from register to memory */
	R_ANAL_OP_TYPE_LOAD  = 32,  /* load from memory to register */
	R_ANAL_OP_TYPE_LEA   = 33, /* TODO add ulea */
	R_ANAL_OP_TYPE_LEAVE = 34,
	R_ANAL_OP_TYPE_ROR   = 35,
	R_ANAL_OP_TYPE_ROL   = 36,
	R_ANAL_OP_TYPE_XCHG  = 37,
	R_ANAL_OP_TYPE_MOD   = 38,
	R_ANAL_OP_TYPE_SWITCH = 39,
	R_ANAL_OP_TYPE_CASE = 40,
	R_ANAL_OP_TYPE_LENGTH = 41,
	R_ANAL_OP_TYPE_CAST = 42,
	R_ANAL_OP_TYPE_NEW = 43,
	R_ANAL_OP_TYPE_ABS = 44,
	R_ANAL_OP_TYPE_CPL = 45,	/* complement */
	R_ANAL_OP_TYPE_CRYPTO = 46,
	R_ANAL_OP_TYPE_SYNC = 47,
	//R_ANAL_OP_TYPE_DEBUG = 43, // monitor/trace/breakpoint
#if 0
	R_ANAL_OP_TYPE_PRIV = 40, /* priviledged instruction */
	R_ANAL_OP_TYPE_FPU = 41, /* floating point stuff */
#endif
} _RAnalOpType;

Giving us:

static int sparcle_op(RAnal *anal, RAnalOp *op, ut64 addr, const ut8 *data, int len) {
	char pbuf[256];
	snprintf(pbuf, 256, "Addr: %ld\n", addr);
	dp(pbuf);

	/* An instruction is always 4 bytes. Return if we don't get four bytes to read */
	if(len < 4) {
		return 0;
	}
	op->size = 4; /* Set opcode size to 4 bytes */

	/* FETCH */
	char opcode[4];
	memcpy(opcode, data, 4);

	/* DECODE */
	ut16 a = 0;
	switch(opcode[0]) {
		case 0x00:
			/* snprintf(op->buf_asm, R_ASM_BUFSIZE, "exit %d", r_read_le24(opcode+1)); */
			break;
		case 0x01:
			op->type = R_ANAL_OP_TYPE_OR;
			break;
		case 0x02:
			op->type = R_ANAL_OP_TYPE_LOAD;
			break;
		case 0x03:
			op->type = R_ANAL_OP_TYPE_STORE;
			break;
		case 0x04:
			/* snprintf(op->buf_asm, R_ASM_BUFSIZE, "mov 0x%04x, r%d", r_read_le16(opcode+1), opcode[3]); */
			break;
		case 0x05:
			op->type = R_ANAL_OP_TYPE_AND;
			break;
		case 0x06:
			op->type = R_ANAL_OP_TYPE_SHL;
			break;
		case 0x07:
			op->type = R_ANAL_OP_TYPE_SHR;
			break;
		case 0x08:
			op->type = R_ANAL_OP_TYPE_ADD;
			break;
		case 0x09:
			op->type = R_ANAL_OP_TYPE_CMP;
			break;
		case 0x0a:
			op->type = R_ANAL_OP_TYPE_JMP;
			op->fail = addr + 4; /* Four bytes after our opcode is the other opcode */
			op->jump = (ut64) r_read_le16(opcode+2);
			break;
		case 0x0b:
			op->type = R_ANAL_OP_TYPE_JMP;
			op->fail = addr + 4;
			op->jump = (ut64) r_read_le16(opcode+2);
			break;
		case 0x0c:
			op->type = R_ANAL_OP_TYPE_JMP;
			op->fail = addr + 4;
			op->jump = (ut64) r_read_le16(opcode+2);
			break;
		case 0x0d:
			op->type = R_ANAL_OP_TYPE_JMP;
			op->fail = addr + 4;
			op->jump = (ut64) r_read_le16(opcode+2);
			break;
		case 0x0e:
			op->type = R_ANAL_OP_TYPE_LOAD;
			break;
		case 0x0f:
			op->type = R_ANAL_OP_TYPE_STORE;
			break;
		case 0x10:
			op->type = R_ANAL_OP_TYPE_SHL;
			break;
		default:
			break;
	}

	return 4;
}

The only gotcha here was that I initially used UJMP as type but then you don’t get the arrows. So use JMP.

Compile and run:

[koffiedrinker@ctf radare2]$ r2 -A -a sparcle -b 32 sparcle_machinecode.bin
[x] Analyze all flags starting with sym. and entry0 (aa)
[ ]
[Value from 0x00000000 to 0x00000049
aav: 0x00000000-0x00000049 in 0x0-0x49
[x] Analyze function calls (aac)
[x] Analyze len bytes of instructions for references (aar)
[x] Use -AA or aaaa to perform additional experimental analysis.
[x] Constructing a function name for fcn.* and sym.func.* functions (aan)
 -- You haxor! Me jane?
[0x00000000]> pd 20
/ (fcn) fcn.00000000 28
|   fcn.00000000 ();
|           0x00000000      04000002       mov 0x0000, r2
|           ; CODE XREF from 0x00000040 (fcn.00000000 + 64)
|       .-> 0x00000004      0e020300       mov mem[r2], r3
|       :   0x00000008      04000004       mov 0x0000, r4
|       :   0x0000000c      09040304       cmp r4, r3, r4
|      ,==< 0x00000010      0a044400       je r4, 0x0044
       |:   0x00000014      01030004       or r3, r0, r4
       |:   0x00000018      06030403       shl r3, 4, r3
       |:   0x0000001c      08030403       add r3, r4, r3
       |:   0x00000020      06030303       shl r3, 3, r3
       |:   0x00000024      08030403       add r3, r4, r3
       |:   0x00000028      04ff0004       mov 0x00ff, r4
       |:   0x0000002c      05030403       and r3, r4, r3
       |:   0x00000030      0f030200       mov r3, mem[r2]
       |:   0x00000034      04010004       mov 0x0001, r4
       |:   0x00000038      08020402       add r2, r4, r2
       |:   0x0000003c      09000004       cmp r0, r0, r4
       |`=< 0x00000040      0a040400       je r4, 0x0004               ; fcn.00000000+0x4
|      |    ; CODE XREF from 0x00000010 (fcn.00000000)
|      `--> 0x00000044      00000000       exit 0
\           0x00000048      a0ffffff       Crash and burn.
            0x0000004c      ffffffff       Crash and burn.
[0x00000000]>

And we have arrows and XREFs!

The RAnalPlugin source can be found here.

Sources

http://radare.today/posts/extending-r2-with-new-plugins/: Of course the official radare blog is a great resource for any kind of information related to radare2. This post documents how to write a disassembly plugin but it’s a bit light on the details.