Hide And Seek - CSCG2022

Category: Reverse Engineering
Difficulty: Easy
Author: lion

You may be wondering why this is one of the challenges that I submitted a writeup for.
Well, I may or may not have overcomplicated it, *quite a bit*.

(Don't mind the missing icons, fonts being fonts...)

What was supposed to be a simple challenge, where an strace command was enough, became an adventure for me which lead to me learning a bit more than intended.

Recon

We are given a zip file, hide_and_seek.zip, containing the executable hide_and_seek.
Executing it gives us the following output:

    Hello!
    Here is your flag:
    https://www.youtube.com/watch?v=oHg5SJYRHA0

If you dared to visit the link, you'll know that it is, in fact, not the flag.
But what you may not know is that this first execution of the binary could lead to some confusion, which we'll get to later.

We, at first, try the usual, strings, strace (incorrectly), ltrace, whatever you have, but none yield results.
It's a simple challenge, but not *that* simple.
So, onto some actual reversing!

Looking at the binary in Ghidra, it does a few things.
First, the binary is stripped, but that shouldn't be a problem.
Usually, you can identify the main function by looking at the entry function.
In it you can find a call to __libc_start_main, and the first argument to that call is the main function we are looking for.

This is a decompilation Ghidra offers us.
We can see that the main function first gets some random data via the getrandom function, and then enters a loop.
This loop repeats for 10 times, and each time it calls fork() followed by the function I for now named do_something.

What fork does here is duplicate the current process, so to say.
It creates another thread, which then calls do_something.
So do_something gets executed 10 times, each of them running at the same time, because they are seperate threads.
This should already raise suspicion, if we run the function 10 times, why do we only see the output once?
Let's say we acknowledged this suspicion, and, unlike me, followed the intended route from here on out.

So yes, something is definitely weird here, only one of the 10 threads get to print.
This could lead to the conclusion that the thread printing the flag has his output suppressed.
This theory gets supported by the fact that in the strings output from earlier we can see a reference to /dev/null, the magical linux file that makes things disappear.
But if we try running strace correctly, to catch the syscalls of all of the threads, we encounter a problem.
None of them print the flag.

So, is the theory busted?
Actually, no!
Looking at the strace output, we can see something rather interesting.

The binary first calls unlink, but on itself?
unlink is the system call that deletes files, "unlinks" them in the filesystem's file table if you want to be precise.
And right after, we reopen, and recreate, the file we just deleted.
So, the binary edits itself when you run it.
Which in turn means that the binary we have now isn't the same binary we downloaded, which we can confirm by comparing their hashes.

Knowing this we can try running the correct strace which accounts for every thread on the original binary, to try and find the flag.
In this case, the command is strace -f -s 100 ./hide_and_seek 2>&1 | grep CSCG.
-f to follow the threads, -s 100 to make strace show more of any strings it finds (for example the flag).
The last bit, 2>&1, is to redirect stderr to stdout, as strace outputs its tracing findings into stderr, which the | grep does not see.

And sure enough, when you run this against the original, never before run binary, you will get the flag.

This is where the intended solution ends, but this is not how I solved it.
To know how my slightly weirder way worked out we have to go back to the start, back in Ghidra.
I did realise that the binary modifies itself, but not that one of the threads actually printed the correct flag.
So from here on out I never ran the binary again, and tried to work out how the flag could be hidden.

On my search I came across 2 functions, both supposedly printing the flag.
One of them however didn't print a flag, rather it just printed the hardcoded link we see above.

This is a snippet of the function, and as we can see, this cannot print the flag as it just prints hardcoded strings.
The other function though looked like this.

This one prints a variable as the flag, so this must be the right function!
Though we see something that worried at least me when I looked at it.
On line 56 we, once again, get random data into the buffer here called local_148.
I looked at this, and on my hunt for the first blood thought "I'm not gonna reverse this, let's do it with angr."
Now, I can imagine that this idea might feel completely out of the blue, and nonsensical looking at this function, but I had a plan.

First, what even is angr?
It's a tool with which you can spare some reversing work, if you know how it works.
It basically takes a binary, and a desired output, and then tries to find out how to run the binary to get the output you want.
The problem here is that the "input" here is the getrandom() function, which is not something you, I, or angr can control.
But I had a solution for this problem.
I had the brilliant idea to "just mod libc to make getrandom() == read()"
What I mean by that is take the current libc from which every standard function is executed, and "replace" getrandom() with read().
While getrandom() outputs completely random data, read() asks the user for data.
This way I and angr can 100% control the random data the binary gets.

To get started with this task, I first coded together a simple getrandom() function which passes its arguments to read().

All we do here is move around the arguments to fit, and then call the read() syscall.
Since I plan to replace a libc function, we can't use libc functions in here, as we cannot dynamically link libc into itself.
After this we compile the wrapper, extract the opcodes, find the opcodes for getrandom() inside of libc with objdump, and use this python script to seamlessly replace it.


import binascii

WRITE = binascii.unhexlify("4889f24889fe48c7c00000000048c7c7000000000f05c3909090909090")
OLD = binascii.unhexlify("f30f1efa648b04251800000085c07510b83e0100000f05483d00f0ffff")

with open("./libc.so.6", "rb") as fd:
  fc = fd.read()

nfc = fc.replace(OLD, WRITE)
with open("./cool-libc.so.6", "wb") as fd:
    fd.write(nfc)
    

This gives us the new and improved cool-libc.so.6.
I proceeded to play around with it a bit to see what I could do with it, and here some interesting results in python.

The first 31 A's (plus a newline for 32 bytes) are possible for some initialization purposes, but the os.urandom(4) call after we fully control.
This confirms this weird new libc works, awesome!
We can now make hide_and_seek depend on it with patchelf, and then try to run it with angr.

The following is the angr script I used to try and find the random values needed for it to give us the flag.


import angr
import sys

p = angr.Project("./hide_and_seek")

initial_state = p.factory.entry_state()
simgr = p.factory.simgr(initial_state)
  
simgr.explore(find=0x40161d)
result = simgr.found[0]
print(result.posix.dumps(sys.stdin.fileno()))
    

All this simple angr project does is use the binary, and try to find its way to the address 0x40161d.
That is the address at which the function prints the supposed real flag.
Then once we find this state, the script tells us which random values were required to reach it.

I ran this, and waited.
And waited.
And waited...
And when the day was over, it still did not find the correct solution.
Usually when angr takes this long, you shouldn't use angr.
So in the end, the cool libc patching solution didn't pan out, and I had to pivot back to reversing.

That doesn't mean my real solution was the boring strace though, I still didn't notice how simple this challenge actually was.
So, back to the flag function in ghidra once again.

On further inspection, it looks like the function checks if the random data it got can be "hashed"/"encrypted" by some custom hash function, and that this hash is the same as the one located at DAT_001020e0.
We, I'll say hash from now on, hash the random bytes we got on line 61/62, and check it against the data segment on line 63.
And so, yet another idea floated in my mind.
As this hashing function checks the input byte-by-byte, bruteforcing is possible here.
We just have to reimplement the function on line 61/62 in python, and try every character at every position to get the flag.
So I did just that.


import string

raw_hash_list = b'DC\x86\x86\r\xdbb\x00h\xbb\x9a\xb4\xd9\xd1\xbb\xd2D\x16T\x94\x9d\
\xf5\x0b\xffH5\x97fa\xda\xa4\xecq\xfa\x19\xa2\xf0\xde\x82\xaa\xf1\x9b\xa8KpM\xdf\xf2\
\x81\xf5\xacW18\xa6\x16q\x05\xbe\xa6\x84\x98s\xab\xfd\x94z\x1a\x96\x00\xf1t!<lY\x16\x90\
\x07\xe6+\xa2p\xcd\xcd$$d>H3^9pxD(\xcc}\xe61\xe9\xf6\x001\xe94c\x88\xa0\xfeNy\n\x88\xd87b\
\x96\xdc-\x1f\xaby\xfd"T\xa8\xbd~\xfd\x00$+=\x9bY\x88\xe7\xc1\xb1@\xaa\x05\xd3\xa0q\xf7N\
\xfb%\xf9\xc3\xbf\xe8o9\x1f\xad#\x95\xeb\x15U\x04\xa0U\x9d\xf5\x15\xe5\xb7\x00\x00\x00\x00'
    
hash_list = []

for i in range(0, len(raw_hash_list), 4):
    num = (raw_hash_list[i+0] << 0) 
    num |= (raw_hash_list[i+1] << 8) 
    num |= (raw_hash_list[i+2] << 16) 
    num |= (raw_hash_list[i+3] << 24)  
    hash_list.append(num)
    
chars = string.printable
    
def hash(char, i):
    char = ord(char)
    char |= char << 0x8
    char |= char << 0x10
    char |= char << 0x18
    return (char * i + 1) % 0x100000000
    
lasthash = 0x10001
i = 0

flag = ""

while True:
    for char in chars:
        curr = hash(char, lasthash)
        if curr != hash_list[i]:
            continue
        lasthash = curr
        flag += char
        print(flag)
        i += 1
        break
    

In the first few lines we define the DAT_001020e0 buffer, copied out of Ghidra.
The next few lines, the for loop, converts this raw buffer into a list of 32bit integers (e.g. b"\x41\x41\x41\x41\x42\x42\x42\x42" => [0x41414141, 0x42424242])
And finally, we try every printable character, and see if the hash is correct for each one of them.
If its correct, add the character to the flag, and move on to the next character.
And this, finally, actually gets us the flag!

While I did overcomplicate the challenge quite a bit, I think this solution is wayy more fun, and the small excourse in libc patching was worth it too.
I hope you enjoyed reading this writeup, and following through on my thought process in this challenge!

~sw1tchbl4d3, 19/05/2022 (dd/mm/yyyy)