OSx86

This page is obviously very outdated, but the hack is an informative read, so I’m keeping it up for informational purposes.

 

I have OSx86 running on a spare laptop I have. Unfortunately, for the longest time, I’ve had issues where the system would crash with a kernel panic after a few minutes of use, rendering the entire system pretty much unusable. I finally got fed up and dove into the kernel, and this is just a little info on how that went.

Diving into the XNU kernel was a lot harder than any of my previous work because there was no source provided. The XNU kernel is open source, but there are a number of things implemented in the full OSX XNU kernel that are not in the Darwin source version. Therefore, there isn’t a 1:1 ratio – you can refer to the source code for reference, but there’s no easy way to fix the problem, since you can’t rebuild the sources after editing them. Thus, this had to be done the hard way – hex editing the binary, like the way a lot of software crackers break copy protection on games and applications.

I won’t bother typing out the full panic printout, but there are two numbers that give key pointers as to where the crash is happening. The first was the line “panic(cpu 0 caller 0×0019BEB2) Unresolved kernel trap (CPU 0, Type 0=divide)”. This says that there was a divide error, meaning either a divide by zero or a divide overflow. And 0×0019BEB2 is the address of the function that is screwing up – in this case, disassembly shows that 0×0019BEB2 is the function “tsc_to_nanoseconds”. This at least tells us that the error is happening somewhere in the realtime clock code. The other key number is “EIP: 0×0019a2d6″. EIP is the instruction pointer, which points at the exact assembly instruction that’s screwing up.

Fortunately (or unfortunately, depending on how you look at it), I wasn’t the only one with this issue, and someone else already had disassembled the kernel and made a fix for his computer. Unfortunately, this solution was pretty much just a hackish disabling of an entire section of code. Not to devalue his work, but the fix didn’t work on my computer. So it required more work.

He posted the disassembly of the code section, and it looked something like this:


__text:0019A2CD push ebx
__text:0019A2CE mov ebx, eax
__text:0019A2D0 mov edx, [eax+8]
__text:0019A2D3 mov eax, [eax+4]
__text:0019A2D6 div ecx ; This is where it fails (divide by zero)
__text:0019A2D8 xchg eax, ebx
__text:0019A2D9 mov eax, [eax]
__text:0019A2DB div ecx
__text:0019A2DD xchg ebx, edx
__text:0019A2DF pop ebx

For the longest time, we were under the impression that the error was a divide by zero. I tried a number of fixes, but they didn’t work. That’s when it occurred to me that it was not a divide by zero, but a divide overflow. And this can be verified by math:

The div instruction on x86, when it’s given a doubleword argument (which it is in this case, ecx is a 32-bit or doubleword register), takes the upper half of the dividend from edx, the lower half from eax, and the divisor is the argument you gave to div. The quotient gets stored in eax, and the remainder is in edx. So, further information from my panic:
EAX: 0xfffffff4
ECX: 0×0263b969
EDX: 0×02faf07f
If you combine edx and eax for the dividend, the full number is 0×02faf07ffffffff4, or 214748364799999988 in decimal. The divisor is 0×0263b969, or 40089961 in decimal. If you divide those out you get 0×13f482c5d, or 5356661853 in decimal. That hex number has 9 digits, which is a 36 bit number (each hex digit is 4 bits, times 9 is 36). That’s too large for the 32 bit registers in x86 hardware. Hence, divide overflow.

Now the question is, how to fix this? The one main problem is space. When you reverse engineer an already-compiled binary, all the addresses of the functions and jumps are already hard-coded into the binary. This means that you cannot change the addresses of any jumps or subroutines. So you either have to change instructions but keep the same number and size of them, or you have to replace some of the extra “nop” functions that exist as a result of function alignment (where the compiler adds extra padding to align subroutines on certain addresses to make fetching them smoother for the processor).

As some more info, here’s a larger disassembly of the entire section of the function (this one was done with a different disassembler, so the addresses are a little different):


000892CD 53 push ebx
000892CE 89C3 mov ebx,eax
000892D0 8B5008 mov edx,[eax+0x8]
000892D3 8B4004 mov eax,[eax+0x4]
000892D6 F7F1 div ecx
000892D8 93 xchg eax,ebx
000892D9 8B00 mov eax,[eax]
000892DB F7F1 div ecx
000892DD 87DA xchg ebx,edx
000892DF 5B pop ebx
000892E0 8945C0 mov [ebp-0x40],eax
000892E3 8955C4 mov [ebp-0x3c],edx
000892E6 8945D0 mov [ebp-0x30],eax
000892E9 8955D4 mov [ebp-0x2c],edx
000892EC 8B45C0 mov eax,[ebp-0x40]
000892EF 8B55C4 mov edx,[ebp-0x3c]
000892F2 83C43C add esp,byte +0x3c
000892F5 5B pop ebx
000892F6 5E pop esi
000892F7 5F pop edi
000892F8 5D pop ebp
000892F9 C3 ret
000892FA 90 nop
000892FB 90 nop
000892FC 55 push ebp

The very last line, the “push ebp” at 0×000892FC, is the start of the next subroutine, so that has to stay where it is. Notice the two nops above it? That’s the extra padding from function alignment. However, 16 bits is not enough for any reasonable instruction that would fix this bug (see on the same line, “nop” is represented by 90 in hex, and there are two of them. So 9090 is 4 hex digits, 4*4 is 16). So at this point it was time to go and see if we could optimize and strip out unnecessary instructions.

Okay, now look at that assembly again. See anything redundant? If not, here’s a closer look:

000892E0 8945C0 mov [ebp-0x40],eax ; Copy the contents of eax into [ebp-0x40]
000892E3 8955C4 mov [ebp-0x3c],edx ; Copy the contents of edx into [ebp-0x3c]
000892E6 8945D0 mov [ebp-0x30],eax ; Copy the contents of eax into [ebp-0x30]
000892E9 8955D4 mov [ebp-0x2c],edx ; Copy the contents of edx into [ebp-0x2c]
000892EC 8B45C0 mov eax,[ebp-0x40] ; Copy [ebp-0x40] into eax, or 0x000892E0 in reverse
000892EF 8B55C4 mov edx,[ebp-0x3c] ; Copy [ebp-0x3c] into edx, or 0x000892C6 in reverse

So we copy stuff from eax and edx into [ebp-0x40] and [ebp-0x3c] respectively, then we copy it back without changing it. So 0×000892EC and 0×000892EF are completely unnecessary. Since those are 3-byte (24-bit) instructions, that means we have an extra 48 bits to work with. Score!

So if we strip out those two and slide the rest of the instructions including the div at 0×0019a2d6 down, we have some extra space above the div to fix the number before it divides and crashes. There was really only one instruction that I could come up with that would fix it – doing an and of edx against a mask, and essentially cutting off the most significant bit. It has the _potential_ to break things, but at this point I didn’t really care. I just wanted it to work. So the instruction to and edx against the mask would be something like:

and edx,0x1ffffff

And in raw hexadecimal, this converts to “81 e2 ff ff ff 01″. 81 e2 is the “and edx” part, and ff ff ff 01 is 0×01ffffff written in the reversed little endian format. So, after that, the final code looked like this:

000892CD 53 push ebx
000892CE 89C3 mov ebx,eax
000892D0 8B5008 mov edx,[eax+0x8]
000892D3 8B4004 mov eax,[eax+0x4]
000892D6 81E2FFFFFF01 and edx,0x1ffffff
000892DC F7F1 div ecx
000892DE 93 xchg eax,ebx
000892DF 8B00 mov eax,[eax]
000892E1 F7F1 div ecx
000892E3 87DA xchg ebx,edx
000892E5 5B pop ebx
000892E6 8945C0 mov [ebp-0x40],eax
000892E9 8955C4 mov [ebp-0x3c],edx
000892EC 8945D0 mov [ebp-0x30],eax
000892EF 8955D4 mov [ebp-0x2c],edx
000892F2 83C43C add esp,byte +0x3c
000892F5 5B pop ebx
000892F6 5E pop esi
000892F7 5F pop edi
000892F8 5D pop ebp
000892F9 C3 ret
000892FA 90 nop
000892FB 90 nop
000892FC 55 push ebp

The number is trimmed, the result of the div is no longer too large, and OSx86 does not crash anymore.

Smoking Blue Child Theme by Altamente Decorativo | built on Thematic Framework
Scroll to top