- tried to show how to look at relevant addresses on your computer
- tried to explain how current OSses still let memory regions clash
- and talked about using existing processor tech to mitigate the problem.
I first came on these ideas twenty, maybe thirty years ago, when trying to figure out what made the M6809 such a magical microprocessor. (That's a subject for another day.) The M6809 was (and still is) an 8-bit microprocessor with a 16-bit address bus. That means it can address 64Kbytes of memory.
Motorola specified a memory management part called the 6829 which supported designs up to 1 megabyte of memory. It was essentially a block of fast RAM that would be used to translate the upper bits of memory, plus a latch that would select which parts of the RAM would be used to translate the upper bits of the address bus, something like this:
(This is from memory, and not really complete. Hopefully, it's enough to get the concept from.)
Memory management control would provide functions like write protect and read protect, so you could keep the CPU from overwriting program code and set parts of the address space up as guard pages.
With 32 bits and more of address, memory management doesn't quite work this simply, but this is enough to get some confidence that memory management can actually and meaningfully be done.
Now, if you are familiar with the 8086, you may wonder what the difference between the 8086's segment registers and this would be. This kind of scheme provides fairly complete control over the memory map.
The 8086 segment registers only moved 64k address windows around in the physical map, and provided no read or write control. Very simple, but no real management. The 80286 provided write protect and such, but the granularity was still abysmal, mostly based on guesses about the usage of certain registers, guesses which sort of worked with some constrained C programming language run-time models. And these guesses were frozen into silicon before they were tested. It should go without saying, that such guesses miss the mark for huge segments of the industry, but Intel's salescrew has always been trained in the art of smooth-talking.
(Intel are not the only badguys in the industry, they are just the ones who played this particular role.)
Now I knew about both of these approaches, and I knew about the split stack in Forth. And it occurred to me that, if a 6829-like MMU could talk to the CPU, and select a different task latch on accesses through the return stack pointer (S in the 6809), you could make it completely impossible to crash the return stack and overwrite return addresses.
I'm not talking about guard pages, I'm talking about the return addresses just simply can't be accesses by any means except call and return. They're outside the range of addresses that application code can generate.
Of course, the OS kernel can access the stack regions by mapping them in, but we expect the OS to behave itself.
(We would provide system calls to allow an application to have the OS adjust a return address when such is necessary.
Also, since we are redesigning the CPU, we might add instructions for exceptional return states, but I would really rather not do that. It seems redundant, since split stacks make multiple return values so much easier.)
Another thing that occurred to me is that the stack regions could be mapped to separate RAM from the main memory. This would allow calls and returns that would take no more time than regular branches or jumps.
At this point in my imaginings, I'm thinking about serious redesign of the CPU. So I thought about adding one more stack register to the 6809, a dedicated call/return stack. It would never be indexed, so it would be a very simple bit of circuitry. That would free up registers for other use, including additional stacks and such.
(Well, if we allow frame pointers to be pushed with the instruction pointers, and provide instructions for walking the stack, there would be one kind of indexing -- an instruction to fetch a frame pointer at a specific level above the current one. I'll explain how this would work, to aid understanding what's going on here:
There would be a couple of bits in the processor status area, which the OS would set before calling the application startup. The application must not be allowed to modify these bits, but, since the application must be able to confirm that the frames are present, it should be able to read them.
These bits would tell the processor which stack pointers to save with the instruction pointer on calls. The return instructions would have a bit field to determine whether to restore or discard each saved frame pointer.
"Walking the stack" would be simply a load of a specified saved frame pointer at a specified level of calling routine.
In the example shown here, the instruction GETFP sees from the status register that both LP and SP are being recorded, and multiplies the index argument by 3, then adds 2 to point to LP, checks against the return stack base register, and loads LP0 into X.
But GETFP SP,3,Y after pointing into stack that isn't there, checks the return stack base register and refuses to load the frame pointer that isn't there.
Another flag in the status register might select between generating an exception on failure and recording the failure in a status bit.
Could we do such a thing with the 68000 or other 32-bit CPUs? Add a dedicated call/return stack and free the existing stack pointer for use as a parameter stack in a split-stack architecture? 64-bit CPUs?
But if we intend to completely separate the return addresses, we have to add at least one bit of physical address, or we have to treat at least one of the existing address bits (the highest bit) in a special way. I think I'd personally want to lean towards adding a physical address bit, even for the 64-bit CPUs, to keep the protection simple. But, of course, there are interesting possibilities with keeping the physical and logical addresses the same size, but filtering the high bit in user mode.
And that would provide us with a new kind of level-1 cache -- 8, 16, or maybe 32 entries of spill-fill cache attached to the call/return stack, operating in parallel with a (modified) generalized level-1 cache. The interface between memory management and cache would need a bit of redesign, of course.
I'm not sure it would mix well with register renaming. At bare minimum, the call/return stack pointer would have to be completely separate from the rename-able registers.
Would this require rewriting a lot of software?
Some, but mostly just the programming language compilers would need to be worked on.
And most of the rewrite would focus on simplification of code designed to work around the bottleneck of having the return addresses mixed in with parameters.
There you have a way to completely protect the return addresses on stack.
What about other regions of memory? Can we separate them meaningfully?
That kind of thing is already being done in software used on real mainframes, so, yes. But it does have a much larger impact on existing software and on run-time speed, and it is not as simply accomplished.
But that's actually a question I want to visit when I start ranting about the ideal processor that I want to design but will probably never get a chance to. Later.