1# How Xtensa register windows work 2 3There is a paucity of introductory material on this subject, and 4Zephyr plays some tricks here that require understanding the base 5layer. 6 7## Hardware 8 9When register windows are configured in the CPU, there are either 32 10or 64 "real" registers in hardware, with 16 visible at one time. 11Registers are grouped and rotated in units of 4, so there are 8 or 16 12such "quads" (my term, not Tensilica's) in hardware of which 4 are 13visible as A0-A15. 14 15The first quad (A0-A3) is pointed to by a special register called 16WINDOWBASE. The register file is cyclic, so for example if NREGS==64 17and WINDOWBASE is 15, quads 15, 0, 1, and 2 will be visible as 18(respectively) A0-A3, A4-A7, A8-A11, and A12-A15. 19 20There is a ROTW instruction that can be used to manually rotate the 21window by a immediate number of quads that are added to WINDOWBASE. 22Positive rotations "move" high registers into low registers 23(i.e. after "ROTW 1" the register that used to be called A4 is now 24A0). 25 26There are CALL4/CALL8/CALL12 instructions to effect rotated calls 27which rotate registers upward (i.e. "hiding" low registers from the 28callee) by 1, 2 or 3 quads. These do not rotate the window 29themselves. Instead they place the rotation amount in two places 30(yes, two; see below): the 2-bit CALLINC field of the PS register, and 31the top two bits of the return address placed in A0. 32 33There is an ENTRY instruction that does the rotation. It adds CALLINC 34to WINDOWBASE, at the same time copying the old (now hidden) stack 35pointer in A1 into the "new" A1 in the rotated frame, subtracting an 36immediate offset from it to make space for the new frame. 37 38There is a RETW instruction that undoes the rotation. It reads the 39top two bits from the return address in A0 and subtracts that value 40from WINDOWBASE before returning. This is why the CALLINC bits went 41in two places. They have to be stored on the stack across potentially 42many calls, so they need to be GPR data that lives in registers and 43can be spilled. But ENTRY isn't specified to assume a particular 44return value format and is used immediately, so it makes more sense 45for it to use processor state instead. 46 47Note that we still don't know how to detect when the register file has 48wrapped around and needs to be spilled or filled. To do this there is 49a WINDOWSTART register used to detect which register quads are in use. 50The name "start" is somewhat confusing, this is not a pointer. 51WINDOWSTART stores a bitmask with one bit per hardware quad (so it's 8 52or 16 bits wide). The bit in windowstart corresponding to WINDOWBASE 53will be set by the ENTRY instruction, and remain set after rotations 54until cleared by a function return (by RETW, see below). Other bits 55stay zero. So there is one set bit in WINDOWSTART corresponding to 56each call frame that is live in hardware registers, and it will be 57followed by 0, 1 or 2 zero bits that tell you how "big" (how many 58quads of registers) that frame is. 59 60So the CPU executing RETW checks to make sure that the register quad 61being brought into A0-A3 (i.e. the new WINDOWBASE) has a set bit 62indicating it's valid. If it does not, the registers must have been 63spilled and the CPU traps to an exception handler to fill them. 64 65Likewise, the processor can tell if a high register is "owned" by 66another call by seeing if there is a one in WINDOWSTART between that 67register's quad and WINDOWBASE. If there is, the CPU traps to a spill 68handler to spill one frame. Note that a frame might be only four 69registers, but it's possible to hit registers 12 out from WINDOWBASE, 70so it's actually possible to trap again when the instruction restarts 71to spill a second quad, and even a third time at maximum. 72 73Finally: note that hardware checks the two bits of WINDOWSTART after 74the frame bit to detect how many quads are represented by the one 75frame. So there are six separate exception handlers to spill/fill 761/2/3 quads of registers. 77 78## Software & ABI 79 80The advantage of the scheme above is that it allows the registers to 81be spilled naturally into the stack by using the stack pointers 82embedded in the register file. But the hardware design assumes and to 83some extent enforces a fairly complicated stack layout to make that 84work: 85 86The spill area for a single frame's A0-A3 registers is not in its own 87stack frame. It lies in the 16 bytes below its CALLEE's stack 88pointer. This is so that the callee (and exception handlers invoked 89on its behalf) can see its caller's potentially-spilled stack pointer 90register (A1) on the stack and be able to walk back up on return. 91Other architectures do this too by e.g. pushing the incoming stack 92pointer onto the stack as a standard "frame pointer" defined in the 93platform ABI. Xtensa wraps this together with the natural spill area 94for register windows. 95 96By convention spill regions always store the lowest numbered register 97in the lowest address. 98 99The spill area for a frame's A4-A11 registers may or may not exist 100depending on whether the call was made with CALL8/CALL12. It is legal 101to write a function using only A0-A3 and CALL4 calls and ignore higher 102registers. But if those 0-2 register quads are in use, they appear at 103the top of the stack frame, immediately below the parent call's A0-A3 104spill area. 105 106There is no spill area for A12-A15. Those registers are always 107caller-save. When using CALLn, you always need to overlap 4 registers 108to provide arguments and take a return value. 109