How to reference the subroutine? I haven't played with this yet, but I'll try to address your question.
The doc for the 815 (which I'm using) shows this:
- dest
The offset of the destination address from RAM_DL which the display command is to be
switched to. BT815/6 has the stack to store the return address. To come back to the next
command of source address, the RETURN command can help.
The valid range is from 0 to 8191.
So, when you make the call you need to include the offset from RAM_DL to the beginning of the routine in the command. As for what that offset will be, that depends on where you place it when building the display list. My only observation here, is you might need to be careful placing the routine at the end of your list. From my own testing, it seems as though RAM_DL is a window into one of two independant display lists, and the swap might just be alternating the roles of those lists. One which is being constructed, and the other which is actively being displayed.
If that's true, then your routine should be fine anywhere in RAM_DL, maybe even if it appears after the DISPLAY command. If it isn't, then you might need your routine to appear before DISPLAY. In that case, you may be better off placing the routines near the beginning of RAM_DL, and skipping over them using the JUMP command. Then you can easily know ahead of time where the routine was placed. If using the coprocessor to build the display list, you can use REG_CMD_DL to determine where the next location in RAM_DL is that will be written to.
The only other note I'd add, is you will need to place these routines in RAM_DL for each time you build a new display list. A common method used with these screens is to have a fixed portion of the screen content (say background and state setup) at the beginning of the list, followed by dynamic content which may change from frame to frame. The routines would naturally fit into that fixed portion, you just need to be sure to JUMP past them until you intend to call them.
I'm not sure if this helps, but I am interested myself in exploring the subroutines in the display list (eventually). Just keep in mind, that the available DL memory is just one aspect of the limits of what you can get this chip to do. Every scanline drawn to the display has to execute the entire display list, and how long that takes will be a function of both the clock speed and the complexity of the display list. At best, complex lists will slow down the frame rate while at worst you may wind up with hard to diagnose graphic glitches.
Keep us posted!