Here are some things that could be done, should be done, whatever.
The MMC/SD worked for the earlier versions for the lpc2106, using the SPI interface. That code may still work with this version of Riscy Pygness, but it needs to be checked. (It uses the mode that ignores CRCs. I would like to add the CRC support, so if someone would like to work out the algorithm for caculating the CRCs, I would be glad to add it.
The lpc2378 and lpc2368 have hardware support for MMC/SD, including the CRC calculations. Code needs to be written to advantage of this.
The current version (16-bit tokens with lookup table) is a good compromise. It allows access to the entire address space and yet is quite compact. For anything that is not fast enough, we can drop down to CODE words. Still, we could have alternate threading implementations for special cases: