I use the GNU ARM toolchain to supply the assembler (and linker and debugger, etc.) rather than take the classic approach of writing the assembler in Forth.
There are tradeoffs. The siren song is "the work has already been done for you, just reuse it".
Still, I have found it very convenient to be able to use the GNU debugger in a few cases in tracing out and correcting certain primitives. I have not needed this often, but it has been a comfort when I did need it. (Using Forth-style assembler instead of the GNU assembler would not necessarily preclude using gdb.)
The full GNU ARM toolchain (with gcc, the C compiler) is not needed, just the binutils package, which is quite easy to install. Binutils supplies the assembler and linker and various object file tools. It can be installed as a binary package — there are a number of sites on the web supplying precompiled ARM toolchains, for both Linux and Microsoft Windows — or you can compile binutils yourself. To use the GNU debugger, you also need the gdb package.
Note, the GNU tools can be given a prefix when you compile them so
that there is no conflict between the ones for the ARM and the ones
for your native Linux system. For example, I use "arm-elf-" as the
prefix, so my ARM assembler is named arm-elf-as while my native x86
assembler is named simply as.
Also, note that the assembler is used only for the primitives. Some Forth implementations cram the high-level definitions into the assembler in a highly unreadable fashion, but Riscy Pygness expresses the high-level definitions in straightforward Forth.