I had been pondering about doing more or less the same thing for 6502 (6510).
It was always the dilemma of whether to pull the CPU out of a C64 and replace it like this, do it as a bus mastering cartridge, or replace the RAM.
I have been leaning towards the cartridge plan to avoid the requirement of doing machine surgery. If you get the RP2350 to pretend to be the RAM then the video hardware could read directly out of it which makes all sorts of shenanigans possible (every line a BADLINE).
At some point it would look like just plugging A VIC-II and a SID into a board with the RP2350 though, The cartridge approach means you have to do transfers across into the computer's RAM, but you could also write to hardware registers every CPU cycle, which would enable some potentially new modes that would not be entirely dissimilar to every line a BADLINE.
Right now I'm mucking around with getting the RP2350 to output video constructed a scanline at a time, using as little CPU as possible. I got three layers of tiles and two layers of sprites each with different pixel formats working yesterday. Quite pleased with that. The CPU calculates a handful of values per scanline, but fetching tilemap data, then tile data, then conversion to pixel values, transparency and palette lookup are all DMA and PIO. Does 1,2,4, and 8 bits per pixel, each tile/sprite/imagebuffer layer with independent 24 bit palettes.
The one thing that makes a modern computer faster than an 80s computer is cache. Without cache, your computer has to go to the memory bus to fetch every instruction and memory read or write, and your system will wait to get the bytes back before it takes any action. You end up at the performance level of an 80s computer.
So you replace the CPU with a faster one with built-in cache. CPU ends up with its own private copy of the RAM and ROM sitting in its cache. But that's not the end.
Computers have a memory map, memory bank switching, memory-mapped IO, and other things to consider. The CPU with its cache has to be kept in sync with the actual memory map of the system. Both the CPU and any memory mapping hardware need to be kept in sync with each other. Memory-mapped IO reads and writes need to go to the actual memory bus at native bus speed.
Then you're left with the issue of other devices that need to access the RAM. This requires cache flushing for writes, and cache invalidation for reads.
This is less of a “CPU replacement” and more of a bus-level participant.
Once you control the bus cycle-accurately, the CPU abstraction kind of disappears.
You’re effectively redefining the whole machine behavior from the outside.
Hot tip: Ignore the RP2350 design sheet and use a standard 1.2V LDO in to provide the internal vCore - you save having to use that weird inductor and can clock it at a 300Mhz much more reliably at 1.2V.
Oh wow. Enhancements for the Sharp MZ line! Wonderful. I spent a lot of time with those machines in the 1980s and own a few. Being able to emulate the Sharp MZ-80K's (https://blog.jgc.org/2009/08/in-which-i-switch-on-30-year-ol...) MZ80FD would be cool.
It was always the dilemma of whether to pull the CPU out of a C64 and replace it like this, do it as a bus mastering cartridge, or replace the RAM.
I have been leaning towards the cartridge plan to avoid the requirement of doing machine surgery. If you get the RP2350 to pretend to be the RAM then the video hardware could read directly out of it which makes all sorts of shenanigans possible (every line a BADLINE).
At some point it would look like just plugging A VIC-II and a SID into a board with the RP2350 though, The cartridge approach means you have to do transfers across into the computer's RAM, but you could also write to hardware registers every CPU cycle, which would enable some potentially new modes that would not be entirely dissimilar to every line a BADLINE.
Right now I'm mucking around with getting the RP2350 to output video constructed a scanline at a time, using as little CPU as possible. I got three layers of tiles and two layers of sprites each with different pixel formats working yesterday. Quite pleased with that. The CPU calculates a handful of values per scanline, but fetching tilemap data, then tile data, then conversion to pixel values, transparency and palette lookup are all DMA and PIO. Does 1,2,4, and 8 bits per pixel, each tile/sprite/imagebuffer layer with independent 24 bit palettes.
https://eaw.app/pico6502/
"and palette lookup are all DMA and PIO"
PIO is a revelation.
So you replace the CPU with a faster one with built-in cache. CPU ends up with its own private copy of the RAM and ROM sitting in its cache. But that's not the end.
Computers have a memory map, memory bank switching, memory-mapped IO, and other things to consider. The CPU with its cache has to be kept in sync with the actual memory map of the system. Both the CPU and any memory mapping hardware need to be kept in sync with each other. Memory-mapped IO reads and writes need to go to the actual memory bus at native bus speed.
Then you're left with the issue of other devices that need to access the RAM. This requires cache flushing for writes, and cache invalidation for reads.
Once you control the bus cycle-accurately, the CPU abstraction kind of disappears. You’re effectively redefining the whole machine behavior from the outside.