This seems like something that *might* be solved by using multithreading or multiprocessing, but it really depends on how the timings for the signals are generated.
If the timings generated are highly accurate (ie, realtime) then nothing short of a subprocessor is going to help because the timings process will literally take over the processor until it's done it's thing.
If they are not, then two programs can be written - one to do the microcontroller stuff and one to display, using a shared memory bank to communicate between them.
Before you dismiss that out of hand, DarkShader works in precisely this way and it works well.