How are you recording the voice? Are you doing it yourself?
For each sound I would record a few slightly different examples with the sound in different parts of the words. For example, if we recorded the G "guh" sound I could say things like: "Gas", "Grass", "Glass", "Aggregate", "Long"
Then we can edit them and cut out the rest of the word around the G and see if there is a difference in the sound. "Long" is definitely different (not sure if this is a separate recognised phoneme) but there may be differences between the other G sounds that you could break them into sub-phonemes for a more fluid pronunciation.
I phoned up my phone company today and the generated voice on the other end was quite impressive, it sounded fluid even when reading back my name that I spelled out to it. I like to talk like a robot to those things, it seems to help and it's funny!
Do oranges know what colour they are?