That's no longer true for any high-speed processor (incl x86). The cache is generally large enough that only the least recently used cache lines will be dumped.
The x86 series includes branch prediction, so it will 'guess' at whether the jump will be made or not, and fill the pipeline with the most appropriate path of instructions - if it finds that it guessed wrong, then the pipeline will be 'stall' and be reset and restart processing instructions from the other path.
In addition, fetching new instructions from memory into cache is relatively expensive compared to re-executing the same instructions again from the cache. Unrolling generally forces the instructions to be fetched to cache.
x86 is very good at the branch prediction, so generally, looped code will be faster than unrolled code, assuming that the loop is relatively small.
Basically, don't worry too much about loops - they aren't slow, and your second pseudo-code will most likely be faster (depending on the actual content of 'do_stuff').
[EDIT]
@James H,
Forgot to answer your post ...
Firstly, DBPro builds lists of objects every frame that must be considered for rendering. If you have excluded your object or hidden it then it won't be added to these lists, reducing the cost of rebuilding these lists - this is minimal though, unless you have increased the number of objects, which could cause these lists be be expanded in size.
Secondly, DBPro sorts these lists in various ways - sorting is expensive, no matter how good your sort algorithms. In certain rendering modes, DBPro will sort these lists every time a camera is rendered, and in others, may sort these lists when you change textures etc.
Thirdly, I happen to know that the OBJECT IN SCREEN code was updated in the last release for improved accuracy and uses different code than the rendering code - whether the new code is actually faster I don't know, but I believe the rendering code is a little 'looser' in what it considers to be inside the frustum.