you guys read what Carmack has said about the nv30 vs R300 argument?
its a very interesting read ... it's pretty valid considering he pitted the nv30 GeForceFX5200 against the R300 Radeon 9600
and he said that the nv30 marginally was faster than the R300, and it could take a bigger lead not relying on the ARGB2 but the NV30 instruction sets. The R300 also performed slightly better in R200 and R300 mode than it did in ARGB2...
Carmack contributed the fact the nv30 was actually slower in some instances was because of the flip between the FP16 and FP32 sets.
As the R300 only has the FP24 at its disposal it the reaction to what the nv30 should use can slow things down and the fact that when it uses FP32 (which he has deemed kinda unessary becuse the visual benifits can't really be seen - which is true most people won't be able to tell the difference between 64bit and 128bit operations and the only REAL time to use the 128bit FP32 is when calculating precision of things suchas vertex positions in collision and such.)
But what he reckons is the main problem with this is the fact that the nVidia drivers are still too young and are not actually optimised properly yet - when nVidia can handle the fragmented programs better on the GPU no doubt we'll see one hell of a speed boost over the comming months.
i think the quad precision that the GeForceFX gives as opposed to the triple that the Radeons do are the major difference.
Although ATi's band to just stick thier precision to a more standardised rate - this a)makes them harder to develop effects for, because sometime they lack precision, sometimes you have to add precision which doesn't need to be there and b)you can't double up the precision modes.
also the fact that the Radeons have a pretty small limit on instruction sets ... on major reason i've personally chosen the FX5900 for a current project is because i can do everything i want in a single pass and speed it up with precision optimisation.
You can put the same code onto the Radeon 9800pro and it'll simply not be able to hack it, when you hit the instruction limit you have to then fragment the program to the next pass.
As my rendering engine has 8passes on the GeForceFX, the limitation on the Radeon ment that i had to take 16-18 passes for the same shaders, and also have to structure them differently to compensate for this fact.
This literally cuts the speed in half if i want the same level of detail ingame ... to me the whole arugments of Radeon vs GeForceFX is very much like saying which processor is better,
PPC G4 3.0ghz or the Intel Pentium4 3.0ghz HT
they're 2 totally different designs and you can't honestly say which is better from a developers point of view ... and although yes shaders do have to be able to run to a specification to make them more comptible all around - as i've said time and time again, how they're implimented by the different companies varies quite dramatically.
Shader implimentation isn't like the implimentation of MMX on the Pentia processor line - or the adhearance to the OpenGL specifications, these guys are creating totally different setups.
i mean thats something to think about atleast, the interview is on beyond3d.com
although powerwise the FX5900 Ultra doesn't technically beat the FX5800 Ultra in everyday tasks ... as we get more and more into the realm of shaders thats when you see the new technology truely shine.
they also have a review of the Creative Blaster5 FX5900 Ultra which is the card i currently own there too, and its benchmarks are pretty much spot on for my 1.82Ghz Pentium4 (although its wierd how the AoD benchmark results differ greatly from the Blaster5 to the 5900ultra benchmarked against the Radeons)