That's what I thought - but the asm code isn't expanded in SM3 as far as I can tell. In fact I don't see how it can be since I can make nIters very large which would make an enormous unrolled loop.
This is what the MS DX9 documentation says:
Quote: "Dependent Read Limit
There are no dependent read limits.
Texture Instruction Limit
There is no limit on texture instructions.
Instruction Count
Each pixel shader is allowed anywhere from 512 up to the number of slots in MaxPixelShader30InstructionSlots (not more than 32768). The number of instructions run can be much higher because of the looping support. MaxPShaderInstructionsExecuted should be at least 2^16.
"
So something else is going on - unless the documentation is wrong (which it is in places unfortunately
).
Edit There are at least three other possibilities: DarkShader is incorrectly flagging a warning as an error;
a bug in the shader compiler;
a bug that I can't see in my shader code.
Edit2 We can probably rule out a DarkShader error - FX Composer gives exactly the same error and says
"loop does not appear to terminate in a timely manner (1024 iterations)"
even when I set nIters to just 4! Here are the code fragments I tested:
Works:
for (int i = 0; i < nIters; i++)
{ tmp = x*x - y*y + x0;
y = 2.0*abs(x)*abs(y) + y0;
x = tmp;
if ((x*x + y*y) > 4.0) // apply colour when iterations have escaped the circle radius 2
{ steps = i;
i = nIters + 1; // force exit from the loop
escaped = 1;
}
}
result.rgb = tex2D(fractalSample, float2 (0, steps/nIters)).rgb * escaped;
Out.Col = result;
Doesn't work:
for (int i = 0; i < nIters; i++)
{ tmp = x*x - y*y + x0;
y = 2.0*abs(x)*abs(y) + y0;
x = tmp;
if ((x*x + y*y) > 4.0) // apply colour when iterations have escaped the circle radius 2
{ steps = i;
i = nIters + 1; // force exit from the loop
result.rgb = tex2D(fractalSample, float2 (0, steps/nIters)).rgb;
}
}
Out.Col = result;
The evidence is pointing to an obscure undocumented limitation or a compiler bug. What do you think?
Edit3 Done some delving on the MS site and it seems that the problem is an obscurely documented feature of the "if" construct. Here is what it says in remarks on the "if Statement" page. Note especially the reference to
tex2D in the penultimate paragraph:
Quote: "Remarks
When the compiler uses the branch method for compiling an if statement it will generate code that will evaluate only one side of the if statement depending on the given condition. For example, in the if statement:
Copy[branch] if(x)
{
x = sqrt(x);
}
The if statement has an implicit else block, which is equivalent to x = x. Because we have told the compiler to use the branch method with the preceding branch attribute, the compiled code will evaluate x and execute only the side that should be executed; if x is zero, then it will execute the else side, and if it is non-zero it will execute the then side.
Conversely, if the flatten attribute is used, then the compiled code will evaluate both sides of the if statement and choose between the two resulting values using the original value of x. Here is an example of a usage of the flatten attribute:
Copy[flatten] if(x)
{
x = sqrt(x);
}
There are certain cases where using the branch or flatten attributes may generate a compile error. The branch attribute may fail if either side of the if statement contains a gradient function, such as tex2D. The flatten attribute may fail if either side of the if statement contains a stream append statement or any other statement that has side-effects.
An if statement can also use an optional else block. If the if expression is true, the code in the statement block associated with the if statement is processed. Otherwise, the statement block associated with the optional else block is processed.
"
The relevant page on the site is:
if statement
Edit4 I think I've found the relevant part of the documentation. The reason seems subtle but makes sense when you read it all. The following is an extract from the relevant page of the MS DX9 SDK docs. You'll find it by searching for the heading:
Quote: "Interaction of Per-Pixel Flow Control With Screen Gradients
The pixel shader instruction set includes several instructions that produce or use gradients of quantities with respect to screen space x and y. The most common use for gradients is to compute level-of-detail calculations for texture sampling, and in the case of anisotropic filtering, selecting samples along the axis of anisotropy. Typically, hardware implementations run the pixel shader on multiple pixels simultaneously (such as a 2x2 grid), so that gradients of quantities computed in the shader can be reasonably approximated as deltas of the values at the same point of execution in adjacent pixels.
When flow control is present in a shader, the result of a gradient calculation requested inside a given branch path is ambiguous when adjacent pixels may execute separate flow control paths. Therefore, it is deemed illegal to use any pixel shader operation that requests a gradient calculation to occur at a location that is inside a flow control construct which could vary across pixels for a given primitive being rasterized.
"
Sorry about all the edits but it was probably worth it in the end.