There are a lot of coding practices that will optimize your code.
Some have only a small performance gain, but if the code is repeated a lot then the optimizations can add up.
Faster Object Loading
Use DBPro's native format, .dbo. Here is the link to a converter written by Rob K for 3DS/X to .DBO:
http://forum.thegamecreators.com/?m=forum_view&t=22599&b=5
If you are using 3DWS and have multiple textures, be sure to
lightmap the terrain first, otherwise the textures wont show up when converted.
My Performance Gain:
7 Skyboxes + 1 terrain, .X FORMAT: Loaded in ~50 seconds
7 Skyboxes + 1 terrain, .DBO FORMAT: Loaded in ~0.17 seconds
TIPS & LITTLE TRICKS
Loop Unroll
The benefit is that the loop doesn't have to check to see if
it should terminate every iteration. Instead, it checks every 5
commands. This is good if you have a loop that just repeats one
command over and over. This also isn't limited to just 5 commands. So
long as the amount of command you unroll is equal to your step,
and your total number of iterations divides evenly by your step, you
can unroll the loop. (Creating and Deleting objects is an extremely bad example, because no matter what, Windows has to allocate space to create the objects. If you had already created 50,000 or so objects, then it would be useful to implement loop unrolling with them. Otherwise, just do it the old fashioned way)
Use this:
for x=1 to 100 step 5
delete object x
delete object x+1
delete object x+2
delete object x+3
delete object x+4
next x
Instead of:
for x=1 to 100
delete object x
next x
My Performance Gain:
Rolled loop, make 50,000 objects: 23.775 seconds
Rolled loop, delete 50,000 objects: 59.296 seconds
Unrolled loop, make 50,000 objects: 4.848 seconds
Unrolled loop, delete 50,000 objects: 56.991 seconds
Pull out continually computated quantities
If a quantity is computed inside a loop during every iteration,
and its value is the same for each iteration, it can vastly
improve efficiency to hoist it outside the loop and compute its
value just once before the loop begins.
Use:
x=y+z
t1=x*x
for n=1 to 100
a=6*n+t1
next n
Instead of:
for n=1 to 100
x=y+z
a=6*n+x*x
next n
My Performance Gains:
Quantity inside loop, 10 million iterations: 0.480 Seconds
Quantity outside loop, 10 million iterations: 0.254 seconds
Quantity inside loop, 100 million iterations: 3.680 seconds
Quantity outside loop, 100 million iterations: 2.208 seconds
Quantity inside loop, 1 billion iterations: 40.768 seconds
Quantity outside loop, 1 billion iterations: 21.152 seconds
Use For/Next instead of Do/Loop for your main loop.
For/Next loops run faster than Do/Loop loops for some reason.
For your main loop, you could set the ending value absurdly high,
or have the value reset periodically, or have many loops nested.
In terms of speed, here are the loops, from fastest to slowest:
For/Next
Gosub (can be a loop)
While/Endwhile
Goto (can be a loop)
Repeat/Until
Do/Loop
Instead of:
Do
(code)
Loop
Use:
For x=1 to (some high number)
(code)
next x
My Performance Gains:
(y is reset to 0 before each loop)
F/N Loop, inc y until y=10,000: 0.000 seconds
Gosub Loop, inc y until y=10,000: 0.000 seconds
W/EW Loop, inc y until y=10,000: 3.198 seconds
Goto Loop, inc y until y=10,000: 3.287 seconds
R/U Loop, inc y until y=10,000: 3.271 seconds
D/L Loop, inc y until y=10,000: 3.226 seconds
F/N Loop, inc y until y=20,000: 0.000 seconds
Gosub Loop, inc y until y=20,000: 0.000 seconds
W/EW Loop, inc y until y=20,000: 6.531 seconds
Goto Loop, inc y until y=20,000: 6.610 seconds
R/U Loop, inc y until y=20,000: 8 seconds
D/L Loop, inc y until y=20,000: 6.920 seconds
F/N Loop, inc y until y=30,000: 0.000 seconds
Gosub Loop, inc y until y=30,000: 0.000 seconds
W/EW Loop, inc y until y=30,000: 9.737 seconds
Goto Loop, inc y until y=30,000: 13.648 seconds
R/U Loop, inc y until y=30,000: 11.283 seconds
D/L Loop, inc y until y=30,000: 10.173 seconds
F/N Loop, inc y until y=400,000: 0.003 seconds
Gosub Loop, inc y until y=400,000: 0.004 seconds
F/N Loop, inc y until y=1,000,000: 0.008 seconds
Gosub Loop, inc y until y=1,000,000: 0.0011 seconds
Loop Unswitching
It moves a conditional inside a loop outside of it
('p' in this case) by duplicating the loop's body, and placing
a version of it inside each of the if and else clauses of the conditional.
Use:
if p=1
for n=1 to 1000
x=x+y
y=0
next f
else
for n=1 to 1000
x=x+y
next f
endif
Instead of:
for n=1 to 1000
x=x+y
if p=1 then y=0
next n
My Performance Gains:
No Unswitching, p=0, 10,000 iterations: 0.017 seconds
No Unswitching, p=1, 10,000 iterations: 0.025 seconds
Unswitching, p=0, 10,000 iterations: 0.014 seconds
Unswitching, p=1, 10,000 iterations: 0.019 seconds
No Unswitching, p=0, 100,000 iterations: 0.192 seconds
No Unswitching, p=1, 100,000 iterations: 0.213 seconds
Unswitching, p=0, 100,000 iterations: 0.128 seconds
Unswitching, p=1, 100,000 iterations: 0.160 seconds
No Unswitching, p=0, 1 million iterations: 18.56 seconds
No Unswitching, p=1, 1 million iterations: 20.8 seconds
Unswitching, p=0, 1 million iterations: 16 seconds
Unswitching, p=1, 1 million iterations: 19.2 seconds
Partial Redundancy Elimination
An expression is called partially redundant if the value computed
by the expression is already available on some but not all paths
through a program to that expression.
An expression is fully redundant if the value computed by the
expression is available on all paths through the program to that expression.
Instead of:
if a=1
y=x+4
else
t=x+4
endif
z=x+4
Use:
if a=1
y=x+4
t=y
else
t=x+4
endif
z=t
That way, you only perform one calculation, regardless of whether
the condition is true or not.
My Performance Gains:
No Elimination, a=0, 1 million iterations: 0.032 seconds
No Elimination, a=1, 1 million iterations: 0.046 seconds
Elimination, a=0, 1 million iterations: 0.000 seconds
Elimination, a=1, 1 million iterations: 0.000 seconds
No Elimination, a=0, 10 million iterations: 0.206 seconds
No Elimination, a=1, 10 million iterations: 0.224 seconds
Elimination, a=0, 10 million iterations: 0.160 seconds
Elimination, a=1, 10 million iterations: 0.192 seconds
No Elimination, a=0, 100 million iterations: 1.856 seconds
No Elimination, a=1, 100 million iterations: 2.048 seconds
Elimination, a=0, 100 million iterations: 1.728 seconds
Elimination, a=1, 100 million iterations: 2.048 seconds
Common Subexpression Elimination
Find instances of identical expressions (ie, they all
evaluate to the same value) and replace them with a single
variable holding the computed value.
Use:
temp=b*c
a=temp+g
d=temp*d
Instead of:
a=b*c+g
d=b*c+g
You can also simplify the code, for even further optimization:
Instead of:
a=30
b=9-a/5
c=b*4
if c>10
c=c-10
c*(60/a)
endif
You can simplify it to:
a=30
b=3
c=3*4
if c>10
c=c-10
c=c*2
endif
Then to:
c=12
if 12>10
c=2
c=c*2
endif
And if you know that the condition is always true, you can simplify that to:
c=4
My Performance Gains:
Not Eliminated, 1 million iterations: 0.064 seconds
Eliminated, 1 million iterations: 0.032 seconds
Not Eliminated, 10 million iterations: 0.224 seconds
Eliminated, 10 million iterations: 0.192 seconds
Not Eliminated, 100 million iterations: 2.592 seconds
Eliminated, 100 million iterations: 1.984 seconds
Not Eliminated, 1 billion iterations: 38.208 seconds
Eliminated, 1 billion iterations: 24.416 seconds
General Tips:
Try to use numbers instead of variables where possible, ex:
load object "Thing.x",12
instead of:
load object "thing.x",num
===================================================
Try to break down computations into their simplest form, so that the compiler has less work to do when running.
===================================================
Delete images/objects/bitmaps/etc. that you are no longer using. It saves memory (RAM).
===================================================
Use only types that you need, for instance use integer instead of floats if you aren't going to be dealing with decimals.
===================================================
Use #CONSTANT if your variables aren't going to be changing in value.
===================================================
Recreating your algorithm in a more efficient manner is by far one of the best ways to increase your program's efficiency.
===================================================
Contributions by IanM:
How to really optimize
1. Pick a better algorithm.
Yes, it's difficult - sometimes you actually have to think about the problem and it's not always easy to come up with new ideas, but it will give you better results than micro-optimisations will almost 100% of the time.
2. Make it right before making it faster.
Debugging code is a harder job than writing code. Optimisation usually makes code less clear to read, and unclear code is harder to debug.
3. Measure.
Become religious about it. Measure several times, PROVE where the problem is, make the changes, then measure it again. PROVE that it is faster in the expected way. In any place that you've made the code less readable, put a comment in to explain what you've done and why, so that if you need to debug in future, you'll at least have half a chance.
Array access is relatively slow.
If you are accessing an array element more than twice in a loop, put it into a variable temporarily while you are working on it.
#constant TIMER_RES 1000000
#constant RUN_COUNT 1000000
sync on
sync rate 0
sync
sync
print "Starting"
sync
dim a(10) as integer
for i = 0 to 10
a(i) = i
next
StartTime = hitimer( TIMER_RES )
for i = 1 to RUN_COUNT
for j = 0 to 10
a(j) = a(j) * a(j)
next
next
FinishTime = hitimer( TIMER_RES )
print "Standard Array Access: "; FinishTime - StartTime
StartTime = hitimer( TIMER_RES )
for i = 1 to RUN_COUNT
for j = 0 to 10
x = a(j)
x = x * x
a(j) = x
next
next
FinishTime = hitimer( TIMER_RES )
print "Alternate Array Access: "; FinishTime - StartTime
sync
wait key
end
The win gets bigger the more array accesses you remove in this way.
Shortcut evaluation
DBPro has not implemented short-cut evaluation. However, for an IF statement, and where you are carrying out AND/&& evaluations, you can fairly easily simulate it.
if x >= StartX and x <= EndX and Y >= StartY and y <= EndY
` do something
endif
` can be replaced with
if x >= StartX then if x <= EndX then if y >= StartY then if y <= EndY
` do something
endif
In the first IF statement, every part of the expression is evaluated before the result is checked - that's 7 operations.
In the second IF statement, the first part is evaluated, and only if true does it go on to evaluate the second, and only if that's true does it evaluate the third, and only if that's true does it evaluate the fourth - that's a minimum of 1 operation and a maximum of 4 operations. Even in the worst case where every evaluation matches, it provides better results than the first IF statement.
#constant TIMER_RES 1000000
#constant RUN_COUNT 1000000
sync on
sync rate 0
sync
sync
print "Starting"
sync
StartX = 100
EndX = 200
StartY = 100
EndY = 200
x = 150
y = 150
StartTime = hitimer( TIMER_RES )
for i = 1 to RUN_COUNT
if x >= StartX and x <= EndX and Y >= StartY and y <= EndY
` do something
endif
next
FinishTime = hitimer( TIMER_RES )
print "Standard AND evaluation: "; FinishTime - StartTime
StartTime = hitimer( TIMER_RES )
for i = 1 to RUN_COUNT
if x >= StartX then if x <= EndX then if Y >= StartY then if y <= EndY
` do something
endif
next
FinishTime = hitimer( TIMER_RES )
print "Shortcut AND evaluation: "; FinishTime - StartTime
sync
wait key
end
Change the values of x and y to check it out (change them to zero for example).
FOR loop evaluations
Don't use functions (either your own or plug-ins) to provide the value for either the top of the loop, or the step - if you must do something like this, always put the results into a variable.
The reason for doing this is that DBPro can call these functions multiple times each time around the loop.
Run this and see how many times each of the 'Get' functions is called:
` Don't do this:
for i = GetStart() to GetEnd() step GetStep()
print "Value of i = "; i
next
print ""
` Do this:
Start = GetStart()
Finish = GetEnd()
StepSize = GetStep()
for i = Start to Finish step StepSize
print "Value of i = "; i
next
wait key
end
function GetStart()
print "Called GetStart()"
endfunction 1
function GetEnd()
print "Called GetEnd()"
endfunction 2
function GetStep()
print "Called GetStep()"
endfunction 1
If your functions are 'expensive' then a lot of time could be lost by them being called multiple times.
Don't code it yourself
If it's already in a plug-in that you have or can afford, use it.
Obviously, code it if you want to figure out how to do something, but once you've done that, put your code to one side and use the plug-in. Unless the plug-in code is especially inefficient, there's no way you can match its speed, even for a simple function. (As a simple example, see the recent 'Highest of two values' in the Code Snippets board)
Avoid type conversions, especially the hidden ones
Passing an integer to a function or command that accepts a float value will cause the compiler to introduce a hidden conversion. In fact, passing a value of any type to a function/command that expects another type will introduce a conversion.
This can happen almost anywhere. For example, all of the object positions, rotation and scaling commands accept float arguments. If you are repositioning or rotating these objects a lot, and using integers to do so, then you are wasting cycles.
In addition, DBPro does not do any type conversion during runtime. For example, if you have a command 'XAngle# = XAngle# + 1', the '1' is an integer value that will be converted at runtime and then added to the variable. This will be marginally slower than the correct 'XAngle# = XAngle# + 1.0'.
So you now know that the compiler treats numbers without a decimal point as an integer, and those with one as a float - basically that 1.0 <> 1.
Did you also know that if you use a hex number (0x12345678), that it will be treated as a dword, and that conversion from an integer to a dword and vice versa will also cost you cycles?
#constant TIMER_RES 1000000
#constant RUN_COUNT 1000000
sync on
sync rate 0
sync
sync
print "Starting"
sync
AnInteger as integer = 1
AFloat as float = 1.0
ADword as dword = 0x1
StartTime = hitimer( TIMER_RES )
for i = 1 to RUN_COUNT
AcceptFloat( AnInteger )
next
FinishTime = hitimer( TIMER_RES )
print "Passing an integer to a float function: "; FinishTime - StartTime
StartTime = hitimer( TIMER_RES )
for i = 1 to RUN_COUNT
AcceptFloat( AFloat )
next
FinishTime = hitimer( TIMER_RES )
print "Passing a float to a float function: "; FinishTime - StartTime
StartTime = hitimer( TIMER_RES )
for i = 1 to RUN_COUNT
AcceptDword( AnInteger )
next
FinishTime = hitimer( TIMER_RES )
print "Passing an integer to a dword function: "; FinishTime - StartTime
StartTime = hitimer( TIMER_RES )
for i = 1 to RUN_COUNT
AcceptDword( ADword )
next
FinishTime = hitimer( TIMER_RES )
print "Passing a dword to a dword function: "; FinishTime - StartTime
sync
wait key
end
function AcceptFloat(a as float)
endfunction
function AcceptDword(a as dword)
endfunction
Apart from the four basic types (integer, float, dword and string) which can be populated without type conversions, the remaining types (boolean, byte, word, double integer and double float) cannot be populated without a type conversion or other indirect means, and because of this, any constant values used in calculation alongside these types will be inherently a little less efficient.
Avoid repeated memory allocation
One more that BatVink raised with arrays, but where he could actually have gone much further:
There are lots of places where either you or DBPro will allocate chunks of memory to carry out the actions you require. The more you can minimise this within your game 'action' loops the faster your game will run.
- Don't load new media when you can clone it.
- Don't clone an object when you can instance it.
- Don't grow arrays when you can presize them.
- Don't grow arrays by 1's when you can grow them by 10's, 100's or 1000's.
- Don't create memblocks when you can reuse an existing one.
- And less obvious, when manipulating strings, do it in as few steps as possible - every new or temporary string is a memory allocation.
eg:
a$ = mid$(x$,3) + mid$(x$,4) + mid$(x$,5)
The above statement allocates 4 temporary strings and one final string.
mid$(x$,3) ==> T1
mid$(x$,4) ==> T2
T1 + T2 ==> T3
mid$(x$,5) => T4
T3 + T4 => a$
For strings in particular: Always use plug-ins where you can.
As an exception to this advice, always use the TEXT command when joining or outputting multiple strings on a single line - for some reason, DBPro is especially inefficient when using PRINT for this:
#constant TIMER_RES 1000000
#constant RUN_COUNT 1000
sync on
sync rate 0
sync
sync
print "Starting"
sync
StartTime = hitimer( TIMER_RES )
for i = 1 to RUN_COUNT
set cursor 200,100
print "Hello "; i; " xxxxxxxx "; i
next
FinishTime = hitimer( TIMER_RES )
print "Print: "; FinishTime - StartTime
StartTime = hitimer( TIMER_RES )
for i = 1 to RUN_COUNT
text 200, 120, "Hello " + str$(i) + " xxxxxxxx " + str$(i)
next
FinishTime = hitimer( TIMER_RES )
print "Text: "; FinishTime - StartTime
sync
wait key
end
Cache effects when accessing arrays
The cache in your processor can have a significant effect on the speed of your array access if care is not taken. For the cache, accessing memory locations in order is more efficient that accessing them out of order or randomly.
However, in DBPro, even if you think you're accessing them in order on a multi-dimensional array, you may not be. If you have a 2D array, the array item a(1,1) is NOT next to a(1,2) like it would be in most languages, but is next to a(2,1).
For instance, the following array defined as ARRAY(4,3):
A11 A12 A13
A21 A22 A23
A31 A32 A33
A41 A42 A43
Would be stored in a DBPro array in the following order:
A11 A21 A31 A41 A12 A22 A32 A42 A13 A23 A33 A43
(yes, I know I excluded the 0 item in each index)
That means that it's actually faster to run through every item in a multidimensional array by incrementing the indexes at the start of the list first rather than the end of the list as is normally done.
Note that you'll only see this effect if the array plus whatever other stuff your program is doing is large enough that it swamps your cache.
#constant TIMER_RES 1000000
#constant RUN_COUNT 10
#constant ARRAY_SIZE 1000
sync on
sync rate 0
sync
sync
print "Preparing"
dim a(ARRAY_SIZE, ARRAY_SIZE) as integer
for x = 0 to ARRAY_SIZE
for y = 0 to ARRAY_SIZE
a(x, y) = 0
next
next
print "Starting"
sync
StartTime = hitimer( TIMER_RES )
for i = 1 to RUN_COUNT
for Major = 0 to ARRAY_SIZE
for Minor = 0 to ARRAY_SIZE
a(Major, Minor) = 1
next
next
next
FinishTime = hitimer( TIMER_RES )
print "Minor/Major order: "; FinishTime - StartTime
StartTime = hitimer( TIMER_RES )
for i = 1 to RUN_COUNT
for Minor = 0 to ARRAY_SIZE
for Major = 0 to ARRAY_SIZE
a(Major, Minor) = 1
next
next
next
FinishTime = hitimer( TIMER_RES )
print "Major/Minor order: "; FinishTime - StartTime
sync
wait key
end
If you don't see a difference, increase the ARRAY_SIZE constant.
===================================================
Contributions by Diggsey:
Use IanM's matrix1 plugins whenever you can
When comparing the distances of objects, compare the squared distance instead of the actual distance (removes the need for a costly square root).
When you do need to calculate the actual distance, create a vector of the desired dimensions, and use the vector length commands to find its length instead of calling 'sqrt' directly (which for some reason is slow beyond belief...)
Precalculate random numbers, the 'rnd' command is very slow.
Whenever you need to store arrays of multiple values, use a single array and a UDT, instead of multiple arrays. Especially when you are going to be resizing these arrays.
If there is a very costly operation, store the result in a variable, and then keep a flag as to whether the result is valid. Whenever you know that the result will be invalid, unset the flag. To get the result, check if the flag is set, and only recalculate the result if it is not, otherwise you can use the aready-calculated value. I'm using this a lot in my game engine for the calculations of inverse matrices and world matrices for nodes in the scene graph.
Certain code formations which can cause the compiler to use literally hundreds of ASM instructions where only a few are used with only minor changes to the code. Just look in the ASM dump to see examples of this.
===================================================================
Contribution by Van B
If you're done with a sprite, delete it - it doesn't have the same performance impact as when deleting objects, and a lot of hidden sprites will eat your frame rate like it's candy. So don't hide, just delete, and rely on the SPRITE Spr,x,y,imb command to place the sprites you do need.
===================================================================
If anyone has any other code optimizations that they know of that I didn't cover (WHICH I KNOW THERE IS, SO SPEAK UP), it would be most helpful for you to post them, then I will add them to the list with credit going to you. (Special thanks to IanM)