Simple Ways to Optimize Code - GameCreators Forum

Author

Message

Digger412

17

Years of Service

User Offline

Joined: 12th Jun 2007

Location:

Posted: 30th Aug 2009 11:55 Edited at: 2nd Sep 2009 14:44

Link

There are a lot of coding practices that will optimize your code.
Some have only a small performance gain, but if the code is repeated a lot then the optimizations can add up.

Faster Object Loading

+ Code Snippet

Use DBPro's native format, .dbo. Here is the link to a converter written by Rob K for 3DS/X to .DBO: 
http://forum.thegamecreators.com/?m=forum_view&t=22599&b=5

If you are using 3DWS and have multiple textures, be sure to 
lightmap the terrain first, otherwise the textures wont show up when converted.

My Performance Gain:
7 Skyboxes + 1 terrain, .X FORMAT: Loaded in ~50 seconds
7 Skyboxes + 1 terrain, .DBO FORMAT: Loaded in ~0.17 seconds

TIPS & LITTLE TRICKS
Loop Unroll

The benefit is that the loop doesn't have to check to see if 
it should terminate every iteration. Instead, it checks every 5 
commands. This is good if you have a loop that just repeats one 
command over and over. This also isn't limited to just 5 commands. So 
long as the amount of command you unroll is equal to your step, 
and your total number of iterations divides evenly by your step, you
can unroll the loop. (Creating and Deleting objects is an extremely bad example, because no matter what, Windows has to allocate space to create the objects. If you had already created 50,000 or so objects, then it would be useful to implement loop unrolling with them. Otherwise, just do it the old fashioned way)

Use this:
for x=1 to 100 step 5
	delete object x
	delete object x+1
	delete object x+2
	delete object x+3
	delete object x+4
next x

Instead of:
for x=1 to 100
	delete object x
next x

My Performance Gain:
Rolled loop, make 50,000 objects: 23.775 seconds
Rolled loop, delete 50,000 objects: 59.296 seconds
Unrolled loop, make 50,000 objects: 4.848 seconds
Unrolled loop, delete 50,000 objects: 56.991 seconds

+ Code Snippet

The benefit is that the loop doesn't have to check to see if 
it should terminate every iteration. Instead, it checks every 5 
commands. This is good if you have a loop that just repeats one 
command over and over. This also isn't limited to just 5 commands. So 
long as the amount of command you unroll is equal to your step, 
and your total number of iterations divides evenly by your step, you
can unroll the loop. (Creating and Deleting objects is an extremely bad example, because no matter what, Windows has to allocate space to create the objects. If you had already created 50,000 or so objects, then it would be useful to implement loop unrolling with them. Otherwise, just do it the old fashioned way)

Use this:
for x=1 to 100 step 5
	delete object x
	delete object x+1
	delete object x+2
	delete object x+3
	delete object x+4
next x

Instead of:
for x=1 to 100
	delete object x
next x

My Performance Gain:
Rolled loop, make 50,000 objects: 23.775 seconds
Rolled loop, delete 50,000 objects: 59.296 seconds
Unrolled loop, make 50,000 objects: 4.848 seconds
Unrolled loop, delete 50,000 objects: 56.991 seconds

Pull out continually computated quantities

+ Code Snippet

If a quantity is computed inside a loop during every iteration, 
and its value is the same for each iteration, it can vastly 
improve efficiency to hoist it outside the loop and compute its 
value just once before the loop begins.

Use:
x=y+z
t1=x*x
for n=1 to 100
	a=6*n+t1
next n

Instead of:
for n=1 to 100
	x=y+z
	a=6*n+x*x
next n

My Performance Gains:
Quantity inside loop, 10 million iterations: 0.480 Seconds
Quantity outside loop, 10 million iterations: 0.254 seconds

Quantity inside loop, 100 million iterations: 3.680 seconds
Quantity outside loop, 100 million iterations: 2.208 seconds

Quantity inside loop, 1 billion iterations: 40.768 seconds
Quantity outside loop, 1 billion iterations: 21.152 seconds

Use For/Next instead of Do/Loop for your main loop.

For/Next loops run faster than Do/Loop loops for some reason. 
For your main loop, you could set the ending value absurdly high, 
or have the value reset periodically, or have many loops nested. 
In terms of speed, here are the loops, from fastest to slowest:
For/Next
Gosub (can be a loop)
While/Endwhile
Goto (can be a loop)
Repeat/Until
Do/Loop

Instead of:
Do
(code)
Loop

Use:
For x=1 to (some high number)
(code)
next x

My Performance Gains:
(y is reset to 0 before each loop)

F/N Loop, inc y until y=10,000: 0.000 seconds
Gosub Loop, inc y until y=10,000: 0.000 seconds
W/EW Loop, inc y until y=10,000: 3.198 seconds
Goto Loop, inc y until y=10,000: 3.287 seconds
R/U Loop, inc y until y=10,000: 3.271 seconds
D/L Loop, inc y until y=10,000: 3.226 seconds

F/N Loop, inc y until y=20,000: 0.000 seconds
Gosub Loop, inc y until y=20,000: 0.000 seconds
W/EW Loop, inc y until y=20,000: 6.531 seconds
Goto Loop, inc y until y=20,000: 6.610 seconds
R/U Loop, inc y until y=20,000: 8 seconds
D/L Loop, inc y until y=20,000: 6.920 seconds

F/N Loop, inc y until y=30,000: 0.000 seconds
Gosub Loop, inc y until y=30,000: 0.000 seconds
W/EW Loop, inc y until y=30,000: 9.737 seconds
Goto Loop, inc y until y=30,000: 13.648 seconds
R/U Loop, inc y until y=30,000: 11.283 seconds
D/L Loop, inc y until y=30,000: 10.173 seconds

F/N Loop, inc y until y=400,000: 0.003 seconds
Gosub Loop, inc y until y=400,000: 0.004 seconds

F/N Loop, inc y until y=1,000,000: 0.008 seconds
Gosub Loop, inc y until y=1,000,000: 0.0011 seconds

+ Code Snippet

For/Next loops run faster than Do/Loop loops for some reason. 
For your main loop, you could set the ending value absurdly high, 
or have the value reset periodically, or have many loops nested. 
In terms of speed, here are the loops, from fastest to slowest:
For/Next
Gosub (can be a loop)
While/Endwhile
Goto (can be a loop)
Repeat/Until
Do/Loop

Instead of:
Do
(code)
Loop

Use:
For x=1 to (some high number)
(code)
next x

My Performance Gains:
(y is reset to 0 before each loop)

F/N Loop, inc y until y=10,000: 0.000 seconds
Gosub Loop, inc y until y=10,000: 0.000 seconds
W/EW Loop, inc y until y=10,000: 3.198 seconds
Goto Loop, inc y until y=10,000: 3.287 seconds
R/U Loop, inc y until y=10,000: 3.271 seconds
D/L Loop, inc y until y=10,000: 3.226 seconds

F/N Loop, inc y until y=20,000: 0.000 seconds
Gosub Loop, inc y until y=20,000: 0.000 seconds
W/EW Loop, inc y until y=20,000: 6.531 seconds
Goto Loop, inc y until y=20,000: 6.610 seconds
R/U Loop, inc y until y=20,000: 8 seconds
D/L Loop, inc y until y=20,000: 6.920 seconds

F/N Loop, inc y until y=30,000: 0.000 seconds
Gosub Loop, inc y until y=30,000: 0.000 seconds
W/EW Loop, inc y until y=30,000: 9.737 seconds
Goto Loop, inc y until y=30,000: 13.648 seconds
R/U Loop, inc y until y=30,000: 11.283 seconds
D/L Loop, inc y until y=30,000: 10.173 seconds

F/N Loop, inc y until y=400,000: 0.003 seconds
Gosub Loop, inc y until y=400,000: 0.004 seconds

F/N Loop, inc y until y=1,000,000: 0.008 seconds
Gosub Loop, inc y until y=1,000,000: 0.0011 seconds

Loop Unswitching

It moves a conditional inside a loop outside of it 
('p' in this case) by duplicating the loop's body, and placing 
a version of it inside each of the if and else clauses of the conditional.

Use:
if p=1
	for n=1 to 1000
	x=x+y
	y=0
	next f
else
	for n=1 to 1000
	x=x+y
	next f
endif

Instead of:
for n=1 to 1000
	x=x+y
	if p=1 then y=0
next n

My Performance Gains:
No Unswitching, p=0, 10,000 iterations: 0.017 seconds
No Unswitching, p=1, 10,000 iterations: 0.025 seconds
Unswitching, p=0, 10,000 iterations: 0.014 seconds
Unswitching, p=1, 10,000 iterations: 0.019 seconds

No Unswitching, p=0, 100,000 iterations: 0.192 seconds
No Unswitching, p=1, 100,000 iterations: 0.213 seconds
Unswitching, p=0, 100,000 iterations: 0.128 seconds
Unswitching, p=1, 100,000 iterations: 0.160 seconds

No Unswitching, p=0, 1 million iterations: 18.56 seconds
No Unswitching, p=1, 1 million iterations: 20.8 seconds
Unswitching, p=0, 1 million iterations: 16 seconds
Unswitching, p=1, 1 million iterations: 19.2 seconds

+ Code Snippet

It moves a conditional inside a loop outside of it 
('p' in this case) by duplicating the loop's body, and placing 
a version of it inside each of the if and else clauses of the conditional.

Use:
if p=1
	for n=1 to 1000
	x=x+y
	y=0
	next f
else
	for n=1 to 1000
	x=x+y
	next f
endif

Instead of:
for n=1 to 1000
	x=x+y
	if p=1 then y=0
next n

My Performance Gains:
No Unswitching, p=0, 10,000 iterations: 0.017 seconds
No Unswitching, p=1, 10,000 iterations: 0.025 seconds
Unswitching, p=0, 10,000 iterations: 0.014 seconds
Unswitching, p=1, 10,000 iterations: 0.019 seconds

No Unswitching, p=0, 100,000 iterations: 0.192 seconds
No Unswitching, p=1, 100,000 iterations: 0.213 seconds
Unswitching, p=0, 100,000 iterations: 0.128 seconds
Unswitching, p=1, 100,000 iterations: 0.160 seconds


No Unswitching, p=0, 1 million iterations: 18.56 seconds
No Unswitching, p=1, 1 million iterations: 20.8 seconds
Unswitching, p=0, 1 million iterations: 16 seconds
Unswitching, p=1, 1 million iterations: 19.2 seconds

Partial Redundancy Elimination

An expression is called partially redundant if the value computed 
by the expression is already available on some but not all paths 
through a program to that expression.

An expression is fully redundant if the value computed by the 
expression is available on all paths through the program to that expression.

Instead of:
if a=1
	y=x+4
else
        t=x+4
endif
z=x+4

Use:
if a=1
	y=x+4
	t=y
else
	t=x+4
endif
z=t

That way, you only perform one calculation, regardless of whether 
the condition is true or not.

My Performance Gains:
No Elimination, a=0, 1 million iterations: 0.032 seconds
No Elimination, a=1, 1 million iterations:  0.046 seconds
Elimination, a=0, 1 million iterations: 0.000 seconds
Elimination, a=1, 1 million iterations: 0.000 seconds

No Elimination, a=0, 10 million iterations: 0.206 seconds
No Elimination, a=1, 10 million iterations: 0.224 seconds
Elimination, a=0, 10 million iterations: 0.160 seconds
Elimination, a=1, 10 million iterations: 0.192 seconds

No Elimination, a=0, 100 million iterations: 1.856 seconds
No Elimination, a=1, 100 million iterations: 2.048 seconds
Elimination, a=0, 100 million iterations: 1.728 seconds
Elimination, a=1, 100 million iterations: 2.048 seconds

+ Code Snippet

An expression is called partially redundant if the value computed 
by the expression is already available on some but not all paths 
through a program to that expression. 

An expression is fully redundant if the value computed by the 
expression is available on all paths through the program to that expression.

Instead of:
if a=1
	y=x+4
else
        t=x+4
endif
z=x+4

Use:
if a=1
	y=x+4
	t=y
else
	t=x+4
endif
z=t

That way, you only perform one calculation, regardless of whether 
the condition is true or not.

My Performance Gains:
No Elimination, a=0, 1 million iterations: 0.032 seconds
No Elimination, a=1, 1 million iterations:  0.046 seconds
Elimination, a=0, 1 million iterations: 0.000 seconds
Elimination, a=1, 1 million iterations: 0.000 seconds

No Elimination, a=0, 10 million iterations: 0.206 seconds
No Elimination, a=1, 10 million iterations: 0.224 seconds
Elimination, a=0, 10 million iterations: 0.160 seconds
Elimination, a=1, 10 million iterations: 0.192 seconds

No Elimination, a=0, 100 million iterations: 1.856 seconds
No Elimination, a=1, 100 million iterations: 2.048 seconds
Elimination, a=0, 100 million iterations: 1.728 seconds
Elimination, a=1, 100 million iterations: 2.048 seconds

Common Subexpression Elimination

+ Code Snippet

Find instances of identical expressions (ie, they all 
evaluate to the same value) and replace them with a single 

variable holding the computed value.

Use:
temp=b*c
a=temp+g
d=temp*d

Instead of:
a=b*c+g
d=b*c+g

You can also simplify the code, for even further optimization:
Instead of:
a=30
b=9-a/5
c=b*4

if c>10
	c=c-10
	c*(60/a)
endif

You can simplify it to:
a=30
b=3
c=3*4
if c>10
	c=c-10
	c=c*2
endif

Then to:
c=12
if 12>10
	c=2
	c=c*2
endif

And if you know that the condition is always true, you can simplify that to:
c=4

My Performance Gains:
Not Eliminated, 1 million iterations: 0.064 seconds
Eliminated, 1 million iterations: 0.032 seconds

Not Eliminated, 10 million iterations: 0.224 seconds
Eliminated, 10 million iterations: 0.192 seconds

Not Eliminated, 100 million iterations: 2.592 seconds
Eliminated, 100 million iterations: 1.984 seconds

Not Eliminated, 1 billion iterations: 38.208 seconds
Eliminated, 1 billion iterations: 24.416 seconds

General Tips:

+ Code Snippet

Try to use numbers instead of variables where possible, ex:
load object "Thing.x",12 
instead of:
load object "thing.x",num
===================================================
Try to break down computations into their simplest form, so that the compiler has less work to do when running.
===================================================
Delete images/objects/bitmaps/etc. that you are no longer using. It saves memory (RAM).
===================================================
Use only types that you need, for instance use integer instead of floats if you aren't going to be dealing with decimals.
===================================================
Use #CONSTANT if your variables aren't going to be changing in value.
===================================================
Recreating your algorithm in a more efficient manner is by far one of the best ways to increase your program's efficiency.

===================================================

Contributions by IanM:

How to really optimize
1. Pick a better algorithm.
Yes, it's difficult - sometimes you actually have to think about the problem and it's not always easy to come up with new ideas, but it will give you better results than micro-optimisations will almost 100% of the time.

2. Make it right before making it faster.
Debugging code is a harder job than writing code. Optimisation usually makes code less clear to read, and unclear code is harder to debug.

3. Measure.
Become religious about it. Measure several times, PROVE where the problem is, make the changes, then measure it again. PROVE that it is faster in the expected way. In any place that you've made the code less readable, put a comment in to explain what you've done and why, so that if you need to debug in future, you'll at least have half a chance.

Array access is relatively slow.
If you are accessing an array element more than twice in a loop, put it into a variable temporarily while you are working on it.

+ Code Snippet

#constant TIMER_RES     1000000
#constant RUN_COUNT    1000000

sync on
sync rate 0
sync
sync

print "Starting"
sync

dim a(10) as integer

for i = 0 to 10
   a(i) = i
next

StartTime = hitimer( TIMER_RES )
for i = 1 to RUN_COUNT
   for j = 0 to 10
      a(j) = a(j) * a(j)
   next
next
FinishTime = hitimer( TIMER_RES )

print "Standard Array Access: "; FinishTime - StartTime

StartTime = hitimer( TIMER_RES )
for i = 1 to RUN_COUNT
   for j = 0 to 10
      x = a(j)
      x = x * x
      a(j) = x
   next
next
FinishTime = hitimer( TIMER_RES )

print "Alternate Array Access: "; FinishTime - StartTime

sync
wait key
end

The win gets bigger the more array accesses you remove in this way.

Shortcut evaluation
DBPro has not implemented short-cut evaluation. However, for an IF statement, and where you are carrying out AND/&& evaluations, you can fairly easily simulate it.

+ Code Snippet

if x >= StartX and x <= EndX and Y >= StartY and y <= EndY
   ` do something
endif

` can be replaced with
if x >= StartX then if x <= EndX then if y >= StartY then if y <= EndY
   ` do something
endif

In the first IF statement, every part of the expression is evaluated before the result is checked - that's 7 operations.

In the second IF statement, the first part is evaluated, and only if true does it go on to evaluate the second, and only if that's true does it evaluate the third, and only if that's true does it evaluate the fourth - that's a minimum of 1 operation and a maximum of 4 operations. Even in the worst case where every evaluation matches, it provides better results than the first IF statement.

+ Code Snippet

#constant TIMER_RES     1000000
#constant RUN_COUNT    1000000

sync on
sync rate 0
sync
sync

print "Starting"
sync

StartX = 100
EndX = 200
StartY = 100
EndY = 200
x = 150
y = 150

StartTime = hitimer( TIMER_RES )
for i = 1 to RUN_COUNT
   if x >= StartX and x <= EndX and Y >= StartY and y <= EndY
      ` do something
   endif
next
FinishTime = hitimer( TIMER_RES )

print "Standard AND evaluation: "; FinishTime - StartTime

StartTime = hitimer( TIMER_RES )
for i = 1 to RUN_COUNT
   if x >= StartX then if x <= EndX then if Y >= StartY then if y <= EndY
      ` do something
   endif
next
FinishTime = hitimer( TIMER_RES )

print "Shortcut AND evaluation: "; FinishTime - StartTime

sync
wait key
end

Change the values of x and y to check it out (change them to zero for example).

FOR loop evaluations
Don't use functions (either your own or plug-ins) to provide the value for either the top of the loop, or the step - if you must do something like this, always put the results into a variable.
The reason for doing this is that DBPro can call these functions multiple times each time around the loop.

Run this and see how many times each of the 'Get' functions is called:

+ Code Snippet

` Don't do this:
for i = GetStart() to GetEnd() step GetStep()
   print "Value of i = "; i
next
print ""

` Do this:
Start = GetStart()
Finish = GetEnd()
StepSize = GetStep()

for i = Start to Finish step StepSize
   print "Value of i = "; i
next

wait key
end


function GetStart()
   print "Called GetStart()"
endfunction 1

function GetEnd()
   print "Called GetEnd()"
endfunction 2

function GetStep()
   print "Called GetStep()"
endfunction 1

If your functions are 'expensive' then a lot of time could be lost by them being called multiple times.

Don't code it yourself
If it's already in a plug-in that you have or can afford, use it.

Obviously, code it if you want to figure out how to do something, but once you've done that, put your code to one side and use the plug-in. Unless the plug-in code is especially inefficient, there's no way you can match its speed, even for a simple function. (As a simple example, see the recent 'Highest of two values' in the Code Snippets board)

Avoid type conversions, especially the hidden ones
Passing an integer to a function or command that accepts a float value will cause the compiler to introduce a hidden conversion. In fact, passing a value of any type to a function/command that expects another type will introduce a conversion.

This can happen almost anywhere. For example, all of the object positions, rotation and scaling commands accept float arguments. If you are repositioning or rotating these objects a lot, and using integers to do so, then you are wasting cycles.

In addition, DBPro does not do any type conversion during runtime. For example, if you have a command 'XAngle# = XAngle# + 1', the '1' is an integer value that will be converted at runtime and then added to the variable. This will be marginally slower than the correct 'XAngle# = XAngle# + 1.0'.

So you now know that the compiler treats numbers without a decimal point as an integer, and those with one as a float - basically that 1.0 <> 1.

Did you also know that if you use a hex number (0x12345678), that it will be treated as a dword, and that conversion from an integer to a dword and vice versa will also cost you cycles?

#constant TIMER_RES     1000000
#constant RUN_COUNT    1000000

sync on
sync rate 0
sync
sync

print "Starting"
sync

AnInteger as integer = 1
AFloat as float = 1.0
ADword as dword = 0x1

StartTime = hitimer( TIMER_RES )
for i = 1 to RUN_COUNT
   AcceptFloat( AnInteger )
next
FinishTime = hitimer( TIMER_RES )

print "Passing an integer to a float function: "; FinishTime - StartTime

StartTime = hitimer( TIMER_RES )
for i = 1 to RUN_COUNT
   AcceptFloat( AFloat )
next
FinishTime = hitimer( TIMER_RES )

print "Passing a float to a float function: "; FinishTime - StartTime

StartTime = hitimer( TIMER_RES )
for i = 1 to RUN_COUNT
   AcceptDword( AnInteger )
next
FinishTime = hitimer( TIMER_RES )

print "Passing an integer to a dword function: "; FinishTime - StartTime

StartTime = hitimer( TIMER_RES )
for i = 1 to RUN_COUNT
   AcceptDword( ADword )
next
FinishTime = hitimer( TIMER_RES )

print "Passing a dword to a dword function: "; FinishTime - StartTime

sync
wait key
end

function AcceptFloat(a as float)
endfunction

function AcceptDword(a as dword)
endfunction

+ Code Snippet

#constant TIMER_RES     1000000
#constant RUN_COUNT    1000000

sync on
sync rate 0
sync
sync

print "Starting"
sync


AnInteger as integer = 1
AFloat as float = 1.0
ADword as dword = 0x1


StartTime = hitimer( TIMER_RES )
for i = 1 to RUN_COUNT
   AcceptFloat( AnInteger )
next
FinishTime = hitimer( TIMER_RES )

print "Passing an integer to a float function: "; FinishTime - StartTime


StartTime = hitimer( TIMER_RES )
for i = 1 to RUN_COUNT
   AcceptFloat( AFloat )
next
FinishTime = hitimer( TIMER_RES )

print "Passing a float to a float function: "; FinishTime - StartTime

StartTime = hitimer( TIMER_RES )
for i = 1 to RUN_COUNT
   AcceptDword( AnInteger )
next
FinishTime = hitimer( TIMER_RES )

print "Passing an integer to a dword function: "; FinishTime - StartTime

StartTime = hitimer( TIMER_RES )
for i = 1 to RUN_COUNT
   AcceptDword( ADword )
next
FinishTime = hitimer( TIMER_RES )

print "Passing a dword to a dword function: "; FinishTime - StartTime

sync
wait key
end

function AcceptFloat(a as float)
endfunction

function AcceptDword(a as dword)
endfunction

Apart from the four basic types (integer, float, dword and string) which can be populated without type conversions, the remaining types (boolean, byte, word, double integer and double float) cannot be populated without a type conversion or other indirect means, and because of this, any constant values used in calculation alongside these types will be inherently a little less efficient.

Avoid repeated memory allocation
One more that BatVink raised with arrays, but where he could actually have gone much further:

There are lots of places where either you or DBPro will allocate chunks of memory to carry out the actions you require. The more you can minimise this within your game 'action' loops the faster your game will run.

- Don't load new media when you can clone it.
- Don't clone an object when you can instance it.
- Don't grow arrays when you can presize them.
- Don't grow arrays by 1's when you can grow them by 10's, 100's or 1000's.
- Don't create memblocks when you can reuse an existing one.
- And less obvious, when manipulating strings, do it in as few steps as possible - every new or temporary string is a memory allocation.

eg:
a$ = mid$(x$,3) + mid$(x$,4) + mid$(x$,5)

The above statement allocates 4 temporary strings and one final string.
mid$(x$,3) ==> T1
mid$(x$,4) ==> T2
T1 + T2 ==> T3
mid$(x$,5) => T4
T3 + T4 => a$

For strings in particular: Always use plug-ins where you can.

As an exception to this advice, always use the TEXT command when joining or outputting multiple strings on a single line - for some reason, DBPro is especially inefficient when using PRINT for this:

+ Code Snippet

#constant TIMER_RES     1000000
#constant RUN_COUNT    1000

sync on
sync rate 0
sync
sync

print "Starting"
sync

StartTime = hitimer( TIMER_RES )
for i = 1 to RUN_COUNT
   set cursor 200,100
   print "Hello "; i; " xxxxxxxx "; i
next
FinishTime = hitimer( TIMER_RES )

print "Print: "; FinishTime - StartTime

StartTime = hitimer( TIMER_RES )
for i = 1 to RUN_COUNT
   text 200, 120, "Hello " + str$(i) + " xxxxxxxx " + str$(i)
next
FinishTime = hitimer( TIMER_RES )

print "Text: "; FinishTime - StartTime

sync
wait key
end

Cache effects when accessing arrays
The cache in your processor can have a significant effect on the speed of your array access if care is not taken. For the cache, accessing memory locations in order is more efficient that accessing them out of order or randomly.

However, in DBPro, even if you think you're accessing them in order on a multi-dimensional array, you may not be. If you have a 2D array, the array item a(1,1) is NOT next to a(1,2) like it would be in most languages, but is next to a(2,1).

For instance, the following array defined as ARRAY(4,3):
A11 A12 A13
A21 A22 A23
A31 A32 A33
A41 A42 A43

Would be stored in a DBPro array in the following order:
A11 A21 A31 A41 A12 A22 A32 A42 A13 A23 A33 A43

(yes, I know I excluded the 0 item in each index)

That means that it's actually faster to run through every item in a multidimensional array by incrementing the indexes at the start of the list first rather than the end of the list as is normally done.

Note that you'll only see this effect if the array plus whatever other stuff your program is doing is large enough that it swamps your cache.

+ Code Snippet

#constant TIMER_RES     1000000
#constant RUN_COUNT     10
#constant ARRAY_SIZE    1000
sync on
sync rate 0
sync
sync

print "Preparing"
dim a(ARRAY_SIZE, ARRAY_SIZE) as integer
for x = 0 to ARRAY_SIZE
   for y = 0 to ARRAY_SIZE
      a(x, y) = 0
   next
next

print "Starting"
sync

StartTime = hitimer( TIMER_RES )
for i = 1 to RUN_COUNT
   for Major = 0 to ARRAY_SIZE
      for Minor = 0 to ARRAY_SIZE
         a(Major, Minor) = 1
      next
   next
next
FinishTime = hitimer( TIMER_RES )

print "Minor/Major order: "; FinishTime - StartTime

StartTime = hitimer( TIMER_RES )
for i = 1 to RUN_COUNT
   for Minor = 0 to ARRAY_SIZE
      for Major = 0 to ARRAY_SIZE
         a(Major, Minor) = 1
      next
   next
next
FinishTime = hitimer( TIMER_RES )

print "Major/Minor order: "; FinishTime - StartTime

sync
wait key
end

If you don't see a difference, increase the ARRAY_SIZE constant.

===================================================

Contributions by Diggsey:
Use IanM's matrix1 plugins whenever you can

When comparing the distances of objects, compare the squared distance instead of the actual distance (removes the need for a costly square root).

When you do need to calculate the actual distance, create a vector of the desired dimensions, and use the vector length commands to find its length instead of calling 'sqrt' directly (which for some reason is slow beyond belief...)

Precalculate random numbers, the 'rnd' command is very slow.

Whenever you need to store arrays of multiple values, use a single array and a UDT, instead of multiple arrays. Especially when you are going to be resizing these arrays.

If there is a very costly operation, store the result in a variable, and then keep a flag as to whether the result is valid. Whenever you know that the result will be invalid, unset the flag. To get the result, check if the flag is set, and only recalculate the result if it is not, otherwise you can use the aready-calculated value. I'm using this a lot in my game engine for the calculations of inverse matrices and world matrices for nodes in the scene graph.

Certain code formations which can cause the compiler to use literally hundreds of ASM instructions where only a few are used with only minor changes to the code. Just look in the ASM dump to see examples of this.

===================================================================

Contribution by Van B
If you're done with a sprite, delete it - it doesn't have the same performance impact as when deleting objects, and a lot of hidden sprites will eat your frame rate like it's candy. So don't hide, just delete, and rely on the SPRITE Spr,x,y,imb command to place the sprites you do need.

===================================================================
If anyone has any other code optimizations that they know of that I didn't cover (WHICH I KNOW THERE IS, SO SPEAK UP), it would be most helpful for you to post them, then I will add them to the list with credit going to you. (Special thanks to IanM)

Back to top

Profile PM

Green Gandalf

VIP Member

19

Years of Service

User Offline

Joined: 3rd Jan 2005

Playing: Malevolence:Sword of Ahkranox, Skyrim, Civ6.

Posted: 30th Aug 2009 14:48

Link

Some interesting results there - but also some surprising ones, for example, why should an unrolled make object loop be so much faster? I'd have thought the processing saving was negligible compared to the effort involved in making an object.

Perhaps there's something else going on? Or perhaps the timing function has been misused?

Back to top

Profile PM Email

Van B

Moderator

21

Years of Service

User Offline

Joined: 8th Oct 2002

Location: Sunnyvale

Posted: 30th Aug 2009 14:55

Link

That has to be the most surprising one, the unrolled object creation loop.

Some really nice tips, excellently presented

.

Health, Ammo, and bacon and eggs!

Back to top

Profile PM Email

Pincho Paxton

21

Years of Service

User Offline

Joined: 8th Dec 2002

Location:

Posted: 30th Aug 2009 16:17 Edited at: 30th Aug 2009 16:47

Link

Well I think that the For Next loop is the most surprising one. Do Loop does no check, so there must be something wrong with the C code there.

EDIT: Just wondering now if Goto might be faster, you don't have to reset it.

Back to top

Profile PM Email Website

IanM

Retired Moderator

21

Years of Service

User Offline

Joined: 11th Sep 2002

Location: In my moon base

Posted: 30th Aug 2009 16:53

Link

Argh! I hate these kind of threads - post a list of micro-optimisations that are difficult to prove either true or false.
Then someone has to come along and be Mr Nasty to set things right.

So I guess that I'll be Mr Nasty for today

Quote: "Faster Object Loading"

Phew! This one is true. I'm not going to bother backing it up with a code proof, though the OP should have.

Quote: "Loop Unroll"

This is a 'sometimes true' optimisation. Unfortunately the OP picked object creation and deletion for the example...

Quote: "why should an unrolled make object loop be so much faster?
Perhaps there's something else going on?"

It isn't, and yes there is.

Loop unrolling for objects is not true - it only appears to be true if you don't take certain precautions. I'll let the comments in the code speak for themselves:

#constant OBJECT_COUNT  50000
#constant TIMER_RES     1000
sync on
sync rate 0
backdrop off

sync
sync

print "Preparing..."
sync

` DBPro maintains a dynamically sized list of object
` Pre-size it before timing starts
make object cube OBJECT_COUNT, 1
delete object OBJECT_COUNT

` Windows assigns memory to the process only when needed.
` So make sure that we have enough space for 50000 object to eliminate the effects of that
for i = 1 to OBJECT_COUNT
   make object cube i, 1
next
for i = 1 to OBJECT_COUNT
   delete object i
next

print "Executing timing runs"
sync

StartTime = hitimer( TIMER_RES )
for i = 1 to OBJECT_COUNT
   make object cube i, 1
next
FinishTime = hitimer( TIMER_RES )

print "Rolled create: "; FinishTime - StartTime
` No sync - don't want the object rendered

StartTime = hitimer( TIMER_RES )
for i = 1 to OBJECT_COUNT
   delete object i
next
FinishTime = hitimer( TIMER_RES )

print "Rolled delete: "; FinishTime - StartTime
sync

StartTime = hitimer( TIMER_RES )
for i = 1 to OBJECT_COUNT step 5
   make object cube i, 1
   make object cube i+1, 1
   make object cube i+2, 1
   make object cube i+3, 1
   make object cube i+4, 1
next
FinishTime = hitimer( TIMER_RES )

print "Unrolled create: "; FinishTime - StartTime
` No sync - don't want the object rendered

StartTime = hitimer( TIMER_RES )
for i = 1 to OBJECT_COUNT step 5
   delete object i
   delete object i+1
   delete object i+2
   delete object i+3
   delete object i+4
next
FinishTime = hitimer( TIMER_RES )

print "Unrolled delete: "; FinishTime - StartTime
sync

wait key
end

+ Code Snippet

#constant OBJECT_COUNT  50000
#constant TIMER_RES     1000
sync on
sync rate 0
backdrop off

sync
sync

print "Preparing..."
sync

` DBPro maintains a dynamically sized list of object
` Pre-size it before timing starts
make object cube OBJECT_COUNT, 1
delete object OBJECT_COUNT

` Windows assigns memory to the process only when needed.
` So make sure that we have enough space for 50000 object to eliminate the effects of that
for i = 1 to OBJECT_COUNT
   make object cube i, 1
next
for i = 1 to OBJECT_COUNT
   delete object i
next



print "Executing timing runs"
sync



StartTime = hitimer( TIMER_RES )
for i = 1 to OBJECT_COUNT
   make object cube i, 1
next
FinishTime = hitimer( TIMER_RES )

print "Rolled create: "; FinishTime - StartTime
` No sync - don't want the object rendered

StartTime = hitimer( TIMER_RES )
for i = 1 to OBJECT_COUNT
   delete object i
next
FinishTime = hitimer( TIMER_RES )

print "Rolled delete: "; FinishTime - StartTime
sync

StartTime = hitimer( TIMER_RES )
for i = 1 to OBJECT_COUNT step 5
   make object cube i, 1
   make object cube i+1, 1
   make object cube i+2, 1
   make object cube i+3, 1
   make object cube i+4, 1
next
FinishTime = hitimer( TIMER_RES )

print "Unrolled create: "; FinishTime - StartTime
` No sync - don't want the object rendered

StartTime = hitimer( TIMER_RES )
for i = 1 to OBJECT_COUNT step 5
   delete object i
   delete object i+1
   delete object i+2
   delete object i+3
   delete object i+4
next
FinishTime = hitimer( TIMER_RES )

print "Unrolled delete: "; FinishTime - StartTime
sync

wait key
end

... when the playing field is flat, there is almost no difference.
However, as it can be true sometimes, read on to some of the notes below.

Quote: "Pull out continually computated quantities"

Yes, this is true. However if you are in the situation of pulling parts of calculations out of your loops, perhaps the time would be better spent looking at the algorithm instead. A recent example was in this thread (http://forum.thegamecreators.com/?m=forum_view&t=156974&b=1), where the calculations were removed althogether in my post, and made entirely redundent in Green Gandalfs last post.

Quote: "Use For/Next instead of Do/Loop for your main loop."

Sorry, but the code posted is nonsense.
If you use WHILE/ENDWHILE or REPEAT/UNTIL or DO/LOOP to simulate a FOR/NEXT loop, of course the FOR/NEXT loop is going to be faster - it's designed for that purpose, while the other loop types aren't!

If you're going to use the structured loops commands for purposes they weren't designed for then you may as well rip the whole lot out and replace them with GOTO's, as you've already missed the point.

Quote: "Loop Unswitching"

Yes, this is true, however, it's one of those optimisations that you should do at the end once your code is working correctly, with timings of 'before' and 'after', and it'll only have a reasonable effect on large loops anyway.
Basically, you'll get more gain by spending your time picking a better algorithm, instead of making your code a little less readable.

Quote: "Partial Redundancy Elimination"

Yes, this is true and related to 'Pull out ...' above. Also see 'Loop unswitching'.

Quote: "Common Subexpression Elimination"

Again, see above.

Quote: "Try to use numbers instead of variables where possible"

Minimal gain.
There is a school of thought that says you should not have any numerical constant in your code except for 1's and 0's - I'm not sure I'd go that far, but I'd certainly lean in that direction. DBPro has the #CONSTANT command anyway, so use named constants instead of magic numbers.

Quote: "Only initialize & give values to variables when you need them, not at the beginning of the program. It saves memory (RAM)."

For anything except strings and arrays, this is meaningless as far as memory is concerned. Forgetting to initialise variables causes a large number of the bugs you will ever see - Deliberately doing this is tantamount to inviting those bugs in the door.

For arrays and strings, the reasons are different, but the answer is the same - you aren't worried so much about memory usage in a game, but speed. If you force windows to go away to locate a new chunk of memory for your game process mid-game, you are going to spoil the flow of your program and potentially introduce large noticable delays. Basically, you should only worry about it when memory usage actually becomes a real problem.

How to really optimise.
1. Pick a better algorithm.
Yes, it's difficult - sometimes you actually have to think about the problem and it's not always easy to come up with new ideas, but it will give you better results than micro-optimisations will almost 100% of the time.

2. Make it right before making it faster.
Debugging code is a harder job than writing code. Optimisation usually makes code less clear to read, and unclear code is harder to debug.

3. Measure.
Become religious about it. Measure several times, PROVE where the problem is, make the changes, then measure it again. PROVE that it is faster in the expected way. In any place that you've made the code less readable, put a comment in to explain what you've done and why, so that if you need to debug in future, you'll at least have half a chance.

Finally, if someone offers optimisation tips, GET PROOF, GET THE CODE. Don't believe them until they prove it, while trying not to be Mr Nasty if you can.

Utility plug-ins collection (updated 07/04/09) and
http://www.matrix1.demon.co.uk

Back to top

Profile PM Email Website

Green Gandalf

VIP Member

19

Years of Service

User Offline

Joined: 3rd Jan 2005

Playing: Malevolence:Sword of Ahkranox, Skyrim, Civ6.

Posted: 30th Aug 2009 19:19

Link

For someone who hates these threads he certainly spent a lot of time on it didn't he?

Ah! That's why he hates them of course.

Thanks IanM for bringing us all down to earth (as usual

).

The one thing I'm not sure about is the named constant thing. The MS DX9 SDK docs are a good example of what I mean. You often find things like "D3DFiltering can be set to DX3DthingyONE or DX3DthingyTWO, etc". You then have to wade through pages and pages, first to find that DX3DthingyONE means use DX3DthingyFILTERINGA, and yet more pages to find that the thingyFILTERING you actually wanted was DX3DthingyFILTERINGZ which is obtained by setting D3DFiltering to 26. Phew!

Plain numbers are simpler sometimes - but I can see you then can have the reverse problem since "26" can mean different things in different contexts.

Back to top

Profile PM Email

IanM

Retired Moderator

21

Years of Service

User Offline

Joined: 11th Sep 2002

Location: In my moon base

Posted: 30th Aug 2009 19:47

Link

Quote: "For someone who hates these threads he certainly spent a lot of time on it didn't he?"

On the positive side though, I get to be Mr Nasty for a day

Quote: "Plain numbers are simpler sometimes - but I can see you then can have the reverse problem since "26" can mean different things in different contexts."

Yep, you see '8' in your code a few weeks after entering it ... what does it mean? Or, you see D3DBLEND_INVDESTALPHA ... you see immediately what that means.

As it's a constant, it is replaced during compile with '8', making it no slower at runtime, and only marginally slower during compilation, and a heck of a lot faster to understand.

My point in this post is that the 'entity' that is going to spend the most time on the code is you, which means that making the code more readable and understandable to you is more important than almost any other part of the coding process.

Utility plug-ins collection (updated 07/04/09) and
http://www.matrix1.demon.co.uk

Back to top

Profile PM Email Website

Green Gandalf

VIP Member

19

Years of Service

User Offline

Joined: 3rd Jan 2005

Playing: Malevolence:Sword of Ahkranox, Skyrim, Civ6.

Posted: 30th Aug 2009 20:05

Link

I agree about the readability thing and something along those lines is essential.

The problem is when constants like D3DBLEND_INVDESTALPHA are not pre-defined in your own code - you have to spend ages wading through a manual or something just to find that you need to define that variable to be 6 or whatever. Of course, the DX SDK is designed for professionals who probably use software which defines all these constants for you.

On the other side of the coin I've often tried to debug someone else's code which contains things with "meaningful" names like TERRAIN_X_SIZE. To track down the bug you might need to know what the actual value is. This can involve a lot of to-ing and fro-ing between different parts of the code. Getting the balance right isn't always easy.

Quote: "My point in this post is that the 'entity' that is going to spend the most time on the code is you, which means that making the code more readable and understandable to you is more important than almost any other part of the coding process."

Agree wholeheartedly.

Back to top

Profile PM Email

Diggsey

18

Years of Service

User Offline

Joined: 24th Apr 2006

Location: On this web page.

Posted: 30th Aug 2009 20:37 Edited at: 30th Aug 2009 20:37

Link

Quote: "Yep, you see '8' in your code a few weeks after entering it ... what does it mean? Or, you see D3DBLEND_INVDESTALPHA ... you see immediately what that means."

Ironic really, seeing as the small amount of DBPro's code I have seen used numbers for all the directx commands instead of the named values

@GG
Many of those values are in enums, so you can simply go to the functions definition, right click on the enum name, and click 'go to definition' on that too, and you will have the list of values.

For the #defines you can go to the function's definition, and in the comment above the function it will tell you the name of the define, and may even give you a possible value. You can go to that value's definition to see the list.

Alternatively, you can click inside the function name, and press F1 and it will tell you the parameters in the help files, and give you links to the lists of possible values for each one, with a short explanation of what each means

New DBPro IDE + integrated debugger!

Back to top

Profile PM Email Website

Digger412

17

Years of Service

User Offline

Joined: 12th Jun 2007

Location:

Posted: 30th Aug 2009 22:06 Edited at: 3rd Sep 2009 15:05

Link

@GG - IanM showed why that happened, I didn't know about that

. Apparently, my rolled loops assigned the memory, then the unrolled ones didn't have to assign the memory when it came to them. Anyways, at the end of this post I'll add the code that I used for each test.

@Van B - Thanks, but I think that IanM brought us back to reality on that one.

@Pincho - I don't know, but I wouldn't go so far as to use goto in relation to the main loop, spaghetti code is much worse than extended code

@ IanM - You have a right to be Mr Nasty, you know more about how DBPro works than I do by far.

Quote: "Argh! I hate these kind of threads - post a list of micro-optimisations that are difficult to prove either true or false."

I can prove each one of them true, except for the loop unrolling one, since you told me about the Windows memory assignment, and certain cases on PRE.

Quote: "Yes, this is true. However if you are in the situation of pulling parts of calculations out of your loops, perhaps the time would be better spent looking at the algorithm instead. A recent example was in this thread (http://forum.thegamecreators.com/?m=forum_view&t=156974&b=1), where the calculations were removed althogether in my post, and made entirely redundent in Green Gandalfs last post."

Yes, but if it is possible to pull constant calculations out, if they are used, then it would be a better alternative compared to leaving them in. It wouldn't have the same performance gains as writing a better algorithm, but it does improve the code performance.

Quote: "Sorry, but the code posted is nonsense."

The code that I posted was the benchmark that I used to test each loop against one another. It wasn't a supplement to use unlike the other optimizations. The performance gains were the results of those loops in that benchmark.

Quote: "Basically, you'll get more gain by spending your time picking a better algorithm, instead of making your code a little less readable."

I agree that you'll see a bigger gain with a better algorithm, this is just a smaller step in a faster direction. Also, if people commented their codes thoroughly (which I didn't here), their codes would be just as readable.

Quote: "Minimal gain.
There is a school of thought that says you should not have any numerical constant in your code except for 1's and 0's - I'm not sure I'd go that far, but I'd certainly lean in that direction. DBPro has the #CONSTANT command anyway, so use named constants instead of magic numbers."

It is still an optimization, if only minimal. Most of the optimizations that I posted are just simple, little things that you can do to squeeze out another millisecond or two. Also, do you mind if I add the #CONSTANT to my first post, under General Tips?

Quote: "For anything except strings and arrays, this is meaningless as far as memory is concerned. Forgetting to initialise variables causes a large number of the bugs you will ever see - Deliberately doing this is tantamount to inviting those bugs in the door."

Okay, seeing that it doesn't matter where you initialise variables, I'll take that one off of the General Tips.

Also, IanM, would you mind if I added your ways to optimize up at the top? I'd like to be able to have a comprehensive list one day.

My Benchmark Codes:

Faster Object Loading:

+ Code Snippet

a#=timer()
Load object "Skybox1.X",1
Load object "Skybox2.X",2
Load object "Skybox3.X",3
Load object "Skybox4.X",4
Load object "Skybox5.X",5
Load object "Skybox7.X",6
Load object "Skybox8.X",7
Load object "Terrain1.X",8
b#=timer()

c#=timer()
Load object "Skybox1.DBO",9
Load object "Skybox2.DBO",10
Load object "Skybox3.DBO",11
Load object "Skybox4.DBO",12
Load object "Skybox5.DBO",13
Load object "Skybox7.DBO",14
Load object "Skybox8.DBO",15
Load object "Terrain1.DBO",16
d#=timer

e#=b#-a#
f#=d#-c#

print ".X LOADING: "+str$(e#)
print ".DBO LOADING: "+str$(f#)
wait mouse

Loop Unroll (though IanM has found why this one was so fast optimized):

+ Code Snippet

for avg=1 to 5
a#=timer()
for x=1 to 50000
   make object cube x,5
next x
b#=timer()

print "Rolled make ("+str$(avg)+"): "+str$(b#-a#)

c#=timer()
for x=1 to 50000
   delete object x
next x
d#=timer()

print "Rolled delete ("+str$(avg)+"): "+str$(d#-c#)

e#=timer()
for x=1 to 50000 step 5
   make object cube x,5
   make object cube x+1,5
   make object cube x+2,5
   make object cube x+3,5
   make object cube x+4,5
next x
f#=timer()

print "Unrolled make ("+str$(avg)+"): "+str$(f#-e#)

g#=timer()
for x=1 to 50000 step 5
   delete object x
   delete object x+1
   delete object x+2
   delete object x+3
   delete object x+4
next x
h#=timer()

print "Unrolled delete ("+str$(avg)+"): "+str$(h#-g#)

i#=i#+(b#-a#)
j#=j#+(d#-c#)
k#=k#+(f#-e#)
l#=l#+(h#-g#)
next avg

set cursor 0,0
print "Rolled make: "+str$(i#/5.0)
print "Rolled delete: "+str$(j#/5.0)
print "Unrolled make: "+str$(k#/5.0)
print "Unrolled delete: "+str$(l#/5.0)
wait key

Pull out continually computated quantities

+ Code Snippet

y=20
z=30

for avg=1 to 50
a#=timer()
for n=1 to 10000000
   x=y+z
   a=6*n+x*x
next n
b#=timer()

c#=timer()
x=y+z
t1=x*x
for n=1 to 10000000
   a=6*n+t1
next n
d#=timer()

e#=e#+(b#-a#)
f#=f#+(d#-c#)
next avg

e#=e#/50.0
f#=f#/50.0

print "Not pulled out: "+str$(e#)
print "Pulled out: "+str$(f#)
wait mouse

Use For/Next instead of Do/Loop for your main loop.

+ Code Snippet

t=timer()
for x=1 to 20000000
  inc y
next x
s=timer()
print "for next:";s-t

y=0
t=timer()
do
  inc y
  if y=20000000 then exit
loop
s=timer()
print "do loop:";s-t

y=0
t=timer()
repeat
  inc y
until y=20000000
s=timer()
print "repeat until:";s-t

y=0
t=timer()
while y<20000000
  inc y
endwhile
s=timer()
print "while endwhile:";s-t

Loop Unswitching

+ Code Snippet

for avg=1 to 50

p=0
y=20
a#=timer()
for n=1 to 100000
   x=x+y
   if p=1 then y=0
next n
b#=timer()

p=1
y=20
c#=timer()
for n=1 to 100000
   x=x+y
   if p=1 then y=0
next n
d#=timer()

p=0
y=20
e#=timer()
if p=1
   for n=1 to 100000
   x=x+y
   y=0
   next n
else
   for n=1 to 100000
   x=x+y
   next n
endif
f#=timer()

p=1
y=20
g#=timer()
if p=1
   for n=1 to 100000
   x=x+y
   y=0
   next n
else
   for n=1 to 100000
   x=x+y
   next n
endif
h#=timer()

i#=i#+(b#-a#)
j#=j#+(d#-c#)
k#=k#+(f#-e#)
l#=l#+(h#-g#)
next avg

i#=i#/50.0
j#=j#/50.0
k#=k#/50.0
l#=l#/50.0

print "Not Unswitched, p=0: "+str$(i#)
print "Not Unswitched, p=1: "+str$(j#)
print "Unswitched, p=0: "+str$(k#)
print "Unswitched, p=1: "+str$(l#)
wait mouse

Partial Redundancy Elimination

+ Code Snippet

for avg=1 to 50

a=0
x=20
a#=timer()
for n=1 to 1000000000
   if a=1
      y=x+4
   else
      t=x+4
   endif
   z=x+4
next n
b#=timer()

a=1
x=20
c#=timer()
for n=1 to 1000000000
   if a=1
      y=x+4
   else
      t=x+4
   endif
   z=x+4
next n
d#=timer()

a=0
x=20
e#=timer()
for n=1 to 1000000000
   if a=1
      y=x+4
      t=y
   else
      t=x+4
   endif
   z=t
next n
f#=timer()

a=1
x=20
g#=timer()
for n=1 to 1000000000
   if a=1
      y=x+4
      t=y
   else
      t=x+4
   endif
   z=t
next n
h#=timer()

i#=i#+(b#-a#)
j#=j#+(d#-c#)
k#=k#+(f#-e#)
l#=l#+(h#-g#)
next avg

i#=i#/50.0
j#=j#/50.0
k#=k#/50.0
l#=l#/50.0

print "No PRE, a=0: "+str$(i#)
print "No PRE, a=1: "+str$(j#)
print "PRE, a=0: "+str$(k#)
print "PRE, a=1: "+str$(l#)
wait mouse

Common Subexpression Elimination

+ Code Snippet

for avg=1 to 100
b=10
c=20
g=30
a1#=timer()
for x=1 to 100000
a=b*c+g
d=b*c+g
next x
b1#=timer()

b=10
c=20
g=30
temp=b*c
c1#=timer()
for x=1 to 100000
a=temp+g
d=temp*d
next x
d1#=timer()

e#=e#+(b1#-a1#)
f#=f#+(d1#-c1#)
next avg

print "No Elimination: "+str$(e#/100)
print "Elimination: "+str$(f#/100)
wait mouse

Back to top

Profile PM

Pincho Paxton

21

Years of Service

User Offline

Joined: 8th Dec 2002

Location:

Posted: 30th Aug 2009 22:36

Link

Quote: "@Pincho - I don't know, but I wouldn't go so far as to use goto in relation to the main loop, spaghetti code is much worse than extended code "

A Goto as a Do Loop is not spaghetti code. You only have a single instance of it, the same as a single instance of a Do Loop. But using code to take over from the Do Loop is improper anyway, so I don't know why you used the odd smiley at the end.

Back to top

Profile PM Email Website

Green Gandalf

VIP Member

19

Years of Service

User Offline

Joined: 3rd Jan 2005

Playing: Malevolence:Sword of Ahkranox, Skyrim, Civ6.

Posted: 30th Aug 2009 22:39

Link

Quote: "@GG
Many of those values are in enums, so you can simply go to the functions definition, right click on the enum name, and click 'go to definition' on that too, and you will have the list of values.

For the #defines you can go to the function's definition, and in the comment above the function it will tell you the name of the define, and may even give you a possible value. You can go to that value's definition to see the list.

Alternatively, you can click inside the function name, and press F1 and it will tell you the parameters in the help files, and give you links to the lists of possible values for each one, with a short explanation of what each means"

I know - but it can still be a lot of work to find out something very simple. In some cases it has taken me half an hour to find out what I want to know - and in others the docs are incomplete. Try finding out what values all the bytes, dwords, etc, need to be in a volume or cubemap DDS file for example. I resorted to taking apart some images byte by byte to get the last few right.

I think my pet hate is the MS documentation - everything you need to know is there (usually

) but isn't easily located unless you've been using it 24 hours a day for 6 months.

Back to top

Profile PM Email

IanM

Retired Moderator

21

Years of Service

User Offline

Joined: 11th Sep 2002

Location: In my moon base

Posted: 30th Aug 2009 23:34

Link

For the amount of stuff in MSDN, I find it amazing that you can actually (eventually) find what you need. TBH, I think MS should be congratulated on actually making it navigable.

@Digger412,
Most of the problems I have with your optimisations is that if you have to go there, you've already lost the war. Just from a simple cost/benefit measure, the amount of time you have to spend doing those small optimisations against the minimal speedups you get from them, you have to do an awful lot of work, when you could spend the same amount of time elsewhere (on algorithms, or optimising media into DBO format etc) and get a much bigger payoff.

I guess it can't hurt too much though, so I will contribute.

Array access is relatively slow.
If you are accessing an array element more than twice in a loop, put it into a variable temporarily while you are working on it.

+ Code Snippet

#constant TIMER_RES     1000000
#constant RUN_COUNT    1000000

sync on
sync rate 0
sync
sync

print "Starting"
sync

dim a(10) as integer

for i = 0 to 10
   a(i) = i
next

StartTime = hitimer( TIMER_RES )
for i = 1 to RUN_COUNT
   for j = 0 to 10
      a(j) = a(j) * a(j)
   next
next
FinishTime = hitimer( TIMER_RES )

print "Standard Array Access: "; FinishTime - StartTime

StartTime = hitimer( TIMER_RES )
for i = 1 to RUN_COUNT
   for j = 0 to 10
      x = a(j)
      x = x * x
      a(j) = x
   next
next
FinishTime = hitimer( TIMER_RES )

print "Alternate Array Access: "; FinishTime - StartTime

sync
wait key
end

The win gets bigger the more array accesses you remove in this way.

Shortcut evaluation
DBPro has not implemented short-cut evaluation. However, for an IF statement, and where you are carrying out AND/&& evaluations, you can fairly easily simulate it.

+ Code Snippet

if x >= StartX and x <= EndX and Y >= StartY and y <= EndY
   ` do something
endif

` can be replaced with
if x >= StartX then if x <= EndX then if y >= StartY then if y <= EndY
   ` do something
endif

In the first IF statement, every part of the expression is evaluated before the result is checked - that's 7 operations.

In the second IF statement, the first part is evaluated, and only if true does it go on to evaluate the second, and only if that's true does it evaluate the third, and only if that's true does it evaluate the fourth - that's a minimum of 1 operation and a maximum of 4 operations. Even in the worst case where every evaluation matches, it provides better results than the first IF statement.

+ Code Snippet

#constant TIMER_RES     1000000
#constant RUN_COUNT    1000000

sync on
sync rate 0
sync
sync

print "Starting"
sync

StartX = 100
EndX = 200
StartY = 100
EndY = 200
x = 150
y = 150

StartTime = hitimer( TIMER_RES )
for i = 1 to RUN_COUNT
   if x >= StartX and x <= EndX and Y >= StartY and y <= EndY
      ` do something
   endif
next
FinishTime = hitimer( TIMER_RES )

print "Standard AND evaluation: "; FinishTime - StartTime

StartTime = hitimer( TIMER_RES )
for i = 1 to RUN_COUNT
   if x >= StartX then if x <= EndX then if Y >= StartY then if y <= EndY
      ` do something
   endif
next
FinishTime = hitimer( TIMER_RES )

print "Shortcut AND evaluation: "; FinishTime - StartTime

sync
wait key
end

Change the values of x and y to check it out (change them to zero for example).

FOR loop evaluations
Don't use functions (either your own or plug-ins) to provide the value for either the top of the loop, or the step - if you must do something like this, always put the results into a variable.
The reason for doing this is that DBPro can call these functions multiple times each time around the loop.

Run this and see how many times each of the 'Get' functions is called:

+ Code Snippet

` Don't do this:
for i = GetStart() to GetEnd() step GetStep()
   print "Value of i = "; i
next
print ""

` Do this:
Start = GetStart()
Finish = GetEnd()
StepSize = GetStep()

for i = Start to Finish step StepSize
   print "Value of i = "; i
next

wait key
end


function GetStart()
   print "Called GetStart()"
endfunction 1

function GetEnd()
   print "Called GetEnd()"
endfunction 2

function GetStep()
   print "Called GetStep()"
endfunction 1

If your functions are 'expensive' then a lot of time could be lost by them being called multiple times.

Don't code it yourself
If it's already in a plug-in that you have or can afford, use it.

Obviously, code it if you want to figure out how to do something, but once you've done that, put your code to one side and use the plug-in. Unless the plug-in code is especially inefficient, there's no way you can match its speed, even for a simple function. (As a simple example, see the recent 'Highest of two values' in the Code Snippets board)

Utility plug-ins collection (updated 07/04/09) and
http://www.matrix1.demon.co.uk

Back to top

Profile PM Email Website

sladeiw

15

Years of Service

User Offline

Joined: 16th May 2009

Location: UK

Posted: 31st Aug 2009 00:00 Edited at: 31st Aug 2009 00:01

Link

Never thought of doing the shortcut evaluation like that, much better. I was doing multiple lines something like

+ Code Snippet

if a=b
  if b=c
    if c=d
      do something
    endif
  endif
endif

which is much more messy!

A nice feature for an IDE would be an option to automatically optimise all the `and` evaluations into `then if`. Maybe this would cause problems though?

I know from experience that this particular optimization can make a big difference, especially when you're checking lots of slow variables like point() values etc.

As for optimizations, it's true that a better algorithm can give you a much bigger boost than tricks. You stand a much better chance of a major optimization (Generally a complete re-write!) if you haven't made your code more confusing by using strange structuring.

Back to top

Profile PM

Digger412

17

Years of Service

User Offline

Joined: 12th Jun 2007

Location:

Posted: 31st Aug 2009 00:24 Edited at: 31st Aug 2009 01:05

Link

@ IanM - Thanks. I know that most of them are like little "last minute" add-ins to squeeze out that little bit more. My goal in making this thread was to get community contributions. Everyone could benefit from knowing a more efficient way of doing something. Also, I hope that this isn't limited to just coding practices (or the Core Commands), but for it to extend into things like more efficient sprite usage, how to get faster file access, or even faster 3d math functions. It should be contained to optimization topics, and not drift into other things such as "Look what I can do! *shows an optimized drawing program*".

On the array access code, you gain by storing a(j) into x in the alternative method so you wouldn't have to call it 3 times, compared to calling a(j) 3 different times in the first, right?

@Pincho - Oh, okay, you meant like:
main_loop:
(other code)
goto main_loop

Hmm...it would definitely bear scrutiny...I'm experimenting...now.

EDIT:
@Pincho - I wonder if gosub would be faster?:
main_loop:
(other code)
gosub main_loop

EDIT2:
Okay, gosub is just SLIGHTLY slower than for/next, but it would most definitely be more stable, since it doesn't have to have an ending value, or one you'd have to change to insure that the loop wouldn't end.

In this code:

+ Code Snippet

t=timer()
for x=1 to 1000000
  inc y
  if y=1000000 then exit
next x
print "for next:";timer()-t

y=0
t=timer()
main_loop2:
inc y
if y<1000000 then gosub main_loop2
print "gosub:";timer()-t

wait key

The for/next finishes in 9 milliseconds, gosub in 10 milliseconds. I didn't even test the other loops, goto (faster than all except f/n and gosub) gave out at only 50k iterations.

Back to top

Profile PM

Diggsey

18

Years of Service

User Offline

Joined: 24th Apr 2006

Location: On this web page.

Posted: 31st Aug 2009 01:12

Link

My speed tips:
Use IanM's matrix1 plugins whenever you can

When comparing the distances of objects, compare the squared distance instead of the actual distance (removes the need for a costly square root).

When you do need to calculate the actual distance, create a vector of the desired dimensions, and use the vector length commands to find its length instead of calling 'sqrt' directly (which for some reason is slow beyond belief...)

Precalculate random numbers, the 'rnd' command is very slow.

Whenever you need to store arrays of multiple values, use a single array and a UDT, instead of multiple arrays. Especially when you are going to be resizing these arrays.

If there is a very costly operation, store the result in a variable, and then keep a flag as to whether the result is valid. Whenever you know that the result will be invalid, unset the flag. To get the result, check if the flag is set, and only recalculate the result if it is not, otherwise you can use the aready-calculated value. I'm using this a lot in my game engine for the calculations of inverse matrices and world matrices for nodes in the scene graph.

I can't think of any off the top of my head, but there are certain code formations which can cause the compiler to use literally hundreds of ASM instructions where only a few are used with only minor changes to the code. Just look in the ASM dump to see examples of this.

New DBPro IDE + integrated debugger!

Back to top

Profile PM Email Website

BatVink

Moderator

21

Years of Service

User Offline

Joined: 4th Apr 2003

Location: Gods own County, UK

Posted: 31st Aug 2009 01:22 Edited at: 31st Aug 2009 01:26

Link

Quote: "Shortcut evaluation"

Thank You! I knew about the problem ,didn't realise this was the solution.

Quote: "Use only types that you need, for instance use integer instead of floats if you aren't going to be dealing with decimals."

I thought integers were cast into floats before being used. I may be wrong, just a rumour I heard.

Quote: "Use numbers instead of variables"

As far as evidence goes, I don't have any on the PC platform. However, in midrange and mainframe terms, this isn't the best way to go. Each hard-coded value requires memory and the access overhead associated with it. But using a variable to represent common numbers reduces this significantly. For example, if you use the number 1, one thousand times, it requires that many memory locations. If you use a variable representing 1, it needs one location, and can be held in a more accessible entity. This is common practise in business applications.

I must admit, I have fallen foul of erroneous results a few times. It's easily done, and I'm always aware that there are many other factors that could be skewing the results. When benchmarking my own stuff, I tend to repeat the test 10 times or so alternately, to level out the playing field. So that would be performing a test 1 million times, for each method, ten times over, for example - 20 million repetitions.

Quote: "Whenever you need to store arrays of multiple values, use a single array and a UDT, instead of multiple arrays. Especially when you are going to be resizing these arrays."

...although avoiding resizing is a much better way to go. I set my arrays to a size that I think will be adequate. I only ever increase their size by 10% at a time if I break their capacity, never by single elements.

Back to top

Profile PM Email

Diggsey

18

Years of Service

User Offline

Joined: 24th Apr 2006

Location: On this web page.

Posted: 31st Aug 2009 01:33

Link

Quote: "I thought integers were cast into floats before being used. I may be wrong, just a rumour I heard."

That is true when integer variables are passed to functions which only accept floats, but for everything else, integers are faster.

New DBPro IDE + integrated debugger!

Back to top

Profile PM Email Website

Digger412

17

Years of Service

User Offline

Joined: 12th Jun 2007

Location:

Posted: 31st Aug 2009 01:39 Edited at: 31st Aug 2009 01:59

Link

Quote: "I thought integers were cast into floats before being used. I may be wrong, just a rumour I heard."

I don't know, but I do know that if you divide 2 integers, you get an integer, and not a float.

Quote: "As far as evidence goes, I don't have any on the PC platform. However, in midrange and mainframe terms, this isn't the best way to go. Each hard-coded value requires memory and the access overhead associated with it. But using a variable to represent common numbers reduces this significantly. For example, if you use the number 1, one thousand times, it requires that many memory locations. If you use a variable representing 1, it needs one location, and can be held in a more accessible entity. This is common practise in business applications.
"

Hmm....I'll test that one, I have been wondering about the access times for variables versus hardcoded numbers, too...I should also be able to find the memory allocation, also..

What I normally do is code in a for avg=1 to 10/50/500 that'll repeat everything, so I can do all of my averaging in one big go...Also, since I have Texture Maker running making a huge skybox, it's easier just to do it all in the program at once.

EDIT: Updated the "Use Gosub instead of Do/Loop for your main loop" Section, Gosub and F/N beat out Do/Loop, Repeat/Until, While/Endwhile, and Goto by an EXTREMELY long shot. Here is the code, can others confirm?:

+ Code Snippet

redo:
y=0
t=timer()
for x=1 to 10000
  inc y
  if y=10000 then exit
next x
print "for next: ";timer()-t

y=0
t=timer()
main:
  inc y
  if y<10000 then gosub main
print "gosub: ";timer()-t

y=0
t=timer()
main2:
  inc y
  if y<10000 then goto main2
print "goto: ";timer()-t

y=0
t=timer()
do
  inc y
  if y=10000 then exit
loop
print "do loop: ";timer()-t

y=0
t=timer()
repeat
  inc y
  if y=10000 then exit
until y=-1
print "repeat until: ";timer()-t

y=0
t=timer()
while y>-1
  inc y
  if y=10000 then exit
endwhile
print "while endwhile: ";timer()-t
wait mouse
goto redo

Back to top

Profile PM

Green Gandalf

VIP Member

19

Years of Service

User Offline

Joined: 3rd Jan 2005

Playing: Malevolence:Sword of Ahkranox, Skyrim, Civ6.

Posted: 31st Aug 2009 03:01

Link

Don't you run out of stack with your gosub method? Your code crashes if you run the loop too many times.

I still have a vague recollection that there is something else going on that IanM hasn't mentioned - something to do with Windows I/O perhaps, or for/next loops don't include certain windows checks? Can't recall what they were though.

Back to top

Profile PM Email

Digger412

17

Years of Service

User Offline

Joined: 12th Jun 2007

Location:

Posted: 31st Aug 2009 03:15

Link

Oh, didn't know that...Well, I have plenty of things to test tonight then.

Back to top

Profile PM

Diggsey

18

Years of Service

User Offline

Joined: 24th Apr 2006

Location: On this web page.

Posted: 31st Aug 2009 03:29 Edited at: 31st Aug 2009 03:29

Link

@GG
For..Next loops don't check for the escape key being pressed (so as long as you call 'sync' which also does the check (I think!) it's OK to use a For..Next loop as the main loop)

New DBPro IDE + integrated debugger!

Back to top

Profile PM Email Website

Libervurto

17

Years of Service

User Offline

Joined: 30th Jun 2006

Location: On Toast

Posted: 31st Aug 2009 03:31

Link

Interesting stuff.
Do you guys still use memblocks?
are they faster than arrays?

TGC Forum - converting error messages into sarcasm since 2002.

Back to top

Profile PM Email

Diggsey

18

Years of Service

User Offline

Joined: 24th Apr 2006

Location: On this web page.

Posted: 31st Aug 2009 03:33

Link

I prefer IanM's 'banks', or raw memory myself

New DBPro IDE + integrated debugger!

Back to top

Profile PM Email Website

Digger412

17

Years of Service

User Offline

Joined: 12th Jun 2007

Location:

Posted: 31st Aug 2009 04:16

Link

@Diggsey - If you put in "if escapekey()=1 then end" it provides no problem =)

Back to top

Profile PM

Diggsey

18

Years of Service

User Offline

Joined: 24th Apr 2006

Location: On this web page.

Posted: 31st Aug 2009 13:21

Link

Yes, but then it is probably about the same speed as the other loops!

New DBPro IDE + integrated debugger!

Back to top

Profile PM Email Website

Digger412

17

Years of Service

User Offline

Joined: 12th Jun 2007

Location:

Posted: 31st Aug 2009 14:33

Link

Nope, F/N and Gosub still hold their own, and Gosub actually comes out on top!

+ Code Snippet

t=timer()
for x=1 to 10000
  inc y
  if escapekey()=1 then end
  if y=10000 then exit
next x
print "for next:";timer()-t

y=0
t=timer()
main:
  inc y
  if escapekey()=1 then end
  if y<10000 then gosub main
print "gosub:";timer()-t

y=0
t=timer()
main2:
  inc y
  if escapekey()=1 then end
  if y<10000 then goto main2
print "goto:";timer()-t

y=0
t=timer()
do
  inc y
  if y=10000 then exit
loop
print "do loop:";timer()-t

y=0
t=timer()
repeat
  inc y
  if y=10000 then exit
until y=-1
print "repeat until:";timer()-t

y=0
t=timer()
while y>-1
  inc y
  if y=10000 then exit
endwhile
print "while endwhile:";timer()-t
wait mouse

Back to top

Profile PM

Diggsey

18

Years of Service

User Offline

Joined: 24th Apr 2006

Location: On this web page.

Posted: 31st Aug 2009 14:37

Link

@Digger412

GG has already told you that you SHOULD NOT use gosub to make a loop! Try taking out the 'if y<1000 then goto main' line and see what happens! The program will just crash with a stack overflow error...

New DBPro IDE + integrated debugger!

Back to top

Profile PM Email Website

Benjamin

21

Years of Service

User Offline

Joined: 24th Nov 2002

Location: France

Posted: 31st Aug 2009 14:50

Link

When a GOSUB is performed the return address is placed on the stack, so if you are constantly using it without returning you are constantly adding to the stack. Consequently your program will go BOOM! if you do this enough.

Back to top

Profile PM Email

Green Gandalf

VIP Member

19

Years of Service

User Offline

Joined: 3rd Jan 2005

Playing: Malevolence:Sword of Ahkranox, Skyrim, Civ6.

Posted: 1st Sep 2009 01:31

Link

Quote: "Consequently your program will go BOOM! if you do this enough."

Have I missed an important update? When I tried that I merely got a crash and no BOOM!

Dammit! Just realised - I had sound switched off.

Back to top

Profile PM Email

Benjamin

21

Years of Service

User Offline

Joined: 24th Nov 2002

Location: France

Posted: 1st Sep 2009 02:07

Link

Quote: "Have I missed an important update? When I tried that I merely got a crash and no BOOM!"

It's only supported on NVIDIA hardware.

Back to top

Profile PM Email

Digger412

17

Years of Service

User Offline

Joined: 12th Jun 2007

Location:

Posted: 1st Sep 2009 03:20

Link

@GG and Diggsey and Benjamin - I know, I wasn't going to use it in a program, I just wanted to test it for benchmarking purposes. For fun, I guess one could say. However, would it be safe for one to use the F/N loops? I know that it wouldn't be correct coding procedure, but I'm having fun testing these out.

Back to top

Profile PM

Van B

Moderator

21

Years of Service

User Offline

Joined: 8th Oct 2002

Location: Sunnyvale

Posted: 1st Sep 2009 16:27

Link

Just a little one from me - if your done with a sprite, delete it - it doesn't have the same performance impact as when deleting objects, and a lot of hidden sprites will eat your frame rate like it's candy. So don't hide, just delete, and rely on the SPRITE Spr,x,y,imb command to place the sprites you do need.

Health, Ammo, and bacon and eggs!

Back to top

Profile PM Email

IanM

Retired Moderator

21

Years of Service

User Offline

Joined: 11th Sep 2002

Location: In my moon base

Posted: 1st Sep 2009 20:32

Link

Can I just say that if I see anyone in these forums replace loop commands with a GOTO or a GOSUB, that I will never ever speak to them again

In fact, I may just make it a condition of using my plug-ins!

Here's another one which reiterates what BatVink was saying and Diggsey supported, but seems to have been passed by:

Avoid type conversions, especially the hidden ones
Passing an integer to a function or command that accepts a float value will cause the compiler to introduce a hidden conversion. In fact, passing a value of any type to a function/command that expects another type will introduce a conversion.

This can happen almost anywhere. For example, all of the object positions, rotation and scaling commands accept float arguments. If you are repositioning or rotating these objects a lot, and using integers to do so, then you are wasting cycles.

In addition, DBPro does not do any type conversion during runtime. For example, if you have a command 'XAngle# = XAngle# + 1', the '1' is an integer value that will be converted at runtime and then added to the variable. This will be marginally slower than the correct 'XAngle# = XAngle# + 1.0'.

So you now know that the compiler treats numbers without a decimal point as an integer, and those with one as a float - basically that 1.0 <> 1.

Did you also know that if you use a hex number (0x12345678), that it will be treated as a dword, and that conversion from an integer to a dword and vice versa will also cost you cycles?