Sorry your browser is not supported!

You are using an outdated browser that does not support modern web technologies, in order to use this site please update to a new browser.

Browsers supported include Chrome, FireFox, Safari, Opera, Internet Explorer 10+ or Microsoft Edge.

AppGameKit Classic Chat / Fastest way to draw pixels?

Author
Message
Xaron
10
Years of Service
User Offline
Joined: 3rd May 2014
Location: Germany
Posted: 19th Aug 2017 21:25
Hey there,

what would be the fastest way of draw lots of pixels?

Currently I create a memblock once and just poke into it using SetMemblockByte (4 times for rgba) and do in the loop (as I change the content every frame)

CreateImageFromMemblock( imgId, memId )
DeleteSprite( sprId )
CreateSprite( sprId, imgId )

Is there a faster way of doing that?
nz0
AGK Developer
17
Years of Service
User Offline
Joined: 13th Jun 2007
Location: Cheshire,UK
Posted: 20th Aug 2017 01:44
This isn't a very fast way because creating images and memblock manipulation isn't recommended in the main loop.

Firstly, why you need to make lots of pixels every frame? How many? and why?

If it's for a star field. then there's other ways.
If it's for some other reason, then you'd have to explain so we can help with the most appropriate method
Xaron
10
Years of Service
User Offline
Joined: 3rd May 2014
Location: Germany
Posted: 20th Aug 2017 07:20
It's for a sonar view so probably hard to avoid. Maybe it can be done on shader side.

It's a bit odd that those simple things are similar slow than 20 years ago.
Xaron
10
Years of Service
User Offline
Joined: 3rd May 2014
Location: Germany
Posted: 20th Aug 2017 12:18
I did a quick test. Filling a 512*512 texture with random pixels (so roughly 260k pixels) takes about 200ms on my older MacBook Air (2011, core i7) using AppGameKit, where Monkey 2 takes 40ms and Monkey 1 even takes only 10ms.

I have to admit that this a very special task I have to perform but never thought there would be that huge difference.

And don't get me wrong, I don't want to promote other stuff here, I'm just wondering.
Scraggle
Moderator
21
Years of Service
User Offline
Joined: 10th Jul 2003
Location: Yorkshire
Posted: 20th Aug 2017 12:33 Edited at: 20th Aug 2017 12:33
I haven't tested this but logically I would say you could reduce the speed by a factor of four by changing SetMemblockByte (4 times for rgba) to SetMemblockInt() with a single integer representing the colour.
Of course the actually bottle neck is probably the image/sprite creation and not the memblock writing but again you can reduce that by changing this:

to this:


Obviously, you'll have to create the sprite initially but you don't have to delete it and create it again every frame as long as the image remains the same size.
Xaron
10
Years of Service
User Offline
Joined: 3rd May 2014
Location: Germany
Posted: 20th Aug 2017 13:05 Edited at: 20th Aug 2017 13:06
No actually the image and sprite creation doesn't need measurable time at all. Setting ints instead of four bytes doesn't change the speed either, but thanks very much for your help!

BTW I must delete the sprite before creating a new one with the same ID.
Scraggle
Moderator
21
Years of Service
User Offline
Joined: 10th Jul 2003
Location: Yorkshire
Posted: 20th Aug 2017 13:10
Quote: "Setting ints instead of four bytes doesn't change the speed either"

That makes no sense at all. Could you show the code you are using?
Quote: "BTW I must delete the sprite before creating a new one with the same ID"

Correct. That's why I suggested not deleting and creating a new one but just changing the image.
nz0
AGK Developer
17
Years of Service
User Offline
Joined: 13th Jun 2007
Location: Cheshire,UK
Posted: 20th Aug 2017 13:16 Edited at: 20th Aug 2017 13:17
The setting of the value isn't going to be where the bulk of the slowdown is, it's the transferring memblock to image part. You won't escape this slowdown.
There's a general severe overhead for processing a large (some thousands) loop of anything in AppGameKit during the main loop.
Xaron
10
Years of Service
User Offline
Joined: 3rd May 2014
Location: Germany
Posted: 20th Aug 2017 14:30 Edited at: 20th Aug 2017 14:32
I did some further tests. Indeed creating the image takes only 4ms, while filling the texture up to 440ms.



I've tested 3 possibilities:

1) Using DrawBox:
On my system it takes 320ms for drawing

2) Using SetMemblockByte 4 times:
On my system it takes 440ms for drawing

3) Using SetMemblockInt:
On my system it takes 280ms for drawing

So setting integers instead bytes is indeed faster, even though not even twice.

Actually I'm totally surprised that poking into a mem block is that slow...

edit:

4) Using SetMemblockInt directly in the loop without calling a function:
This takes about 150ms, so the function calls eat quite some time. But still 150ms is awful.
Xaron
10
Years of Service
User Offline
Joined: 3rd May 2014
Location: Germany
Posted: 20th Aug 2017 15:11 Edited at: 20th Aug 2017 15:13
Did another test with C++ (Tier 2) and wow, it only takes 20ms now! So that Basic stuff seems to be good until you do some heavy stuff.
Markus
Valued Member
20
Years of Service
User Offline
Joined: 10th Apr 2004
Location: Germany
Posted: 20th Aug 2017 22:46
i agree with you, this SetMemblockInt seems to be slow ..
AGK (Steam) V2017.08.16 : Windows 10 Pro 64 Bit : AMD (17.7.2) Radeon R7 265 : Mac mini OS Sierra (10.12.2)
Kevin Picone
22
Years of Service
User Offline
Joined: 27th Aug 2002
Location: Australia
Posted: 21st Aug 2017 04:15
AppGameKit BASIC is compiled down a custom VM, so it's effectively a binary level interpreter. As such, any redundancy within inner loops will have significant impact on performance, in particular the more granular operations like pixel access.

In the above tests case with Xpixel is a user function, so there's 512*512 function call overhead on the VM.. which is dead time for an operation as simple as this. So inlining will trim above lots of VM overhead and improve the performance of the operation. The goal is reduce the number of operations per pixel, which will remove the number times there AppGameKit VM has to look up your instructions, which you can assume is about a 10 to 1 ratio..

So starting with the bare bones loop like this
Quote: "

For y = 0 To 511
For x = 0 To 511
XDrawPixel2( 1, x, y, 512, 255, 0, 0, 255 )
Next x
Next y

Function XDrawPixel2( memId As Integer, x As Integer, y As Integer, w As Integer, rgba As Integer )
Local dst As Integer
dst = 12+4*(y*w+x)
SetMemblockInt( memId, dst, rgba )
EndFunction

"



Here it's computing the target address randomly when it's actually linear in the test. So the function can stripped and inlined..



Quote: "

For y = 0 To 511
; compute outside inner loop
dst = 12+4*(y*w)
For x = 0 To 511
SetMemblockInt( memId, dst + (X*4), rgba )
Next x
Next y

"



So the inner most loop is down at approximately 5 operations (dunno how many instructions that would be in AppGameKit on the VM) per pixel

Depending upon how the For/Next loops are iterated in the AppGameKit VM, then you might be able to strip out some more.


Quote: "

WidthBY4 = (W *4) -1
For y = 0 To 511
; compute outside inner loop
dst = 12+4*(y*w)
For x = 0 To WidthBY4 step 4
SetMemblockInt( memId, dst + X), rgba )
Next x
Next y

"



or possibly

Quote: "

WidthBY4 = (W *4) -1
For y = 0 To 511
; compute row offset outside inner loop
Dst = 12+4*(y*w)
; compute end of row offset in outter loop
DstEnd= dst + WidthBy4
For x = Dst To DstEnd step 4
SetMemblockInt( memId, dst , rgba )
Next x
Next y

"


just depends on how the For/Next Step is implemented in the AppGameKit VM

Even so it still might not be able to brute force it in a single frame.. Often you can split such tasks up for 2 or more frame to reduce the fixed overhead, or have pre-computed chunks and copy the memory directly .




PlayBASIC To HTML5/WEB - Convert PlayBASIC To Machine Code
MikeMax
AGK Academic Backer
12
Years of Service
User Offline
Joined: 13th Dec 2011
Location: Paris
Posted: 21st Aug 2017 07:44 Edited at: 21st Aug 2017 07:47
+1 for kevin. it's often forgot to optimize code (even a small code) with actual machine's performances. Especially for an interpreter, each CPU tick is important !

(for example : a multiplication is often faster than a division ... doing a *0.5 is faster that /2.0 !)
--------------------------------
Join us on dedicated AppGameKit WeeKChat :
https://week.chat/room/AppGameKit
Scraggle
Moderator
21
Years of Service
User Offline
Joined: 10th Jul 2003
Location: Yorkshire
Posted: 21st Aug 2017 08:54 Edited at: 21st Aug 2017 08:55
Quote: "doing a *0.5 is faster that /2.0"


In C++ this is definitely true and bit-shifting is faster still.
However, AppGameKit doesn't seem to agree:

The following code performs a "divide by 2" by multiplication, division and bit-shifting:


The result:


It seems in AppGameKit Division is fastest followed by bit-shifting and then multiplication is slowest!

Oddly, changing the division to 2.0 (instead of 2) makes division the slowest. A decent compiler should recognise that the result should be integer and discard the float before calculation

Attachments

Login to view attachments
Preben
AGK Studio Developer
20
Years of Service
User Offline
Joined: 30th Jun 2004
Location:
Posted: 21st Aug 2017 10:41 Edited at: 21st Aug 2017 10:42
Just for fun a tried to use setrendertoimage and drawsprite , but that was even more slow.

A sonar is slow updating so why not:

Use sonerlines to set what fps you want , i get 60 fps on my old samsung s2 using sonerlines=9

Sorry code formatting was lost in copy/paste ?

Not really a faster way to do it, but looks great with the "updating sonar line"
best regards Preben Eriksen,
MikeMax
AGK Academic Backer
12
Years of Service
User Offline
Joined: 13th Dec 2011
Location: Paris
Posted: 21st Aug 2017 12:36 Edited at: 21st Aug 2017 12:39
Scraggle wrote: "The following code performs a "divide by 2" by multiplication, division and bit-shifting:"


Using your exact code , it seems results differs a little between division and bitshifting (CPU specifications and conditions i guess) :



In conclusion ... Multiply with Appgamekit should be avoid lol Very strange Maybe Paul have some explanations (just for curiosity)
--------------------------------
Join us on dedicated AppGameKit WeeKChat :
https://week.chat/room/AppGameKit

Attachments

Login to view attachments
MikeMax
AGK Academic Backer
12
Years of Service
User Offline
Joined: 13th Dec 2011
Location: Paris
Posted: 21st Aug 2017 13:25
in all situations, you should prefer to divide or multiply by an INT if you can...

For example :

a*2
is faster than
a / 0.5

but

a / 2
is faster than
a * 0.5

So, slowdown is essentially due to floats (which seems logic in fact).
--------------------------------
Join us on dedicated AppGameKit WeeKChat :
https://week.chat/room/AppGameKit
Xaron
10
Years of Service
User Offline
Joined: 3rd May 2014
Location: Germany
Posted: 21st Aug 2017 14:10
Na, that's not logical at all. Today's CPUs do floating point computation in the same speed as integer. I grew up with C64 and Amiga and wow there I needed quite some optimizations. Nowadays one shouldn't care anymore due to pretty good compilers, but yeah this interpreted stuff can be weird.

Thanks so much for this interesting discussion!
MikeMax
AGK Academic Backer
12
Years of Service
User Offline
Joined: 13th Dec 2011
Location: Paris
Posted: 21st Aug 2017 14:28 Edited at: 21st Aug 2017 14:30
It' not the case with a PC and multiple Android Phones & Tablets after tests (with AppGameKit Tier1) (and do not forget that appgamekit not only runs on x86 CPU !)
--------------------------------
Join us on dedicated AppGameKit WeeKChat :
https://week.chat/room/AppGameKit
Kevin Picone
22
Years of Service
User Offline
Joined: 27th Aug 2002
Location: Australia
Posted: 21st Aug 2017 14:54

unfortunately modern scalar CPU's prefer to do things in parallel, most of which aren't really possible in a VM, so by design it's doing a small operation, stalling and hitting memory, the memory hit cause a stall..


PlayBASIC To HTML5/WEB - Convert PlayBASIC To Machine Code
Markus
Valued Member
20
Years of Service
User Offline
Joined: 10th Apr 2004
Location: Germany
Posted: 21st Aug 2017 16:23
a idea,
how about using a pixel array in a shader and set values there with SetShaderConstantArrayByName ?
So the sonar sprite have a texture image modified by shader, means the shader make a image from array.
the shader array data (sonar wave) are update within a agk function.
i don't know if it is faster or not as memblock commands.
AGK (Steam) V2017.08.16 : Windows 10 Pro 64 Bit : AMD (17.7.2) Radeon R7 265 : Mac mini OS Sierra (10.12.2)
Xaron
10
Years of Service
User Offline
Joined: 3rd May 2014
Location: Germany
Posted: 21st Aug 2017 19:21
Yes, using shaders is an option. Using Tier 2 it's now fast enough for my needs for now, especially as I don't need a high resolution texture anyway. Thanks for all your help, much appreciated. I'm going to post kind of a devlog at some point.

Login to post a reply

Server time is: 2024-09-30 07:24:33
Your offset time is: 2024-09-30 07:24:33