Sorry your browser is not supported!

You are using an outdated browser that does not support modern web technologies, in order to use this site please update to a new browser.

Browsers supported include Chrome, FireFox, Safari, Opera, Internet Explorer 10+ or Microsoft Edge.

AppGameKit Studio Chat / Array Performance???

Author
Message
Raven
19
Years of Service
User Offline
Joined: 23rd Mar 2005
Location: Hertfordshire, England
Posted: 1st Apr 2020 04:38
While I know something has been mentioned before about the Array Performance in AppGameKit BASIC., I wasn't quite aware of just HOW bad it is.

AppGameKit Studio v2020.03.27


DarkBASIC Professional (v1.07.GG)


What's interesting is putting both of these to 1 Element results in AppGameKit Studio getting ~2800 FPS (0.357ms / Loop) and DarkBASIC Professional getting ~3500 FPS (0.286ms / Loop)

Now for the moment let's ignore the fact that somehow DBP (which is Single Threaded) running on DirectX 9.0c (also Single Threaded) is outperforming AGK-S (which is Multi-Threaded) running on Vulkan (which is not only also Multi-Threaded, but has dramatically lower Driver Overhead).
Rather what's more important is the sheer drop in performance going to the above code.

Keep in mind that both are essentially using identical Memory Space. I'm using a Nestled Structured Type in DBP as it doesn't support Arrays in Structures, but otherwise it's more or less identical.

DBP drops to 2300 FPS (0.435ms), while AGK-S drops to 145 FPS (6.897ms)

That's with the same 10,000 Elements... if we up it to 100,000 Elements.
DBP drops to 500 FPS (2.000ms)., while AGK-S drops to 14.3 FPS (69.930ms)

Within my Software 3D Engine (a converted tutorial), I was considering that the pipeline itself was a slowdown... but I've since checked the processing time; and it scales with the number of Triangles.
Keep in mind that this actually shouldn't be the case... it shouldn't be slower to process each Triangle when there are more of them, it should just take longer to process ALL of them.
What I was seeing however is with the 105 Triangle Model., it was taking an avg. 0.038ms / Triangle; while with the 6319 Triangle Model., it was taking an avg. 1.21ms / Triangle.

Remember., each Triangle has the exact same amount of Processing occurring; all that's changing is the Number, but that shouldn't be affecting the speed of the For...Next Loop to process them.
(And I did change the For..Next to a Repeat..Until, just to see if I could gain any performance that way... which technically I did see a small rise in Avg. but statistically no)
I also as a note disabled Debug / Error Logging (or rather set it to ignore) ... but still nothing really helped.

It's just a little baffling to me that DBP is capable of several order of magnitude BETTER Array Performance than AGK-S.
I was seriously not expecting it to be such a massive difference.
blink0k
Moderator
11
Years of Service
User Offline
Joined: 22nd Feb 2013
Location: the land of oz
Posted: 1st Apr 2020 08:35
AGK is interpreted and DB is machine code, is that right? If so that would explain a lot
SFSW
21
Years of Service
User Offline
Joined: 9th Oct 2002
Location:
Posted: 2nd Apr 2020 06:05
blink0k is correct, imo. Thread count and rendering API have little to do with array performance (or much of any data set/type for that matter). What you are observing likely has more to do with the interpreted nature of AGK/S vs the machine code compiled structure of DBPro.

What I've had to do to compensate is break things down into smaller chunks. So splitting up excessive nested loops into smaller pieces, changing only values that need to be changed rather than panning through an entire array set. Likewise, breaking up large arrays themselves into smaller pieces that can be managed individually much faster with much smaller loops or arrays.

Performance limitations can require some creative thinking to work around large loop/array limitations and in some cases, there's little that can be done as you may need to sort through a lot of array values at times, so you may have to reserve some operations as side/background operations while the rest of your code continues to run.
veronikachuhalova
4
Years of Service
User Offline
Joined: 2nd Apr 2020
Location:
Posted: 2nd Apr 2020 11:50
Thank you for the info
How do you like to spend time in quarantine?


How do you like to spend time in quarantine?
I began to write texts in order to develop
Do you think this is a good activity?
Raven
19
Years of Service
User Offline
Joined: 23rd Mar 2005
Location: Hertfordshire, England
Posted: 2nd Apr 2020 23:53
Quote: "blink0k is correct, imo. Thread count and rendering API have little to do with array performance (or much of any data set/type for that matter). What you are observing likely has more to do with the interpreted nature of AGK/S vs the machine code compiled structure of DBPro."


While DBP is a Compiled Language., but it isn't Machine Code Compiled.
Instead the Virtual Machine is Compiled and Optimised, along with the Bytecode being Optimised... which allowed for better performance over the Embedded VM and Parsed Bytecode of DBC., but still the overall performance wasn't as good as Native Machine Code.
As for how that differs from AppGameKit... I'm not convinced it is actually Interpreted.
If it was, then it wouldn't strictly need to generate Bytecode, as that's something that Virtual Machine Languages do... instead it could just perform Just-in-Time Parsing, and work as a Real-Time Scripting Language.

In any case... you might find the result of this interesting:

AGK-S v2020.3.27


DBP v1.07.GG


Now I perhaps should've noted that the Performance Metrics I'm getting are from an AMD Ryzen 5 1600 at 3.2/3.5GHz (Stock)., and it was a 1st Run 1st Gen Ryzen... so it actually doesn't overclock particularly well (think I can squeeze 3.72GHz out of it on a good day).
Still in any case... we are talking about 6 Cores and 12 Threads., now Windows 10 specifically sees these as Physical and Logical Processors.

DBP will recognise and utilise 4 Cores for Logical Processes., but Optimisation is Single Core.
So you only really gain performance from Additional Cores via DirectX Operations (which I'm not using) ... so for all intended purposes it's Single Threaded; and it can't use SMT/HTT.
And I can actually track this via the Task Manager - Performance Tab.

Running the DBP Code essentially just uses Core 0 and nothing else.
Where-as this is where things get a little "Weird" with AppGame Kit.
Core 3 sees a rise in utilisation., while Core 8 sees the same massive leap in utilisation as Core 0 when using DBP... the Core 3 rise in utilisation is about 30% of what Core 8 has., and it's inline with the runtime; meaning there is "Some" offloading occurring; but not much.

Why I say this is "Weird" is because of how AMD "Cores" work.
See with Intel., on a 6 Core 12 Thread CPU... Core 0 = Physical Core, Core 1 = HyperThread ... and so we can just say that the Even Numbers are Physical Cores while the Odd Numbers of Logical Cores.
AMD on the other hand Core 0 - 5 are the Physical Cores; while Core 6 - 11 are Logical Cores.

This means that AppGameKit is more or less just running on SMT almost exclusively.
In fact if I had to guess, it's running on Core 3 with SMT... and the SMT actually has the bulk of the workload being pushed to it.
Now SMT unlike Hyper-Threading is essentially 2 General Purpose Process Pipelines (ALX) operating beside the ALU; these are "General Purpose" because they can either be used by the SIMD or ALU as "Extra" Processing Pipeline., depending on which has more priority.
Windows Threading doesn't exactly use these well, hence why AMD recommend their CPU SDK on GPU Open.

Still in essence this does mean you have approx. 50% of the Throughput of the Physical Core; which as noted AppGameKit is pushing most of it's Logical Processing on to.
At least in my case. I'm not sure if that's different for Intel Processors,.. and Bulldozer just has Split Physical GPU (2 ALU per SIMD with a Shared I/O Pipeline; hence why it sucks at Floating-Point,. because it's far too easy to bottleneck it); so I'm sure they behave as you'd expect them too.
Mind it doesn't stop this being weird Threading Behaviour.

As it still ends up being more-or-less "Single Threaded" it just isn't using the Lowest Thread.
And it ALWAYS uses those Processor "Cores" ... despite others having lower utilisation., so it's not like it's even testing and using "Unutilised" Cores.
I'm going to experiment further., but I'm starting to get a good picture as to why AppGameKit has relatively terrible performance for Larger Datasets.

Remember my point here wasn't the concern that AGK/DBP both drop performance with Larger Datasets... rather it stems from how AGK-S *SUBSTANTIALLY* drops performance.
This new information doesn't really help in terms of improving performance, as well the Dataset that I want to use for the other project is somewhat requires / frame., breaking it down into smaller loops isn't going to help.
Instead I think this highlights what TGC could do to dramatically improve performance, as they're clearly under-utilising the Hardware Available... and I'd never have even thought to check had this not been an issue.

DavidAGK
AGK Developer
10
Years of Service
User Offline
Joined: 1st Jan 2014
Location:
Posted: 4th Apr 2020 10:57
I'm a firm believer than any regression in performance should be treated like a bug. For games you really want to squeeze every drop of performance out.
Kevin Picone
21
Years of Service
User Offline
Joined: 27th Aug 2002
Location: Australia
Posted: 5th Apr 2020 16:48

Quote: "
While DBP is a Compiled Language., but it isn't Machine Code Compiled.
Instead the Virtual Machine is Compiled and Optimised, along with the Bytecode being Optimised
"


The original DB was running on a VM, but in DBPRO they did implement native code. It's not every efficient code, but it is native. You can check what it generates, the assembly output used to be stored in the temp folder every time you build.

PlayBASIC To HTML5/WEB - Convert PlayBASIC To Machine Code
Ortu
DBPro Master
16
Years of Service
User Offline
Joined: 21st Nov 2007
Location: Austin, TX
Posted: 10th Apr 2020 04:41
one of the reasons I like AGKsharp so much. keep all your logic and data handling in C# let AppGameKit only focus on the media and graphics. It is a really good balance between performance, features, and ease of use.
http://games.joshkirklin.com/sulium

A single player RPG featuring a branching, player driven storyline of meaningful choices and multiple endings alongside challenging active combat and intelligent AI.

Login to post a reply

Server time is: 2024-04-20 13:25:26
Your offset time is: 2024-04-20 13:25:26