PlayBASIC V1.64N2 Round Up (April/May)
The upgrade is basically complete, apart from another merry-go-round pass over the documentation. A lot has been happening with the compilers optimization in this version, revolving around
type caching.
The following is cut'n'pasted from the WIP blog and covers some of the key points from the development of the new version. For Example Code / Pictures etc read the actual blog..
Beta #8 - Global Type Caching
Moving on from the expression type caching of previous PlayBASIC beta's, we head into the wild unknown wilderness that is global type caching. In beta 7, the caching is activated during assignments styled expressions and then turned off again, to avoid potential logical issues at runtime. Most of the logical stuff is fairly predictable, where caching could result in potentially runtime errors, it's the unforeseen stuff that worries me about this feature most of all. As such, caching is disabled by default, which reverts the parser to it's original behavior. To enable it, we set Bit 1 in OptExpression, Bit 0 is used to toggle the standard optimization (which is on by default), so to enable Both modes we set OptExpressions to an integer value of 3.
Currently, i'm slowly picking through the parser tagging the various situations where the cache should be flushed or disallowed completely. I'd like to say it's all working wonderfully well, but it isn't. Small stuff seems to work ok, but the bigger apps virtually all die at this point. Thus we'll focus on the small stuff for now Smiley - Like for example the 'Cached Type Reading' bench mark from above.
The post above, shows the results for PBV1,64N and V1,64N2 beta7 running the bench mark, now bellow we've got today's results. At first glance you might not see the big deal, but beta 8 can run a complete forth test in the time V1.64N can only run three of them. Whats more interesting is that the forth test is benching 3D typed array accesses. Now granted, the test code is set out to cache well, but I didn't think it'd work that well. To put the test in terms of bandwidth, the demo is pushing around 1/2 a gigabyte of bandwidth per second, not too bad for a 10 year old runtime, on almost
7 year old hardware.
Beta #9 - Disassembler
Work continued on the upgrade over the weekend, but with shift in focus geared more towards testing then anything else. I guess the only new addition a simple parser to convert string based Vector Stack expressions into an operations list for the resolve stack function, making the process a lot easier. Unlike the PlayBASIC compiler, the vector stack operation lists can be processed at runtime. Giving a bit of leeway in how an operation is solved.
Anyway, in terms of testing, my focus has been on trying to validate where and when the compiler is seeing cachable type operations. For some reason, some programs with heavy type usage, return little or no successful caches, even though the code was heavily serialized. After a lot of head scratching ended up coming to the conclusion that it'd be much easier to see exactly where and when it's failing if I could look a dis-assembly of byte code. Much older versions of the PlayBASIC had something similar built in, but was removed long ago. Luckily i had most of the material just sitting around.
The disassembler can currently decode about 50% of PB core instruction set (don't need all of it), in the initial version it just showed the opcode and parameter data, so you don't get anything even reassembling PlayBASIC code back. But yesterday added some 'clean up' routines that try to reconstruct that type of the expression that created this opcode, making it a lot more BASIC looking and easier to follow the logic. Some logic you could basically resource with a pretty high accuracy really, where as some bits, you can't as the information just doesn't exist or it requires multiple passes to try and guess the original logic.
The tool is currently working well enough for me to get an overview at just what the compiler is producing in various situations, and it's already revealing some rather interesting results. There seems to be the odd occasion where certain expressions generate extra MOVE operations, where they could be short cut. The odd extra operation isn't a big deal in a code that's executed infrequently , but the weigh is magnified in brute force situations. So it's well worth ripping them out.
Another oddity appears when initializing the function scope, where it seems to be writing some bogus data, dunno what's up with that. Perhaps the best thing about it so far, is that it reveals a few situations where merging some common operations could be used to reduce the vm overhead further. Some of which I think could be done a lot easier from a dedicated optimization pre-processing pass though, but well worth putting on the ideas board..
PlayBASIC V1.64N2 Beta #10 - Inner Workings Of Type Caching
Testing is going pretty well thus far, the more code I look over the dis-assembly outputs from, the more little potential tweaks appear. I think most of the problems i've been having with caching relate to array indexes where the index has changed between accesses. The parser isn't smart enough to trap those situations as yet. As a whole, the compiler takes a pretty nervous approach to caching, which you can see when picking over the code it's producing for various blocks. If i suss out the problem areas, then it should be able to be relaxed a bit more, which in turn will make further improvements.
The disassembler tool is working well, still LOTS of stuff it doesn't have a clue about, but for what i'm interested in (the core logic) it works well enough now. So well, i've dressed it a little more since yesterday. Now it allows us to see reconstructed BASIC styled versions of the instruction set. Some stuff, there's no obvious BASIC translation, but it's a lot more readable and easier to browse now than it was before.
PlayBASIC V1.64N2 Beta #11c - Smoothing Out Type Caching
Yesterdays Sprite/Map reshuffling passes ended up going a lot faster than i'd expected. It's not a lot of hands on coding work, it's just one of those things that often leads to broken instructions if your not careful, of which there were a few, but it all went pretty smoothly for once. Which leads me back for another round at type caching problem, which thankfully I've had a few more ideas about since last pass.
In previous versions, the type cache optimizer is intentionally set up to be rather over protective. So the moment it spots a user or built in function call within an expression, it turns caching off, regardless if the function/command damages the previous cached data or not. To combat this ,the command tables now have computed safety flag. The process is a bit like a shot gun really, where any function/command that accepts passed arrays as a parameter, is set to being potentially unsafe, the rest are tagged as safe. Now, yeah I know that just because a function is being passed an array it doesn't necessarily mean our cache data will be broken, but better safe than sorry.
Now that's ok for built in commands/functions, but for user functions there's really no easy way to accurately guess if the function destroys our cache data or not, so those calls disable caching until after the call. The reason this is such an issue is that we're caching a raw pointer to a previously known to exist thing, a type in this case. If we allow caching across user functions, and that function alters a previously cached type pointer, then we're in for a world of hurt. Best case scenario is it'll pop a runtime error, but I can't check at runtime where the cache data comes from. So if it's been corrupted unknowingly then any following accesses might just end up reading/writing from the wrong structure of the same type, or some different structure completely, potentially resulting in a horrible death.. It's possible to solve, but way too much work for the return.
Even so, what we're got now, is already able to chomp out slabs of bogus accesses from some of the bigger example programs, such as YAAC and
THESIUS XIII for example. The tweaked optimizer spots around 100 caches in the
YAAC demo (Yet Another Asteroids Clone) and almost 400 optimizers in the THESIUS demo, making around a 5/6 K saving in the byte code alone. Saving a few K is not a big deal, its just that's 5K less memory accesses the runtime now doesn't have to do, in order to do the same thing. So provided the opt's are in routines that are heavily used, we're gaining some more free performance. The down side, is that the Thesius demo is is still a little crash happy when caching is enabled globally. YAAC works fine, but its definitely not 100% at this time.
Anyway, I've also been able to address the changing variable issue mentioned earlier, now it's just matter of tracking down those last few annoying issues.
PlayBASIC V1.64N2 Beta #12b - Mmmmm Caching
After soooooo looooong...... tinkering with compiler/runtimes, there's not much in the PlayBASIC world that actually surprises me. Often a lot of work goes into the smallest of changes, sometimes we win some handy double digit performance back or expand something to make it that little more flexible, but such changes are generally pretty marginal. So imagine the look on my face tonight by simply re-blocking the cache instructions in the runtime, wins us back more performance that i'd even hoped for. So much I was initially assuming it was broken.. Couldn't possibly be down in the low teens.. But it is.
The test I'm using is the same basic
type bench mark as always (Shown back here ) The test just goes through and does a bunch of read/writes to a 1D/2D/3D types arrays. Where each test is performed 25,000 times. Now in previous versions of V1.64N2, caching has shaved from between
17% up to a staggering
30% of the V1.64N speed. Which I was more than happy with to be honest. But today's edition of V1.64N2 beta 12b, is conservatively between
15-20% faster again. The 3D array is basically twice as fast.
Now even though the bench mark code is basically presenting a perfect situation for the optimizer to chew through the expressions, it'll certainly be able to improve the performance of your real world programs also. Since type usage is frequently serialized in programs, so if you have a loop that runs through and processes your game characters each frame (list or array) and there's multiple lines where the same structure is accessed (read /write), then those operations may be anywhere from 25% to 50% faster.. The amount of time won back, really comes down to how frequently the loop is executed.
Attached we have the results from the test running on
V1.64N,
V1.64N2 Beta11 and
V1.64N2 beta12b.. Speaks for itself really..
WIP Blog
Been in documentation mode, but have been knocking up some more usage examples. Simple stuff like
Sliding Collision, Vertical Sorting.
PlayBASIC V1.64N2 Development Blog