PBFX V1.73c - Basic Control Changes VM2.
Working my way through the 'call' control change operation level opcodes. Making the odd change as the opportunities arise. Been able to improve the performance of the
Repeat/Until While/Endwhile,
Do/DecLoop structures under brute forces conditions as well as certain decision operations. These small improvements make today's edition run the standard test in
1.4 seconds.. And you know what that means. Which I find kind of funny, but I wouldn't get too excited about that.
So far, it's looking like VM2 can run pure Vm2 code around twice as fast as Vm1 (1.64 edition). I'm sure there will situations where it's much quicker and some cases where it's slower. But ya get that !
PBFX V1.73e - String Operations On VM2.
Today's little chore has been to move all the string opcodes from the Vm1 side to the Vm2. While I'm only about 1/2 way through it atm, the results have been pretty pleasing thus far with saving being made in terms of memory and runtime performance. The string engine is already one of the fastest around, and we've made it even quicker with the new runtime. With today's edition being able to chew through a 100,000 string compare about 2->4 milliseconds faster than yesterdays edition. These operations also take almost 1/2 the number of bytes to represent the instruction also. Packing the opcodes tighter means the compiled byte code is smaller, plus it helps in terms of efficiency. Namely in prefetech. Which X86 cpu suck at !..
While todays changes have been only been focuses upon the string instructions, today's edition runs the standard test faster again at
1.375 seconds. Although I think that's more of a cache artifact than real performance difference. But it's interesting none the less.
PBFX V1.73f - Pointers and Cleaning VM2.
The last few days i've been reworking/moving the pointer instruction set to make it more Vm2 friendly. They're not entirely running on vm2, in fact probably only 10% are today, but I've been picking through Vm1 instructions patching all the hand 'optimized' opcodes I can stomach, to make sure they're using the Vm2 data tables. This will make a lot of those previously misbehaving opcodes function again. It's important to make sure those things are working (the best I can during the translation) so everything isn't broken all at once. Finding problems in those situations is neigh on impossible. So you can think of the 1.73F as something of the mild stabilizer in the migration history.
The next question is where to from here ? - Well, that's a good question, currently there's still another 4 or 5 opcode levels (groups of instructions) to hook up/move. Some of which are co-dependant and some of which i'd planned upon changing completely. However, what I generally find, is that while it's easier to make most translations on the fly, this is generally only ok for a few operations at a time. Changing how the whole opcode model in one big hit, doesn't tend to go well. So I think the best idea is the keep the translation simple, use it to clean everything up first, then if need be look at changing particular levels of the instructions later.
One area that i've a few ideas for changes, relate to how the Vm stack and functions work mainly. What I've in mind would not only allow translation to native of code functions or at least parts of the calling process, but things like methods also (shock horror OO yawn). This gets tricky because the VM can run a bunch of different types of function calls transparently to user, doing so isn't that easy. Which is further complicated when moving down compiled to demand (machine code) road. While COD (compile on demand) is really a VM3 level operation, I'm torn about the best way to address this. Converting the VM2 opcodes to native machine is for the most part trivial. The problem is, that it's only really going to be beneficial for demanding functions.
Converting code like this, is useless.
For Myimage =0 to 100
Images(myImage)=LoadNewImage(filename$)
next
If you're wondering why, well the load image function (which is already machine code) will take 99.99999 of the time of the loops processing time. Translating such control code loops to pure machine code, will not only make this code bigger in memory and take more pre-processing time at start up, there's no performance gain. So why bother ?
Ideally, a better approach would be to give user control over the translation of the key parts of the VM code. So code that needs to be fast, can be, and code that doesn't, won't eat up unnecessary memory and runtime compilation. This fits in better with the whole COD model actually. The same idea could be applied to the things like threading (multi core).
PBFX V1.73g - Stacks / Scope VM2.
The pass few coding sessions, I've been working on changing how PB applications use the runtime stack. There's a fair bit involved, so It's been slow progress. But it's finally starting to work again. The current alpha can happily change scopes and return. The implementation at this point is basically 1/2 way house between how VM1 works and how I need future VM's to work. But for now, it's working and it's more efficient than it was. With VM2 being about 50% faster when calling & returning from a function. (over 10000 calls).
Ie.
T=timer()
For LP=0 to Tests
SomeFunctionCalc(10,lp)
next
tt1#=tt1#+(timer()-t)
Print "Test #1 Average Time:"+Str$(tt1#/frames)
This doesn't effect calling PSUB btw. Why ? - Because they're not functions
. Atm the execution of Psubs is currently slower than 1.64, as not all of the required instruction set has been ported from Vm1 to Vm2. So in order to call/return from a sub, the runtime has to change contexts. It should be faster again when running purely on VM2.
PBFX V1.73h - Assignments and complex data structures.
Ok, so after 10 or days of conversion and we're about 1/2 way there, but there's still a few big areas left to translate. The main one now is the array/data structure instruction set. A lot of this is built into the Vm1 instruction set, simply to make it as a fast as it can be. So it's a big mess. What i want to do is 'functionize' as much of the common parts are possible. However I don't really want to rewrite it all for the sake of cleaning it up. Speed vs cleanliness... hmm, we'll see I guess.
One change that i'll differently be making, is to how the assignment opcodes work. No matter what program you write, everything you make will contain data assignments and lots of them. The Vm1 instruction set uses a couple of high level opcodes to perform such moves all in one hit. While it works pretty well, it's awfully messy and not very expansion friendly. Since most of the code is 'hand opt'd'. So what I've in mind is to break the big operations down into smaller simpler opcodes. This should help clean up the code dramatically and make dropping new functionality in somewhat easier. Got to careful though, as using too many opcodes can make for too much VM overhead. Which can effect performance dramatically.
While on the subject of performance, I'm pretty happy with how things are progressing thus far. Most of the changes are giving PBFX some solid bang for the buck. I was some what concerned that VM2 might not be able crush Vm1 performance, as it once did. Since a lot of Vm2 design ideas have migrated back into Vm1 anyway. PB has been using Vm2 memory/string manager for a couple of years (from about PB V1.50 onwards). Which gives it a massive boost over the older editions.
Anyway, everything is looking firmly on target for getting our %50 or so performance boost across the board. Considering that such changes can make can Vm2 give comparable and even better performance than some compiled to machine code basic's, that's nothing to sneeze at. The mind boggles at what the native code version will do..