Here is the shader code, i'll clean up the DBPro code and post a demo asap
The code i hinted about for the objects use arrays in the shader (float4 EmitterIDInstance[50] and other arrays to keep emitter lifetime, random seeds) that gives me 50x4 unique emitters to work with.
Thats the first part, then i have to keep track of the object and what vertice of that object the shader is working with as i use the same shader for all the emitters.
The DBPro code only sets parameters on emitter creation, some are set when an emitterobject is reused. The emitter object is created in a memblock and all the UV stages possible are added with FVF.
When a new emitter is assigned i set the object number of the emitter as one of the UV coordinate values along with the other parameters for that emitter. 8 UV stages gives 16 parameters but i needed more so some are packed.
The DBPro code have to set the shader array that tracks the object number, this also happens when an object is reused. Then set the Vector4 and pass it to the shaders object ID tracking array (EmitterIDInstance):
null = Get Object Effect(ActiveObject ,-1)
Set Vector4 EmitterPointerVector,Emitter(IndexToChange).ControllerID,Emitter(IndexToChange+1).ControllerID,Emitter(IndexToChange+2).ControllerID,Emitter(IndexToChange+3).ControllerID
Set Effect Constant Vector Element "EmitterIDInstance", floor(IndexPos) ,EmitterPointerVector
(have to repopulate all four floats in the float4 it just reassigns the previous values for the other floats).
The "for" loop in the shader then compares the vertex being processed so it corresponds with the correct object/emitter (Stored in the vertex UV data) and then transform the particle/vertex according to lifetime, movement and so on.
Shader code, love to see if there are some optimizations that can be done
:
// DEParticles Point Sprites v0.5
// Duke E, 2 July 2010.
// Using ideas and code from:
// - Green Gandalf's Vertex Particles demo
// - Dark Coder's Point Sprites demo
// Particles are created from all vertices in specially prepared
// objects and uses them for "Point Sprites" in the shader.
// Compared to using Quads we save ~three times the polygons.
// All emitter calculations and animation are done on the GFX card except for the
// "emitter objects" positioning and the shaders only constant change on reuse. Re-randomization makes reuse of previously created emitter objects possible.
// The shader requires DEParticlesPS.dbp to initialize and handle the emitters.
// Requires VS-PS model 2.0
float4x4 WorldView : WorldView;
float4x4 projection : Projection;
float seconds; // Use for CPU code based timing;
// float seconds : time; // Use for shader based timing;
float4 NowTimeInstance[50]; // Max number of Emitters the shader will handle(?!)
float4 RandomSeedInstance[50]; // emitter instance randomseed
float4 EmitterScaleInstance[50];
float4 EmitterIDInstance[50];
float HalfScreenPixelsY = 1024; // Set this to the screen height when initalizating the shader. Used to calculate the PSprite sizes.
// why can't DBP output VIEWPORTPIXELSIZE? "float2 ScreenSize : VIEWPORTPIXELSIZE;".
int EmittersRunning = 1;
const float AtlasTilesX = 4;
const float AtlasTilesY = 4;
float TimeScale = 0.0;
texture ParticleDiffuse< string ResourceName = ""; >;
sampler ParticleDiffuseSample = sampler_state
{ texture = (ParticleDiffuse);
addressU = clamp;
addressV = clamp;
magFilter = linear; //linear;
minFilter = linear; //linear;
mipFilter = linear; //linear;
};
struct VSInput
{
// NOTE EmitterID has to be the same for all UV0.x in the object.
float4 Movement : POSITION; // MoveX, MoveY, MoveZ & (StartX ,StartY ,StartZ * StartPosScale )
float PSize : PSIZE; // ParticleStartSize
// float2 x y
float2 EmitterID_EmitterLifetime : TEXCOORD0; // UV0 = EmitterID, EmitterLifetime
float2 IncSize_JitterL_PLifeTimeH : TEXCOORD1; // UV1 = IncreaseSize, ParticleLifeTime&Jitter
float2 IncSizeRnd_JitterL_RollSpdH : TEXCOORD2; // UV2 = IncreaseSizeJitter, ParticleRollSpeed&ParticleRollSpeedJitter
float2 MoveRndX_PFadeInH_FadeOutL : TEXCOORD3; // UV3 = MoveJitterX, ParticleStartFadeIn&ParticleStartFadeOut
float2 MoveRndY_AlphaInH_AtlasSpdL : TEXCOORD4; // UV4 = MoveJitterY, AlphaStart&AtlasAnimSpeed
float2 MoveRndZ_StartRotL_GravityH : TEXCOORD5; // UV5 = MoveJitterZ, StartRotAngle&Gravity
float2 rnd1 : TEXCOORD6; // UV6 = Rand1x, Rand1y
float2 rnd2 : TEXCOORD7; // UV7 = Rand2x, Rand2y
};
struct VSOutput
{
float4 pos : POSITION;
float PSize : PSIZE;
float4 AlpXAnSpYTimeZ : COLOR; // Alpha and Atlas tile animation data;
float4 rotmatrix : COLOR1; // Rotation matrix to PS
float2 UV : TEXCOORD0;
};
struct PSInput
{
float4 AlpXAnSpYTimeZ : COLOR;
float4 rotationmatrix : COLOR1;
float2 UV : TEXCOORD0;
};
struct PSOutput
{ float4 Colour : COLOR;
};
float LCGRandomDirection(float RndSeedMod, float InitRandomSeed)
{
// function returns -1.0 to 1.0
return (InitRandomSeed*RndSeedMod) % 1.0;
}
float LCGRandom(float RndSeedMod, float InitRandomSeed)
{
// function returns 0.0 to 1.0
return abs((InitRandomSeed*RndSeedMod) % 1.0);
}
float4 GetParticleRotationMatrix(float Angle)
{
float x = radians(Angle) ; //1R=57.2957D
float c = cos(x);
float s = sin(x);
float4 rotationMatrix = float4(c, -s, s, c);
rotationMatrix *= 0.5;
rotationMatrix += 0.5;
return rotationMatrix;
}
float2 TileAnimation(float2 UVIn, float XDim, float YDim, float Speed, float Time)
{
float t = frac((seconds+Time)*(ceil(0xFFFF*Speed)-0xFFF)*0.01*TimeScale);// frac(seconds*(ceil(0xFFFF*Speed)-0xFFF)*0.01*TimeScale);
float2 scale = 1.0 / float2( XDim, YDim );
int index = int(16*t);
float2 UV = UVIn + float2( index % XDim, floor( index / YDim ) );
UV *= scale;
return UV;
}
float UnpackSecond(float InValue)
{
// Unpack second packed value
return InValue / 32768.0f; //0x10000;
}
float UnpackFirst(float InValue)
{
// Unpack first packed value , note this value is somewhat fluctuating due to float conversion
return InValue % 32768.0f;// 0x10000;
}
// ----- VS PS
VSOutput VShaderPersistent(VSInput In)
{
// ----- Persistent emitters, will live EmitterLifetime, if EmitterLifetime is zero it will live indefinitely.
VSOutput Out;
int i;
float NowTime =0.0; // we assume the computer has been on for more than zero seconds, may need to do a wrapparound in cpu code to detect a maxed out double integer and reset the values for the emitter if that happens.
int Ui;
int Li;
for (i = 0; i < EmittersRunning; i++) {
// determine the EmitterID this vertex belongs to, then unpack the Starttime and random seed of the emitter and apply it to the vertex transform.
if ((EmitterIDInstance[i/4][i%4]) == (In.EmitterID_EmitterLifetime.x)){
Ui = i / 4;
Li = i % 4;
NowTime = NowTimeInstance[Ui][Li];
}
}
// Get the emitter objects EmitterLifetime data
float EmitterLifeTime = In.EmitterID_EmitterLifetime.y;
// unpack packed emitter "Starttime" data for this vertex as injected to the shader.
// Yay! the EmitterLifeTime conditional actually compiles, but does it give improved "idle" performace on a limited time emitter?
if ((EmitterLifeTime/TimeScale) + NowTime > seconds ) {
Out.AlpXAnSpYTimeZ = 0.0;
// Unpack all UVin data in to the emitter variables.
float RndSeed = RandomSeedInstance[Ui][Li];
float Scale = EmitterScaleInstance[Ui][Li];
float StartPosScaleH = 0;
float StartPosScaleV = 0;
float ParticleLifeTime = UnpackSecond(In.IncSize_JitterL_PLifeTimeH.y)*0.001;
float ParticleLifeTimeJitter = UnpackFirst(In.IncSize_JitterL_PLifeTimeH.y)*0.001;
float ParticleRollSpeed = (UnpackSecond(In.IncSizeRnd_JitterL_RollSpdH.y)*0.1)-1.0;
float AlignTowardsMovement = ParticleRollSpeed; // if ParticleRollSpeed is negative we set the Align.
ParticleRollSpeed +=1.0;
float ParticleRollSpeedJitter = UnpackFirst(In.IncSizeRnd_JitterL_RollSpdH.y)*0.1;
float StartFadeIn = UnpackFirst(In.MoveRndX_PFadeInH_FadeOutL.y)*0.001;
float StartFadeOut = UnpackSecond(In.MoveRndX_PFadeInH_FadeOutL.y)*0.001;
float AlphaStart = UnpackFirst(In.MoveRndY_AlphaInH_AtlasSpdL.y)/0x7D00;
float AtlasAnimSpeed = UnpackSecond(In.MoveRndY_AlphaInH_AtlasSpdL.y);
float StartRotationAngle = floor(UnpackFirst(In.MoveRndZ_StartRotL_GravityH.y)-0x2000);
float Gravity = UnpackSecond(In.MoveRndZ_StartRotL_GravityH.y)-0x4000;
float4 Rnd = float4(In.rnd1.x,In.rnd1.y,In.rnd2.x,In.rnd2.y);
// Particles spawn and live ParticleLifeTime, no automatic respawn.
float VertLifeJitter= (ParticleLifeTimeJitter*LCGRandom(Rnd.w,RndSeed));
float t = ((seconds-NowTime)/ ParticleLifeTime*TimeScale);
float TimeScaleMod = t*(ParticleLifeTime-VertLifeJitter);
float StartTimeLine = smoothstep(NowTime, NowTime + StartFadeIn, seconds); // 1;//(seconds-NowTime) / ParticleLifeTime ; //
float FadeInTimeLine = smoothstep( 0.0,StartFadeIn/ParticleLifeTime, t); //saturate((seconds-NowTime) / StartFadeIn*TimeScale) ;
float FadeTimeLine = 1.0-smoothstep(1.0-(VertLifeJitter/ParticleLifeTime)-(StartFadeOut/ParticleLifeTime),1.0-(VertLifeJitter/ParticleLifeTime),t) ;//1.0-smoothstep(1.0-((StartFadeOut+VertLifeJitter)/(ParticleLifeTime)),1.0, t);
Out.AlpXAnSpYTimeZ.x = AlphaStart*FadeTimeLine*FadeInTimeLine;
// X,Y,Z movement/positional vectors of the particles, In.pos is used as movement data for the particles.
float4 WorldPosition = float4(
(StartPosScaleH*LCGRandomDirection(Rnd.x,RndSeed))+( ((In.Movement.x+(In.MoveRndX_PFadeInH_FadeOutL.x*LCGRandomDirection(Rnd.x,RndSeed)))* t)) * Scale
,(StartPosScaleV*LCGRandomDirection(Rnd.y,RndSeed))+( ((In.Movement.y+(Gravity*t)+(In.MoveRndY_AlphaInH_AtlasSpdL.x*LCGRandomDirection(Rnd.y,RndSeed)))* t)) * Scale
,(StartPosScaleH*LCGRandomDirection(Rnd.z,RndSeed))+( ((In.Movement.z+(In.MoveRndZ_StartRotL_GravityH.x*LCGRandomDirection(Rnd.z,RndSeed)))* t))* Scale
,1.0);
Out.pos = mul( mul( WorldPosition , WorldView ), projection );
float AlterSize = In.PSize + ((In.IncSize_JitterL_PLifeTimeH.x + (In.IncSizeRnd_JitterL_RollSpdH.x * LCGRandom(Rnd.w,RndSeed))) * t);
Out.PSize = mul(AlterSize , Scale) * mul(projection._m11 , HalfScreenPixelsY) / Out.pos.w;
// Rotation
float rot=0.0;
if (AlignTowardsMovement<0.0) {
// Align the textures top (north) towards the movement direction (or facing away from the origin).
float4 EmitterPosition = float4(0.0,0.0,0.0,1.0);
float4 OldPos = mul( mul( EmitterPosition, WorldView ), projection ) ;
OldPos.zw = 0;
rot = ((degrees(atan2(Out.pos.x-OldPos.x, Out.pos.y-OldPos.y))));
// spread from wiewpoint center !!! rot = (degrees(atan2((Out.pos.x),(Out.pos.y)))+180)/360.0; // saturate((ParticleRollSpeedJitter*LCGRandomDirection(In.random.x)) + StartRotationAngle); //(ParticleRollSpeedJitter*LCGRandomDirection(In.random.x))+
}
if (AlignTowardsMovement>0.0) {
rot += ((StartRotationAngle)*LCGRandomDirection(Rnd.w,RndSeed))+
(((ParticleRollSpeed) + ((ParticleRollSpeedJitter)*LCGRandomDirection(Rnd.w,RndSeed)))*TimeScaleMod);
// rot += !any(rot)*.001; // can't allow 0 angle, add a thousands of a revolution. -- old and no use, but a nice way to do it.
}
Out.rotmatrix = GetParticleRotationMatrix(rot);
// Set atlas animation
Out.AlpXAnSpYTimeZ.y = 0.0;
if (floor(AtlasAnimSpeed-0xFFF)) { // if atlas animation unpack the anim speed.
Out.AlpXAnSpYTimeZ.y = AtlasAnimSpeed/0xFFFF;
}
Out.AlpXAnSpYTimeZ.z=LCGRandom(Rnd.w,RndSeed)*TimeScaleMod; // randomize the time of the atlas animation, else the particles animates in unison and that does not look so good.
Out.UV=0.0;
} else {
// EmitterLifetime is up, idle... (the DBPro code also checks when time is up and Excludes the particle object for max performance.
Out.pos = 0;
Out.PSize=0;
Out.AlpXAnSpYTimeZ = 0.0;
Out.rotmatrix = 0.0;
Out.UV=0.0;
}
return Out;
}
PSOutput PShader_Rotate_Tiled(PSInput In)
{
PSOutput Out;
// Could add other effects as light illumination and maybe spherical billboarding.
// Not any performance gain?.... clip(any(In.AlpXAnSpYTimeZ.x)-1);
float2 UV =In.UV; // PSprite's UV's are created in the hardware. The VS - UV.Out is not the same as the one we get In, we have to output a UV from the VS though(?).
// In.rotationmatrix is a float4 rotation matrix calculated in the VS, we can't make the full rotation in the VS as the UV is not avaliable util it is inserted in to the PS.
UV = mul(UV-0.5, float2x2(In.rotationmatrix * 2 - 1));
// Tiled atlas animation, I really want to do this in the VS, loosing 1/4 performance doing it here :(
if (any(In.AlpXAnSpYTimeZ.y)) {
// not on atlas, will bleed over to the next tile on the edges ---- UV *= sqrt(2); // shrink to fit a quarter rotated texture to the tile.
UV += 0.5; // adjust to hit the center of the tile.
UV = TileAnimation(UV,AtlasTilesX,AtlasTilesY,In.AlpXAnSpYTimeZ.y,In.AlpXAnSpYTimeZ.z) ;
} else {
UV *= sqrt(2); // shrink to fit a quarter rotated texture to the tile.
UV += 0.5; // adjust to hit the center of the tile.
}
Out.Colour = tex2D(ParticleDiffuseSample, UV);
Out.Colour.a = Out.Colour.a*In.AlpXAnSpYTimeZ.x; // adjust the particle textures own alpha with the timeline calculated one.
return Out;
}
technique DEParticlesRT // Rotation & Tiling
{ pass p1
{
// Here we can set the max pointsizes of a particle, many large alphablended particles tiled taxes the fillrate / mem bandwith profusely.
FILLMODE = point;
POINTSPRITEENABLE = true;
POINTSCALEENABLE = true;
PointSize_Min = 1.0;
PointSize_Max = 499.0;
POINTSCALE_A = 0.0f;
POINTSCALE_B = 0.0f;
POINTSCALE_C = 1000.0f;
Zwriteenable = false;
ZEnable = true;
Cullmode = none;
AlphaBlendEnable = true;
// Normal addative blending transparency
DestBlend = INVSRCALPHA;
SrcBlend = SRCALPHA;
// DBPro Ghosted type particles
//DestBlend = One;
//SrcBlend = SRCALPHA;
VertexShader = compile vs_2_0 VShaderPersistent();
PixelShader = compile ps_2_0 PShader_Rotate_Tiled();
}
}
Quote: "The downside is that means more work for the PS - unless you move to SM3 which allows some texture lookups in the vertex shader. SM3 could be useful for this particular problem"
Yeah i feel it's going to be hard to use texture lookups for the parameters as VS model 3.0 is needed, maybe my obsession with keeping it VS 2.0 is misplaced.
Quote: "I usually find the PS is the bottle-neck."
I try to use as little of the PS as possible cause of the bottleneck
. I would like to have all of the rotation matrix transformation in the VS (did so in my first version, was based on your Vertex Particles demo
), however with Point Sprites it is not possible, have to do half and half in the VS/PS.
I made a video for now, it's my Small gnomes game that i have been working on for two years, dunno if it ever will be completed as i keep working on the details too much and change things
.
Anyways it's got a lot of particles and explosions, every explosion is made up of seven separate emitters: Blast concussion, fireball, flash, fire sparks, smoke trails, spawling cloud and residual smoke. Oh and a dirt cloud if it hits the ground.
Think i saw it top almost 4k particles at one point
. My old CPU only particle system would never have been able to run this, especially as i converted it from Dark Physics to use Newton physics that takes a lot of CPU in itself.
Regards
Duke