20x faster GPU programming with DBPro - A perlin noise tutorial

Author

Message

GrumpyOne

17

Years of Service

User Offline

Joined: 27th Nov 2007

Location: London, UK

Posted: 11th Aug 2014 17:28 Edited at: 11th Aug 2014 17:45

Link

In recent decades the performance of GPUs (in graphics cards) has increased immensely compared to CPUs with a result that now significant performance increases can be obtained by performing some processing on the GPU that would normally be achieved on the CPU. I’ve been working on implementing general purpose GPU programming with DBPro to greatly improve the performance of calculations using DBPro. I have succeeded and it is wasn't that hard. This tutorial will introduce the concept using a simplified perlin noise texture generation with a speed boost of >20x!

About the GPU

Before we get to the nitty gritty I guess we should say a word or two about the GPU and it’s capabilities since not all processing will run better on the GPU than the CPU. The key to understanding the GPU is that it, not surprisingly, is designed for processing of graphics, so vertexes, faces and pixels, however, since this involves a lot of vector and matrix mathematics it is also adept at really hard sums. This maths geekiness is hardwired into the GPU since it has registers designed with four dimensional vectors in mind (r,g,b,w). Processes involving really hard vector or matrix operations will work considerably faster on the GPU than on the CPU.

GPU performance is also boosted by parallel processing, which means it runs the same program on several GPUs at one time. Graphics cards have more parallel processors than CPUs, for example, a Geforce 6 has six processors in parallel to process vertex data and sixteen processors to processes pixels. We can, however, get these processors to do other stuff if we pretend the stuff is images. We can then get the GPU to do number crunching for us super fast.

What GPUs are not good at is communicating. One of the requirements of parallel processing is that there are significant restrictions on where data can be read from or to and when in GPU programs. This makes GPU programming somewhat fiddly since everything has its place in GPU code.

GPU Programs

We’re all familiar with GPU programs, they’re called shaders. This tutorial will not cover shader programming in too much detail, there are many tutorials out there on the subject. To start using the GPU to speed up DBPro you’ll need to know how to code in High Level Shader Language (HLSL).

We do need to know a few things about shader programming before we start. Perhaps the most important is that there are two main parts to a shader: (a) the vertex shader, and (b) the pixel shader. All these are is two small pieces of code, which look like functions, but are really more like separate programs.

The vertex shader performs operations on vertexes, most important of them being the position and normals of the vertex both in the world and relative to the camera (i.e. their projection). There is much cool stuff you can do in the vertex shader to manipulate model meshes, however, for most GP-GPU the vertex shader will simply act to pass our data to the pixel shader which will do most of the work. The vertex shader is run before the pixel shader for every vertex in a mesh. Each instance of the vertex shader is run in parallel on one of the vertex processors. Importantly for us, vertex shaders cannot read texture data (or at least only on the most modern graphics cards and even then they are slow at this).

The pixel shader performs operations on screen pixels. One instance of the pixel shader will run for every pixel in parallel on the pixel processors. Instances of the pixel shader are actually generated by the vertex shader since this creates faces from vertexes which are then used to generate pixels by the rasterizer. You can think of this processes like spawning. The vertex shader makes projections of faces that spawn an instance of the pixel shader for every pixel that should be in the triangular face. The pixel shader then processes the colour of that one pixel and passes it out to graphics memory for display. The key to understanding the pixel shader is that it has no control over where it’s output will be written, this has already been determined by the vertex shader and rasterizer.

Finally we should also mention the final part of every shader, the technique. This piece of code in a shader specifies the conditions under which the vertex and pixel shaders operate (e.g. back face culling, whether to enable z write), and which vertex or pixel shaders are run (we can have several in a single shader). The technique also allows us a rudimentary form of program control using passes to run data through several vertex and pixel shaders sequentially. Texture data can also be transferred from one pass to another using a RENDERCOLORTARGET.

Programming the GPU: In a nutshell

Although the GPU is designed and setup for processing graphics we can trick it into doing all sorts of useful stuff, all we need to do is present it with data in the form of a texture, write pixel shader code that does the really hard sums we want it to do, then let it pass back the results to us as an output texture. Since a texture is pretty much like an array of data, what we are doing is passing it a structured list of numbers, and getting back a resulting structured list of numbers which we can then read and use how ever we like.

When tricking the GPU into painting our fence for us (i.e. the Tom Sawyer method of coding

) we have to remember the limitations of the gullible boy doing the painting. The data we pass in will be in a texture, the larger the texture, the slower it is passed to the graphics card and back. The texture also has limited depth, so each pixel stores a number 0-255. We can, however, split our data up into 4 channels (red, green, blue and alpha) which means we can pass a much larger range of data in a texture if we stop thinking of it as a picture and start thinking of it as a limited array. The same, of course, is true of the data when it comes out of the shader.

As well as read/write issues GPUs have other limitations resulting from their parallel nature, of which program flow is perhaps the most different from CPU programming. Each pixel shader can be considered to be the inner code of a loop that runs over all pixels in the projected image (since the code is run for every pixel). Adding program flow commands to pixel shaders, however, results in significant decreases in performance since each parallel processor must process commands in unison and thus logic statements such as IF THEN are problematic. Loops are also a problem in pixel shaders unless they are very specific. The use of globals to define the total number of loops will, for example, prevent a shader from compiling.

A problematic limitation of GPUs is also the limited number of instructions (lines of code) that can be included in older shaders, which are more widely compatible, although use of functions allows additional instructions to be used.

To make the GPU do work for us in DBPro we only need to find a way to pass texture data too and get texture data from a shader.

You can read much more about general purpose GPU programming and concepts in this:

http://http.developer.nvidia.com/GPUGems2/gpugems2_frontmatter.html

GPU Example: Generating Perlin Noise

Perlin noise is commonly used in generating terrains, clouds and random textures but is computationally expensive to generate in DBPro. We can, for example, use memblocks to construct, scale and smooth the various layers of perlin noise we need to combined to generate a perlin noise texture. This method is, however, very slow for textures larger than 256x256 pixels and not appropriate for the generation of perlin noise in realtime. In this example we’ll generate some simplified perlin noise using a shader and return the texture generated and save it as a file. Since this is really only a demonstration of the concept of GP-GPU the algorithm I use in the shader to generate the noise is rather simple. It suffers from linear smoothing artifacts and is not seamless, it also has a fixed number of octaves. You’ll find more complex shaders for generating perlin noise that will do a better job. The great thing about this technique is you can just swap the guts of the shader over if you want it to do something else.

Here is the DBPro function that I’ve used to generate perlin noise. What this function does is generate a random noise image using memblock commands and load this to an image. This noise image will be used as an input texture to the perlin noise shader by creating a plane object and adding the texture to it with the texture object command. We then load the shader using load effect and apply the shader to the object we’ve just created. To grab the output of the shader is a little more tricky. The simplest solution is to use the camera to view the final texture on the object and copy this to a bitmap. To do this we must set the aspect, distance and range of the camera so that it captures the image without distortion (i.e.using a small angle isometric camera) and captures every pixel of the image. We can then grab an image from the bitmap using get image. You’d probably want to reset the camera aspect and range back to its original values after you’ve done this…but I have been bothered.

Function makePerlinWithGPU(seed, startingRes)
    
    Randomize Seed
    Local color as Dword
    dt=timer()

//Generate a single noise image at the highest resolution.
    
       memblockID  = memperlin
       if bitmap exist(1) then delete bitmap 1
       create bitmap 1, startingres, startingres
       set current bitmap 0
       
       if memblock exist(memblockID) then delete memblock memblockID
       make memblock from bitmap memblockID, 1

for x = 0 to startingres-1
           for y = 0 to startingres-1
               r=rnd(255)
               color = rgb(r,r,r)
               Write Memblock Dword memblockID, 12+(x + y*startingres)*4, color
           next
       next
       
       make image from memblock memperlin, memblockID
       
       delete memblock memblockID
       
       //Make an object and apply our shader
       if object exist(10) then delete object 10
       make object plane  10, 1, 1
       texture object 10, memperlin
       convert object fvf 10, 530 ` XYZ+NORMAL+TEX2
       if effect exist(2) then delete effect 2
       load effect "shaders\perlinnoise.fx", 2, 0
       set object effect 10, 2
       
       //Set up isometric square camera and calculate view distance and range to return full image
       dist# = 1.0/(2.0*tan(1))
       position camera 0, 0, dist#
       set camera range camdist#*0.75, camdist#*1.25
       set camera aspect 1.0   
       set camera fov 2
       point camera 0, 0, 0
       create bitmap 2, startingres, startingres
       sync
       get image memperlin+1, 0, 0, startingres, startingres, 3
       set current bitmap 0
    
endfunction

+ Code Snippet

Function makePerlinWithGPU(seed, startingRes)
    
    Randomize Seed
    Local color as Dword
    dt=timer()

    //Generate a single noise image at the highest resolution.
    
       memblockID  = memperlin
       if bitmap exist(1) then delete bitmap 1
       create bitmap 1, startingres, startingres
       set current bitmap 0
       
       if memblock exist(memblockID) then delete memblock memblockID
       make memblock from bitmap memblockID, 1

       for x = 0 to startingres-1
           for y = 0 to startingres-1
               r=rnd(255)
               color = rgb(r,r,r)
               Write Memblock Dword memblockID, 12+(x + y*startingres)*4, color
           next
       next
       
       make image from memblock memperlin, memblockID
       
       delete memblock memblockID
       
       //Make an object and apply our shader
       if object exist(10) then delete object 10
       make object plane  10, 1, 1
       texture object 10, memperlin
       convert object fvf 10, 530 ` XYZ+NORMAL+TEX2
       if effect exist(2) then delete effect 2
       load effect "shaders\perlinnoise.fx", 2, 0
       set object effect 10, 2
       
       //Set up isometric square camera and calculate view distance and range to return full image
       dist# = 1.0/(2.0*tan(1))
       position camera 0, 0, dist#
       set camera range camdist#*0.75, camdist#*1.25
       set camera aspect 1.0   
       set camera fov 2
       point camera 0, 0, 0
       create bitmap 2, startingres, startingres
       sync
       get image memperlin+1, 0, 0, startingres, startingres, 3
       set current bitmap 0
    
endfunction

The shader perlinnoise.fx is shown below. There are a couple of points to notice. The input texture is grabbed from the model. All the vertex shader does is project the model to the current view to generate one pixel shader per pixel of our input texture ensuring the pixel shader code will process every pixel of the input and produce one pixel of output in the same position. The pixel shader is very simple it takes the input textures and scales them one by one and combines them with a weight. The smoothing necessary to generate perlin noise is achieved by the interpolation of the tex2D function an is bilinear. This is about the simplest implementation of perlin noise and generates a texture that has some of the artifacts caused by bilinear smoothing and isn’t seamless since we aren’t using repeated boundary conditions in smoothing the image. However, the beauty of the technique is that the shader is independent of the DBPro code, we can easily use a different shader.

//Generates Simplified Perlin Noise
float4x4 WorldViewProj : WorldViewProjection;

int numOctaves=6;

texture inputTexture
< 
	string ResourceName = "";
>;

sampler2D inputTextureSample = sampler_state {
    Texture = <inputTexture>;
    MinFilter = Linear; 
    MagFilter = Linear; 
    MipFilter = Linear;
    AddressU = Wrap; 
    AddressV = Wrap;
};

struct inputData
{    
    float4 Position         : POSITION;
    float2 TextureCoords    : TEXCOORD0;
};

struct outputData
{    
    float4 Position         : POSITION;
    float2 TextureCoords    : TEXCOORD0;
};

//Vertex Shader passes on data to pixel shader including the texture coordinates of the texture
outputData PerlinVS(inputData IN)
{    
    outputData OUT;

float4 pos = mul( IN.Position, WorldViewProj );
    OUT.Position = pos;   
    OUT.TextureCoords = IN.TextureCoords;
    
    return OUT;    
}

//Pixel Shader combines weighted and scaled interpolated (smoothed) versions of texture

float4 PerlinPS(outputData IN): COLOR
{

float4 perlin = tex2D(inputTextureSample , IN.TextureCoords/pow(2,numOctaves-1))/pow(2,numOctaves-5);
    perlin += tex2D(inputTextureSample , IN.TextureCoords/pow(2,numOctaves-2))/pow(2,numOctaves-4);
    perlin += tex2D(inputTextureSample , IN.TextureCoords/pow(2,numOctaves-3))/pow(2,numOctaves-3);
    perlin += tex2D(inputTextureSample , IN.TextureCoords/pow(2,numOctaves-4))/pow(2,numOctaves-2);
    perlin += tex2D(inputTextureSample , IN.TextureCoords/pow(2,numOctaves-5))/pow(2,numOctaves-1);
    perlin += tex2D(inputTextureSample , IN.TextureCoords)/pow(2,numOctaves);

return perlin;   
}

technique PerlinNoise
{

pass Pass0 
    {
        VertexShader = compile vs_2_0 PerlinVS();
        PixelShader = compile ps_2_0 PerlinPS();
    }
}

+ Code Snippet

//Generates Simplified Perlin Noise
float4x4 WorldViewProj : WorldViewProjection;

int numOctaves=6;

texture inputTexture
< 
	string ResourceName = "";
>;

sampler2D inputTextureSample = sampler_state {
    Texture = <inputTexture>;
    MinFilter = Linear; 
    MagFilter = Linear; 
    MipFilter = Linear;
    AddressU = Wrap; 
    AddressV = Wrap;
};


struct inputData
{    
    float4 Position         : POSITION;
    float2 TextureCoords    : TEXCOORD0;
};

struct outputData
{    
    float4 Position         : POSITION;
    float2 TextureCoords    : TEXCOORD0;
};


//Vertex Shader passes on data to pixel shader including the texture coordinates of the texture
outputData PerlinVS(inputData IN)
{    
    outputData OUT;

    float4 pos = mul( IN.Position, WorldViewProj );
    OUT.Position = pos;   
    OUT.TextureCoords = IN.TextureCoords;
    
    return OUT;    
}

//Pixel Shader combines weighted and scaled interpolated (smoothed) versions of texture

float4 PerlinPS(outputData IN): COLOR
{   

    float4 perlin = tex2D(inputTextureSample , IN.TextureCoords/pow(2,numOctaves-1))/pow(2,numOctaves-5);
    perlin += tex2D(inputTextureSample , IN.TextureCoords/pow(2,numOctaves-2))/pow(2,numOctaves-4);
    perlin += tex2D(inputTextureSample , IN.TextureCoords/pow(2,numOctaves-3))/pow(2,numOctaves-3);
    perlin += tex2D(inputTextureSample , IN.TextureCoords/pow(2,numOctaves-4))/pow(2,numOctaves-2);
    perlin += tex2D(inputTextureSample , IN.TextureCoords/pow(2,numOctaves-5))/pow(2,numOctaves-1);
    perlin += tex2D(inputTextureSample , IN.TextureCoords)/pow(2,numOctaves);  

    return perlin;   
}


technique PerlinNoise
{


    pass Pass0 
    {
        VertexShader = compile vs_2_0 PerlinVS();
        PixelShader = compile ps_2_0 PerlinPS();
    }
}

Performance

Okay...proof of pudding time...the graph below shows the relative performance of the memblock method (CPU) compared to the shader method (GPU) in generating perlin noise. You can see at small sizes (64x64) the GPU method is twice the speed, however, at large sizes this factor raises to around 20x faster. Ultimately there is a plateau at around 17x faster, however, much of this plateau is due to the speed of the dark basic commands in setting up the noise function and creating bitmaps. The GPU speed enhancement in actually doing the calculations involved is much larger than 20x as large sizes.

In terms of realtime generation of perlin noise memblocks in this test took ~100 millisecs to generate a 128x128 perlin noise texture, which is probably the limit for realtime generation. The GPU method can generate a 512x512 texture in the same time. Actually most of the overhead here is still the memblock commands to generate a random noise image. If these were precomputed then even larger images could be generated realtime.

For precomputation of textures at startup or during level loading the GPU can process larger textures. In this test a 1024x1024 texture took 450 millisec to process, whilst a 2048x2048 texture took 2011 millisecs. Of course, the exact boost will depend on the system. These were tested on a system with an i7 quad core CPU and a nvidea GTX 675M graphics card.

I hope you agree, this is quite a mouthwatering pudding.

Other Applications

A huge range of image processing tasks would benefit from utilisation of the GPU such as blur, sharpening, gaussian smooth etc and would be significantly faster than even built-in CPU techniques. GPU processing is not, however, restricted to graphics, any data can be passed into an image and processed by the graphics card such as modelling of physics. The limitations of graphics cards, however, mean there are certain tasks that GPUs don’t do well, such as those involving many logical statements and requiring random read/write (e.g. AI pathfinding).

Future Developments?

You’ll have noticed that the method used to output data from a shader into an image so it can be used by the CPU (i.e. your dbpro code) was fiddly and causes a major slow down. Camera Effect shaders allow a texture to be passed back to an image, however, input would have to be from the camera. It should, however, be possible to make a new shader interface that simply inputs an image and outputs an image…say “image effect”…if the Games Creators can rewrite one of their DLLs to do that it could speed up GPU programs considerably.

Products
Forester Pro (tree and plant creator) - http://www.hptware.co.uk/forester.php
Medusa Pro (rock, cliff, cave creator) - http://www.hptware.co.uk/medusa.php
Mr Normal (Normal map generator) - http://www.hptware.co.uk/mrnormal.php

GrumpyOne - the natural state of the programmer - Forester Pro (Tree & Plant Creator), Medusa Pro (Rock Creator), Mr Normal (Normal Map Generator) http://www.hptware.co.uk

Back to top

Profile PM Website

GrumpyOne

17

Years of Service

User Offline

Joined: 27th Nov 2007

Location: London, UK

Posted: 11th Aug 2014 18:48

Link

Oh I should mention.

You'll also want to delete the object and it's effect after you implement it, otherwise it just keeps on running.

GrumpyOne - the natural state of the programmer - Forester Pro (Tree & Plant Creator), Medusa Pro (Rock Creator), Mr Normal (Normal Map Generator) http://www.hptware.co.uk

Back to top

Profile PM Website

Libervurto

18

Years of Service

User Offline

Joined: 30th Jun 2006

Location: On Toast

Posted: 12th Aug 2014 01:13

Link

Bookmarked.

Formerly OBese87.

Back to top

Profile PM Email

JackDawson

13

Years of Service

User Offline

Joined: 12th Jul 2011

Location:

Posted: 12th Aug 2014 17:17

Link

Holy moly mother of all that is holy.. WOW..

great work on that !!

Back to top

Profile PM

GrumpyOne

17

Years of Service

User Offline

Joined: 27th Nov 2007

Location: London, UK

Posted: 13th Aug 2014 18:56

Link

Of course, the little fly in the ointment is.... if you are already using a hefty shader to show pretty game related stuff the GPU may already be pretty busy. Useful for initial processing and for meaty calculations that would run slower on the CPU. I am also wondering whether it is possible to give stuff to the GPU, get on with CPU stuff, and pick it up later. Certainly CPU program flow doesn't sit and wait for the output of the shader, you can always pick it up later in the code. Kinda mail order processing

GrumpyOne - the natural state of the programmer - Forester Pro (Tree & Plant Creator), Medusa Pro (Rock Creator), Mr Normal (Normal Map Generator) http://www.hptware.co.uk

Back to top

Profile PM Website

Rudolpho

19

Years of Service

User Offline

Joined: 28th Dec 2005

Location: Sweden

Posted: 13th Aug 2014 19:32

Link

Quote: "I am also wondering whether it is possible to give stuff to the GPU, get on with CPU stuff, and pick it up later."

Using a compute shader the GPU will do its thing asynchronously; a possible blocking of the CPU is only issued once you want to return the resulting data from the GPU.
Such an approach will not work with vertex and pixel shaders since they are run and waited for during draw calls however.

To toot my own horn a bit, my DX11 plugin will include DirectCompute (basically the DirectX take on GPGPU) functionality in version 0.3.5.0, which with any luck will be released sometime later this week. In case you'd be interested in playing around with that

Anyway, nice, informative opening post!

"Why do programmers get Halloween and Christmas mixed up?"

+ Code Snippet

Because Oct(31) = Dec(25)

Back to top

Profile PM Email Website

GrumpyOne

17

Years of Service

User Offline

Joined: 27th Nov 2007

Location: London, UK

Posted: 15th Aug 2014 12:17

Link

Hi Rudolpho,

I am afraid I am shamefully ignorant about DirectX concepts. It is the essential 3D engine that DB calls to create objects. Does it also handle the interface with shaders...I am guessing so....my what a big hole in my knowledge.

GrumpyOne

GrumpyOne - the natural state of the programmer - Forester Pro (Tree & Plant Creator), Medusa Pro (Rock Creator), Mr Normal (Normal Map Generator) http://www.hptware.co.uk

Back to top

Profile PM Website

Green Gandalf

VIP Member

20

Years of Service

User Offline

Joined: 3rd Jan 2005

Playing: Malevolence:Sword of Ahkranox, Skyrim, Civ6.

Posted: 15th Aug 2014 14:29 Edited at: 15th Aug 2014 14:29

Link

GrumpyOne

You might like to look at this shader demo of mine. It's still a work in progress, partly because a recent change to Dark Shader makes it difficult to test (but still runs fine in DBPro

) and partly because it has problems with scaling and random number generation.

It runs quite quickly though so if I could fix the random number generation and tiling problems things would be fine. The tiled version works fine up to a point - and as you can see in the demo the result is seamless. I was hoping to include a version of this in a set of image processing shaders I'm working on but it's not really suitable in its present form.

Here's a screenshot showing 4 octaves tiled 3x3:

Project download in next post.

Powered by Free Banners

Attachments

Login to view attachments

Back to top

Profile PM Email

Green Gandalf

VIP Member

20

Years of Service

User Offline

Joined: 3rd Jan 2005

Playing: Malevolence:Sword of Ahkranox, Skyrim, Civ6.

Posted: 15th Aug 2014 14:42

Link

Project download for previous post.

Powered by Free Banners

Attachments

Login to view attachments

Back to top

Profile PM Email

Rudolpho

19

Years of Service

User Offline

Joined: 28th Dec 2005

Location: Sweden

Posted: 16th Aug 2014 01:44

Link

Quote: "I am afraid I am shamefully ignorant about DirectX concepts. It is the essential 3D engine that DB calls to create objects. Does it also handle the interface with shaders...I am guessing so"

Yes, DirectX is basically the means of communicating with the graphics card on Windows, so anything making use of the GPU is channeled through there. Even OpenGL wraps DirectX for rendering on Windows systems.
There are also some peripheral libraries that are considered part of DirectX aimed specifically at game creation, such as DirectSound (for audio processing) and DirectInput (for reading input from keyboards, mice and gamepads). The core component is actually called Direct3D, while the above together with Direct3D are called DirectX.

"Why do programmers get Halloween and Christmas mixed up?"

+ Code Snippet

Because Oct(31) = Dec(25)

Back to top

Profile PM Email Website

JackDawson

13

Years of Service

User Offline

Joined: 12th Jul 2011

Location:

Posted: 16th Aug 2014 02:28 Edited at: 16th Aug 2014 02:32

Link

@Rudolpho

Quote: "OpenGL wraps DirectX for rendering on Windows systems."

Can you show proof of this ? I have books 3 through 6 of the OpenGL bible and from what they say, that Kronos makes the OpenGL API that talks directly to the video cards ( AMD / NVidia / Intel ). DirectX is no where mentioned as getting used, whether wrapped or otherwise, under the windows OS or any OS for that matter.

From what the books say, they are two SEPARATE APIs from each other. One doesn't talk to the other. Which is why you have to choose DirectX OR OpenGL when it comes to graphic choices in games or software in general.

Both APIs talk directly to the video cards.

Back to top

Profile PM

Rudolpho

19

Years of Service

User Offline

Joined: 28th Dec 2005

Location: Sweden

Posted: 16th Aug 2014 11:49

Link

Quote: "Both APIs talk directly to the video cards."

I may have jumped the gun as this seems to indeed be the case.
It's been mentioned during my studies in a comparison chart if I recall correctly, however it might have been either misinformed, misinterpreted, or perhaps only valid for some certain functionality or older API versions.
From what I could gather when actually doing some research on this just now, you are indeed correct.

This notion does seem to be widespread as can be seen for example here though.

Quote: "If I am not wrong, then OpenGL must be implemented either using DirectX itself or on top of DirectX.
So doesn't it mean that on Windows OpenGL is really DirectX?"

Then this follows as part of the accepted answer which may be the basis for these assumptions;

Quote: "Vista and above ship with a OpenGL-1.4 emulation built on top of DirectX"

However it also continues to state that

Quote: "As soon as you install a GPU driver with OpenGL support, this completely replaces the OpenGL-1.4 emulation with an actual low-level implementation."

"Why do programmers get Halloween and Christmas mixed up?"

+ Code Snippet

Because Oct(31) = Dec(25)

Back to top

Profile PM Email Website

Green Gandalf

VIP Member

20

Years of Service

User Offline

Joined: 3rd Jan 2005

Playing: Malevolence:Sword of Ahkranox, Skyrim, Civ6.

Posted: 16th Aug 2014 18:09

Link

Here's an improved version of my earlier demo. This version first creates a 1024x1024 Perlin Noise image and then uses that image to texture two plains, one tiled 3x3 and the other untiled.

As you can see, the image is almost seamless - there is obviously a bug somewhere which is causing the image not to be seamless horizontally. However, this demo is working sufficiently well for me to post it. Hopefully I can get the bug(s?) out of it.

It runs at 40 fps on my laptop while recreating the image each loop (unnecessarily of course

).

Powered by Free Banners

Attachments

Login to view attachments

Back to top

Profile PM Email

JackDawson

13

Years of Service

User Offline

Joined: 12th Jul 2011

Location:

Posted: 16th Aug 2014 23:11 Edited at: 16th Aug 2014 23:27

Link

Quote: "This notion does seem to be widespread as can be seen for example here though."

I totally understand. It is a common mistake people assume. The books explain that OpenGL is built into the drivers from the Video Card manufactures who work with the Kronos team. When a video card is designed, it has certain functionality ( Registers etc ) that is built into the card, and MS ( for DirectX ) and Kronos ( for OpenGL ) make an API function that will talk to the added "features" of the card. It's why not all cards support "newest versions" ( notice plural ) of DirectX and OpenGL. Example : MAC OSX ( and its hardware as of the latest OSX version at the time I am writing this ) only supports up to OpenGL 3.2. This in turn means that any OpenGL features above that will not work. DirectX is a MS ONLY API and that means that ONLY MS software / hardware is designed to use it. When Linux users hacked away at it, they found they could port directX 9 over to Linux, but there is no Version 11 support. Meaning, DirectX 11 will always fail on Linux and it never works on MAC. MS made sure of it, since they don't want their API to work on other Operating System.. this is to ensure they stay the leading Operating System.

BUT.. times are changing and OpenGL is the one doing it. Slow yes.. but it is doing it with its REAL open source cross platform abilities.

The well known game company known as Valve, with their graphics engine, is using OpenGL for Linux.... and they even stated that its faster then DirectX. That's a bold claim. Either way, OpenGL is the future, unless MS gets the stick out their butt and opens DirectX to work on other OSes other then windows.

With MS pushing everyone toward 64Bit, 32bit only versions of software will become outdated and no longer work out of the box. This includes anything made from TGC if they don't start looking at 64Bit versions of their compilers now. And for those who don't know.. MS is dropping all support for 32 Bit in their up coming OSes. Just like they dropped it for 16Bit. Meaning, DBP will not work on their newer up coming OSes. AND THIS, is why DarkBasic Pro is in trouble. THIS is why so many of us are saying we want an OpenGL version of DBP. Some say "oh but your calling fowl before it happens, we have plenty of time.".. My retort.. have you seen how slowly TGC is with their products ? You aren't from around here are ya...

PowerBasic has already announced they are working on a 64Bit version to keep up with the times ( operating systems ). I wish TGC would do the same for DarkBasic.

Back to top

Profile PM

GrumpyOne

17

Years of Service

User Offline

Joined: 27th Nov 2007

Location: London, UK

Posted: 31st Aug 2014 17:16

Link

Hi Everyone,

I've found quite a bizarre problem with my code. If the size of the image required as output from my code (startingres in the snippet below) is larger than screen size...weirdness happens.

+ Code Snippet

       //Set up isometric square camera and calculate view distance and range to return full image
       dist# = 1.0/(2.0*tan(1))
       position camera 0, 0, dist#
       set camera range camdist#*0.75, camdist#*1.25
       set camera aspect 1.0   
       set camera fov 2
       point camera 0, 0, 0
       create bitmap 2, startingres, startingres
       sync
       get image memperlin+1, 0, 0, startingres, startingres, 3
       set current bitmap 0

Instead of capturing an image of the texture applied to the object, modified by the shader, it captures an image with a very odd aspect ratio and slightly larger than the object. This is very odd, since the code seems to check out. All I can think of is sync doesn't like rendering to larger than screen size? Or at least I encounter this issue for 2048x2048 images.

Again this is related to getting the image data back from the GPU. There is a work around. Grab segments of the image at the highest screen resolution and stitch them together into a bitmap. Alas this slows down the GPU method somewhat (but it is still much better than the CPU).

If you have an explanation why this is happening that is more technical than mine then please feel free to enlighten me.

GrumpyOne

GrumpyOne - the natural state of the programmer - Forester Pro (Tree & Plant Creator), Medusa Pro (Rock Creator), Mr Normal (Normal Map Generator) http://www.hptware.co.uk

Back to top

Profile PM Website

Rudolpho

19

Years of Service

User Offline

Joined: 28th Dec 2005

Location: Sweden

Posted: 31st Aug 2014 18:52

Link

It may be that bitmaps have some such restrictions. You really should try using set camera to image instead as this will be much faster than get image. Unless you manually need to edit the image on the CPU it will work perfectly for sending a render target along as a texture to a shader as well.

"Why do programmers get Halloween and Christmas mixed up?"

+ Code Snippet

Because Oct(31) = Dec(25)

Back to top

Profile PM Email Website

GrumpyOne

17

Years of Service

User Offline

Joined: 27th Nov 2007

Location: London, UK

Posted: 1st Sep 2014 18:49

Link

Hi Rudolpho,

Thanks for the suggestion. I've just been playing with set camera to image and can't get it to work (it captures a completely blank image).

+ Code Snippet

       dist# = 1.0/(2.0*tan(1))
       make camera 1
       position camera 1, 0, 0, -dist#
       set camera range 1, dist#*0.75, dist#*1.25
       if image width(memin)=image height(memin)
            set camera aspect 1, 1.0   
       else
            set camera aspect 1, image width(memin)/(1.0*image height(memin))
       endif
       set camera fov 1, 2
       point camera 1, 0, 0, 0
       set camera to image 1, memout, image width(memin), image height(memin)
       sync mask 0
       set current bitmap 0
       delete camera 1
       save image "temp\grey.bmp", memout

I also note the image must be a power of 2...which means if the user wants a 1056x1056 image (for some odd reason) then would need to cut a portion of a 2048x2048 image...which will no doubt be pretty slow.

I'll try capture the image in bits and paste them together. Thanks for the suggestion tho'

Best,
GrumpyOne

GrumpyOne - the natural state of the programmer - Forester Pro (Tree & Plant Creator), Medusa Pro (Rock Creator), Mr Normal (Normal Map Generator) http://www.hptware.co.uk

Back to top

Profile PM Website

Green Gandalf

VIP Member

20

Years of Service

User Offline

Joined: 3rd Jan 2005

Playing: Malevolence:Sword of Ahkranox, Skyrim, Civ6.

Posted: 2nd Sep 2014 00:46

Link

You haven't sync'ed the camera - you've only specified the mask.

Powered by Free Banners

Back to top

Profile PM Email

GrumpyOne

17

Years of Service

User Offline

Joined: 27th Nov 2007

Location: London, UK

Posted: 2nd Sep 2014 18:18

Link

Thx GG,

Oddly it still didn't work with sync until I ensured the image number was being used for nothing else. Not sure why this matters, perhaps it had somehow become locked elsewhere in my code.

GrumpyOne - the natural state of the programmer - Forester Pro (Tree & Plant Creator), Medusa Pro (Rock Creator), Mr Normal (Normal Map Generator) http://www.hptware.co.uk

Back to top

Profile PM Website

Green Gandalf

VIP Member

20

Years of Service

User Offline

Joined: 3rd Jan 2005

Playing: Malevolence:Sword of Ahkranox, Skyrim, Civ6.

Posted: 2nd Sep 2014 18:37

Link

I think we need to see how you've been using that image ID - including the order you used it in. Generally I wouldn't use that ID for anything else - other than standard usage of that particular image. You should be able to update it every sync if you need to - as in real time bloom etc.

Just noticed that your sync mask was set for camera 0, i.e. the default screen camera, not the new camera 1 (which would be sync mask 2).

Powered by Free Banners

Back to top

Profile PM Email

GrumpyOne

17

Years of Service

User Offline

Joined: 27th Nov 2007

Location: London, UK

Posted: 2nd Sep 2014 19:44

Link

Hi GG,

Thx. I managed to get it working then saw your advice.

Once I noticed I wasn't using sync (duh), I discovered that set camera to image doesn't like using an image number used for anything else (I'd guess in particular ones already used to create memblocks...but who knows what else). Then I discovered that if you call set camera to image again, and change the image size, it syncs at the original size but outputs an image of the new size (i.e. truncates if it is smaller). I found to use set camera to image with the same image number you can't just delete the image, this results in an "unknown image" message. You need to delete the camera before deleting the image, then you can recreate the camera and set camera to image again.

Phew....imagine how little hair I have left

GrumpyOne - the natural state of the programmer - Forester Pro (Tree & Plant Creator), Medusa Pro (Rock Creator), Mr Normal (Normal Map Generator) http://www.hptware.co.uk

Back to top

Profile PM Website

Green Gandalf

VIP Member

20

Years of Service

User Offline

Joined: 3rd Jan 2005

Playing: Malevolence:Sword of Ahkranox, Skyrim, Civ6.

Posted: 2nd Sep 2014 20:55

Link

Quote: "You need to delete the camera before deleting the image, then you can recreate the camera and set camera to image again."

Why do you need to keep deleting/recreating the camera?

Anyway, glad you got it working.

Quote: "Phew....imagine how little hair I have left"

Just imagine how grey I am.

Powered by Free Banners

Back to top

Profile PM Email

Rudolpho

19

Years of Service

User Offline

Joined: 28th Dec 2005

Location: Sweden

Posted: 2nd Sep 2014 21:02

Link

Quote: "I discovered that set camera to image doesn't like using an image number used for anything else"

That's because it most certainly creates a new image internally and assigns that whatever ID you give it.
A good rule of thumb with things like this is that you cannot bind anything as both input and output for a shader at the same time. So for example you cannot render an object using a certain texture if that texture is also the render target currently being rendered to. DirectX 11 will automatically unbind the texture (not the render target) if this happens, I believe DX9 will as well, but it might be it causes undefined behaviour as well.

"Why do programmers get Halloween and Christmas mixed up?"

+ Code Snippet

Because Oct(31) = Dec(25)

Back to top

Profile PM Email Website

Green Gandalf

VIP Member

20

Years of Service

User Offline

Joined: 3rd Jan 2005

Playing: Malevolence:Sword of Ahkranox, Skyrim, Civ6.

Posted: 2nd Sep 2014 22:03

Link

Quote: "A good rule of thumb with things like this is that you cannot bind anything as both input and output for a shader at the same time."

Yes. That rule works for me too. So I usually create the image in one render pass, to camera 1 for example, and then use it as a texture for something when I render camera 0 in a separate sync. You should be able to save the image at any point after it has been rendered.

Powered by Free Banners

Back to top

Profile PM Email

Sorry your browser is not supported!

DarkBASIC Professional Discussion / 20x faster GPU programming with DBPro - A perlin noise tutorial

Attachments

Attachments

Attachments