Adding Support for Textures in Engine


Textures are n-dimensional arrays which can contain any information. Usually when talking about textures, we talk about 2D arrays containing color even though there are other types of textures such as height maps, normal maps, specular maps, environment maps and others. Some of them I will be adding to my engine in the later stages. This post is about adding support for 2D textures containing color. Since textures are just arrays, they can be accessed as an array by using co-ordinates. These co-ordinates are called as Texture Co-ordinates or TEXCOORDs or UVs (which is how we will describe them from now on). We get these UVs from mesh authoring programs such as maya. I am programming these UVs to my mesh format and then exporting them as binary. We also specify which texture we want to use for a material in our material file. If there is no texture file specified, we use the default texture which is just white.

Material =
EffectLocation = "Effects/Effect1.Effectbinary",
ConstantType = "float4",
ConstantName = "Color",
ConstantValue = {0,0,0,1},
TextureLocation = "Textures/water.jpg",

We then add support for textures in our Input layout in C++, vertexLayoutShader and any other shaders that are using textures.

auto& positionElement = layoutDescription[2];
positionElement.SemanticName = "TEXCOORD";
positionElement.SemanticIndex = 0; // (Semantics without modifying indices at the end can always use zero)
positionElement.Format = DXGI_FORMAT_R32G32_FLOAT;
positionElement.InputSlot = 0;
positionElement.AlignedByteOffset =offsetof(eae6320::Graphics::VertexFormats::sMesh, u);
positionElement.InputSlotClass = D3D11_INPUT_PER_VERTEX_DATA;
positionElement.InstanceDataStepRate = 0; // (Must be zero for per-vertex data)

We also add a sampler state. It specifies how the texture is to be sampled. There can be only one sampler state for the entire application or there can be multiple which can be bound when needed. I am having only one right now which I bind at the start of application.

Once we add support for these in the application code, we have to then sample the texture in the shaders. We perform that using the ‘sample’ instruction in hlsl in which we pass in the texture and the coordinates and get the color at that coordinate. I am performing the sampling in my fragment shader using the function

SampleTexture2d(g_diffuseTexture, g_samplerState, i_textureData)

This expands to i_texture.Sample( i_samplerState, i_uv ).

I then multiply it with the vertex colors to get the final output color.

Meshes with textures on them.

Adding Diffuse lighting

What is Diffuse Light?
A diffuse reflection is the reflection of light by a surface such that the incident light is scattered at many angles instead of one angle. Diffuse lighting occurs when the light is scattered by centers beneath the surface as shown in the picture. A non-shiny surface would have a very high diffuse value. The amount of diffuse lighting emitted by the surface is controlled by the lambetian reflectance, given by lambert’s cosine law.

Most of the surfaces in the real world are diffuse by nature. The main exceptions include particularly shiny surfaces such as mirrors and metals.
Calculating Diffuse Light
I am using the following equation to calculate the diffuse light

(g_LightColor * (saturate(dot(normalize(g_LightRotation), normalize(i_normals)))) + ambientLight)

Where g_LightColor is the color of the directional light, g_LightRotation is the forward direction of the light. These two values are passed from the application in the per frame constant buffer. i_normals is the value of the normal of that fragment. The normal of a surface is defined as the vector that is perpendicular to the surface.
In the below video you can see that I am simulating a warning light beacon.

Creating shaders based on various object transforms

Before we start discussing the topic, here is a small background about shaders and how data is sent from C++ to GPU.

Shader: Shaders are special programs that reside on the GPU to create custom effects which are not possible/hard to recreate on the CPU side. There are two main types of shaders. They are the vertex and the fragment/pixel shader. As their name suggests, the vertex shader acts on each vertex of a triangle and the fragment shader acts on each fragment that is encompassed by that triangle.

The main output from vertex shader is the position of a vertex with respect to the window coordinates. We can also output any other data we like from vertex shader with the color being the most common. All the outputs from vertex shaders are then passed onto fragment shaders where they are interpolated between the vertices.

Fragment shaders run for each fragment inside the triangle and output a color which is then displayed at the location that is output by the vertex shader. There are many other types of shaders such as geometry shaders, tesselation shaders, and mesh shaders, but we will only use vertex and fragment shaders for now.

When we send data from C++ the primary information that we submit is the vertex data. Apart from this, we can also send data using constant buffers. Constant buffers contain data which is constant over a frame or draw call. In our engine, we have two constant buffers, one is the constant frame buffer, and other is the constant draw call buffer which looks like the following.

struct sPerFrame
Math::cMatrix_transformation g_transform_worldToCamera;
Math::cMatrix_transformation g_transform_cameraToProjected;

Math::sVector g_CameraPositionInWorld;
float padding0;

float g_elapsedSecondCount_systemTime = 0.0f;
float g_elapsedSecondCount_simulationTime = 0.0f;
float padding1[2]; // For float4 alignment

struct sPerDrawCall
Math::cMatrix_transformation g_transform_localToWorld;
Math::cMatrix_transformation g_transform_localToProjected;

To use the above constant buffers, we need to declare them in shaders as shown below. Shaders for Direct3D are written in HLSL while those for OpenGL are written in GLSL. Since we will be working only with Direct3D, we will use HLSL.

cbuffer g_constantBuffer_perFrame : register( b0 )
float4x4 g_transform_worldToCamera;
float4x4 g_transform_cameraToProjected;

float3 g_CameraPositionInWorld;
float g_padding0;

float g_elapsedSecondCount_systemTime;
float g_elapsedSecondCount_simulationTime;
// For float4 alignment
float2 g_padding1;

cbuffer g_constantBuffer_perDrawCall : register( b2 )
float4x4 g_transform_localToWorld;
float4x4 g_transform_localToProjected;

We use the above matrices in our shaders to create effects based on the position of the object relative to the world or camera. Since we also output more data than just position from the vertex shader, I created a struct which contains all the data that I pass from vertex shader and which can be easily accessed in the fragment shader. The struct can be modified as necessary to include additional data.

struct VS_OUTPUT
float4 o_vertexPosition_projected : SV_POSITION;
float4 o_vertexPosition_local : TEXCOORD1;
float4 o_vertexColor_projected : COLOR;
float4 o_vertexColor_local : TEXCOORD2;
  1. Creating a shader that is independent of its position in the world.

To create a shader that has effects which are independent of world position, we need to use the local position of the fragment. This is what I am doing in my fragment shader

o_color = float4(floor(sin(i_VSInput.o_vertexPosition_local.x) / cos(i_VSInput.o_vertexPosition_local.x)),floor(sin(i_VSInput.o_vertexPosition_local.y) / cos(i_VSInput.o_vertexPosition_local.y)),floor(sin(i_VSInput.o_vertexPosition_local.z) / cos(i_VSInput.o_vertexPosition_local.z)), 1.0)* i_VSInput.o_vertexPosition_local;

This produces the following output:

2. Creating effect through which object can move through: The fragment shader output code is similar to the above except instead of using the local vertex position, we use the world position of the vertex.

3. Creating a grow and shrink effect: Until now, we were only modifying the fragment shader, but to create a grow and shrink effect, we need to change the positions of vertices. We do it by creating a scaling matrix. We then modify the scaling matrix value based on the time and multiply it to local position. The rest of the transformations remain the same.

float s = (sin(g_elapsedSecondCount_simulationTime) + 0.5) + 1;
float4x4 scale= {
// Transform the local vertex into world space
float4 vertexPosition_world;
float4 vertexPosition_local = float4( i_vertexPosition_local, 1.0 );
vertexPosition_local = mul(scale, vertexPosition_local);
vertexPosition_world = mul(g_transform_localToWorld, vertexPosition_local);}

Even though this method works, it might not be the most optimized. So, instead of matrix multiplication, we can use scalar multiplication to get the same results.

float s = (sin(g_elapsedSecondCount_simulationTime) + 0.5) + 1;
// Transform the local vertex into world space
float4 vertexPosition_world;
// This will be done in a future assignment.
// For now, however, local space is treated as if it is world space.
float3 scaledLocalPosition = i_vertexPosition_local * s;
float4 vertexPosition_local = float4(scaledLocalPosition, 1.0 );
vertexPosition_world = mul(g_transform_localToWorld, vertexPosition_local);

4. Changing the effect on an object based on its proximity to the camera: To find the distance between the object and camera, we find the length between the camera position and world position of the vertex. I am then lerping between the current vertex color to red based on a clamped value of the distance to the far plane of the camera.

const float4 color = {1,0,0,1};
const float distance = length((g_CameraPositionInWorld - (vertexPosition_world).xyz));
output.o_vertexColor_projected = lerp(color, i_vertexColor_local, saturate(distance/100));

Real Time Rendering – Combining transforms.

;For every object that are being rendered in the scene, we currently have three transformations. They are local space to world space, world space to camera space and finally camera space to projected space or the 2D plane of the screen. All these transformations are performed in the vertex shader. As you all know, vertex shader is called for every vertex in the object. Even though modern GPUs are massively parallel, calculating these matrices are still computationally intensive. Hence, instead of calculating these in the vertex shader, we calculate them in C++ and then pass the one single transform from local to projected space as a constant buffer to the shader, where we can then pass this value to the output. A constant buffer is one which does not change over a draw call. This reduces the number of instructions vertex shaders compile to.

This slideshow requires JavaScript.


As you can see from the above slideshow, the number of instructions of shaders before passing the local to projected transform and after have changed considerably. If we are using, some other transforms such as world to camera or local to world; we can pass them in the constant frame buffer as these transforms will not change throughout a frame.

We calculate local to projected space transform by multiplying local to world, world to camera and camera to projected space matrices together. The order in which we concatenate matrics does not matter as long as the camera to projected transform or the result of it multiplied to other matrices is on the left-hand side of the operation. This is because the camera to projected transform is not an affine transformation.

I am doing in the following way since I want to save the local to camera transform in my per draw call constant buffer for my speciality shaders from assignment 2 to work since it is hard to get back the other transforms from local to projected transform as we have to convert 2D space into 3D space

local to camera = world to camera * local to world;

local to projected = camera to projected * local to camera;


Optimizing the render pipeline

The render pipeline refers to the order in which we first submit data to render, then bind the shaders and finally draw the triangles before switching the buffers. The more optimized the pipeline, the faster we are able to send data to the GPU, which results in higher frame rates and more responsive games. Currently, in the engine, we are binding an effect every time we are drawing an object(which is every draw call). This might not make any impact for small games like mine, but for any moderately sized game, this starts to become a bottleneck.

So, the first thing we need to do is to figure out a way of not binding effects for every draw call i.e., we need to bind an effect once and then draw all the meshes which are using that effect. The way we can accomplish this is by using render commands. Render commands are essentially unsigned integers which represent a rendering operation. There can be many types of render commands such as one for drawing, one for transparency etc. The render command that we use in this assignment is for drawing an object.

I am representing a render command as an unsigned 64-bit integer mainly because of the number of bits such data type can provide. As you will see below, we need as many bits as possible to create render commands, so that we can effectively store data for a particular one. When storing data in a render command, the order is important. The objects which have the highest priority are sorted in MSB and the least are stored in LSB and when we sort all the similar render commands will be sorted from lowest to highest.

Since we need to only bind effect once, it has the highest priority and hence goes in the MSB. We need to make sure that we have enough bits to store all the possible effects we could have in game. I am using 7 bits for effects, which gives me around 128 effects that I can have. The data I am storing for effects is the index at which the effect is stored by the manager. This is similar to how I am storing the meshes in the render command too. Since meshes are not that important for sorting order, I am storing them in the last 7 bits of my render command.

Also when drawing objects, the objects that are closer to the camera must be drawn first compared to objects which are farther. So, we calculate the z-depth from the camera to each object and the scale it with respect to the number of bits we are allocating and then store the value in render command. This has the second highest priority after effects. So, my final render command structure looks like the following.

After storing all the render commands for each mesh, I use std::sort to sort everything in descending order. I then check if an effect is currently bound and if not, bind it, else draw the mesh.

Below you can see two videos captured from various angles to show how all meshes of one effect are drawn first based on the camera distance and then the others.

Below are the screenshots from graphics analyzer showing how one effect is bound first and then all the meshes are drawn before another effect is bound.

The orange rectangles highlight when effects are bound and green shows when meshes are drawn.


Optional Challenges:

The graphics analyzer highlights calls which set states which are already set in the previous frame and are not needed to be set again as shown below.

 ©John-Paul Ownby.

This is happening because I am setting vertex buffers in the draw call to DirectX. To optimize this, I set these buffers only when I switch meshes. The output looks like this.