当前位置:   article > 正文

Scriptable Render Pipeline-Custom Shaders_batchrenderergroup requires the use of a scriptabl

batchrenderergroup requires the use of a scriptablerenderpipeline

https://catlikecoding.com/unity/tutorials/scriptable-render-pipeline/custom-shaders/

1 custom unlit shader

although we have used the default unlit shader to test our pipeline, taking full advantage of nontrivial custom pipeline requires the creation of custom shader to work with it. so we are going to create shader of our own, replacing unity’s default unlit shader.

1.1 creating a shader

a shader asset can be created via one of the options in the Assets/Create/Shader menu. the unlit shader is most appropriate, but we are going to start fresh, by deleting all the default code from the created shader file. name the asset unlit.

the fundamentals of shader files are explained in rendering 2, shader fundamentals. give it a read if u are unfamiliar with writing shaders so u know the basic. the minimum to get a working shader is to define a shader block with a Properties block plus a SubShader block with a Pass block inside it. unity will turn that into a default white unlit shader. after the shader keyword comes a string that will be used in the shader dropdown menu for materials. we will use My Pipeline/Unlit for it.

Shader "My Pipeline/Unlit" {
	
	Properties {}
	
	SubShader {
		
		Pass {}
	}
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9

adjust the unlit opaque material so it uses our new shader, which will turn it white, if it were not already.

1.2 hlsl
to write our own shader, we have to put a program inside its Pass block. unity supports either glsl or hlsl program. while glsl is used in the default shader and also in rendering 2, shader fundamentals, unity’s new rendering pipeline shaders use hlsl, so we will use that for our pipeline too. that means that we have to out all our code in between an HLSLPROGRAM and ENDHLSL statement.

Pass {
			HLSLPROGRAM
			
			ENDHLSL
		}
  • 1
  • 2
  • 3
  • 4
  • 5

what is the difference between glsl and hlsl programs???

in practice, unity uses virtually the same syntax for both and takes care of converting to the appropriate shader code per build target. the biggest difference is that glsl programs include some code by default. hlsl programs do no do anything implicitly, requiring us to include anything that we need explicitly. that is fine, because the old glsl include files are weighed down by old and obsolete code. we will rely on newer hlsl include files instead. glsl会默认包含一些文件,而hlsl则需要们明确指定包含那些需要的文件。这样很好,因为glsl默认的包含的文件可能过时了,会受到影响,所以还是使用hlsl。

at minimum, a unity shader requires a vertex program and a fragment program function, each defined with a pragma compiler directive. w ewill use UnlitPassVertex for the vertex function and UnlitPassFragment for the other. but we will not put the code for these functions in the shader file directly. instead, we will put the hlsl code in a separate include fine, which we will also name unlit, but with the hlsl extension. put it the same foler as unlit.shader and then include it the hlsl program, after the pragma directives.

HLSLPROGRAM
			
			#pragma vertex UnlitPassVertex
			#pragma fragment UnlitPassFragment
			
			#include "Unlit.hlsl"
			
			ENDHLSL
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8

unfortunately, unity does not have a convenient menu item for the creation of an hlsl include file asset. u will have to create it yourself, for example by duplicating the unlit.shader file, changing its file extension to hlsl and removing the shader code from it.

inside the include file, begin with an include guard to prevent duplicating code in case the files gets included more than once. while that should never happen, it’s good practice to always to do this for every include file.

#ifndef MYRP_UNLIT_INCLUDED
#define MYRP_UNLIT_INCLUDED

#endif // MYRP_UNLIT_INCLUDED
  • 1
  • 2
  • 3
  • 4

at minimum, we have to know the vertex position in the vertex program, which has to output a homogeneous clip-space position. so we will define an input and an output structure for the vertex program, both with a single float4 position.

#ifndef MYRP_UNLIT_INCLUDED
#define MYRP_UNLIT_INCLUDED

struct VertexInput {
	float4 pos : POSITION;
};

struct VertexOutput {
	float4 clipPos : SV_POSITION;
};

#endif // MYRP_UNLIT_INCLUDED
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12

next, we will define the vertex program function, UnlitPassVertex. for now, we will directly use the object-space vertex position as the clip-space position. that is incorrect, but is the quickest way to get a compiling shader. we will add the correct space conversion later.

struct VertexOutput {
	float4 clipPos : SV_POSITION;
};

VertexOutput UnlitPassVertex (VertexInput input) {
	VertexOutput output;
	output.clipPos = input.pos;
	return output;
}

#endif // MYRP_UNLIT_INCLUDED
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

we keep the default white color for now, so our fragment program function can simply return 1 as a float4. it receives the interpolated vertex output as its input, so add that as a parameter, even though we do not use it yet.

VertexOutput UnlitPassVertex (VertexInput input) {
	VertexOutput output;
	output.clipPos = input.pos;
	return output;
}

float4 UnlitPassFragment (VertexOutput input) : SV_TARGET {
	return 1;
}

#endif // MYRP_UNLIT_INCLUDED
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

should we use half or float???

most mobile gpus support both precision types, half being more efficient. so if u are optimizing for mobiles it makes sense to use half as much as possible. the rule is to use float for positions and texture coordinate only and half for everything else, provided that the results are acceptable.

when not targeting mobile platforms, precision is not an issue because the gpu always uses float, even if we write half. i will consistently use float in this tutorial.

there also the fixed type, but it is only really supported by old hardware that u would not target for modern app. it is usually equivalent to half.

1.4 transformatoin matrices

at this point we have a compiling shader, although it does not produce sensible results yet. the next step is to convert the vertex position to the correct space. if we had a model-view-projection matrix then we could convert directly form object space to clip space, but unity does not create such a matrix for us. it does make the model matrix available, which we can use to convert from object space to world space. unity expects our shader to have a float4x4 unity_ObjectToWorld variable to store the matrix. as we are working with hlsl, we have to define that variable ourselves. then use it to convert to world space in the vertex function and use that for its output.

float4x4 unity_ObjectToWorld;

struct VertexInput {
	float4 pos : POSITION;
};

struct VertexOutput {
	float4 clipPos : SV_POSITION;
};

VertexOutput UnlitPassVertex (VertexInput input) {
	VertexOutput output;
	float4 worldPos = mul(unity_ObjectToWorld, input.pos);
	output.clipPos = worldPos;
	return output;
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16

next, we need to convert from world space to clip space. that is done with a view-projection matrix, which unity makes available via a float4x4 unity_MatrixVP variable. add it and then complete the conversion.

float4x4 unity_MatrixVP;
float4x4 unity_ObjectToWorld;

…

VertexOutput UnlitPassVertex (VertexInput input) {
	VertexOutput output;
	float4 worldPos = mul(unity_ObjectToWorld, input.pos);
	output.clipPos = mul(unity_MatrixVP, worldPos);
	return output;
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

our shader now works correctly. all objects that use the unlit material are once again visible, fully white. but our conversion is not as efficient as it could be, because it is performing a full matrix multiplication with a 4D position vector. the fourth component of the position is always 1. by making that explicit we make it possible for the compiler to optimize the computation.

float4 worldPos = mul(unity_ObjectToWorld, float4(input.pos.xyz, 1.0));
  • 1

1.4 constant buffers

unity does not provide us with a model-view-projection matrix, because that way a matrix multiplication of the M and VP matrices can be avoided. besides that, the VP matrix can be reused for everything that gets drawn with the same camera during a frame. unity’s shaders takes advantage of that fact and put the matrices in different constant buffers. although we define them as variables, their data remains constant during the drawing of a single shape, and often longer than that. the VP matrix gets put in a per-frame buffer, while the M matrix goes put in a per-draw buffer.

while it is not strictly required to put shader variables in constant buffers, doing so makes it possible for all data in the same buffer to be changed more efficiently. at least, that is the case when it is supported by the graphics api. opengl does not.

to be as efficient as possible, we will also make use of constant buffers. unity puts the VP matrix in a UnityPerFrame buffer and the M matrix in a UnityPerDraw buffer. there is more data that gets put in these buffers, but we do not need it yet so there is no need to include it. a constant buffer is defined like a struct, except with the cbuffer keyword and the variables remain accessible as before.

cbuffer UnityPerFrame {
	float4x4 unity_MatrixVP;
};

cbuffer UnityPerDraw {
	float4x4 unity_ObjectToWorld;
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

1.5 core libray
because constant buffers do not benefit all platforms, unity’s shaders rely on macros to only use them when needed. the CBUFFER_START macro with a name parameter is used instead of directly writing cbuffer and an accompanying CBUFFER_END macro replaces the end of the buffer. let us see that approach as well.

CBUFFER_START(UnityPerFrame)
	float4x4 unity_MatrixVP;
CBUFFER_END

CBUFFER_START(UnityPerDraw)
	float4x4 unity_ObjectToWorld;
CBUFFER_END
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

that results in a compiler error, because those two macros are not defined. rather than figure out when it is appropriate to use constant buffers and define the macros ourselves, we will use of unity’s core library for render pipelines. it can be added to our project via the package manager window. switch to the all packages list and enable show preview packages under advanced, then select render pipelines.core, and install it. i am using verison 4.6.0-preview, the highest version that works in unity 2018.3.

now we can include the common library functionality, which we can access via Packages/com.unity.render-pipelines.core/ShaderLibrary/Common.hlsl. it defines multiple useful functions and macros, along with the constant buffer macros, so include it before using them.

#include "Packages/com.unity.render-pipelines.core/ShaderLibrary/Common.hlsl"

CBUFFER_START(UnityPerFrame)
float4x4 unity_MatrixVP;
CBUFFER_END
  • 1
  • 2
  • 3
  • 4
  • 5

1.6 compilation target level

our shader works again, at least for most platforms. after including the library, our shader fails to compile for opengl es 2. that happens because by default unity uses a shader compiler for opengl es 2 that does not work with the core library. we can fix that by adding #pragma prefer_hlslcc gles to our shader, which is what unity does for its shaders in the Lightweight render pipeline. however, instead of doing that we simply will not support opengl es 2 at all, as it is only relevant when targeting old mobile devices. we do that by using the #pragma target directive to target shader level 3.5 instead of the default level, which is 2.5.

#pragma target 3.5
			
			#pragma vertex UnlitPassVertex
			#pragma fragment UnlitPassFragment
  • 1
  • 2
  • 3
  • 4

1.7 folder structure

note that all the hlsl include files of the core library are located ShaderLibrary folder. let us do that too, so put Unlit.hlsl in a new ShaderLibrary folder inside MyPipeline. put the shader in a separate shader folder too.

to keep our shader intact while still relying on relative include paths, we will have to change our include statement from Unlit.hlsl to …/ShaderLibrary/Unlit.hlsl.

#include "../ShaderLibrary/Unlit.hlsl"
  • 1

2 dynamic batching

now that we have a minimal custom shader we can use it to further investigate how our pipeline renders things. a big question is how efficient it can render. we will test that by filling the scene with a bunch of spheres that use our unlit material. u could use thousands, but a few dozen also gets the message across. the can have different transformations, but keep their scales uniform, meaning that each scales’s X, Y, and Z components are always equal.

when investigating how the scene is drawn via the frame debugger, u will notice that every sphere requires its own separate draw call. that is not very efficient, as each draw all introduces overhead as the cpu and gpu need to communicate.
ideally, multiple spheres get drawn together with a single call. while that is possible, it currently does not happen. the frame debugger gives us a hint about it when u select one of the draw calls.

2.1 enabling batching
the frame debugger tells us that dynamic batching is not used, because it is either turned off or because depth sorting interferes with it. if u check the player settings, then u will see that indeed the dynamic batching option is disabled. however, enabling it has no effect. that is because the player setting applies to unity’s default pipeline. not our custom one.

to enable dynamic batching for our pipeline, we have to indicate that it is allowed when drawing in MyPipeline.Render. the drawy settings contain a flags field that we have to set to DrawRenderFlags.EnableDynamicBatching.

var drawSettings = new DrawRendererSettings(
			camera, new ShaderPassName("SRPDefaultUnlit")
		);
		drawSettings.flags = DrawRendererFlags.EnableDynamicBatching;
		drawSettings.sorting.flags = SortFlags.CommonOpaque;
  • 1
  • 2
  • 3
  • 4
  • 5

after that change we still do not get dynamic batching, but the reason has changed. dynamic batching means that unity merges objects together in a single mesh before they are drawn. that requires cpu time each frame and to keep that in check it’s limited to small meshes only.

the sphere mesh is too big, but cubes are small and will work. so adjust all objects to use the cube mesh instead. u can select them all and adjust their mesh filter in one go.

2.2 colors

dynamic batching works for small meshes that are drawn with the same material. but when multiple materials are involved things get more complicated. to illustrate this, we will make it possible to change the color of our unlit material. add a color property to its Properties block named _Color, with color as its label, using white as the default.

Properties {
		_Color ("Color", Color) = (1, 1, 1, 1)
	}
  • 1
  • 2
  • 3

now we can adjust the color of our material, but it does not affect what gets drawn yet. add a float4 _Color variable to our include file and return that instead of the fixed value in UnlitPassFragment. the color is defined per materal, so can be put in a constant buffer that only needs to change when materials are switched. we will name the buffer UnityPerMaterial, just like unity does.

CBUFFER_START(UnityPerDraw)
	float4x4 unity_ObjectToWorld;
CBUFFER_END

CBUFFER_START(UnityPerMaterial)
	float4 _Color;
CBUFFER_END

struct VertexInput {
	float4 pos : POSITION;
};

…

float4 UnlitPassFragment (VertexOutput input) : SV_TARGET {
	return _Color;
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17

duplicate our material and set both to use different colors, so we can distinguish them. select a few objects and have them use the new material, so u end up with a mix.

dynamic batching still happens, but we end up with multiple batches. there will be at least one batch per material, because each requires different per-material data. but there will often be more batches because unity prefers to group objects spatially to reduce to overdraw.

2.3 optional batching

dynamic batching can be a benefit, but it an also up not making much of a difference, or even slow things down. if your scene does not contain lots of small meshes that share the same material, it might make sense to disable dynamic batching so unity does not have to figure out whether to use it or not each frame. so we will add an option to enable dynamic batching to our pipeline. we can not rely on the player settings. instead, we add a toggle configuration option to MyPipelineAsset, so we can configure it via our pipeline asset in the editor.

[SerializeField]
	bool dynamicBatching;
  • 1
  • 2

when the MyPipeline instance is created, we have to tell it whether to use dynamic batching or not. we will provide this information as an argument when invoking its constructor.

protected override IRenderPipeline InternalCreatePipeline () {
		return new MyPipeline(dynamicBatching);
	}
  • 1
  • 2
  • 3

to make that work, we can no longer rely on the default constructor of MyPipeline. give it a public constructor method, with a boolean parameter to control dynamic batching. we will setup the drawn flags once in the constructor and keep track of them in a field.

	DrawRendererFlags drawFlags;

	public MyPipeline (bool dynamicBatching) {
		if (dynamicBatching) {
			drawFlags = DrawRendererFlags.EnableDynamicBatching;
		}
	}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

copy the flags to the draw settings in render.

drawSettings.flags = drawFlags;
  • 1

note that when we toggle the dynamic batching option of our asset in the editor, the batching behavior of unity immediately changes. each time we adjust the asset a new pipeline instance gets created.

3 gpu instancing

dynamic batching is not the only way in which we can reduce the number of draw calls per frame. another approach is to use GPU instancing. in the case of instancing, the cpu tells the gpu to draw a specific mesh-material combination more than once via a single draw call. that makes it possible to group objects that use the same mesh and material without having to construct a new mesh. that also removes the limit on the mesh size.
使用同一个网格、同一个material画多次。无需构建多个mesh,解除了mesh的size要求。
用一个实例画一组东西。

3.1 optional instancing

gpu instancing is enabled by default, but we overrode that with our custom draw flags. let us make gpu instancing optional too, which makes it easy to compare the results with and without it. add another toggle to MyPipelineAsset and pass it to the constructor invocation.

	[SerializeField]
	bool instancing;
	
	protected override IRenderPipeline InternalCreatePipeline () {
		return new MyPipeline(dynamicBatching, instancing);
	}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6

in the MyPipeline constructor method, also set the flags for instancing after doing so for dynamic batching. in this case the flags value is DrawRendererFlags.EnableInstancing and we boolean-OR it into the flags, so both dynamic batching and instancing can be enabled at the same time. when they both enabled unity prefers instancing over batching.
当动态批处理和GPU实例化都开启的时候,unity优先选择GPU实例化。

public MyPipeline (bool dynamicBatching, bool instancing) {
		if (dynamicBatching) {
			drawFlags = DrawRendererFlags.EnableDynamicBatching;
		}
		if (instancing) {
			drawFlags |= DrawRendererFlags.EnableInstancing;
		}
	}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/IT小白/article/detail/86215
推荐阅读
相关标签
  

闽ICP备14008679号