Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove copy from hsMatrix2SIMD in Metal renderer #1647

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

colincornaby
Copy link
Contributor

hsMatrix2SIMD used to do a transpose of the input hsMatrix - but shaders and code were rewritten to make this unnecessary. hsMatrix2SIMD still does a vestigial memory copy. This commit removes that memory copy.

This changes the meaning of hsMatrix2SIMD - so functions that need to make a copy now do so manually. Otherwise a direct cast is made. hsMatrix2SIMD has also been made constexpr.

This should result in a small (~1%) performance improvement in the Metal renderer main loop.

Copy link
Member

@dpogue dpogue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if it makes sense to return a matrix_float4x4& rather than a pointer, since we always seem to be immediately dereferencing it

@colincornaby colincornaby force-pushed the hsMatrix2SIMD-without-implicit-copy branch from b11d7cf to 14d89ad Compare January 1, 2025 22:51
@colincornaby
Copy link
Contributor Author

I've rebased this branch to include the Linux runner changes.

@dpogue - I'm still considering your question. No real thoughts yet - I think the underlying fMap in the matrix is a pointer type which is why it was a pointer here. Shame the matrix type includes extra data beyond the matrix, but using the fMap as the raw underlying matrix data is a small inconvenience.

@colincornaby colincornaby force-pushed the hsMatrix2SIMD-without-implicit-copy branch from 14d89ad to 618ccc2 Compare January 12, 2025 20:51
@colincornaby
Copy link
Contributor Author

I've edited this commit to use reinterpret_cast and to return a reference instead of a pointer.

hsMatrix2SIMD used to do a transpose of the input hsMatrix - but shaders and code were rewritten to make this unnecessary. hsMatrix2SIMD still does a vestigial memory copy. This commit removes that memory copy.

This changes the meaning of hsMatrix2SIMD - so functions that need to make a copy now do so manually. Otherwise a direct cast is made. hsMatrix2SIMD has also been made constexpr.

This should result in a small (~1%) performance improvement in the Metal renderer main loop.
@colincornaby colincornaby force-pushed the hsMatrix2SIMD-without-implicit-copy branch from 618ccc2 to 0aa4708 Compare January 12, 2025 21:00
matrix_float4x4* hsMatrix2SIMD(const hsMatrix44& src, matrix_float4x4* dst);
inline const matrix_float4x4& hsMatrix2SIMD(const hsMatrix44& src)
{
//reinterperate_cast not allowed in constexpr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//reinterperate_cast not allowed in constexpr
// reinterpret_cast not allowed in constexpr

inline const matrix_float4x4& hsMatrix2SIMD(const hsMatrix44& src)
{
//reinterperate_cast not allowed in constexpr
return *reinterpret_cast<const simd_float4x4 *>(src.fMap);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return *reinterpret_cast<const simd_float4x4 *>(src.fMap);
return *reinterpret_cast<const simd_float4x4*>(src.fMap);

matrix_float4x4 mat;
hsMatrix2SIMD(drawable->GetPaletteMatrix(span.fBaseMatrix + 1), &mat);
fDevice.CurrentRenderCommandEncoder()->setVertexBytes(&mat, sizeof(matrix_float4x4), VertexShaderArgumentBlendMatrix1);
const matrix_float4x4 *mat = &hsMatrix2SIMD(drawable->GetPaletteMatrix(span.fBaseMatrix + 1));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const matrix_float4x4 *mat = &hsMatrix2SIMD(drawable->GetPaletteMatrix(span.fBaseMatrix + 1));
const matrix_float4x4* mat = &hsMatrix2SIMD(drawable->GetPaletteMatrix(span.fBaseMatrix + 1));

@@ -1315,8 +1314,7 @@ void plMetalPipeline::IRenderProjection(const plRenderPrimFunc& render, plLightI
fCurrentRenderPassUniforms->fogColor = {0.f, 0.f, 0.f};
fCurrentRenderPassUniforms->diffuseCol = {1.f, 1.f, 1.f, 1.f};

matrix_float4x4 tXfm;
hsMatrix2SIMD(proj->GetTransform(), &tXfm);
const matrix_float4x4 &tXfm = hsMatrix2SIMD(proj->GetTransform());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const matrix_float4x4 &tXfm = hsMatrix2SIMD(proj->GetTransform());
const matrix_float4x4& tXfm = hsMatrix2SIMD(proj->GetTransform());

if (weights[j]) {
float weight = weights[j];
if (weight) {
const simd_float4x4 simdMatrix = hsMatrix2SIMD(matrixPalette[indices & 0xFF]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const simd_float4x4 simdMatrix = hsMatrix2SIMD(matrixPalette[indices & 0xFF]);
const simd_float4x4& simdMatrix = hsMatrix2SIMD(matrixPalette[indices & 0xFF]);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants