Page 3 of 3
Posted: Thu Oct 13, 2016 9:47
Well, the performance of the 4col sprite drawers only really matter if a sprite is really close up to the camera - otherwise it doesn't cover enough of the screen to make a difference. The wall and span drawers are doing most of the rendering in the majority of cases.
For more complex maps, i.e. kdizd, the performance problems are often due to the BSP part of the renderer not feeding the drawers with stuff to draw fast enough. GZDoom has that problem too where it is often CPU bound.
Posted: Thu Oct 13, 2016 14:10
I've a feeling you're gonna hate me soon.
r_columnmethod == false == 1col method. >_>
Posted: Thu Oct 13, 2016 14:20
Lol at your lineup of former humans.
I must admit I'm pretty surprised to see the 1col drawer not only match, but even clearly beat the optimized 4col version.
Bit hard to argue against evidence like that, although I'll still insist that, theoretically, 8 bytes or 16 bytes reads/writes should be faster. Cache-lines are 16 bytes on modern Intel CPUs, which 4col matches perfectly. Maybe the setup costs, and rendering to a temp buffer first, is what makes the 4col version lose out.
Posted: Thu Oct 13, 2016 14:24
I won't argue against that. And if I had to guess, you're probably right about the setup part. Still - this is also keeping in mind that regardless of the drawer used you're still threading them out, the results themselves may well be different on an AMD CPU due to lack of hyperthreading which Intel has, and different ways the cache and memory calls are used.
EDIT: Just tried on an AMD CPU, the results are 61 FPS with 4col, 65 FPS with 1col, doing a similar thing just spawning zombiemen until they somewhat cover the screen.
Posted: Thu Oct 13, 2016 14:33
I don't know enough about CPU architecture myself to say for sure. It might also be because I told LLVM to partially unroll the inner drawer loop, giving the 1col version the ability to grab several rows in one go and do the math with SSE registers. I suck at this, so I'm betting LLVM is better at it than I am.
Posted: Thu Oct 13, 2016 14:36
The multiple 1col draws are probably it - because I just tried it in regular ZDoom (palette mode) and the frame rate stayed exactly the same, even with zombiemen covering the screen.
Posted: Sun Oct 16, 2016 23:09
I think I'm going to default vid_used3d to "true" (for Windows, only, if this is implemented in Mac/Linux) until the live OpenGL switcher actually gets implemented. OpenGL is causing problems on enough graphics cards that I think having the Direct3D backend as default seems saner.
Posted: Sun Oct 16, 2016 23:49
I agree. The Direct3D target is a lot more reliable on old low end hardware.