Speed regression very large since 1.9.1 on AMD

Something not working in GZDoom that's not a bug? Is the display a bit quirky and unexpected? Post here.

Moderator: Graf Zahl

Locked
grahf
Posts: 5
Joined: Thu Mar 23, 2017 23:23

Speed regression very large since 1.9.1 on AMD

Post by grahf » Thu Mar 23, 2017 23:37

So lately on large maps I've been having some big performance problems where I dip to 30fps or below. It's unusual as I don't remember this happening very often in the past. So today I did some tests of different gzDoom versions to see where the problems began.

My system: i5 3570k + R9 290 + 16gb RAM + Windows 7 64-bit

I tested all 32-bit versions since I read they can be slightly faster somewhere... The test is loading Water Spirit, entering the first level on UV and checking FPS at the starting position.

2.4.0 = 43fps
2.3.0 = 44fps
2.2.0 = 47fps
1.9.1 = 58fps

I tested with very basic settings, 1080p resolution, fake contrast off (this seems to eat a LOT of FPS, surprisingly), speed rendering, no fancy stuff like bloom/SSAO/Tonemap, no anisotropic filtering or multisampling. Also I tested without brightmaps or lights.pk3 in the folder.

I tested with Vsync off. I did notice some strange behaviour, as I turned backwards from the starting position on 2.2.0-2.4.0 I hit 200fps - on 1.9.1 I hit 1300fps. I tested some more and it seems like there is some sort of 200fps cap in 2.2.0+?

Finally I read something about gl_renderbuffers console command and/or -glversion 2 when starting gzdoom but none of that made any difference on 2.4.0.

I realize AMD has some major deficiencies with gzdoom but it's disappointing that each version keeps getting slower, even on a fairly powerful system. Any ideas?

User avatar
Graf Zahl
GZDoom Developer
GZDoom Developer
Posts: 7148
Joined: Wed Jul 20, 2005 9:48
Location: Germany
Contact:

Re: Speed regression very large since 1.9.1 on AMD

Post by Graf Zahl » Fri Mar 24, 2017 0:04

1.9 uses completely different render code than the 2.x versions. On my Geforce the 2.0 versions are consistently faster, though.
About the 200 fps, yes there is indeed a speed cap, this part is perfectly normal.

My guess is that the time is lost somewhere in the postprocessing pipeline, this is not completely inactive, even if gl_renderbuffers is off.
I need some more detailed info where the time is being lost, so please do the following:

First, get GZDoom 2.1.1, which is the 2.x companion version to 1.9.1 from here: https://gzdoom.drdteam.org/archive/bin/
Second, get both 32 and 64 bit builds for at least one of those versions. I have seen systems where this makes a difference.

Now bind a key to the 'bench' console command (type 'bind b bench' in the console, for example to use the 'b' key)
Start each version with the level and press the 'b' key.
This will write a benchmarks.txt file with some timing measurements of the renderer.
When you have done this for all versions, post the results here.
BTW, I get 55 fps on this map with a Geforce 550Ti with postprocessing on and 62 fps with postprocessing off, in 1920x1080 on a Core i7-3770, 3.4 GHz.

What I do not understand is how the fake contrast can have such an effect, it's just a different light level that's being set, nothing more. Also, what light mode are you using?

grahf
Posts: 5
Joined: Thu Mar 23, 2017 23:23

Re: Speed regression very large since 1.9.1 on AMD

Post by grahf » Fri Mar 24, 2017 0:37

Thanks for the fast reply. This will be a long post because of the large files bench spits out, sorry...

2.1.1 64-bit
Map map01: "Aquarius",
x = -288.0000, y = -2312.0000, z = -199.0000, angle = 45.0000, pitch = 0.0000
Walls: 8742 (0 splits, 17 t-splits, 17432 vertices)
Flats: 975 (11299 primitives, 57742 vertices)
Sprites: 357, Decals=0, Portals: 1
W: Render=5.095, Setup=1.815, Clip=2.318
F: Render=3.732, Setup=0.080
S: Render=1.260, Setup=0.539
All=20.835, Render=10.914, Setup=8.549, BSP = 1.002, Portal=0.101, Drawcalls=7.603, Finish=1.238
DLight - Walls: 0 processed, 0 rendered - Flats: 0 processed, 0 rendered
Missing textures: 508 upper, 507 lower, 3.038 ms
46 fps
2.1.1 32-bit
Map map01: "Aquarius",
x = -287.9158, y = -2311.9158, z = -199.0000, angle = 45.0000, pitch = 0.0000
Walls: 8746 (0 splits, 17 t-splits, 17436 vertices)
Flats: 975 (11299 primitives, 57742 vertices)
Sprites: 357, Decals=0, Portals: 1
W: Render=5.506, Setup=1.902, Clip=2.072
F: Render=3.633, Setup=0.090
S: Render=1.330, Setup=0.385
All=20.921, Render=11.386, Setup=8.079, BSP = 1.137, Portal=0.148, Drawcalls=7.799, Finish=1.318
DLight - Walls: 0 processed, 0 rendered - Flats: 0 processed, 0 rendered
Missing textures: 508 upper, 507 lower, 2.719 ms
46 fps
So I don't see any big difference between 32 and 64 bit...

2.4.0 32-bit
Map map01: "Aquarius",
x = -288.0000, y = -2312.0000, z = -199.0000, angle = 45.0000, pitch = 0.0000
Walls: 8788 (0 splits, 17 t-splits, 17536 vertices)
Flats: 975 (11299 primitives, 57742 vertices)
Sprites: 362, Decals=0, Portals: 1
W: Render=6.391, Setup=2.321, Clip=2.391
F: Render=3.537, Setup=0.098
S: Render=1.342, Setup=0.509
All=23.647, Render=12.233, Setup=9.187, BSP = 1.127, Portal=0.135, Drawcalls=8.321, Finish=1.844
DLight - Walls: 0 processed, 0 rendered - Flats: 0 processed, 0 rendered
Missing textures: 508 upper, 507 lower, 2.986 ms
41 fps
2.3.0 32-bit
Map map01: "Aquarius",
x = -288.0000, y = -2312.0000, z = -199.0000, angle = 45.0000, pitch = 0.0000
Walls: 8742 (0 splits, 17 t-splits, 17432 vertices)
Flats: 975 (11299 primitives, 57742 vertices)
Sprites: 362, Decals=0, Portals: 1
W: Render=4.894, Setup=2.340, Clip=2.408
F: Render=4.004, Setup=0.118
S: Render=1.271, Setup=0.501
All=23.324, Render=11.131, Setup=9.639, BSP = 1.105, Portal=0.131, Drawcalls=7.687, Finish=2.286
DLight - Walls: 0 processed, 0 rendered - Flats: 0 processed, 0 rendered
Missing textures: 508 upper, 507 lower, 3.404 ms
43 fps
2.2.0 32-bit
Map map01: "Aquarius",
x = -288.0000, y = -2312.0000, z = -199.0000, angle = 45.0000, pitch = 0.5164
Walls: 8745 (0 splits, 17 t-splits, 17440 vertices)
Flats: 975 (11299 primitives, 57742 vertices)
Sprites: 357, Decals=0, Portals: 1
W: Render=3.417, Setup=2.170, Clip=2.421
F: Render=3.358, Setup=0.090
S: Render=1.123, Setup=0.444
All=22.437, Render=8.815, Setup=8.922, BSP = 1.106, Portal=0.149, Drawcalls=6.034, Finish=4.460
DLight - Walls: 0 processed, 0 rendered - Flats: 0 processed, 0 rendered
Missing textures: 508 upper, 507 lower, 2.930 ms
43 fps
1.9.1 32-bit
Map map01: "Aquarius",
x = -288.0000, y = -2312.0000, z = -199.0000, angle = 45.0000, pitch = 0.0000
Walls: 8742 (0 splits, 17 t-splits, 19140 vertices)
Flats: 975 (11299 primitives, 57742 vertices)
Sprites: 357, Decals=0, Portals: 1
W: Render=1.528, Split = 0.000, Setup=1.824, Clip=1.977
F: Render=3.101, Setup=0.095
S: Render=1.575, Setup=0.396
All=16.725, Render=7.940, Setup=7.789, BSP = 1.111, Portal=0.047, Finish=0.867
DLight - Walls: 0 processed, 0 rendered - Flats: 0 processed, 0 rendered
Missing textures: 508 upper, 507 lower, 2.609 ms
57 fps
Fake contrast has had a very large effect for me, although it does not seem like this map is a good example. I'll try to find another good example of that and post bench results later tonight. By 'light mode' I'm not sure what you mean. Everything under Opengl > lighting options is default. However, I am not loading brightmaps.pk3 or lights.pk3 so I think that means there are no dynamic lights?

grahf
Posts: 5
Joined: Thu Mar 23, 2017 23:23

Re: Speed regression very large since 1.9.1 on AMD

Post by grahf » Fri Mar 24, 2017 0:52

Ok I just quickly did the Fake Contrast benchmark on 2.4.0 32-bit. This on Map23 of Ancient Aliens. Not at the starting position, but I could post a savegame...

I have been very surprised, every time I couldn't quite maintain 60fps in gzdoom, I will turn off Fake Contrast and almost certainly I'll get 60fps.

Fake Contrast Off
Map map23: "Trinary Temple",
x = 1503.4712, y = -1762.8206, z = 297.0000, angle = -177.0776, pitch = -2.6422
Walls: 8169 (0 splits, 362 t-splits, 16676 vertices)
Flats: 671 (6426 primitives, 34467 vertices)
Sprites: 411, Decals=22, Portals: 1
W: Render=5.954, Setup=1.942, Clip=1.999
F: Render=1.952, Setup=0.100
S: Render=1.107, Setup=0.451
All=17.043, Render=10.126, Setup=5.494, BSP = 0.951, Portal=0.064, Drawcalls=6.742, Finish=1.133
DLight - Walls: 0 processed, 0 rendered - Flats: 0 processed, 0 rendered
Missing textures: 9 upper, 0 lower, 0.050 ms
51 fps
Fake Contrast Smooth
Map map23: "Trinary Temple",
x = 1503.4712, y = -1762.8206, z = 297.0000, angle = -177.0776, pitch = -2.5983
Walls: 8169 (0 splits, 362 t-splits, 16676 vertices)
Flats: 671 (6426 primitives, 34467 vertices)
Sprites: 408, Decals=22, Portals: 1
W: Render=6.844, Setup=2.217, Clip=2.022
F: Render=2.320, Setup=0.103
S: Render=1.257, Setup=0.462
All=19.036, Render=11.625, Setup=5.779, BSP = 0.932, Portal=0.064, Drawcalls=7.671, Finish=1.281
DLight - Walls: 0 processed, 0 rendered - Flats: 0 processed, 0 rendered
Missing textures: 9 upper, 0 lower, 0.043 ms
48 fps

User avatar
Graf Zahl
GZDoom Developer
GZDoom Developer
Posts: 7148
Joined: Wed Jul 20, 2005 9:48
Location: Germany
Contact:

Re: Speed regression very large since 1.9.1 on AMD

Post by Graf Zahl » Fri Mar 24, 2017 1:23

The 1.9.1 results are somewhat puzzling as they totally contradict every wisdom about how to do 3D graphics. It's fairly obvious that with your driver the supposedly obsolete immediate mode is quite a bit faster than the modern buffer-based approach. Unfortunately that's also something that cannot be reverted because the engine has moved too far past the point of no return - and even then it's be a bad idea because on Macs the old version imposes some unacceptable restrictions.

The development between 2.1 and 2.4 is due to some added complexity from the postprocessing pipeline and a few added features that can add a bit of processing overhead which in cases like this map can cause a small frame rate drop.

Just to be sure that there isn't a hidden problem, can you also post the startup log you get when starting with '-logfile log.txt'?

Guest

Re: Speed regression very large since 1.9.1 on AMD

Post by Guest » Fri Mar 24, 2017 3:47

So this is something you haven't heard reported before about much higher performance on AMD with 1.9.1, I take it? Just as an aside, has anyone ever contacted AMD about the poor performance in gzdoom? I know they have made some changes for emulator authors etc., so it's not just AAA stuff that gets attention.

Anyway here is the logfile. BTW I needed to do +logfile log.txt to make it work.
Log started: Thu Mar 23 20:42:17 2017

M_LoadDefaults: Load system defaults.
Using program directory for storage
W_Init: Init WADfiles.
adding E:/games/gzdoom/gzdoom.pk3, 699 lumps
adding ./doom2.wad, 2919 lumps
adding watrsp.wad, 1768 lumps
Unknown command "#"
Unknown command "#"
Unknown command "#"
I_Init: Setting up machine state.
CPU speed: 3403 MHz
CPU Vendor ID: GenuineIntel
Name: Intel(R) Core(TM) i5-3570K CPU @ 3.40GHz
Family 6, Model 58, Stepping 9
Features: MMX SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2
I_InitSound: Initializing FMOD
FMOD Sound System, copyright © Firelight Technologies Pty, Ltd., 1994-2009.
Loaded FMOD version 4.44.61
V_Init: allocate screen.
S_Init: Setting up sound.
ST_Init: Init startup screen.
Checking cmd-line parameters...
S_InitData: Load sound definitions.
G_ParseMapInfo: Load map definitions.
Texman.Init: Init texture manager.
ParseTeamInfo: Load team definitions.
LoadActors: Load actor definitions.
script parsing took 90.55 ms
R_Init: Init Doom refresh subsystem.
DecalLibrary: Load decals.
M_Init: Init menus.
P_Init: Init Playloop state.
ParseSBarInfo: Loading default status bar definition.
ParseSBarInfo: Loading custom status bar definition.
D_CheckNetGame: Checking network game status.
player 1 of 1 (1 nodes)
I_InitInput
I_StartupMouse
I_StartupKeyboard
I_StartupXInput
I_StartupRawPS2
I_StartupDirectInputJoystick
GL_VENDOR: ATI Technologies Inc.
GL_RENDERER: AMD Radeon R9 200 Series
GL_VERSION: 4.5.13469 Core Profile Context 21.19.519.2 (Core profile)
GL_SHADING_LANGUAGE_VERSION: 4.50
GL_EXTENSIONS: GL_AMDX_debug_output GL_AMD_blend_minmax_factor GL_AMD_conservative_depth GL_AMD_debug_output GL_AMD_depth_clamp_separate GL_AMD_draw_buffers_blend GL_AMD_framebuffer_sample_positions GL_AMD_gcn_shader GL_AMD_gpu_shader_int64 GL_AMD_interleaved_elements GL_AMD_multi_draw_indirect GL_AMD_name_gen_delete GL_AMD_occlusion_query_event GL_AMD_performance_monitor GL_AMD_pinned_memory GL_AMD_query_buffer_object GL_AMD_sample_positions GL_AMD_seamless_cubemap_per_texture GL_AMD_shader_atomic_counter_ops GL_AMD_shader_stencil_export GL_AMD_shader_stencil_value_export GL_AMD_shader_trace GL_AMD_shader_trinary_minmax GL_AMD_sparse_texture GL_AMD_sparse_texture_pool GL_AMD_stencil_operation_extended GL_AMD_texture_cube_map_array GL_AMD_texture_texture4 GL_AMD_transform_feedback3_lines_triangles GL_AMD_transform_feedback4 GL_AMD_vertex_shader_layer GL_AMD_vertex_shader_viewport_index GL_ARB_ES2_compatibility GL_ARB_ES3_1_compatibility GL_ARB_ES3_compatibility GL_ARB_arrays_of_arrays GL_ARB_base_instance GL_ARB_bindless_texture GL_ARB_blend_func_extended GL_ARB_buffer_storage GL_ARB_clear_buffer_object GL_ARB_clear_texture GL_ARB_clip_control GL_ARB_color_buffer_float GL_ARB_compressed_texture_pixel_storage GL_ARB_compute_shader GL_ARB_conditional_render_inverted GL_ARB_conservative_depth GL_ARB_copy_buffer GL_ARB_copy_image GL_ARB_cull_distance GL_ARB_debug_output GL_ARB_depth_buffer_float GL_ARB_depth_clamp GL_ARB_depth_texture GL_ARB_derivative_control GL_ARB_direct_state_access GL_ARB_draw_buffers GL_ARB_draw_buffers_blend GL_ARB_draw_elements_base_vertex GL_ARB_draw_indirect GL_ARB_draw_instanced GL_ARB_enhanced_layouts GL_ARB_explicit_attrib_location GL_ARB_explicit_uniform_location GL_ARB_fragment_coord_conventions GL_ARB_fragment_layer_viewport GL_ARB_fragment_program GL_ARB_fragment_program_shadow GL_ARB_fragment_shader GL_ARB_framebuffer_no_attachments GL_ARB_framebuffer_object GL_ARB_framebuffer_sRGB GL_ARB_geometry_shader4 GL_ARB_get_program_binary GL_ARB_get_texture_sub_image GL_ARB_gl_spirv GL_ARB_gpu_shader5 GL_ARB_gpu_shader_fp64 GL_ARB_half_float_pixel GL_ARB_half_float_vertex GL_ARB_imaging GL_ARB_indirect_parameters GL_ARB_instanced_arrays GL_ARB_internalformat_query GL_ARB_internalformat_query2 GL_ARB_invalidate_subdata GL_ARB_map_buffer_alignment GL_ARB_map_buffer_range GL_ARB_multi_bind GL_ARB_multi_draw_indirect GL_ARB_multisample GL_ARB_multitexture GL_ARB_occlusion_query GL_ARB_occlusion_query2 GL_ARB_pipeline_statistics_query GL_ARB_pixel_buffer_object GL_ARB_point_parameters GL_ARB_point_sprite GL_ARB_program_interface_query GL_ARB_provoking_vertex GL_ARB_query_buffer_object GL_ARB_robust_buffer_access_behavior GL_ARB_sample_shading GL_ARB_sampler_objects GL_ARB_seamless_cube_map GL_ARB_seamless_cubemap_per_texture GL_ARB_separate_shader_objects GL_ARB_shader_atomic_counters GL_ARB_shader_ballot GL_ARB_shader_bit_encoding GL_ARB_shader_draw_parameters GL_ARB_shader_group_vote GL_ARB_shader_image_load_store GL_ARB_shader_image_size GL_ARB_shader_objects GL_ARB_shader_precision GL_ARB_shader_stencil_export GL_ARB_shader_storage_buffer_object GL_ARB_shader_subroutine GL_ARB_shader_texture_image_samples GL_ARB_shader_texture_lod GL_ARB_shading_language_100 GL_ARB_shading_language_420pack GL_ARB_shading_language_packing GL_ARB_shadow GL_ARB_shadow_ambient GL_ARB_sparse_buffer GL_ARB_sparse_texture GL_ARB_stencil_texturing GL_ARB_sync GL_ARB_tessellation_shader GL_ARB_texture_barrier GL_ARB_texture_border_clamp GL_ARB_texture_buffer_object GL_ARB_texture_buffer_object_rgb32 GL_ARB_texture_buffer_range GL_ARB_texture_compression GL_ARB_texture_compression_bptc GL_ARB_texture_compression_rgtc GL_ARB_texture_cube_map GL_ARB_texture_cube_map_array GL_ARB_texture_env_add GL_ARB_texture_env_combine GL_ARB_texture_env_crossbar GL_ARB_texture_env_dot3 GL_ARB_texture_float GL_ARB_texture_gather GL_ARB_texture_mirror_clamp_to_edge GL_ARB_texture_mirrored_repeat GL_ARB_texture_multisample GL_ARB_texture_non_power_of_two GL_ARB_texture_query_levels GL_ARB_texture_query_lod GL_ARB_texture_rectangle GL_ARB_texture_rg GL_ARB_texture_rgb10_a2ui GL_ARB_texture_snorm GL_ARB_texture_stencil8 GL_ARB_texture_storage GL_ARB_texture_storage_multisample GL_ARB_texture_swizzle GL_ARB_texture_view GL_ARB_timer_query GL_ARB_transform_feedback2 GL_ARB_transform_feedback3 GL_ARB_transform_feedback_instanced GL_ARB_transform_feedback_overflow_query GL_ARB_transpose_matrix GL_ARB_uniform_buffer_object GL_ARB_vertex_array_bgra GL_ARB_vertex_array_object GL_ARB_vertex_attrib_64bit GL_ARB_vertex_attrib_binding GL_ARB_vertex_buffer_object GL_ARB_vertex_program GL_ARB_vertex_shader GL_ARB_vertex_type_10f_11f_11f_rev GL_ARB_vertex_type_2_10_10_10_rev GL_ARB_viewport_array GL_ARB_window_pos GL_ATI_draw_buffers GL_ATI_envmap_bumpmap GL_ATI_fragment_shader GL_ATI_separate_stencil GL_ATI_texture_compression_3dc GL_ATI_texture_env_combine3 GL_ATI_texture_float GL_ATI_texture_mirror_once GL_EXT_abgr GL_EXT_bgra GL_EXT_bindable_uniform GL_EXT_blend_color GL_EXT_blend_equation_separate GL_EXT_blend_func_separate GL_EXT_blend_minmax GL_EXT_blend_subtract GL_EXT_compiled_vertex_array GL_EXT_copy_buffer GL_EXT_copy_texture GL_EXT_depth_bounds_test GL_EXT_direct_state_access GL_EXT_draw_buffers2 GL_EXT_draw_instanced GL_EXT_draw_range_elements GL_EXT_fog_coord GL_EXT_framebuffer_blit GL_EXT_framebuffer_multisample GL_EXT_framebuffer_object GL_EXT_framebuffer_sRGB GL_EXT_geometry_shader4 GL_EXT_gpu_program_parameters GL_EXT_gpu_shader4 GL_EXT_histogram GL_EXT_multi_draw_arrays GL_EXT_packed_depth_stencil GL_EXT_packed_float GL_EXT_packed_pixels GL_EXT_pixel_buffer_object GL_EXT_point_parameters GL_EXT_polygon_offset_clamp GL_EXT_provoking_vertex GL_EXT_rescale_normal GL_EXT_secondary_color GL_EXT_separate_specular_color GL_EXT_shader_image_load_store GL_EXT_shader_integer_mix GL_EXT_shadow_funcs GL_EXT_stencil_wrap GL_EXT_subtexture GL_EXT_texgen_reflection GL_EXT_texture3D GL_EXT_texture_array GL_EXT_texture_buffer_object GL_EXT_texture_compression_bptc GL_EXT_texture_compression_latc GL_EXT_texture_compression_rgtc GL_EXT_texture_compression_s3tc GL_EXT_texture_cube_map GL_EXT_texture_edge_clamp GL_EXT_texture_env_add GL_EXT_texture_env_combine GL_EXT_texture_env_dot3 GL_EXT_texture_filter_anisotropic GL_EXT_texture_integer GL_EXT_texture_lod GL_EXT_texture_lod_bias GL_EXT_texture_mirror_clamp GL_EXT_texture_object GL_EXT_texture_rectangle GL_EXT_texture_sRGB GL_EXT_texture_sRGB_decode GL_EXT_texture_shared_exponent GL_EXT_texture_snorm GL_EXT_texture_storage GL_EXT_texture_swizzle GL_EXT_timer_query GL_EXT_transform_feedback GL_EXT_vertex_array GL_EXT_vertex_array_bgra GL_EXT_vertex_attrib_64bit GL_IBM_texture_mirrored_repeat GL_INTEL_fragment_shader_ordering GL_KHR_context_flush_control GL_KHR_debug GL_KHR_robust_buffer_access_behavior GL_KHR_robustness GL_KTX_buffer_region GL_NV_blend_square GL_NV_conditional_render GL_NV_copy_depth_to_color GL_NV_copy_image GL_NV_depth_buffer_float GL_NV_explicit_multisample GL_NV_float_buffer GL_NV_half_float GL_NV_primitive_restart GL_NV_texgen_reflection GL_NV_texture_barrier GL_OES_EGL_image GL_SGIS_generate_mipmap GL_SGIS_texture_edge_clamp GL_SGIS_texture_lod GL_SUN_multi_draw_arrays GL_WIN_swap_hint WGL_EXT_swap_control
Max. texture size: 16384
Max. texture units: 32
Max. varying: 128
Max. combined shader storage blocks: 64
Max. vertex shader storage blocks: 64
Resolution: 1920 x 1080

<------------------------------->

map01 - Aquarius

grahf
Posts: 5
Joined: Thu Mar 23, 2017 23:23

Re: Speed regression very large since 1.9.1 on AMD

Post by grahf » Fri Mar 24, 2017 3:51

That last post was by me. Sorry, no idea how I got logged out and posted as Guest.

User avatar
Blue Shadow
Global Moderator
Global Moderator
Posts: 304
Joined: Sun Aug 29, 2010 6:09

Re: Speed regression very large since 1.9.1 on AMD

Post by Blue Shadow » Fri Mar 24, 2017 4:04

No problem.

User avatar
Graf Zahl
GZDoom Developer
GZDoom Developer
Posts: 7148
Joined: Wed Jul 20, 2005 9:48
Location: Germany
Contact:

Re: Speed regression very large since 1.9.1 on AMD

Post by Graf Zahl » Fri Mar 24, 2017 10:18

Guest wrote:
Fri Mar 24, 2017 3:47
I know they have made some changes for emulator authors etc., so it's not just AAA stuff that gets attention.
That could be the reason why the classic method saw some improvement, because that is what emulators mostly use.
GZDoom 2.x seems to fall into some hole here, using modern rendering methods, but with small batches and apparently that particular use case has not been optimized at all.



On NVidia the change in performance was quite significant between 1.9 and 2.1 with 2.1 performing 10-20% better on some maps.

grahf
Posts: 5
Joined: Thu Mar 23, 2017 23:23

Re: Speed regression very large since 1.9.1 on AMD

Post by grahf » Fri Mar 24, 2017 20:17

Any idea how is best to contact AMD about this type of thing? I mean, even though gzdoom is fairly niche, it's a pretty bad look for AMD to have a 550ti outperform a R9 290. I'll write them an email. Maybe we can get some other AMD gzdoom users to do the same.

Secondly, any thoughts on the fake contrast benchmarks?

User avatar
Graf Zahl
GZDoom Developer
GZDoom Developer
Posts: 7148
Joined: Wed Jul 20, 2005 9:48
Location: Germany
Contact:

Re: Speed regression very large since 1.9.1 on AMD

Post by Graf Zahl » Fri Mar 24, 2017 20:48

The fake contrast thing looks like the driver is somewhat sensitive to changing a vertex attribute outside the buffer. If I had such a card I could do some tests, but doing this remotely would be a bit too time consuming.

Concerning performance vs. my Geforce 550Ti this has been a constant with ATI/AMD ever since I made comparisons.
If you look at your output, the 'drawcalls' value is around 7ms. On the same map, NVidia's driver prints 0.3 ms! That's where all the time is lost. The same had already been the case on a Geforce 8600 vs. an ATI card of the same vintage.

The issue here is, that both cards are totally underserved, they never can run at full load, it's all in-game processing and driver overhead.

Locked

Return to “Technical Support”