"Christmas - the time to fix the computers of your loved ones" « Lord Wyrm

The HPET bug: What it is and what it isn't

mat 26.04.2018 - 11:30 208166 59 Thread rating
Posts

mat

Administrator
Legends never die
Avatar
Registered: Aug 2003
Location: nö
Posts: 25420
I had a look at changes of the QPC API function when 1809 was released and compared it to previous versions. There is only a small difference and that's a nicely solved integer multiplication to unify the timestamp to 10 MHz for TSC (as QPC mode). My guess is that this implementation is actually faster by a few cycles, but I did not measure it.

These are parts of the code of QueryPerformanceCounter() for RS4 and prior on the left and RS5+ on the right. They don't show everything, but I apparently chose to screenshot only those lines of code, can't remember why. :p


You can also see here that both just rely on a call to RDTSCP, although there are multiple codepaths available for fencing on other platforms. This was on Intel Coffee Lake.

I also played around with their new way to figure out if there is any loss of precision due to the new QPC code. These are two measurements of 1 second each showing different calculation methods and their precision. Of course the integer multiplication is the fastest, although it's only precise to ~100 ns:


The additional latency shouldn't have anything to do with the QPC function itself. That said there is much more going on behind the scenes that could intervene with DPC latency in general. I think more details are needed to have a look at this. Screenshots from multiple TimerBench runs, DPC Latency checker and the full specs of the tested systems.

Edit: Sorry for the overshare, I wanted to post this stuff a while ago but never got around to it.

mat

Administrator
Legends never die
Avatar
Registered: Aug 2003
Location: nö
Posts: 25420
TimerBench 1.4

timerbench-256_240633.png

  • Detection of iTSC QPC mode with 10 MHz (since Windows 10 RS5)
  • Fixed crashes when loading result files
  • Implemented multi select for result file dialog to load more than one result at once
  • You can start more than one measure process dialog now inside one TimerBench application
  • Measure process now additionally takes a process id to target a specific application for measurements
  • New icon! Let's go professional.
  • Updated all libraries to their latest version: HWiNFO 6.12, PresentMon 1.5.2, EasyHook 2.7.7097
  • All executable and DLL files are now signed with my SHA2 signature
  • Compiled with Visual Studio 2017 (staticly linked MFC und VC)

Download: TimerBench 1.4 (172 MB, Self-extracting Exe, CRC32: 39a1a4c7)
Prerequisites: DirectX 11

LordGurciullo

Bloody Newbie
Registered: Jan 2020
Location: Los Angeles
Posts: 2
You're clearly a genius.

I've optimized my system to the max but I wanted to try the HPET off thing.

Is it going to help? I have a 9900k at 5ghz 4133 ram and a 2080 super overclocked.

What are your thoughts?

LordGurciullo

Bloody Newbie
Registered: Jan 2020
Location: Los Angeles
Posts: 2
I've tried every combination of everything and I haven't gotten any solid results in latencymon for reducing latency and no fps gains... Any ideas?

Marctraider

Bloody Newbie
Registered: Feb 2020
Location: Nowhere
Posts: 6
Hi. Great program but i think there is a bug with it.

I suspect the program also seems to count frametimes that happen when the 3D window is just activated and moving around the screen to place itself, causing wrong frametime output in the end result varying from 25 to 60+ms!

The whole time my frametimes are well below <1ms and steady 1000+fps.

So result are inaccurate anyway.

mat

Administrator
Legends never die
Avatar
Registered: Aug 2003
Location: nö
Posts: 25420
You can check the frame times yourself by importing PresentMon.csv in the Results directory right after the test. You will find all the raw values there.

Important are the columns "MsBetweenPresents" and "Dropped". The final frame times values are calculated from all frame times that are not dropped PLUS the first 5% of the test are ignored due to loading anomalies in Unreal. Meaning your suspicion won't check out.

This is for example a GTX 1080 Ti in windowed FullHD (just the first 16 seconds of the test):

click to enlarge

mat

Administrator
Legends never die
Avatar
Registered: Aug 2003
Location: nö
Posts: 25420
TimerBench 1.5

TimerBench - Intel Core i9-10980XE - Comparison of TSC and HPET

  • Added awesome graphs to visualize the results of the Game Test. Available data: Frametimes, GPU Load, GPU Frequency, GPU Memory Frequency, CPU Load and CPU Max Frequency (the fastest core)
  • Added the 99th Percentile average for frametimes to the result dialog. This shows the value that's better than 99% of all frametimes captured during the Game Test.
  • Changed warmup period of Game Test to 3 seconds (instead of 5% of the collected data). To show all ignored values that are not part of the final result, the warmup period is shown as a red line in the graphs.
  • Improved detection of the primary graphics cards
  • Improved detection of SLI/Crossfire configurations. To keep things simple, the Game Test's sensor data will only capture the first card.
  • Latest HWiNFO version 6.23 integrated
  • Implemented with C++17, compiled with Visual Studio 2019 (staticly linked MFC und VC)

Download: TimerBench 1.5 (173 MB, Self-extracting Exe, CRC32: 34EC2A27)
Prerequisites: DirectX 11

Marctraider

Bloody Newbie
Registered: Feb 2020
Location: Nowhere
Posts: 6
Awesome nice job! Btw I'm now consistently at 4/5ms frametimes!

But I like this graph idea, tried importing them csv into CapFrameX but only shows a few seconds each time.

Marctraider

Bloody Newbie
Registered: Feb 2020
Location: Nowhere
Posts: 6
tw. With this version I'm getting constant 'Could not read frametime data'.

[16:06:08] Demo window found!
[16:06:08] Injecting QPC hook DLL: C:\Users\Administrator\Desktop\Tools\TimerBench 1.5\QPCHook32.dll
[16:06:08] Starting PresentMon with session: TimerBench2128
[16:06:08] Starting PresentMon trace session
[16:06:41] Stopping PresentMon trace session
[16:06:42] PresentMon successfully stopped
[16:06:42] Parsing result file: C:\Users\Administrator\Desktop\Tools\TimerBench 1.5\Results\PresentMon.csv
[16:06:42] File not found or no access: C:\Users\Administrator\Desktop\Tools\TimerBench 1.5\Results\PresentMon.csv

Marctraider

Bloody Newbie
Registered: Feb 2020
Location: Nowhere
Posts: 6
Nvm. Works after a reboot. Suspect that CapframeX is conflicting with it as they both use PresentMon logger

mat

Administrator
Legends never die
Avatar
Registered: Aug 2003
Location: nö
Posts: 25420
Yeah, if the output file is opened with a tool, it is locked for further access. Copy it first to another location if you want to do some further analysis.

Lowprofile18

Bloody Newbie
Registered: Mar 2020
Location: USA
Posts: 3
https://i.imgur.com/dCdI4KI.png


Ran 2 test on each in windowed and fullscreen. A bit confused on the results, which is better?

Also does adjusting dynamictick in bcdedit have an effect with these tests?

mat

Administrator
Legends never die
Avatar
Registered: Aug 2003
Location: nö
Posts: 25420
Please upload the screenshot to this forum. I can't read a single number on my mobile.

Lowprofile18

Bloody Newbie
Registered: Mar 2020
Location: USA
Posts: 3
test_242868.png


Sorry about that, here it is

mat

Administrator
Legends never die
Avatar
Registered: Aug 2003
Location: nö
Posts: 25420
All results are pretty good. Enabling HPET does not hinder your performance at all, so if you want, you can leave it enabled.

If there would be a problem, it would look like this:


Check the frame times between the HPET and the TSC timer results, especially the 99th Percentile, which is a good measurement to see if your framerate was stable or had frequent stuttering (it means 99% of all frames you saw on your screen were better than x milliseconds):

Yours:

HPET: 5.16-5.57 ms
TSC: 5.36-5.55 ms

Flawed i9-10980XE:

HPET: 8.80 ms
TSC: 1.78 ms

That's a frametime about 5 times worse than without HPET enabled.

Dynamic Tick is something that can result in higher latencies because your system "ticks" only when it's needed, which can be slightly later than when it ticks at a fixed frequency (normally 64 times a second, sometimes even 1000 times a second). A tick is a wakeup for your OS to do certain things, like do some thread scheduling, increment some timers like your taskbar clock and so on.

You can disable it, but it might lead to higher power consumption, because your system has to do more work during idle.
On Windows 8 this feature was still buggy and led to a skewed taskbar clock. So it made sense to disable it back then. Today there shouldn't be any visible or even measureable advantage of a fixed tick frequency.
Kontakt | Unser Forum | Über overclockers.at | Impressum | Datenschutz