Monday, 23 June 2014

Understanding Context Switches

Processes contain "threads" that are doing the work; they are scheduled and run on the system CPU (at one CPU not on all available CPU's). A process can have multiple threads but only one thread can run on a CPU at a time. The amount of time a thread runs is called "quatum" and when the time is over the system "switches" to the next thread in line (This is the normal case for a switch) – a "context switch" happened.

If the performance counter shows high context switches, it means that threads have less time to do their work and the system performance might go down. At that time the Citrix Resource Manager or any other monitor will raise an alert to inform the Administrator that something is wrong.

Two other definition of context switches:


Microsoft


"The average rate per second at which the processor switches context among threads. A high rate can indicate that many threads are contending for processor time."

Windows Internals

"When Windows selects a new thread to run, it performs a context switch to it. A context switch is the procedure of saving the volatile state associated with a running thread, loading another thread’s volatile state, and starting the new thread’s execution."


Cause of high Context Switches

Common issue I have encountered is a too small page file or where the page file could dynamically grow (start- and end size not set to the same value). Also an option is the write cache of a (RAID) controller that you might want to change using Microsoft's dskcache utility (or the vendor tool). High activity rates can also result from inefficient hardware or poorly designed applications.

Troubleshooting Context Switches

As always there are different ways to troubleshoot such problem but the main target is to find the process(es) that are generating high context switches. Keep in mind that you might need better hardware.

Now how can I find the amount context switches on the system? The answer is the Microsoft performance counter (perfmon.msc) under system/context switches or thread/context switches.
Looking at the performance monitor for context switches based on threads is hard to figure out what process(es) is causing the high rate.

A better utility is sysinternals process explorer. By default process explorer doesn't show context switches and needs to be set in view | select columns | Process Performance | Activate context switches and context switch delta.
Image  Image
You should see both columns in the main view of process explorer. The context switches row shows the total number of switches since the system boot time. Sort the row and look for a)high values and b) fast growing values both are good indicator for high switch rates of the process(es).
Image
Next check the CSwitch Delta row for a high value, since the value shows the context switches made per process explorer refresh interval (if the "update speed" is set to one second, then you have Context Switches / Sec). Once you have found the process(es) you should find out why the process is generating those context switches.
Image
Values for "bad" Context Switches / sec

The default context switches "red alert" value for Citrix Resource Manager is 14.000 but is for a single CPU. The value is per CPU and if the system has two CPU’s you should change the value to 28.000 or 42.000 for three CPU's and 56.000 for a quad CPU system. Still these values are just some basic suggestions and for a "good" value you have to monitor your system over time.

No comments:

Post a Comment