Thermal Management for Intel® Xeon® Scalable Processors

Documentation

Maintenance & Performance

000006710

05/10/2023

Click on the topic for details:

Thermal management overview

For Box vs Tray, please visit: What Is the Difference Between Intel® Boxed and Tray Processors?

What is the thermal management solution?

The thermal management solution for Intel® Xeon® Scalable Processors, intended for 4-way or 8-way multiprocessing, is specific to the manufacturer of the motherboard and chassis. All boxed Intel® Xeon® Scalable Processors products are sold as a kit consisting of a configured:

  • Thermal solution
  • Motherboard
  • Chassis
  • Power supply

For thermal management specifications, see the system manufacturer or Intel Xeon Processor datasheet. The processor wind tunnel (PWT) is only intended for use with general purpose server (2U and above)Intel® Xeon® Scalable Processor, not the Intel Xeon Processor MP or the Intel Xeon Processor for 1U Rack Mount Servers. 

Can you give me some thermal management basics?

Systems using Intel® Xeon® Scalable Processors require thermal management. This document assumes a general knowledge of and experience with system operation, integration, and thermal management. Integrators who follow the recommendations presented can provide their customers with more reliable systems and will see fewer customers returning with thermal management issues. (The term Boxed Intel® Xeon® Scalable Processors refers to processors packaged for use by system integrators.)

Thermal management in Intel® Xeon® Scalable Processors-based systems can affect both the performance and noise level of the system. The Intel® Xeon® Scalable Processors uses the Thermal Monitor feature to protect the processor during times where the silicon would otherwise operate above specification. In a properly designed system, the Thermal Monitor feature should never become active. The feature is intended to provide protection for unusual circumstances like higher than normal ambient air temperatures or failure of a system thermal management component (such as a system fan). While the Thermal Monitor feature is active, the system's performance may drop below its normal peak performance level. It is critical that systems be designed to maintain low enough internal ambient temperatures to prevent the Intel® Xeon® Scalable Processors from entering a Thermal Monitor active state. Information on the Thermal Monitor feature can be found in the Intel® Xeon® Scalable Processors Datasheet.

Additionally, the Intel® Xeon® Scalable Processors heat sink uses an active duct solution called the Processor Wind Tunnel (PWT), which includes a high quality fan. This processor fan operates at a constant speed. This duct provides adequate airflow across the processor heat sink as long as the ambient temperature is maintained below the maximum specification.

Allowing processors to operate at temperatures beyond their maximum specified operating temperature may shorten the life of the processor and can cause unreliable operation. Meeting the processor's temperature specification is ultimately the responsibility of the system integrator. When building quality systems using the Intel Xeon Processor, it is imperative to carefully consider the thermal management of the system and verify the system design with thermal testing. This document details specific thermal requirements of the Intel Xeon Processor. System integrators using the Intel Xeon Processor should become familiar with this document.

What is proper thermal management?

Proper thermal management depends on two major elements: a heat sink properly mounted to the processor, and effective airflow through the system chassis. The ultimate goal of thermal management is to keep the processor at or below its maximum operating temperature.

Proper thermal management is achieved when heat is transferred from the processor to the system air, which is then vented out of the system. Boxed Intel® Xeon® Scalable Processors are shipped with a heat sink and the PWT, which can effectively transfer processor heat to the system air. It is the responsibility of the system integrator to ensure adequate system airflow. Tray Intel® Xeon® Scalable Processors are not shipped with a heat sink and the PWT, it is the responsibility of the system integrator to ensure adequate system airflow.

Thermal management operations

How do I install the heat sink? You must securely attach the heat sink (included with the boxed Intel® Xeon® Scalable Processors) to the processor. Thermal interface material (applied during system integration) provides effective heat transfer from the processor to the fan heatsink.

Critical: Using the boxed processor without properly applying the included thermal interface material will void the boxed processor warranty and may cause damage to the processor. Be sure to follow the installation procedures documented in the boxed processor manual and the integration overview.

The fan on the Processor Wind Tunnel is a high-quality ball bearing fan that provides a good local air stream. This air stream transfers heat from the heat sink to the air inside the system. However, moving heat to the system air is only half the task. Sufficient system airflow is also needed in order to exhaust the air. Without a steady stream of air through the system, the fan heat sink will re-circulate warm air, and therefore may not cool the processor adequately.

How do I manage system airflow?

The following are factors which determine system airflow:

  • Chassis Design
  • Chassis Size
  • Location of Chassis Air Intake and Exhaust Vents
  • Power Supply Fan Capacity and Venting
  • Location of the Processor Slot(s)
  • Placement of Add-in Cards and Cables

System integrators must ensure adequate airflow through the system to allow the heat sink to work effectively. Proper attention to airflow when selecting subassemblies and building systems is important for good thermal management and reliable system operation.

Integrators use two basic motherboard-chassis-power supply form factors for servers and workstations: ATX variations and the older Server AT form factor. Due to cooling and voltage considerations, Intel recommends the use of ATX form factor motherboards and chassis for the boxed Intel® Xeon® Scalable Processors.

Server AT form factor motherboards are not recommended because such designs are not standardized for effective thermal management. However, some chassis designed exclusively for Server AT form factor motherboards may yield efficient cooling.

The following is a list of guidelines to be used when integrating a system:

  • Chassis vents must be functional and not excessive in quantity: Integrators should be careful not to select chassis that contain cosmetic vents only. Cosmetic vents are designed to look as if they allow air flow but little or no air flow actually exists. Chassis with excessive air vents should also be avoided. In this case, very little air flows over the processor and other components. In ATX chassis, I/O shields must be present. Otherwise, the I/O opening may provide for excessive venting.
     
  • Vents must be properly located: Systems must have properly located intake and exhaust vents. The best locations for air intakes allow air to enter the chassis and directly flow over the processor. Exhaust vents should be situated so that air flows on a path through the system, over various components, before exiting. The specific location of vents depends upon the chassis. For ATX systems, exhaust vents should be located both in the bottom front and bottom rear of the chassis. Also, for ATX systems, I/O shields must be present to allow the chassis to vent air as designed. Lack of an I/O shield may disrupt proper airflow or circulation within the chassis.
     
  • Power Supply Airflow Direction: It is important to choose a power supply that has a fan that exhausts air in the proper direction. Some power supplies have markings noting airflow direction.
     
  • Power Supply Fan Strength: PC power supplies contain a fan. For some chassis where the processor is running too warm, changing to a power supply with a stronger fan can greatly improve airflow.
     
  • Power Supply Venting: A lot of air flows through the power supply unit, which can be a significant restriction if not well vented. Choose a power supply unit with large vents. Wire finger guards for the power supply fan offer much less airflow resistance than openings stamped into the sheet metal casing of the power supply unit.
     
  • System Fan - Should It Be Used? Some chassis may contain a system fan (in addition to the power supply fan) to facilitate airflow. A system fan is typically used with passive heat sinks. In some situations, a system fan improves system cooling. Thermal testing both with a system fan and without the fan will reveal which configuration is best for a specific chassis.
     
  • System Fan Airflow Direction: When using a system fan, ensure that it draws air in the same direction as the overall system airflow. For example, a system fan in an ATX system should act as an exhaust fan, pulling air from within the system out through the rear or front chassis vents.
     
  • Protect Against Hot Spots: A system may have a strong airflow, but still contain hot spots. Hot spots are areas within the chassis that are significantly warmer than the rest of the chassis air. Improper positioning of the exhaust fan, adapter cards, cables, or chassis brackets and subassemblies blocking the airflow within the system, can create such areas. To avoid hot spots, place exhaust fans as needed, reposition full-length adapter cards or use half-length cards, re-route and tie cables, and ensure space is provided around and over the processor.
How do I perform thermal testing?

Differences in motherboards, power supplies, add-in peripherals and chassis all affect the operating temperature of systems and the processors that run them. Thermal testing is highly recommended when choosing a new supplier for motherboards or chassis, or when starting to use new products. Thermal testing can determine if a specific chassis-power supply-motherboard configuration provides adequate airflow for boxed Intel® Xeon® Scalable Processors. To begin determining the best thermal solution for your Intel® Xeon® Scalable Processors–based systems, contact your motherboard vendor for chassis and fan configuration recommendations.

Thermal sensor and thermal reference byte
The Intel® Xeon® Scalable Processors has unique system management capabilities. One of these is the ability to monitor the processor's core temperature relative to a known maximum setting. The processor's Thermal Sensor outputs the current processor temperature and can be addressed via the System Management Bus (SMBus). A thermal byte (8-bits) of information can be read from the Thermal Sensor at any time. The thermal byte granularity is 1°C. The reading from the thermal sensor is then compared to the Thermal Reference Byte.

The Thermal Reference Byte is also available through the Processor Information ROM on the SMBus. This 8-bit number is recorded when the processor is manufactured. The Thermal Reference Byte contains a preprogrammed value that corresponds to the thermal sensor reading when the processor is stressed to its maximum thermal specification. Therefore, if the thermal byte reading from the Thermal Sensor ever exceeds the Thermal Reference Byte, the processor is running hotter than the specification allows.

Stressing each of the processors in a fully configured system, reading the thermal sensor of each processor, and comparing it to the thermal reference byte of each processor to determine if it is running within thermal specifications can do thermal testing. Software that can read information off the SMBus is needed to read both the Thermal Sensor and Thermal Reference Byte.

Thermal test procedure
The procedure for thermal testing is as follows:

Note If you are testing a system with a variable speed system fan, you must run the test at the maximum operating room temperature you have specified for the system.
  1. To ensure maximum power consumption during the test, you must disable the system's automatic power down modes or green features. These features are controlled either within the system BIOS or by operating system drivers.
     
  2. Set up a method to record the room temperature, either with an accurate thermometer or thermocouple and thermal meter combination.
     
  3. Power up the workstation or server. If the system has been assembled properly, and the processor is properly installed and seated, the system boots into the intended operating system (OS).
     
  4. Invoke the thermally stressful application.
     
  5. Allow the program to run for 40 minutes. This allows the entire system to heat up and stabilize. Record the Thermal Sensor reading for each processor once every 5 minutes for the next 20 minutes. Record the room temperature at the end of the 1-hour period.
After recording the room temperature, power the system down. Remove the chassis cover. Allow the system to cool at least 15 minutes.
 

Using the highest of the four measurements taken from the thermal sensor, follow the procedure in the following section to verify the systems thermal management.

Calculation to verify a system's thermal management solution
This section explains how to determine whether a system can operate at the maximum operating temperature while keeping the processor within its maximum operating range. The result of this process shows whether the system airflow needs to be improved or the system's maximum operating temperature needs to be revised in order to produce a more reliable system.

The first step is to select a maximum operating room temperature for the system. A common value for systems where air conditioning is not available is 40°C. This temperature exceeds the maximum recommended external temperature for Intel® Xeon® Scalable Processors–based platforms, but it can be used if the chassis used doesn't exceed the 45°C fan inlet temperature specification. A common value for systems where air conditioning is available is 35°C. Choose a value that is right for your customer. Write this value on line A below.

Write the room temperature recorded after testing on line B below. Subtract line B from line A and write the result on line C. This difference compensates for the fact that the test was likely conducted in a room that is cooler than the system's maximum operating temperature.

A. _________ (Maximum operating temperature, typically 35° C or 40° C)

B. - _______ Room temperature ° C at end of test

C. _________

Write the highest temperature recorded from the thermal meter on line D below. Copy the number from line C to line E below. Add line D and line E and write the sum on line F. This number represents the highest thermal sensor reading for the processor core when the system is used at its specified maximum operating room temperature running a similarly thermally stressful application. This value must remain below the Thermal Reference Byte value. Write the Thermal Reference Byte reading on line G.

D. _________ Maximum reading from thermal sensor

E. + _______ Max. operating temperature adjustment from line C above

F. _________ Max. thermal sensor reading in a worst case room ambient

G. _________ Thermal Reference Byte reading

Processors should not be run at temperatures higher than their maximum specified operating temperature or failures may occur. Boxed processors will remain within thermal specification if the Thermal Sensor reading is less than the Thermal Reference Byte at all times.

If line F reveals that processor core exceeded its maximum temperature, then action is required. Either the system airflow must be significantly improved, or the system's maximum operating room temperature must be lowered.

If the number on line F is less than or equal to Thermal Reference Byte, the system will keep the boxed processor within specification under similar thermally stressful conditions, even if the system is operated in its warmest environment.

To summarize:
If the value on line F is greater than the Thermal Reference Byte, there are two options:

  1. Improve system airflow to bring the processor's fan inlet temperature down (follow the recommendations made earlier). Then retest the system.
     
  2. Choose a lower maximum operating room temperature for the system. Bear in mind the customer and the system's typical environment.
After implementing either option, you must recalculate the thermal calculation to verify the solution.

 

Testing hints
Use the following hints to reduce the need for unnecessary thermal testing:

  1. When testing a system that supports more than one processor speed, test using the processor(s) that generates the most power. Processors that dissipate the most power will generate the most heat. By testing the warmest processor supported by the motherboard you can avoid additional testing with processors that generate less heat with the same motherboard and chassis configuration.

    Power dissipation varies with processor speed and silicon stepping. To ensure selection of the appropriate processor for your system thermal testing, refer to Table 1 for power dissipation numbers for boxed Intel® Xeon® Scalable Processors. Boxed Intel® Xeon® Scalable Processors are marked with a 5-digit test specification number, usually beginning with the letter S.
  2. Thermal checkout with a new motherboard is not necessary if all of the following conditions are met:
    • The new motherboard is used with a previously tested chassis that worked with a similar motherboard
    • The previous test showed the configuration to provide adequate airflow
    • The processor is located in approximately the same place on both motherboards
    • A processor with the same or lower power dissipation will be used on the new motherboard
  3. Most systems are upgraded (additional RAM, adapter cards, drives, etc.) sometime during their life. Integrators should test systems with some expansion cards installed in order to simulate a system that has been upgraded. A thermal management solution that works well in a system that is heavily loaded does not need to be re-tested for lightly loaded configurations.

Thermal management specifications

What are the Intel® Xeon® Scalable Processors thermal specifications?

The Intel® Xeon® Scalable Processors datasheet (also listed in Table 1) lists the power dissipation of Intel® Xeon® Scalable Processors at various operating frequencies. For Intel® Xeon® Scalable Processors, the highest frequency processor available will dissipate more power than lower frequencies. When building systems that will feature many operating frequencies, testing should be performed using the highest frequency processor supported, because it dissipates the most power. System integrators can perform thermal testing using thermocouples to determine the temperature of the processor's integrated heat spreader (see the Intel® Xeon® Scalable Processors Datasheet, for details).

Note Because the PWT can be configured in a vacuum mode or a pressure mode, the duct inlet temperature should be taken from the inlet into the PWT, which may not be on the same side as the fan.

A simple evaluation of the temperature of the air entering the fan heat sink can provide confidence in the system's thermal management. For Intel® Xeon® Scalable Processors, the testing point is at the center of the fan hub, approximately 0.3 inches in front of the fan. Evaluation of test data makes it possible to determine if a system has sufficient thermal management for the boxed processor. Systems should have a maximum expected temperature of 45°C in the maximum expected external ambient conditions (which is typically 35°C).

Table 1: Intel® Xeon® Scalable Processors Thermal Specifications 1,3

Processor Core Frequency (GHz) Maximum Case Temperature (°C) Maximum Recommended Fan Inlet Temperature (°C) Processor Thermal Design Power (W)
1.40 69 45 56.0
1.50 70 45 59.2
1.70 73 45 65.8
1.802 69 45 55.8
2 78 45 77.2
22 70 45 58
2.202 (B0 step) 72 45 61
2.202 (C1 step) 75 45 61
2.402 (B0 step) 71 45 65
2.402 (C1 step) 74 45 65
2.402,4(M0 step) 72 45 77
2.602 74 45 71
2.662 (C1 step) 74 45 71
2.662 (M0 step) 72 45 77
2.802 (C1 step) 75 45 74
2.802,4 (M0 step) 72 45 77
32 73 45 85
3.062 (C1 step) 73 45 85
3.062 (MO step) 70 45 87
3.22,4 (M0 step) 71 45 92
 
Notes
  1. These specifications are from the Intel® Xeon® Scalable Processors Datasheet.
  2. This processor is a die shrink to 0.13 micron process technology.
  3. 400MHz Front Side Bus and 533MHz Front Side Bus processors have identical thermal characteristics.
  4. These processors include ones with 1-MB and 2-MB (3.2 GHz processor only) iL3 cache.
What are the chassis recommendations?

System integrators must use an ATX chassis that has been specifically designed to support the Intel® Xeon® Scalable Processors. Chassis specifically designed to support the Intel® Xeon® Scalable Processors will ship with proper mechanical and electrical support for the processor in addition to having improved thermal performance. Intel has tested chassis for use with Intel® Xeon® Scalable Processors using enabled third-party boards. The chassis that pass this thermal testing provide system integrators with a starting place for determining which chassis to evaluate. 

 

Note For demos on the LGA3647 Socket, review: