Reports from OSPM 2025, day one

发布于:2025-05-21 ⋅ 阅读:(11) ⋅ 点赞:(0)

The seventh edition of the Power Management and Scheduling in the Linux Kernel (known as "OSPM") Summit took place on March 18-20, 2025. It was organized by Juri Lelli, Frauke Jäger, Tommaso Cucinotta, and Lorenzo Pieralisi, and was hosted by Linutronix at Alte Fabrik, Uhldingen-Mühlhofen, Germany. The event was sponsored by Linutronix, Arm, and the Scuola Superiore Sant'Anna in Pisa.

第七届 Linux 内核电源管理与调度(简称 "OSPM")峰会于 2025 年 3 月 18 日至 20 日举行,由 Juri Lelli、Frauke Jäger、Tommaso Cucinotta 和 Lorenzo Pieralisi 组织,Linutronix 在德国 Uhldingen-Mühlhofen 的 Alte Fabrik 提供场地支持。本次活动由 Linutronix、Arm 以及比萨圣安娜高等研究院赞助。

The following contains summaries of the sessions; each summary is written by the session presenter. A recording of the entire summit is available as a playlist on the RetisLab YouTube channel. Photos of the event can be found on this Google Photos page or as a 965MB zip archive. The full set of slides from the sessions is also available.

以下内容为各场会议的摘要,由演讲者本人撰写。整个峰会的视频可在 RetisLab 的 YouTube 频道中以播放列表形式观看。活动照片可通过此 Google Photos 页面查看,也可下载一个 965MB 的压缩包。会议使用的所有幻灯片亦已全部发布。

Scheduling for core asymmetry and shared resources
Speaker: Morten Rasmussen

不对称核心与共享资源的调度
演讲者:Morten Rasmussen

The Linux scheduler currently represents CPU throughput, by capacity and efficiency, through an energy model (EM), which was introduced with energy-aware scheduling (EAS). The EAS EM has been around for years and is quite simple relative to the complexity of modern systems. The EM, therefore, often fails to capture platform-specific constraints, leading to suboptimal performance and efficiency. The aim of the talk was to discuss potential ways forward to close the gap between the scheduler and power-management frameworks in the mainline kernel and the increasingly complex solutions developed by vendors out of tree.

目前,Linux 调度器通过一个能源模型(EM)来表达 CPU 的吞吐能力与效率,该模型是在引入能量感知调度(EAS)时提出的。EAS 的 EM 已存在多年,相对于现代系统的复杂性,该模型显得过于简化。因此,EM 经常无法反映特定平台的约束条件,导致性能与能效表现不佳。本次演讲旨在探讨如何缩小主线内核中的调度器与电源管理框架之间的差距,以及如何与厂商在主线外实现的日益复杂的解决方案接轨。

Previous studies have shown that power consumption and efficiency vary with the workload, which is not surprising, but the variation is significant and might be worth optimizing for. In mobile systems running Android, this and other aspects of optimization have been addressed by vendors with kernel modifications loaded in modules that attach functions to Android-specific vendor hooks. The number of vendor hooks in Android has grown to a level where most key scheduler mechanisms can be overridden by vendor modules.

以往研究表明,功耗与能效会随工作负载变化而变化,这一现象虽不令人意外,但其波动范围相当可观,因此值得优化。在运行 Android 的移动系统中,厂商通过内核模块进行修改,这些模块通过 Android 特有的 vendor hook 附加功能,从而实现对包括该问题在内的多项优化。如今 Android 中的 vendor hook 数量已经多到可以覆盖绝大多数关键调度机制,使得厂商模块几乎可以完全接管调度行为。

A look at a common dynamic voltage and frequency scaling (DVFS) implementation for mobile systems and expected upcoming implementations of DVFS with shared compute resources illustrates how the current CPU-frequency and scheduler abstractions fall short. In the first example, a subset of CPUs shares a clock frequency with the interconnect and L3 cache, which means that L3 cache DVFS scaling impacts CPU efficiency and performance directly. In the second example, DVFS scaling is further complicated by having a shared compute unit used transparently by all CPUs in the system. Modeling the clock-frequency dependencies of both examples and taking these into account for task placement is nontrivial. Often, simple platform-specific heuristics can help, while coming up with a generic, future-proof EM and EAS policy is virtually impossible.

通过观察一种常见的面向移动系统的动态电压频率调整(DVFS)实现,以及对即将出现的、涉及共享计算资源的 DVFS 实现的预期,可以看出当前 CPU 频率与调度器抽象存在严重不足。在第一个例子中,一部分 CPU 与互连和 L3 缓存共享同一个时钟频率,这意味着 L3 缓存的 DVFS 缩放会直接影响到 CPU 的效率与性能。在第二个例子中,DVFS 缩放更加复杂,因为所有 CPU 共同使用一个透明的共享计算单元。对这两种情况中时钟频率依赖关系的建模,并将其纳入任务放置决策,是一项复杂任务。虽然在具体平台上使用简单的启发式规则往往有帮助,但要制定一个通用的、面向未来的 EM 和 EAS 策略几乎不可能实现。

Two potential paths forward to improve mainline Linux were discussed:

演讲中讨论了两种可能的路径来改进主线 Linux:

  • Improve EAS and EM to model the platform in more detail. This was considered mostly unrealistic.

  • 改进 EAS 和 EM,使其能更详细地建模具体平台。但这种做法被认为大多不现实。

  • Work on expanding the architecture-specific callbacks from the scheduler to give architectures more influence on task-placement decisions. The scheduler already has a few callbacks to prioritize CPUs for certain systems. This could be extended to provide platform-specific preferred CPUs.

  • 扩展调度器中与体系结构相关的回调机制,使体系结构在任务放置决策中拥有更多影响力。当前调度器已经包含了一些用于特定系统的 CPU 优先级回调机制,可以进一步扩展它,以支持平台特定的首选 CPU。

In the discussion, the gap between the mainline, Android, and sched_ext communities was acknowledged. Adding architecture-specific callbacks seemed the most promising approach, although concrete proposals need to be shared for discussion.

在讨论环节中,与会者认可了主线内核、Android 以及 sched_ext 社区之间的差距。增加体系结构相关的回调机制被认为是最具前景的方向,尽管目前仍需提出具体方案以供进一步讨论。

CAS/EAS on Intel client platforms
在英特尔客户端平台上实现 CAS/EAS

Speaker: Rafael J. Wysocki
演讲者:Rafael J. Wysocki

Wysocki took the stage to discuss advances in implementing energy-aware scheduling on Intel hybrid chips. He started with a recap of two OSPM-summit 2024 presentations, one from himself and one from Ricardo Neri, that focused on the problems this effort was facing. Namely, the implementation of full-scale invariance on Intel hybrid processors was problematic, but has been overcome since last year's summit. There were also problems with creating energy models for those chips. The latter is still an issue but, fortunately, it can be addressed by using artificial energy models containing abstract cost values rather than any remotely realistic power numbers. He has been working on this problem recently.
Wysocki 登台介绍了在英特尔混合芯片上实现能耗感知调度(EAS)的最新进展。他首先回顾了 2024 年 OSPM 峰会上的两场演讲,一场由他本人进行,另一场由 Ricardo Neri 主讲,内容都集中在当时所面临的挑战。具体来说,在英特尔混合处理器上实现完全的 scale invariance(规模不变性)曾是一大难题,但自去年峰会以来这一问题已经被克服。而构建这些芯片的能耗模型仍存在困难,不过幸运的是,这可以通过使用包含抽象成本值的人造能耗模型来规避,而无需依赖于现实功耗数据。他最近一直在致力于这一问题。

Next, Wysocki described scale invariance on the processors in question. There are two parts of it, capacity invariance and frequency invariance, each of which is implemented with the help of a function returning values between zero and SCHED_CAPACITY_SCALE, inclusive. One of them returns the CPU capacity relative to the capacity of the most performant CPU in the system and the other one returns the CPU frequency relative to the one at which the full capacity is reached.
接下来,Wysocki 讲解了这些处理器上的 scale invariance 的实现方式。它由两个部分组成:容量不变性(capacity invariance)和频率不变性(frequency invariance),这两者分别通过返回从 0 到 SCHED_CAPACITY_SCALE(包含端点)的函数来实现。其中一个函数返回当前 CPU 的容量,相对于系统中性能最强 CPU 的容量;另一个函数返回当前 CPU 的频率,相对于其达到完整容量时所需的频率。

Generally, on x86 processors, the frequency-invariance part is derived from the values of the APERF and MPERF performance counters; this has been the case for several years. MPERF counts at a certain reference frequency, while APERF counts at the current frequency of the CPU. Snapshot them at two points in time, compute the deltas, divide the APERF delta by the MPERF delta, and you get the average CPU frequency relative to the MPERF counting frequency. However, this frequency needs to be relative to the frequency at which the full CPU capacity is reached.
通常,在 x86 处理器上,频率不变性是基于 APERF 和 MPERF 性能计数器的数值来实现的;这种方法已经使用多年。MPERF 按参考频率计数,而 APERF 按 CPU 当前频率计数。在两个时刻对其进行快照,计算它们的差值,然后用 APERF 的差值除以 MPERF 的差值,即可得出 CPU 的平均频率,相对于 MPERF 的参考频率。不过,最终需要的频率是相对于 CPU 达到完整容量时的频率。

On hybrid Intel processors, the HWP (hardware P-state) interface is used and the frequency in question is proportional to the value of the HIGHEST_PERF field in the HWP capabilities (HWP_CAP) register, but that value is not in frequency units. It is an HWP performance level; the frequency corresponding to it can be obtained with the help of the


网站公告

今日签到

点亮在社区的每一天
去签到