Webinar

MLOps Live Webinar: Building Agent Co-pilots for Proactive Call Centers - Tuesday, April 22nd at 12pm ET

What is Arithmetic Intensity?

What is Arithmetic Intensity?

Arithmetic intensity describes the ratio of computational operations (like additions, multiplications, etc.) to memory operations (like loads and stores) that a program or algorithm performs. Arithmetic intensity provides insight into how well an algorithm can make use of computational power versus how often it has to wait on memory operations. This can be used for optimizing performance for parallel computing and high-performance applications. 

  • A high arithmetic intensity means the program is doing a lot of calculations with relatively less memory transfer.
  • Low arithmetic intensity means that the program is spending more time transferring data between memory and the CPU/GPU, potentially turning into a bottleneck and resulting in slow performance.

GPU Architecture and the Role of Arithmetic Intensity

One of the key factors determining the performance of a GPU is its arithmetic intensity. 

GPUs are designed to process large-scale parallel tasks with a high degree of computational complexity. Modern GPUs have thousands of smaller compute cores, making them ideal for tasks that can be broken down into many independent computations. Therefore, their performance depends on the ability to keep these cores busy.

High arithmetic intensity ensures that these cores are not waiting around for data from memory but are continuously performing computations, improving the overall throughput and efficiency of the GPU.

Memory access is often a bottleneck, because data transfers between the GPU’s memory and its cores are relatively slow compared to the speed of the cores themselves. If an application has low arithmetic intensity, it will require frequent memory accesses, which can lead to a bottleneck. A high arithmetic intensity minimizes this problem because the amount of data transfer is reduced relative to the amount of work done per data element.

Key Factors Impacting Arithmetic Intensity

Arithmetic intensity is a key factor in optimizing performance, particularly in GPU-accelerated applications. Here are the key factors impacting arithmetic intensity:

Algorithm Structure – The design of the algorithm determines how much computation is required relative to data movement.

  • Algorithms with more intensive mathematical operations (e.g., matrix multiplication, Fourier transforms) typically have higher arithmetic intensity.
  • Algorithms that require a lot of data fetching and moving (e.g., data-intensive tasks like sorting or large-scale I/O operations) may have lower arithmetic intensity.

Memory Access Patterns – The way memory is accessed influences arithmetic intensity.

  • Efficient memory access patterns that minimize data fetching overhead (e.g., by making use of caches or coalescing memory accesses in GPUs) increase arithmetic intensity.
  • Disorganized memory access patterns or excessive random memory access can reduce the ratio of computation to memory transfer, lowering arithmetic intensity.

Data Locality – Data locality refers to the use of data that is close to the processor. Temporal locality (accessing the same data multiple times in a short period) and spatial locality (accessing data that is physically close together in memory) help minimize memory latency, allowing for more computations to occur relative to memory transfers.

  • High locality tends to increase arithmetic intensity, while poor locality can significantly reduce it.

Processor Architecture – The characteristics of the processor or GPU, such as the number of computational units, cache hierarchy and memory bandwidth, impact arithmetic intensity and the arithmetic intensity computer architecture.

  • Processors with high computational power and sufficient memory bandwidth are more capable of sustaining high arithmetic intensity.
  • Bottlenecks due to limited computational units or memory bandwidth can cause the arithmetic intensity to drop, as computations have to wait for data to be fetched.

Data Size and Problem Scale – The size of the data being processed influences arithmetic intensity.

  • For large datasets, more computation is often needed to process or analyze the data, which can increase arithmetic intensity.

Compiler Optimizations – The ability of compilers to optimize code for memory access patterns, loop unrolling, vectorization and parallelism can affect arithmetic intensity.

  • Compiler optimizations that increase parallel execution or minimize unnecessary data movement can lead to higher arithmetic intensity, making better use of available computational resources.

Parallelism – The level of parallelism in a system (e.g., multi-core processors, GPUs, etc.) influences arithmetic intensity.

  • In highly parallel systems, many operations can be carried out simultaneously, potentially increasing the arithmetic intensity.
  • If the system is not designed to handle large data efficiently, it may not be able to keep all computation units busy, reducing arithmetic intensity.

Data Transfer Overhead – The time spent on moving data (and not calculations) impacts arithmetic intensity.

  • If the data transfer overhead (such as communication between CPU and GPU or between nodes in a distributed system) is high, it reduces arithmetic intensity.

Importance of Measuring Arithmetic Intensity in GPU Workloads

Here’s why measuring arithmetic intensity is important:

Identifying the Root Cause of Bottlenecks 

  • A workload with low arithmetic intensity (i.e., more memory accesses than computations) tends to be memory-bound, meaning that memory bandwidth becomes the limiting factor in performance.
  • High arithmetic intensity indicates that the workload is compute-bound, meaning the processor is the limiting factor.

Identifying Inefficient Memory Access Patterns: Low arithmetic intensity can indicate inefficient memory access patterns. For example, workloads that access memory too frequently or in inefficient ways (like non-contiguous accesses) will incur high memory latency, which leads to slower performance.

Maximizing Throughput: Measuring arithmetic intensity helps developers adjust workloads to better align with the capabilities of the hardware and execute faster..

Balancing Compute and Memory Resources: Adjusting arithmetic intensity helps ensure that both compute and memory resources are fully utilized, enhancing overall cost-effectiveness.

Guiding Algorithm Design: Measuring the arithmetic intensity of different algorithms can help guide the choice of algorithms that are better suited for GPU processing or lead to changes in algorithm structure to improve their AI.

Calculating Arithmetic Intensity for a Specific Model

Arithmetic intensity helps determine which GPU you need for your LLM. The formula for arithmetic intensity calculation is:

\[
AI = \frac{\text{FLOPs}}{\text{Bytes Transferred to/from Memory}}
\]

For arithmetic intensity of matrix multiplication for the operation:

C=A×B

A is an m×k matrix, B is a k×n matrix, C is an m×n matrix,

AI= Number of floating-point operations (FLOPs)​/ Number of memory accesses (bytes)

 

Read more about how to use the arithmetic intensity formula and how arithmetic intensity relates to choosing your LLM in this blog.

Optimizing Arithmetic Intensity

Arithmetic intensity can help you with your GPU workloads. Here’s how to optimize:

    • Increase Computation per Data Load – Combine multiple loops operating on the same data to reduce memory accesses. If an operation is lightweight (e.g., simple arithmetic), recomputing values can be cheaper than fetching them from memory.
    • Optimize Data Access Patterns – Access memory in a contiguous manner to take advantage of cache lines. Reuse recently loaded data as much as possible.
    • Use Blocking and Tiling Techniques – Process small chunks of data that fit in cache before moving to the next block, reducing memory traffic.
    • Optimize Data Layout and Structures with structure of arrays (when processing elements individually) and array of structures (when elements are frequently accessed together). Plus, use padded or aligned memory.
    • Leverage Vectorization and SIMD Instructions – Use Single Instruction Multiple Data (SIMD) to process multiple values simultaneously.
    • Reduce Redundant Memory Transfers – Preload data into cache before it is needed. Avoid unnecessary memory writes or loads by restructuring code logic.
    • Leverage High-Bandwidth Memory & Compute Hierarchies – Use GPU shared memory or register-level optimization for high-performance kernels. Optimize CPU-GPU memory transfers using pinned memory and unified memory models (e.g., CUDA’s Unified Memory)

Discover how arithmetic intensity can help you choose and optimize your LLMs.

 

Let's discuss your gen AI use case

Meet the unique tech stack field-tested on global enterprise leaders, and discuss your use case with our AI experts.