How to benchmark Ultralytics YOLO models like YOLO11

Learn how to benchmark Ultralytics YOLO11, compare performance across devices, and explore different export formats to optimize speed, accuracy, and efficiency.

Written by

Abirami Vina

min read

Apr 28, 2025

Example H2

Example H3

With the growing number of AI models available today, selecting the most suitable one for your specific AI application is essential to achieving accurate and reliable results. Each model varies in speed, accuracy, and overall performance. So, how can we determine which model is best suited for a given task? This is especially important for real-time systems such as autonomous vehicles, security solutions, and robotics, where rapid and dependable decision-making is critical.

Benchmarking helps answer this question by evaluating a model under different conditions. It provides insights into how well the model performs across various hardware setups and configurations, enabling more informed decision-making.

For instance, Ultralytics YOLO11 is a computer vision model that supports various visual data analysis tasks like object detection and instance segmentation. To fully understand its capabilities, you can benchmark its performance on different setups to see how it will handle real-world scenarios.

In this article, we’ll explore how to benchmark Ultralytics YOLO models like YOLO11, compare their performance across various hardware, and see how different export formats impact their speed and efficiency. Let’s get started!

What is model benchmarking?

When it comes to using a Vision AI model in a real-world application, how can you tell if it’ll be fast, accurate, and reliable enough? Benchmarking the model can provide insights to answer this. Model benchmarking is the process of testing and comparing different AI models to see which one performs best.

It involves setting a baseline for comparison, choosing the right performance measures (like accuracy or speed), and testing all models under the same conditions. The results help identify each model’s strengths and weaknesses, making it easier to decide which one is best suited for your specific AI solution. In particular, a benchmark dataset is often used to provide fair comparisons and assess how well a model performs in different real-world scenarios.

Fig 1. Why benchmark computer vision models? Image by author.

‍

A clear example of why benchmarking is vital is in real-time applications like surveillance or robotics, where even slight delays can impact decision-making. Benchmarking helps evaluate whether a model can process images quickly while still delivering reliable predictions.

It also plays a key role in identifying performance bottlenecks. If a model runs slowly or uses excessive resources, benchmarking can reveal whether the issue stems from hardware limitations, model configurations, or export formats. These insights are crucial for selecting the most effective setup.

Model benchmarking compared to model evaluation and testing

Model benchmarking, evaluation, and testing are popular AI terms that are used together. While similar, they are not the same and have different functions. Model testing checks how well a single model performs by running it on a test dataset and measuring factors like accuracy and speed. Meanwhile, model evaluation goes a step further by analyzing the results to understand the model’s strengths, weaknesses, and how well it works in real-world situations. Both focus on just one model at a time.

Model benchmarking, however, compares multiple models side by side using the same tests and datasets. It helps find out which model works best for a specific task by highlighting differences in accuracy, speed, and efficiency between them. While testing and evaluation focus on a single model, benchmarking helps pick the right one (or the best one) by comparing different options fairly.

Fig 2. How model benchmarking is different from evaluation and testing. Image by author.

‍

An overview of Ultralytics YOLO11

Ultralytics YOLO11 is a reliable Vision AI model that is designed to perform various computer vision tasks accurately. It improves upon earlier YOLO model versions and is packed with features that can help solve real-world problems. For example, it can be used to detect objects, classify images, segment regions, track movements, and more. It can also be used in applications across many industries, from security to automation and analytics.

Fig 3. An example of using YOLO11 to segment people in an image.

‍

One of the key benefits related to Ultralytics YOLO11 is how easy it is to use. With just a few lines of code, anyone can integrate it into their AI projects without dealing with complicated setups or advanced technical expertise.

It also works smoothly across different hardware, running efficiently on CPUs (Central Processing Units), GPUs (Graphics Processing Units), and other specialized AI accelerators. Whether deployed on edge devices or cloud servers, it delivers strong performance.

YOLO11 is available in various model sizes, each optimized for different tasks. Benchmarking helps determine which version best fits your specific needs. For instance, one key insight benchmarking can reveal is that smaller models, such as nano or small, tend to run faster but may trade off some accuracy.

How to benchmark YOLO models like YOLO11

Now that we’ve understood what benchmarking is and its importance. Let’s walk through how you can benchmark YOLO models like YOLO11 and evaluate their efficiency to gather valuable insights.

To get started, you can install the Ultralytics Python package by running the following command in your terminal or command prompt: “pip install ultralytics”. If you run into any issues during installation, check out our Common Issues Guide for troubleshooting tips.

Once the package is installed, you can easily benchmark YOLO11 with just a few lines of Python code:

from ultralytics.utils.benchmarks import benchmark

# Benchmark on GPU
benchmark(model="yolo11n.pt", data="coco8.yaml", imgsz=640, half=False, device=0)

‍

When you run the code shown above, it calculates how fast the model processes images, how many frames it can handle in one second, and how accurately it detects objects.

The mention of “coco8.yaml” in the code refers to a dataset configuration file based on the COCO8 (Common Objects in Context) dataset - a small, sample version of the full COCO dataset, often used for testing and experimentation.

If you're testing YOLO11 for a specific application, such as traffic monitoring or medical imaging, using a relevant dataset (e.g., a traffic dataset or medical dataset) will give more accurate insights. Benchmarking with COCO provides a general idea of performance, but for best results, you can choose a dataset that reflects your actual use case.

Understanding YOLO11 benchmarking outputs

Once YOLO11 has been benchmarked, the next step is to interpret the results. After running the benchmark, you’ll see various numbers in the results. These metrics help evaluate how well YOLO11 performs in terms of accuracy and speed.

Here are some notable YOLO11 benchmarking metrics to look out for:

mAP50-95: It measures object detection accuracy. A higher value means the model is better at recognizing objects.
‍
accuracy_top5: It is commonly used for classification tasks. It shows how often the correct label appears in the top five predictions.
‍
Inference time: The time taken to process a single image, measured in milliseconds. Lower values mean faster processing.

Fig 4. A graph showing YOLO11’s benchmark performance.

‍

Other factors to consider when benchmarking YOLO11

Looking at the benchmark results alone only tells part of the story. To get a better understanding of performance, it's helpful to compare different settings and hardware options. Here are a few important things to look at:

GPU vs. CPU: GPUs can process images much faster than CPUs. Benchmarking helps you see if a CPU is fast enough for your needs or if you’ll benefit from using a GPU.
‍
Precision settings (FP32, FP16, INT8): These control how the model handles numbers. Lower precision (like FP16 or INT8) makes the model run faster and use less memory, but it might slightly reduce accuracy.
‍
Export formats: Converting the model to a format like TensorRT can make it run much faster on certain hardware. This is useful if you're optimizing for speed on specific devices.

How to benchmark YOLO11 on different hardware

The Ultralytics Python package allows you to convert YOLO11 models into different formats that run more efficiently on specific hardware, improving both speed and memory usage. Each export format is optimized for different devices.

On one hand, the ONNX format can speed up performance across various environments. On the other hand, OpenVINO improves efficiency on Intel hardware, and formats like CoreML or TF SavedModel are ideal for Apple devices and mobile applications.

Let’s take a look at how you can benchmark YOLO11 in a specific format. The code below benchmarks YOLO11 in the ONNX format, which is widely used for running AI models on both CPUs and GPUs.

from ultralytics.utils.benchmarks import benchmark  

# Benchmark a specific export format (e.g., ONNX)  
benchmark(model="yolo11n.pt", data="coco8.yaml", imgsz=640, format="onnx")

‍

Beyond the benchmarking results, choosing the right format depends on your system's specifications and deployment needs. For example, self-driving cars need fast object detection. If you plan on using NVIDIA GPUs to accelerate performance, the TensorRT format is the ideal choice for running YOLO11 on an NVIDIA GPU.

Fig 5. Using YOLO11 for object detection in self-driving cars.

‍

Key takeaways

The Ultralytics Python package makes benchmarking YOLO11 easy by providing simple commands that can handle performance testing for you. With just a few steps, you can see how different setups affect the speed and accuracy of models, helping you make informed choices without needing deep technical expertise.

The right hardware and settings can also make a huge difference. Adjusting parameters like the model size and dataset lets you fine-tune YOLO11 for the best performance, whether you're running it on a high-end GPU or locally on an edge device.

Connect with our community and explore cutting-edge AI projects on our GitHub repository. Learn about the impact of AI in agriculture and the role of computer vision in manufacturing through our solutions pages. Explore our licensing plans and start your AI journey now!

How to benchmark Ultralytics YOLO models like YOLO11

What is model benchmarking?

Model benchmarking compared to model evaluation and testing

An overview of Ultralytics YOLO11

How to benchmark YOLO models like YOLO11

Understanding YOLO11 benchmarking outputs

Other factors to consider when benchmarking YOLO11

How to benchmark YOLO11 on different hardware

Key takeaways

Read more in this category

Let’s build the future
of AI together!

How to benchmark Ultralytics YOLO models like YOLO11

What is model benchmarking?

Model benchmarking compared to model evaluation and testing

An overview of Ultralytics YOLO11

How to benchmark YOLO models like YOLO11

Understanding YOLO11 benchmarking outputs

Other factors to consider when benchmarking YOLO11

How to benchmark YOLO11 on different hardware

Key takeaways

Read more in this category

Let’s build the future of AI together!

Let’s build the future
of AI together!