Programming xcore with SMP FreeRTOS

Multi-core architectures offer compelling advantages for performance scaling, energy efficiency and low cost.  They have become a mainstay of embedded system architectures in the past 10 years.  Nonetheless, programming multicore architecture continues to have a reputation for being time consuming, complex and costly. In this blog, we explore the ways in which our newly released extension to support symmetric multiprocessing (SMP) FreeRTOS enables a familiar, fast and easy programming environment for multi-core processors and, in particular, preserves the unique flexibility and real-time capabilities of the xcore architecture. 

Reducing the complexity of developing applications for multicore processors has been a driving force at XMOS since its inception – particularly delivering the significant advantages of parallel execution to the application, while also preserving the hard real-time benefits that the xcore architecture provides for lower-level software functions. The release of the SMP FreeRTOS marks a key milestone in that journey. So how does it work? 

Programming an application for a multiprocessor environment using an SMP FreeRTOS is remarkably like programming for a single processor environment using FreeRTOS. In fact, most applications written for single core FreeRTOS can be run unmodified under SMP FreeRTOS and potentially achieve a boost in performance. However, there are some differences that the programmer must consider in order to fully utilize the multiple cores, as well as to avoid pitfalls. 

The first significant difference is that it is now possible for tasks at all priority levels to run simultaneously on any number of available cores per tile. Once the scheduler is started, FreeRTOS tasks are placed on allocated cores dynamically at runtime, rather than statically at compile time. All the usual FreeRTOS rules for task scheduling are followed, and tasks chosen to run are always those that are highest priority and ready. When there are more tasks of a single priority that are ready to run than the number of cores available, they are scheduled in a round robin fashion. 

This allows for applications to split up work over multiple tasks so that they may run in parallel on multiple cores. This can be done, for example, to implement the stages of a pipeline where all stages run simultaneously. Splitting up work like this might not normally be done in an application running on single core FreeRTOS as it does not necessarily provide any benefit. 

Take for example the following simple program that creates 5 tasks at the same priority: 

#include <stdio.h>

#include "FreeRTOS.h"
#include "task.h"

#define TASK_COUNT 5

static int task_numbers[TASK_COUNT]; 

void demo_task(void *arg)
{
    int task_number = *((int *) arg);
    TickType_t initial_ticks = xTaskGetTickCount();
    unsigned int iterations = 0;

    while ((xTaskGetTickCount() - initial_ticks) < configTICK_RATE_HZ) {
        iterations++;
    }

    printf("Task %d's loop ran %d iterations in %d ticks\n",
           task_number, iterations, configTICK_RATE_HZ);

    vTaskDelete(NULL);
}

int main(void)
{
    for (int i = 0; i < 5; i++) {
        task_numbers[i] = i + 1;

        xTaskCreate(demo_task,
                    "demo_task",
                     configMINIMAL_STACK_SIZE,
                     &task_numbers[i],
                     configMAX_PRIORITIES-1,
                      NULL);
    }

    /* The demo tasks have been created - start the scheduler. */
    printf("Starting Scheduler\n");
    vTaskStartScheduler();

    /* Should not reach here! */
    for (;;);
}

Since the tasks do not block, single core FreeRTOS time slices them in a round robin fashion, resulting in each getting 1/5 of the CPU time. The output when run under single core FreeRTOS is: 

Starting Scheduler
Task 1's loop ran 3994133 iterations in 1000 ticks
Task 2's loop ran 4013686 iterations in 1000 ticks
Task 3's loop ran 4014371 iterations in 1000 ticks
Task 4's loop ran 4014371 iterations in 1000 ticks
Task 5's loop ran 4014371 iterations in 1000 ticks

However, when the exact same program is linked with and run under SMP FreeRTOS, each task is assigned to run on its own core. This means that each task gets 5x the run-time when compared with single core FreeRTOS. The output when run under SMP FreeRTOS is: 

Starting Scheduler
Task 3's loop ran 19842031 iterations in 1000 ticks
Task 5's loop ran 19844158 iterations in 1000 ticks
Task 4's loop ran 19871861 iterations in 1000 ticks
Task 1's loop ran 19862185 iterations in 1000 ticks
Task 2's loop ran 19852108 iterations in 1000 ticks

On the other hand, in a single core environment a task will never be preempted by another task with a lower priority, which can allow for certain assumptions to be made. Since tasks of different priority levels may run simultaneously under SMP FreeRTOS, these types of assumptions are no longer valid. 

The second significant difference is with interrupt service routines (ISRs). In a single core environment, ISRs cannot run simultaneously either with each other or with application tasks. Both situations are of course possible in a multicore environment. Therefore, there needs to be a way to ensure mutual exclusion for access to data structures that are shared both between multiple ISRs, as well as between ISRs and tasks. 

FreeRTOS already provides the macro functions taskENTER_CRITICAL_FROM_ISR() and taskEXIT_CRITICAL_FROM_ISR() for use with ports for architectures that support interrupt nesting. The SMP FreeRTOS port for xcore makes use of these, using an xcore hardware lock under the hood. These are used in ISRs around access to data that is shared with tasks and requires mutual exclusion. The corresponding task version of these macros (taskENTER_CRITICAL() and taskEXIT_CRITICAL()) must be called by any tasks that also access this shared data. The task version disables interrupts on the calling core in addition to obtaining the lock. 

Adding features to support SMP 

Two new features have been added to FreeRTOS to support SMP and xcore. Similar features are also found in other real time operating systems that support SMP. 

The first allows a FreeRTOS task to only be run on certain cores. This is done with a core affinity mask, and supports various scenarios: 

  • Where a task fully utilises one or more logical cores in the xcore architecture and requires deterministic execution.  
  • For periodic timer interrupts – most FreeRTOS applications require a timer interrupt that runs periodically, usually once every 1 or 10 milliseconds. The xcore SMP FreeRTOS port always places this timer interrupt on logical core 0. When execution of this interrupt’s service routine breaks timing assumptions made by tasks that require deterministic execution, and it is not feasible to disable interrupts around their critical sections, then it can make sense to exclude these tasks from core 0. 
  • Another scenario is when there are two or more legacy tasks written with assumptions that only apply in a single core environment and that are invalid when run under SMP FreeRTOS. When it is not possible to modify the code to be compatible with SMP, for example when the functions are part of a third-party library, then it can make sense to lock these tasks down to a single core, ensuring that they do not run simultaneously. 

The two new functions to support this are: 

  • void vTaskCoreAffinitySet( const TaskHandle_t xTask, UBaseType_t uxCoreExclude ) 
    • This function sets the specified task’s core affinity mask. Each bit position represents the corresponding core number, supporting up to 32 cores. After the call, the task will only execute on cores whose corresponding bit in the mask is set to 1. 
  • UBaseType_t vTaskCoreAffinityGet( const TaskHandle_t xTask ) 
    • This function returns the specified task’s current core affinity mask. 

The second new feature allows preemption to be disabled at runtime on a per task basis. Global preemption may still be disabled at compile time with the configuration option configUSE_TASK_PREEMPTION_DISABLE. 

This allows tasks to ensure that they are not preempted by another lower or same priority task. This can be useful for tasks that require deterministic execution but that do not necessarily need to be run at the highest priority level. For example, a task might spend much of the time blocked in a waiting state, but once it is woken up and running must not be interrupted. Disabling interrupts within these tasks may also be required, but by additionally disabling preemption the scheduler will not even attempt to preempt it, ensuring that other tasks continue running as they should. 

The two new functions to support this are: 

  • void vTaskPreemptionDisable( const TaskHandle_t xTask ) 
    • This function disables preemption for the specified task. 
  • void vTaskPreemptionEnable( const TaskHandle_t xTask )
    • This function enables preemption for the specified task. 

Aside from the above additions, the API (Application Programming Interface) between the single core FreeRTOS kernel and the new SMP FreeRTOS kernel is the same. Almost all code that has been written for single core FreeRTOS should compile and work under SMP FreeRTOS. Just be aware of the single core assumptions that are occasionally made and account for them as necessary. 

If you would like to find out more about the SMP FreeRTOS kernel, why not download our latest whitepaper. To access the SMP FreeRTOS, please visit https://github.com/FreeRTOS/FreeRTOS-Kernel/tree/smp

Scroll to Top