Using software-defined memory#

This section shows how to use software-defined memory in xcore.ai target systems.

Software-defined memory is a region of memory with a base address of 0x40000000 and a size of 0x40000000 bytes. The content of this address range is provided by software and therefore software-defined.

When a program performs a read access in this range, the required content is looked up in a level 1 cache. If it is not present a software fill handler (running in another independent logical core) is triggered. This handler must provide an entire cache line of data to satisfy the read and place it in the cache.

When a program performs a write access in this range, the data is written to the cache memory.

When a line is written to it is flagged as “dirty”. This indicates that the copy of the data in the cache is more recent than that in the software-defined backing store. When this dirty line has to be evicted from the cache to make way for another line, an “eviction” software handler is triggered.

A typical use of this feature is to provide a cache of data stored in flash where application accesses have properties of spatial and temporal locality.

Level 1 cache#

A level 1 cache is present on each tile. This cache is the same cache that is used for LPDDR accesses. It is not recommended to use both LPDDR and software-defined memory together on a tile.

A level 1 cache is situated between the xCORE tile and the LPDDR memory. This is a unified I and D cache, fully-associative, with write-back. It has 8 lines and the line size is 32 bytes. The replacement policy is pseudo-LRU (Least Recently Used).

xCORE instructions are provided to prefetch, invalidate and flush this cache.

Application implementation#

An application may manage the software-defined memory directly, by providing fill and evict software handlers that access the data in application-specific backing store. Alternatively the XTC Tools can be used to provide assistance in placing application objects in flash.

The software-defined memory region is not enabled on reset, and an access will cause a trap.

APIs to manage the cache content are provided by xcore/swmem_fill.h andf xcore/swmem_evict.h.

XTC Tools built-in support for flash storage#

Executable code or data may be stored in flash for subsequent access by the application via the software-defined memory region. The code or data is annotated to place it in a section and the section name must start with the string .SwMem, for example:

__attribute__((section(".SwMem_data")))
unsigned int mydata = 12345678;

The above will store mydata in the flash image built with xflash such that it may be accessed via the software-defined memory region.

Both executable code (functions) and data may be annotated to be stored in flash.

The code below will trigger the software fill handler to fetch the content of mydata from this region, because it is not yet resident in the level 1 cache:

unsigned int newdata = mydata;

Once the software fill handler has obtained the data and placed it in the cache, a subsequent read of mydata will not trigger the software fill handler, unless the line containing mydata has been evicted because eight other cache lines have been filled.

Compiling for software-defined memory#

See compiling for external memory (LPDDR) and software-defined memory.

Examples#

Two separate examples are provided; one for the software fill handler and another for the software evict handler. These use the XTC Tools built-in support to place annotated objects is flash.

Fill handler example#

The following example illustrates the use of this feature. Two logical cores are used; the first is the “application” which requires data stored in flash, and the second is the sofware fill handler, which is triggerred to fetch data from flash and place it in the cache for the application.

The fill handler uses APIs provided by xmos_flash.h to read data from flash and APIs provided by xcore/swmem_fill.h to write data into the the level 1 cache. The symbol __swmem_address must be defined and intialised to 0xFFFFFFFF. The system bootstrap will overwrite this with a value which provides an offset into flash from which the annotated application data will be fetched.

In this example a read to the address 0x50000000 will cause the fill handler loop running on a logical core to terminate.

Build the example with:

$ xcc main.c main.xc -o main.xe -lquadspi -target=XCORE-AI-EXPLORER -mcmodel=large
main.c#
#include <stdio.h>
#include <xcore/parallel.h>
#include <xcore/swmem_fill.h>
#include <xmos_flash.h>

__attribute__((section(".SwMem_data")))
const unsigned int my_array[20] = {
  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
};

flash_ports_t flash_ports_0 =
{
  PORT_SQI_CS,
  PORT_SQI_SCLK,
  PORT_SQI_SIO,
  XS1_CLKBLK_5
};

flash_clock_config_t flash_clock_config =
{
  1,
  8,
  8,
  1,
  0,
};

flash_qe_config_t flash_qe_config_0 =
{
  flash_qe_location_status_reg_0,
  flash_qe_bit_6
};

flash_handle_t flash_handle;

// We must initialise this to a value such that it is not memset to zero during C runtime startup
#define SWMEM_ADDRESS_UNINITIALISED 0xffffffff
volatile unsigned int __swmem_address = SWMEM_ADDRESS_UNINITIALISED;

static unsigned int nibble_swap_word(unsigned int x)
{
  return ((x & 0xf0f0f0f0) >> 4) | ((x & 0x0f0f0f0f) << 4);
}

void swmem_fill(swmem_fill_t handle, fill_slot_t address) {
  swmem_fill_buffer_t buf;
  unsigned int * buf_ptr = (unsigned int *) buf;

  flash_read_quad(&flash_handle, (address - (void *)XS1_SWMEM_BASE + __swmem_address) >> 2, buf_ptr, SWMEM_FILL_SIZE_WORDS);
  for (unsigned int i=0; i < SWMEM_FILL_SIZE_WORDS; i++)
  {
    buf_ptr[i] = nibble_swap_word(buf_ptr[i]);
  }

  swmem_fill_populate_from_buffer(handle, address, buf);
}

swmem_fill_t swmem_setup() {
  flash_connect(&flash_handle, &flash_ports_0, flash_clock_config, flash_qe_config_0);

  if (__swmem_address == SWMEM_ADDRESS_UNINITIALISED)
  {
    __swmem_address = 0;
  }

  return swmem_fill_get();
}

void swmem_teardown(swmem_fill_t fill_handle) {
  swmem_fill_free(fill_handle);
  flash_disconnect(&flash_handle);
}

static const fill_slot_t swmem_terminate_address = (void *)0x50000000;

DECLARE_JOB(swmem_handler, (swmem_fill_t))
void swmem_handler(swmem_fill_t fill_handle)
{
  fill_slot_t address = 0;
  while (address != swmem_terminate_address)
  {
    address = swmem_fill_in_address(fill_handle);
    swmem_fill(fill_handle, address);
    swmem_fill_populate_word_done(fill_handle, address);
  }
}

DECLARE_JOB(use_swmem, (void))
void use_swmem(void)
{
  volatile unsigned long a = 0;

  for (int i = 0; i < 20; i++) {
    printf("Result: 0x%08x\n", my_array[i]);
    a = my_array[i];
  }

  a = *(const volatile unsigned long *)swmem_terminate_address;
}

void tile_main(void) {
  swmem_fill_t fill_handle = swmem_setup();

  PAR_JOBS(
    PJOB(swmem_handler, (fill_handle)),
    PJOB(use_swmem, ())
  );

  swmem_teardown(fill_handle);
}
main.xc#
#include <platform.h>
#include <stdio.h>

void tile_main(void);

int main(void) {
  par {
    on tile[0]: par {
      tile_main();
    }
    on tile[1]: par {
    }
  }
  return 0;
}

Evict handler example#

The following example illustrates the main.c file for a software evict handler.

main.c#
#include <stdio.h>
#include <xcore/parallel.h>
#include <xcore/swmem_evict.h>
#include <xcore/minicache.h>
#include <xmos_flash.h>

__attribute__((section(".SwMem_data")))
unsigned char my_array[512] = {};


DECLARE_JOB(swmem_handler, (swmem_evict_t))
void swmem_handler(swmem_evict_t evict_handle)
{
  for (unsigned evictions = 0; evictions < 16; evictions += 1)
  {
    evict_slot_t address = swmem_evict_in_address(evict_handle);
    unsigned long mask = swmem_evict_get_dirty_mask(evict_handle, address);
    unsigned long buf[SWMEM_EVICT_SIZE_WORDS];
    swmem_evict_to_buffer(evict_handle, address, buf);
    printf("Eviction of address %p with mask %lx; data:\n", address, mask);
    for (unsigned i = 0; i < SWMEM_EVICT_SIZE_WORDS; i += 1)
    {
      printf(i == SWMEM_EVICT_SIZE_WORDS - 1 ? "%lx\n" : "%lx ", buf[i]);
    }
  }
}

DECLARE_JOB(use_swmem, (void))
void use_swmem(void)
{
  volatile unsigned char *a = my_array;

  for (unsigned i = 0; i < sizeof(my_array); i += 8)
  {
    *((volatile unsigned long *)(a + i)) = (unsigned long)&a[i];
    a[i + 4] = i/4;
    a[i + 5] = 0;
    a[i + 7] = 255;
    if (i % 16) { a[i + 6] = 10; }
  }
  // Performs asm volatile ("flush");
  minicache_flush();
}

void tile_main(void) {
  swmem_evict_t evict_handle = swmem_evict_get();

  PAR_JOBS(
    PJOB(swmem_handler, (evict_handle)),
    PJOB(use_swmem, ())
  );

  swmem_evict_free(evict_handle);
}

Using xrun and xgdb#

When an application is written to flash using xflash the value of __swmem_address will be an offset into the flash storage from which the annotated application objects may be obtained.

But when using xrun or xgdb to run an application the flash bootloader does not execute so __swmem_address will retain the intialiser value of 0xFFFFFFFF.

The data that would be placed at an offset by xflash needs to be extracted from the image and written to the the bottom of flash. The __swmem_address will be set to 0 when it contains the value 0xFFFFFFFF as shown in the example above.

Extracting and writing the data to the bottom of flash is done as follows:

$ xobjdump --strip main.xe

$ xobjdump --split main.xb

$ xflash --reverse --write-all image_n0c0.swmem --target XCORE-AI-EXPLORER

The data nibbles must be swapped to match the format in which xflash stores a complete application in flash. In this example the the swap is done by the --reverse option to xflash.