Wake Word Configuration

The wake word Engine(WWE) implemented in the XVF3615 comprises two components.

  1. Core wake word engine – (executable) This component is responsible for processing the audio stream and it contains a Neural Network (NN) that can identify the wake word in the audio stream. This component is a fixed part of the standard XVF3615 firmware image.

  2. Alexa Model – (binary data) The binary data contains the network parameters for the NN and can be changed to allow support of different keywords, eg for support of different languages.

The XVF3615 release package contains an example wake word model. Additional wake word models can be obtained from Amazon directly – please contact your AVS Solution Architect at Amazon.

A standard XVF3615 firmware image comprises an integrated binary executable containing both these components and the audio processing executable. The release package also contains a utility that allows a developer to change the binary model in the firmware image to incorporate a different language model. The WWE can only execute one model in the device at any time.

The WWE integrated in the XVF3615 continuously monitors the audio stream at the output of the audio processing pipeline. When the WWE detects the presence of the wake word in the audio stream the XVF3615 device signals the host processor.

The XVF3615 provides three mechanisms to inform the host of the detection. These mechanism can be individually enabled via the control interface.

DIGITAL OUTPUT

When a wake word is detected the XVF3615 can generate a pulse on one of the four General Purpose Output(GPO) pins on the device. The specific pin used and the duration of the pulse can be configured via the control interface.

This digital output can be used to trigger a host interrupt, or it can be polled by the host.

The GPO pin used to signal the detection must be configured via the vfctrl interface or in the XVF3615 data partition.

For example the following commands set GPO_0 to emit a 50ms pulse following detection of a wake word.

vfctrl_usb SET_WWE_DETECTED_PIN 0
vfctrl_usb SET_WWE_DETECTED_PERIOD 5

Wake Word Counter

The XVF3615 maintains a count of the number of WW detections and this counter can be polled by a host, and reset as required. This feature in useful for system testing and tuning as it allows the WWE to be characterised independently of running a full AVS client.

The sequence below will reset the counter and then, after some time report the number of wake words detected since that reset event.

vfctrl_usb SET_WWE_COUNT 0
:
:
.. Some time later
:
:
vfctrl_usb GET_WWE_COUNT

USB HID

When the WWE detects a wake word the XVF3615 can send a HID Report. The configuration of the HID is identical to the XVF3610 which supports three HID Reports, one each for keyboard, consumer and telephony events.

With the default data partition the XFV3615 reports a wake word event via a HID report with the format shown in the table below.

The HID report can be modified to report detection as a keyboard event. The following commands can be used to set up the HID as a keyboard reporting the KEY_T as the wake word notification.

SET_HID_MAP_HEADER 1 0
SET_HID_MAP 1 0 0
Table 57 USB HID Report - With KEY_T

USB HID Usage Page

Bit Byte

7

6

5

4

3

2

1

0

Keyboard

0

Reserved

F24

F23

Reserved

‘t’

Please see the XVF3610 user guide for further details on configuration of the USB HID.

The report is triggered when the WWE detects a wake word. Following receipt of the HID report the host can use a vfctrl command to read the start and end index to identify the position of the wake word.