Mixed-Precision Vector API#

group vect_mixed_api

Functions

void mat_mul_s8_x_s16_yield_s32(int32_t output[], const int8_t matrix[], const int16_t input_vect[], const unsigned M_rows, const unsigned N_cols, int8_t scratch[])#

Multiply an 8-bit matrix by a 16-bit vetor for a 32-bit result vector.

This function multiplies an 8-bit \(M \times N\) matrix \(\bar W\) by a 16-bit \(N\)-element column vector \(\bar v\) and returns the result as a 32-bit \(M\)-element vector \(\bar a\).

output is the output vector \(\bar a\).

matrix is the matrix \(\bar W\).

input_vect is the vector \(\bar v\).

matrix and input_vect must both begin at a word-aligned offsets.

M_rows and N_rows are the dimensions \(M\) and \(N\) of matrix \(\bar W\). \(M\) must be a multiple of 16, and \(N\) must be a multiple of 32.

scratch is a pointer to a word-aligned buffer that this function may use to store intermediate results. This buffer must be at least \(N\) bytes long.

The result of this multiplication is exact, so long as saturation does not occur.

Parameters:

output – [inout] The output vector \(\bar a\)
matrix – [in] The weight matrix \(\bar W\)
input_vect – [in] The input vector \(\bar v\)
M_rows – [in] The number of rows \(M\) in matrix \(\bar W\)
N_cols – [in] The number of columns \(N\) in matrix \(\bar W\)
scratch – [in] Scratch buffer required by this function.

Throws ET_LOAD_STORE:

Raised if matrix or input_vect is not word-aligned (See Note: Vector Alignment)

unsigned vect_sXX_add_scalar(int32_t a[], const int32_t b[], const unsigned length_bytes, const int32_t c, const int32_t d, const right_shift_t b_shr, const unsigned mode_bits)#

Add a scalar to a vector.

Add a scalar to a vector. This works for 8, 16 or 32 bits, real or complex.

length_bytes is the total number of bytes to be output. So, for 16-bit vectors, length_bytes is twice the number of elements, whereas for complex 32-bit vectors, length_bytes is 8 times the number of elements.

c and d are the values that populate the internal buffer to be added to the input vector as follows: Internally an 8 word (32 byte) buffer is allocated (on the stack). Even-indexed words are populated with c and odd-indexed words are populated with d. For real vectors, c and d should be the same value — the reason for d is to allow this same function to work for complex 32-bit vectors. This also means that for 16-bit vectors, the value to be added needs to be duplicated in both the higher 2 bytes and lower 2 bytes of the word.

mode_bits should be 0x0000 for 32-bit mode, 0x0100 for 16-bit mode or 0x0200 for 8-bit mode.