XS3 Float Vector Functions

exponent_t xs3_vect_f32_max_exponent(const float b[], const unsigned length)

Get the maximum (32-bit BFP) exponent from a vector of IEEE754 floats.

This function is used to determine the BFP exponent to use when converting a vector of IEEE754 single-precision floats into a 32-bit BFP vector.

The exponent returned, if used with xs3_vect_f32_to_s32(), is the one which will result in no headroom in the BFP vector — that is, the minimum permissible exponent for the BFP vector. The minimum permissible exponent is derived from the maximum exponent found in the float elements themselves.

More specifically, the FSEXP instruction is used on each element to determine its exponent. The value returned is the maximum exponent given by the FSEXP instruction plus 30.

b[] must begin at a double-word-aligned address.

Note

If required, when converting to a 32-bit BFP vector, additional headroom can be included by adding the amount of required headroom to the exponent returned by this function.

Parameters
  • b[in] Input vector of IEEE754 single-precision floats \(\bar b\)

  • length[in] Number of elements in \(\bar b\)

Throws
  • ET_LOAD_STORE – Raised ifb is not double-word-aligned (See Note: Vector Alignment)

  • ET_ARITHMETIC – Raised if Any element of b is infinite or not-a-number.

Returns

Exponent used for converting to 32-bit BFP vector.

void xs3_vect_f32_to_s32(int32_t a[], const float b[], const unsigned length, const exponent_t a_exp)

Convert a vector of IEEE754 single-precision floats into a 32-bit BFP vector.

This function converts a vector of IEEE754 single-precision floats \(\bar b\) into the mantissa vector \(\bar a\) of a 32-bit BFP vector, given BFP vector exponent \(a\_exp\). Conceptually, the elements of output vector \(\bar{a} \cdot 2^{a\_exp}\) represent the same values as those of the input vector.

Because the output exponent \(a\_exp\) is shared by all elements of the output vector, even though the output vector has 32-bit mantissas, precision may be lost on some elements if the exponents of the input elements \(b_k\) span a wide range.

The function xs3_vect_f32_max_exponent() can be used to determine the value for \(a\_exp\) which minimizes headroom of the output vector.

Operation Performed:

\[\begin{split}\begin{align*} & a_k \leftarrow round(\frac{b_k}{2^{b\_exp}}) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{align*}\end{split}\]

Parameter Details

a[] represents the 32-bit output mantissa vector \(\bar a\).

b[] represents the IEEE754 float input vector \(\bar b\).

a[] and b[] must each begin at a double-word-aligned address.

b[] can be safely updated in-place.

length is the number of elements in each of the vectors.

a_exp is the exponent associated with the output vector \(\bar a\).

Parameters
  • a[out] Output vector \(\bar a\)

  • b[in] Input vector \(\bar b\)

  • length[in] Number of elements in vectors \(\bar a\) and \(\bar b\)

  • a_exp[in] Exponent \(a\_exp\) of output vector \(\bar a\)

Throws
  • ET_LOAD_STORE – Raised if a or b is not double-word-aligned (See Note: Vector Alignment)

  • ET_ARITHMETIC – Raised if Any element of b is infinite or not-a-number.

void xs3_vect_s32_to_f32(float a[], const int32_t b[], const unsigned length, const exponent_t b_exp)

Convert a 32-bit BFP vector into a vector of IEEE754 single-precision floats.

This function converts a 32-bit mantissa vector and exponent \(\bar b \cdot 2^{b\_exp}\) into a vector of 32-bit IEEE754 single-precision floating-point elements \(\bar a\). Conceptually, the elements of output vector \(\bar a\) represent the same values as those of the input vector.

Because IEEE754 single-precision floats hold fewer mantissa bits, this operation may result in a loss of precision for some elements.

Operation Performed:

\[\begin{split}\begin{align*} & a_k \leftarrow b_k \cdot 2^{b\_exp} \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{align*}\end{split}\]

Parameter Details

a[] represents the output IEEE754 float vector \(\bar a\).

b[] represents the 32-bit input mantissa vector \(\bar b\).

a[] and b[] must each begin at a double-word-aligned address.

b[] can be safely updated in-place.

length is the number of elements in each of the vectors.

b_exp is the exponent associated with the input vector \(\bar b\).

Parameters
  • a[out] Output vector \(\bar a\)

  • b[in] Input vector \(\bar b\)

  • length[in] Number of elements in vectors \(\bar a\) and \(\bar b\)

  • b_exp[in] Exponent \(b\_exp\) of input vector \(\bar b\)

Throws

ET_LOAD_STORE – Raised if a or b is not double-word-aligned (See Note: Vector Alignment)

float xs3_vect_f32_dot(const float b[], const float c[], const unsigned length)

Compute the inner product of two IEEE754 float vectors.

This function takes two vectors of IEEE754 single-precision floats and computes their inner product — the sum of the elementwise products. The FMACC instruction is used, granting full precision in the addition.

The inner product \(a\) is returned.

Operation Performed:

\[\begin{align*} & a \leftarrow \sum_{k=0}^{length-1} ( b_k \cdot c_k ) \end{align*}\]

Parameters
  • b[in] Input vector \(\bar b\)

  • c[in] Input vector \(\bar c\)

  • length[in] Number of elements in vectors \(\bar b\) and \(\bar c\)

Returns

The inner product

complex_float_t *xs3_vect_f32_fft_forward(float x[], const unsigned fft_length)

Perform forward FFT on a vector of IEEE754 floats.

This function takes real input vector \(\bar x\) and performs a forward FFT on the signal in-place to get output vector \(\bar{X} = FFT{\bar{x}}\). This implementation is accelerated by converting the IEEE754 float vector into a block floating-point representation to compute the FFT. The resulting BFP spectrum is then converted back to IEEE754 single-precision floats. The operation is performed in-place on x[].

See bfp_fft_forward_mono() for the details of the FFT.

Whereas the input x[] is an array of fft_length float elements, the output (placed in x[]) is an array of fft_length/2 complex_float_t elements, so the input should be cast after calling this.

const unsigned FFT_N = 512
float time_series[FFT_N] = { ... };
xs3_vect_f32_fft_forward(time_series, FFT_N);
complex_float_t* freq_spectrum = (complex_float_t*) &time_series[0];
const unsigned FREQ_BINS = FFT_N/2;
// e.g.   freq_spectrum[FREQ_BINS-1].re

x[] must begin at a double-word-aligned address.

Operation Performed:

\[\begin{align*} & \bar{X} \leftarrow FFT{\bar{x}} \end{align*}\]

Parameters
  • x[inout] Input vector \(\bar x\)

  • fft_length[in] The length of \(\bar x\)

Throws

ET_LOAD_STORE – Raised if x is not double-word-aligned (See Note: Vector Alignment)

Returns

Pointer to frequency-domain spectrum (i.e. ((complex_float_t*) &x[0]))

float *xs3_vect_f32_fft_inverse(complex_float_t X[], const unsigned fft_length)

Perform inverse FFT on a vector of complex_float_t.

This function takes complex input vector \(\bar X\) and performs an inverse real FFT on the spectrum in-place to get output vector \(\bar{x} = IFFT{\bar{X}}\). This implementation is accelerated by converting the IEEE754 float vector into a block floating-point representation to compute the IFFT. The resulting BFP signal is then converted back to IEEE754 single-precision floats. The operation is performed in-place on X[].

See bfp_fft_inverse_mono() for the details of the IFFT.

Input X[] is an array of fft_length/2 complex_float_t elements. The output (placed in X[]) is an array of fft_length float elements.

const unsigned FFT_N = 512
complex_float_t freq_spectrum[FFT_N/2] = { ... };
xs3_vect_f32_fft_inverse(freq_spectrum, FFT_N);
float* time_series = (float*) &freq_spectrum[0];

X[] must begin at a double-word-aligned address.

Parameters
  • X[inout] Input vector \(\bar X\)

  • fft_length[in] The FFT length. Twice the element count of \(\bar X\).

Throws

ET_LOAD_STORE – Raised if X is not double-word-aligned (See Note: Vector Alignment)

Returns

Pointer to time-domain signal (i.e. ((float*) &X[0]))