Scalar Float API¶

@todo astew: Include note here about the conventions used in the documentation below (particularly: w.r.t. the MathJax describing the inputs/outputs; e.g. is \(X\) just a mantissa, or is it the logical value \(x \cdot 2^{x\_exp}\) where \(x\) is the mantissa and \(x\_exp\) the exponent?

float_s32_t float_s64_to_float_s32(const float_s64_t x)¶

Convert a float_s64_t to a float_s32_t.

Note

This operation may result in precision loss.

Parameters: x – [in] Input value
Returns: float_s32_t representation of x

float_s32_t float_to_float_s32(const float x)¶

Convert an IEEE754 float to a float_s32_t.

Parameters: x – [in] Input value
Throws: ET_ARITHMETIC – Raised if x is infinite or NaN
Returns: float_s32_t representation of x

float_s32_t double_to_float_s32(const double x)¶

Convert an IEEE754 double to a float_s32_t.

Note

This operation may result in precision loss.

Parameters: x – [in] Input value
Throws: ET_ARITHMETIC – Raised if x is infinite or NaN
Returns: float_s32_t representation of x

float_s64_t float_s32_to_float_s64(const float_s32_t x)¶

Convert a float_s32_t to a float_s64_t.

Parameters: x – [in] Input value
Returns: float_s64_t representation of x

float float_s32_to_float(const float_s32_t x)¶

Convert a float_s32_t to an IEEE754 float.

Parameters: x – [in] Input value
Returns: float representation of x

double float_s32_to_double(const float_s32_t x)¶

Convert a float_s32_t to an IEEE754 double.

Parameters: x – [in] Input value
Returns: double representation of x

float_s32_t float_s32_mul(const float_s32_t x, const float_s32_t y)¶

Multiply two float_s32_t together.

The inputs \(x\) and \(y\) are multiplied together for a result \(a\), which is returned.

Operation Performed:

\[\begin{align*} & a \leftarrow x \cdot y \end{align*}\]

Parameters

x – [in] Input operand \(x\)
y – [in] Input operand \(y\)

Returns

The product of \(x\) and \(y\)

float_s32_t float_s32_add(const float_s32_t x, const float_s32_t y)¶

Add two float_s32_t together.

The inputs \(x\) and \(y\) are added together for a result \(a\), which is returned.

Operation Performed:

\[\begin{align*} & a \leftarrow x + y \end{align*}\]

Parameters

x – [in] Input operand \(x\)
y – [in] Input operand \(y\)

Returns

The sum of \(x\) and \(y\)

float_s32_t float_s32_sub(const float_s32_t x, const float_s32_t y)¶

Subtract one float_s32_t from another.

The input \(y\) is subtracted from the input \(x\) for a result \(a\), which is returned.

Operation Performed:

\[\begin{align*} & a \leftarrow x - y \end{align*}\]

Parameters

x – [in] Input operand \(x\)
y – [in] Input operand \(y\)

Returns

The difference of \(x\) and \(y\)

float_s32_t float_s32_div(const float_s32_t x, const float_s32_t y)¶

Divide one float_s32_t from another.

The input \(x\) is divided by the input \(y\) for a result \(a\), which is returned.

Operation Performed:

\[\begin{align*} & a \leftarrow \frac{x}{y} \end{align*}\]

Parameters

x – [in] Input operand \(x\)
y – [in] Input operand \(y\)

Throws

ET_ARITHMETIC – if \(Y\) is \(0\)

Returns

The result of \(x / y\)

float_s32_t float_s32_abs(const float_s32_t x)¶

Get the absolute value of a float_s32_t.

\(a\), the absolute value of \(x\) is returned.

Operation Performed:

\[\begin{align*} & a \leftarrow \left| x \right| \end{align*}\]

Parameters: x – [in] Input operand \(x\)
Returns: The absolute value of \(x\)

unsigned float_s32_gt(const float_s32_t x, const float_s32_t y)¶

Determine whether one float_s32_t is greater than another.

The inputs \(x\) and \(y\) are compared. The result \(a\) is true iff \(x\) is greater than \(y\) and false otherwise. \(a\) is returned.

Operation Performed:

\[\begin{split}\begin{align*} & a \leftarrow \begin{cases} & 1 & x \gt y \\ & 0 & otherwise & \end{cases} \end{align*}\end{split}\]

Parameters

x – [in] Input operand \(x\)
y – [in] Input operand \(y\)

Returns

1 iff \(x \gt y\); 0 otherwise

unsigned float_s32_gte(const float_s32_t x, const float_s32_t y)¶

Determine whether one float_s32_t is greater or equal to another.

The inputs \(x\) and \(y\) are compared. The result \(a\) is true iff \(x\) is greater than or equal to \(y\) and false otherwise. \(a\) is returned.

Operation Performed:

\[\begin{split}\begin{align*} & a \leftarrow \begin{cases} & 1 & x \geq y \\ & 0 & otherwise & \end{cases} \end{align*}\end{split}\]

Parameters

x – [in] Input operand \(x\)
y – [in] Input operand \(y\)

Returns

1 iff \(x \geq y\); 0 otherwise

float_s32_t float_s32_ema(const float_s32_t x, const float_s32_t y, const fixed_s32_t coef_q30)¶

Update an exponential moving average.

This function updates an exponential moving average by applying a single new sample. \(x\) is taken as the previous EMA state, with \(y\) as the new sample. The EMA coefficient \(\alpha\) is applied to the term including \(x\).

coef_q30 is a fixed-point value in a Q30 format (i.e. has an implied exponent of \(-30\)), and should be in the range \(0 \leq \alpha \leq 1\).

Operation Performed:

\[\begin{align*} & a \leftarrow \alpha \cdot x + (1 - \alpha) \cdot y \end{align*}\]

Parameters

x – [in] Input operand \(x\)
y – [in] Input operand \(y\)
coef_q30 – [in] EMA coefficient \(\alpha\) encoded in Q30 format

Returns

The new EMA state

float_s32_t float_s32_sqrt(const float_s32_t x)¶

Get the square root of a float_s32_t.

This function computes the square root of \(x\). The result, \(a\) is returned.

The precision with which \(a\) is computed is configurable via the XS3_BFP_SQRT_DEPTH_S32 configuration parameter. It indicates the number of most significant bits to be calculated.