Functions
void	arm_nn_softmax_common_s8 (const int8_t input, const int32_t num_rows, const int32_t row_size, const int32_t mult, const int32_t shift, const int32_t diff_min, const bool int16_output, void output)
	Common softmax function for s8 input and s8 or s16 output. More...

void	arm_softmax_q15 (const q15_t vec_in, const uint16_t dim_vec, q15_t p_out)
	Q15 softmax function. More...

void	arm_softmax_q7 (const q7_t vec_in, const uint16_t dim_vec, q7_t p_out)
	Q7 softmax function. More...

arm_status	arm_softmax_s16 (const int16_t input, const int32_t num_rows, const int32_t row_size, const int32_t mult, const int32_t shift, const cmsis_nn_softmax_lut_s16 softmax_params, int16_t *output)
	S16 softmax function. More...

void	arm_softmax_s8 (const int8_t input, const int32_t num_rows, const int32_t row_size, const int32_t mult, const int32_t shift, const int32_t diff_min, int8_t output)
	S8 softmax function. More...

void	arm_softmax_s8_s16 (const int8_t input, const int32_t num_rows, const int32_t row_size, const int32_t mult, const int32_t shift, const int32_t diff_min, int16_t output)
	S8 to s16 softmax function. More...

void	arm_softmax_u8 (const uint8_t input, const int32_t num_rows, const int32_t row_size, const int32_t mult, const int32_t shift, const int32_t diff_min, uint8_t output)
	U8 softmax function. More...

void	arm_softmax_with_batch_q7 (const q7_t vec_in, const uint16_t nb_batches, const uint16_t dim_vec, q7_t p_out)
	Q7 softmax function with batch parameter. More...

Description

EXP(2) based softmax functions.

Function Documentation

void arm_nn_softmax_common_s8	(	const int8_t *	input,
		const int32_t	num_rows,
		const int32_t	row_size,
		const int32_t	mult,
		const int32_t	shift,
		const int32_t	diff_min,
		const bool	int16_output,
		void *	output
	)

Parameters

[in]	input	Pointer to the input tensor
[in]	num_rows	Number of rows in the input tensor
[in]	row_size	Number of elements in each input row
[in]	mult	Input quantization multiplier
[in]	shift	Input quantization shift within the range [0, 31]
[in]	diff_min	Minimum difference with max in row. Used to check if the quantized exponential operation can be performed
[in]	int16_output	Indicating s8 output if 0 else s16 output
[out]	output	Pointer to the output tensor

Note: Supported framework: TensorFlow Lite micro (bit-accurate)

References ACCUM_BITS, CLAMP, DIV_POW2, EXP_ON_NEG, MAX, MUL_SAT, NN_Q15_MAX, NN_Q15_MIN, NN_Q7_MAX, NN_Q7_MIN, and ONE_OVER1.

Referenced by arm_softmax_s8(), and arm_softmax_s8_s16().

void arm_softmax_q15	(	const q15_t *	vec_in,
		const uint16_t	dim_vec,
		q15_t *	p_out
	)

Parameters

[in]	vec_in	pointer to input vector
[in]	dim_vec	input vector dimention
[out]	p_out	pointer to output vector

Here, instead of typical e based softmax, we use 2-based softmax, i.e.,:

y_i = 2^(x_i) / sum(2^x_j)

The relative output will be different here. But mathematically, the gradient will be the same with a log(2) scaling factor.

void arm_softmax_q7	(	const q7_t *	vec_in,
		const uint16_t	dim_vec,
		q7_t *	p_out
	)

Parameters

[in]	vec_in	pointer to input vector
[in]	dim_vec	input vector dimention
[out]	p_out	pointer to output vector

Here, instead of typical natural logarithm e based softmax, we use 2-based softmax here, i.e.,:

y_i = 2^(x_i) / sum(2^x_j)

The relative output will be different here. But mathematically, the gradient will be the same with a log(2) scaling factor.

Referenced by arm_softmax_with_batch_q7().

arm_status arm_softmax_s16	(	const int16_t *	input,
		const int32_t	num_rows,
		const int32_t	row_size,
		const int32_t	mult,
		const int32_t	shift,
		const cmsis_nn_softmax_lut_s16 *	softmax_params,
		int16_t *	output
	)

Parameters

[in]	input	Pointer to the input tensor
[in]	num_rows	Number of rows in the input tensor
[in]	row_size	Number of elements in each input row
[in]	mult	Input quantization multiplier
[in]	shift	Input quantization shift within the range [0, 31]
[in]	softmax_params	Softmax s16 layer parameters with two pointers to LUTs speficied below. For indexing the high 9 bits are used and 7 remaining for interpolation. That means 512 entries for the 9-bit indexing and 1 extra for interpolation, i.e. 513 values for each LUT. Lookup table for exp(x), where x uniform distributed between [-10.0 , 0.0] Lookup table for 1 / (1 + x), where x uniform distributed between [0.0 , 1.0]
[out]	output	Pointer to the output tensor

Returns: The function returns ARM_MATH_ARGUMENT_ERROR if LUTs are NULL ARM_MATH_SUCCESS - Successful operation

Note: Supported framework: TensorFlow Lite micro (bit-accurate)

References arm_nn_requantize(), cmsis_nn_softmax_lut_s16::exp_lut, MAX, MIN, NN_Q15_MAX, NN_Q15_MIN, and cmsis_nn_softmax_lut_s16::one_by_one_lut.

void arm_softmax_s8	(	const int8_t *	input,
		const int32_t	num_rows,
		const int32_t	row_size,
		const int32_t	mult,
		const int32_t	shift,
		const int32_t	diff_min,
		int8_t *	output
	)

Parameters

[in]	input	Pointer to the input tensor
[in]	num_rows	Number of rows in the input tensor
[in]	row_size	Number of elements in each input row
[in]	mult	Input quantization multiplier
[in]	shift	Input quantization shift within the range [0, 31]
[in]	diff_min	Minimum difference with max in row. Used to check if the quantized exponential operation can be performed
[out]	output	Pointer to the output tensor

Note: Supported framework: TensorFlow Lite micro (bit-accurate)

References ACCUM_BITS, arm_nn_softmax_common_s8(), CLAMP, DIV_POW2, DIV_POW2_MVE, EXP_ON_NEG, MUL_SAT, MUL_SAT_MVE, NN_Q7_MIN, and ONE_OVER1.

void arm_softmax_s8_s16	(	const int8_t *	input,
		const int32_t	num_rows,
		const int32_t	row_size,
		const int32_t	mult,
		const int32_t	shift,
		const int32_t	diff_min,
		int16_t *	output
	)

Parameters

[in]	input	Pointer to the input tensor
[in]	num_rows	Number of rows in the input tensor
[in]	row_size	Number of elements in each input row
[in]	mult	Input quantization multiplier
[in]	shift	Input quantization shift within the range [0, 31]
[in]	diff_min	Minimum difference with max in row. Used to check if the quantized exponential operation can be performed
[out]	output	Pointer to the output tensor

Note: Supported framework: TensorFlow Lite micro (bit-accurate)

References arm_nn_softmax_common_s8().

void arm_softmax_u8	(	const uint8_t *	input,
		const int32_t	num_rows,
		const int32_t	row_size,
		const int32_t	mult,
		const int32_t	shift,
		const int32_t	diff_min,
		uint8_t *	output
	)

Parameters

[in]	input	Pointer to the input tensor
[in]	num_rows	Number of rows in the input tensor
[in]	row_size	Number of elements in each input row
[in]	mult	Input quantization multiplier
[in]	shift	Input quantization shift within the range [0, 31]
[in]	diff_min	Minimum difference with max in row. Used to check if the quantized exponential operation can be performed
[out]	output	Pointer to the output tensor

Note: Supported framework: TensorFlow Lite micro (bit-accurate)

References ACCUM_BITS, CLAMP, DIV_POW2, EXP_ON_NEG, MAX, MUL_SAT, and ONE_OVER1.

void arm_softmax_with_batch_q7	(	const q7_t *	vec_in,
		const uint16_t	nb_batches,
		const uint16_t	dim_vec,
		q7_t *	p_out
	)

Parameters

[in]	vec_in	pointer to input vector
[in]	nb_batches	number of batches
[in]	dim_vec	input vector dimention
[out]	p_out	pointer to output vector

Here, instead of typical natural logarithm e based softmax, we use 2-based softmax here, i.e.,:

y_i = 2^(x_i) / sum(2^x_j)

The relative output will be different here. But mathematically, the gradient will be the same with a log(2) scaling factor.

References arm_softmax_q7().

Functions

Description

Function Documentation