An example method is performed using at least a processor and an N-bit fixed point matrix operations accelerator. For a first of multiple convolution layers, the method includes generating, based on initial weights and biases and an input feature map that includes initial feature values, an output feature map that includes output feature values; removing outliers of the output feature values to generate a range of feature values; generating, using the range of feature values, a feature scale value; determining a weight scale value based on a range of weights for the first convolution layer; determining a first range of biases for the first convolution layer; and determining a maximum bias scale for the first convolution layer. For each additional convolution layer, the method includes generating, based on the output feature map, the feature and weight scale values, and the maximum bias scale of the previously processed convolution layer, an output feature map for the current convolution layer.
Full Text
What is claimed is: