In various embodiments, a system can: access a failure image on which a first model has inaccurately performed an inferencing task; train, on a set of dummy images, a second model to learn a visual variety of the failure image, based on a loss function having a first term and a second term, the first term quantifying visual content dissimilarities between the set of dummy images and outputs predicted during training by the second model, and the second term quantifying, at a plurality of different image scales, visual variety dissimilarities between the failure image and the outputs predicted during training by the second model; and execute the second model on each of a set of training images on which the first model was trained, thereby yielding a set of first converted training images that exhibit the visual variety of the failure image.
Full Text
What is claimed is: