A second input image is generated by applying a target augmented reality (AR) effect to a first input image. The first input image and the second input image are provided to a first visual-semantic machine learning model to obtain output describing at least one feature of the target AR effect. The first visual-semantic machine learning model is fine-tuned from a second visual-semantic machine learning model by using training samples. Each training sample comprises a first training image, a second training image, and a training description of a given AR effect. The second training image is generated by applying the given AR effect to the first training image. A description of the target AR effect is selected based on the output of the visual-semantic machine learning model. The description of the target AR effect is stored in association with an identifier of the target AR effect.
Full Text
What is claimed is: