beta
/Model Fine-tuning For Automated Augmented Reality Descriptions
Abstract

A second input image is generated by applying a target augmented reality (AR) effect to a first input image. The first input image and the second input image are provided to a first visual-semantic machine learning model to obtain output describing at least one feature of the target AR effect. The first visual-semantic machine learning model is fine-tuned from a second visual-semantic machine learning model by using training samples. Each training sample comprises a first training image, a second training image, and a training description of a given AR effect. The second training image is generated by applying the given AR effect to the first training image. A description of the target AR effect is selected based on the output of the visual-semantic machine learning model. The description of the target AR effect is stored in association with an identifier of the target AR effect.

Full Text

What is claimed is:

A second input image is generated by applying a target augmented reality (AR) effect to a first input image. The first input image and the second input image are provided to a first visual-semantic machine learning model to obtain output describing at least one feature of the target AR effect. The first visual-semantic machine learning model is fine-tuned from a second visual-semantic machine learning model by using training samples. Each training sample comprises a first training image, a second training image, and a training description of a given AR effect. The second training image is generated by applying the given AR effect to the first training image. A description of the target AR effect is selected based on the output of the visual-semantic machine learning model. The description of the target AR effect is stored in association with an identifier of the target AR effect.
Timeline
Filed
02/23/2026
Published
06/25/2026
Granted
Not Available
IPC Codes(7)
G06V 20/70:Labelling scene content, e.g. deriving syntactic or semantic representations
G06F 40/40:Processing or translation of natural language (natural language analysis G06F 40/20; semantic analysis G06F 40/30)
G06N 3/0455:Auto-encoder networks; Encoder-decoder networks