beta
/Systems And Methods For Distributing Layers Of Special Mixture-of-experts Machine Learning Models
Abstract

Some disclosed embodiments are directed to computing systems having different accelerators such that a first set of accelerators has a greater memory capability than a second set accelerators, while the second set of accelerators has a greater processing capability than the first set of accelerators. A machine learning model having different dense layers and sparse layers is distributed on the different accelerators such that the dense layers are distributed on one or more accelerators selected from the first set of accelerators and the sparse layers are distributed on one or more accelerators in the second set of accelerators.

Full Text

What is claimed is:

Some disclosed embodiments are directed to computing systems having different accelerators such that a first set of accelerators has a greater memory capability than a second set accelerators, while the second set of accelerators has a greater processing capability than the first set of accelerators. A machine learning model having different dense layers and sparse layers is distributed on the different accelerators such that the dense layers are distributed on one or more accelerators selected from the first set of accelerators and the sparse layers are distributed on one or more accelerators in the second set of accelerators.
Timeline
Filed
02/19/2026
Published
06/25/2026
Granted
Not Available
IPC Codes(3)
G06N 20/00:Machine learning
G06F 18/214:Generating training patterns; Bootstrap methods, e.g. bagging or boosting
G06T 1/60:Memory management