96 4 months ago

Our smallest model is also Mixture of Experts. Having a total of 0.4b active and 1.3b total parameters, it beats models way bigger than it in efficiency and performance.

tools