19.6K 6 months ago

A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone

vision 8b

13 models