15.3K 2 months ago

A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone

vision 8b
e6daeaa4f14b · 112B
{
"num_ctx": 4096,
"stop": [
"[\"<|im_start|>\",\"<|im_end|>\"]"
],
"temperature": 0.7,
"top_p": 0.9
}