Peak intelligence, is realizing an LLM doesn’t care whether its tokens represent chunks of text, sound, images, videos, 3D models, paths, hand movements, floor planning, emojis, etc.
The keyword is: “multimodal”.
As for being able to correlate some “chunks of MRI scan” with the word “cancer”… that’s all about the training.
Peak intelligence, is realizing an LLM doesn’t care whether its tokens represent chunks of text, sound, images, videos, 3D models, paths, hand movements, floor planning, emojis, etc.
The keyword is: “multimodal”.
As for being able to correlate some “chunks of MRI scan” with the word “cancer”… that’s all about the training.