Interestingly, LLMs can't see images, they are large language models.
They use image recognition software to create plain text descriptions of an image in order to observe the world.
Imagine if our eyes spoke to us?
I only work with mildly intellectually disabled people, intelligence is not common in my field… artificial intelligence is a couple of realms away from me.