xAI
Check xAI Grok Vision explaining program code screenshot
xAI is yet to release its vision capabilities for the Grok large language model (LLM) but a screenshot showed this model in action by explaining a program code. Grok-1.5 Vision is the first-gen multimodal model. It can process a range of visual information, including documents, diagrams, charts, screenshots, and photographs.
It can write code from a diagram and calculate calories in an image. The model can draw a story from a toddler’s sketch and explain a meme. This version can convert a table to CSV format and give you suggestions about a scenario shown in the image. The model can solve a screenshot with Python code and turn the code into text.
An image shared by X user @Lohansimpson showed a user query to explain the code in the image. The model identifies the code and replies with a summary as well as commands used in the code structure and values.
This chatbot comes with a general user interface (UI) including a prompt bar to ask questions and receive an answer. Grok is currently integrated into social media site X for premium users and the company only allows you to do text conversations.
However, xAI has developed a feature to upload files for Grok conversation. You can push a screenshot or image to Grok and caption to ask for an explanation.
Although Grok 1.5 is now released to most users but access to Vision is still limited to only a few testers. Overall, processing screenshots and images via Grok Vision is a major feature and should be released soon for all X premium users.