xAI

Check xAI Grok Vision explaining program code screenshot

Published

1 year ago

July 2, 2024

xAI is yet to release its vision capabilities for the Grok large language model (LLM) but a screenshot showed this model in action by explaining a program code. Grok-1.5 Vision is the first-gen multimodal model. It can process a range of visual information, including documents, diagrams, charts, screenshots, and photographs.

It can write code from a diagram and calculate calories in an image. The model can draw a story from a toddler’s sketch and explain a meme. This version can convert a table to CSV format and give you suggestions about a scenario shown in the image. The model can solve a screenshot with Python code and turn the code into text.

An image shared by X user @Lohansimpson showed a user query to explain the code in the image. The model identifies the code and replies with a summary as well as commands used in the code structure and values.

xAI Grok 1.5 Vision explaining program code (Image Source – @Lohansimpson/X)

This chatbot comes with a general user interface (UI) including a prompt bar to ask questions and receive an answer. Grok is currently integrated into social media site X for premium users and the company only allows you to do text conversations.

However, xAI has developed a feature to upload files for Grok conversation. You can push a screenshot or image to Grok and caption to ask for an explanation.

Although Grok 1.5 is now released to most users but access to Vision is still limited to only a few testers. Overall, processing screenshots and images via Grok Vision is a major feature and should be released soon for all X premium users.

Related Topics:AI Artificial Intelligence Grok News xAI