xAI
xAI launches Grok 1.5 Vision, new multimodal model with visual data processing capability
Artificial Intelligence Company xAI today announced its first-gen multimodal model Grok 1.5 Vision. It is capable of processing visual data, which is a new feature of this large language model.
Grok 1.5 Vision can process:
- Documents
- Diagrams
- Charts
- Screenshots
- Photographs
In November, xAI announced Grok, its first generative AI chatbot which competes with OpenAI’s ChatGPT and Google’s Bard. The company later integrated this chatbot into social media site X, formerly known as Twitter.
Grok works like its competitors with a prompt system and a simple user interface. However, it’s only available for paid X membership users as of April 2024. Grok is based on the text-to-text framework, the bot can comprehend the user prompt and share a text-based response on real-time information from the X platform.
A few weeks ago, xAI brought Grok 1.5 with long context comprehension and advanced reasoning. Benchmark shows that the 1.5 version has brought new performance to Grok. Adding a visual data processing capability to the model could bring more advantages to users.
Grok 1.5 Vision Samples:
xAI has shared seven samples for Grok-1.5V.
- Photograph processing: Grok 1.5 Vision translated a photograph of a Python program flowchart on a whiteboard to Python code.
- Calculation: The program demonstrated calory count from an image.
- Generating a story from an Image: It could generate a story from a simple sketch drawn by a kid.
- Meme: The program now can explain a meme
- Conversion: The sample showed the conversion of a table’s content into CSV format.
- Suggestion: It could give advice based on the scenario shown in an image
- Solving coding problems: It could understand and solve a runtime error or bug in a program code.
Preview:
The xAI Grok 1.5 Vision will be available soon for early beta testers and existing Grok users. However, the company has shared no timeline to release this new multimode in action.
(source)