AI Breakthrough: ChatGPT Can Now See, Hear, and Speak

Sep 25

ChatGPT is now equipped with the capability to see and hear, ushering in a new era of more intuitive human-machine interactions.

This latest innovation introduces voice and image capabilities to ChatGPT, offering users a dynamic range of applications in their daily lives. Whether you're on the move or at home, these enhancements provide an immersive conversational experience and visual context for your interactions.

Voice Interaction with ChatGPT

With the introduction of voice capabilities, users can now engage in seamless back-and-forth conversations with ChatGPT. This feature allows for a multitude of applications, from requesting bedtime stories for your family to settling dinner table debates. Whether you need information, entertainment, or assistance, ChatGPT's voice interaction feature brings a new level of convenience.

Visual Interaction with ChatGPT

The addition of image capabilities enables users to share one or more images with ChatGPT to facilitate a deeper understanding of their queries. Troubleshoot problems like a malfunctioning grill, plan meals by exploring the contents of your fridge and pantry, or analyze complex data graphs for work-related insights. The mobile app even provides a drawing tool for highlighting specific areas of an image, making interactions more precise.

These image-understanding capabilities are powered by the advanced multimodal models GPT-3.5 and GPT-4. These models leverage their language reasoning skills to comprehend various types of images, including photographs, screenshots, and documents featuring both text and visuals.

Gradual Deployment for Safety

OpenAI remains committed to the responsible deployment of AI technology. To ensure safety and gradual integration, these voice and image capabilities will be introduced gradually to Plus and Enterprise users over the next two weeks. Voice capabilities will be available on both iOS and Android platforms, while image capabilities will be accessible on all platforms.

Voice Technology and Its Potential

Voice technology in ChatGPT can create realistic synthetic voices using just a few seconds of real speech data. This breakthrough opens up numerous creative and accessibility-focused applications. However, it also brings new challenges, such as the potential for malicious actors to impersonate public figures or commit fraud. To mitigate these risks, OpenAI is initially focusing on specific use cases, such as voice chat, and is working with trusted collaborators, like Spotify, to ensure responsible and secure deployment.

Vision-Based Features and Responsible Usage

The vision-based features in ChatGPT pose unique challenges, including the potential for misinterpretation of images in critical domains. Prior to broader deployment, rigorous testing and collaboration with red teamers and alpha testers helped identify key details for responsible usage. OpenAI's aim is to make vision features both useful and safe, guided by real-world feedback and partnerships with organizations like Be My Eyes, a mobile app for the visually impaired.

Transparency and Model Limitations

OpenAI is committed to transparency regarding the model's limitations. Users are advised against deploying ChatGPT for high-risk use cases without proper verification, especially in languages other than English, where the model's performance may be suboptimal. OpenAI encourages users to refer to the system card for image input for more information on safety measures.

Expanding Access

In the coming weeks, Plus and Enterprise users will have the opportunity to experience voice and image capabilities with ChatGPT. OpenAI is excited to extend access to these features to a wider audience, including developers, in the near future as they continue to refine and enhance this revolutionary technology.

Editor’s note: This article was written by Jonathan Gasca in collaboration with OpenAI’s GPT-3.5

Jonathan Gasca

AI Breakthrough: ChatGPT Can Now See, Hear, and Speak

Spotify Unveils AI-Powered Podcast Voice Translation Feature

Dior x Refik Anadol: A Fusion of Art and Luxury in L'Or de J'adore