OpenAI developers have released a global update for the ChatGPT chatbot, which has learned to “see, hear and speak.” The update marks a major step in the development of artificial intelligence, which can perceive and process information in multiple formats, not just text.
“We are starting to implement voice and graphics capabilities in ChatGPT. They offer a new, more intuitive type of interface, allowing you to have a conversation with the neural network or show it the subject of the conversation,” explained OpenAI.
Conversations with AI
The updated chatbot can hear and recognize user speech. Any request to AI can be made using voice, which is more reminiscent of virtual assistants like Siri from Apple.
To start working with speech functions, you need to enable them in the application settings. ChatGPT offers a choice of five different voices – “juniper”, “bay”, “sky”, “breeze” and “coal”. Professional actors participated in their recording.
For speech recognition, the neural network uses the Whisper open source system.
“The new technology, capable of creating realistic synthetic voices in just a few seconds of listening to real speech, opens the door to many creative applications focused on accessibility,” OpenAI noted.
The new feature is already being used by the streaming service Spotify to translate podcasts into other languages, preserving the original voice of the presenter.
Show and tell
Users can also send ChatGPT various images in addition to regular requests. The Vision or GPT-V function helps the neural network provide more accurate answers.
As an example, the developers cited a situation where something needs to be fixed. The area of the breakdown can be outlined with drawing tools to make the task easier for the chatbot.
Image analysis is provided by multimodal GPT-3.5 and GPT-4. These models apply their language-thinking skills to a wide range of attachments, from screenshots and diagrams to regular photos.
“Vision is designed to help you in your everyday life. The neural network does this best when it sees what you see. The approach is based directly on our work with Be My Eyes, a free mobile app for blind and visually impaired people, to understand its usage and limitations,” a company spokesperson explained.
New opportunities – new risks
OpenAI’s main goal is to create safe and useful artificial general intelligence (AGI). However, the issue of user protection has become more relevant with the advent of new functions.
According to the developers, the ability to transform voice opens up new opportunities for fraudsters. For example, criminals can create deepfakes that imitate famous personalities.
Visual patterns also create problems, from misinterpreting images to making offensive judgments about people in photos. Before launching the utility, OpenAI tested it on the “red team” for extremism and inaccurate scientific statements.
“We have also taken technical measures to significantly limit the neural network’s ability to analyze and make direct statements about people, since ChatGPT is not always accurate and these systems must respect privacy,” OpenAI said.
Let’s remember that in July the developers released a new plugin for a chatbot that can analyze data, create code in Python, build graphs and solve mathematical problems. The neural network managed to scientifically refute the “flat Earth” theory.
In August, OpenAI launched ChatGPT Enterprise, a faster, safer and more powerful version of the chatbot for enterprise clients.
Found an error in the text? Select it and press CTRL+ENTER
ForkLog newsletters: keep your finger on the pulse of the Bitcoin industry!