OpenAI, in which Microsoft has made heavy investments, allegedly trained its artificial intelligence (AI) models by scraping data from Google-owned video sharing platform, YouTube.
In report published by The Information, it’s been claimed that OpenAI “has secretly used data from the site (YouTube) to train some of its artificial intelligence models”.
YouTube is one of the biggest and richest source of imagery, audio and text transcripts on the Internet.
Naturally, “the value of YouTube hasn’t been lost on OpenAI”.
Google researchers have been using YouTube to develop its next large-language model dubbed Gemini.
YouTube’s terms of service ban using the content it hosts for anything other than “personal, non-commercial use.”
However, it’s an open secret in the AI industry that all competitors are scraping the web to train their models.
OpenAI reportedly “scraped” YouTube data to train its AI models, as per “one person with direct knowledge of the effort”.
OpenAI did not immediately comment on the report.
OpenAI has just released the new versions of its text-generating AI models GPT-3.5-turbo and GPT-4, with a capability called function calling.
With the function calling capability, developers can create chatbots that answer questions by calling external tools (like ChatGPT Plugins).
Last month, Google upgraded its Bard chatbot with a new machine learning model that can better understand conversational language and compete with OpenAI’s ChatGPT.
The tech giant has introduced new improvements to Bard, including better logic and reasoning skills.
Bard now uses a technique called “implicit code execution” to recognize computational prompts and run code in the background, the tech giant said in a blog post this week.
As a result, it can respond to string manipulation, coding questions and mathematical operations more correctly.