Accéder au contenu principal

In Cringe Video, OpenAI CTO Says She Doesn’t Know Where Sora’s Training Data Came From

Wondering what data OpenAI used to train its buzzy new text-to-video AI? The company’s CTO is similarly unsure.

Mira Murati, OpenAI’s longtime chief technology officer, sat down with The Wall Street Journal’s Joanna Stern this week to discuss Sora, the company’s forthcoming video-generating AI. About halfway through the 10-minute-long interview, Stern straightforwardly asked Murati where the new model’s training data was gleaned from. But Murati, in the most cringe-inducing way possible, couldn’t find an answer beyond vague corporate language.

“We used publicly available data and licensed data,” Murati responded to the resoundingly simple question.

Stern pushed back with more specific source examples: “So, videos on YouTube?”

“I’m actually not sure about that,” said Murati, before rebuffing further queries about whether videos shared to Instagram or Facebook were fed into model.

“You know, if they were publicly available — publicly available to use,” the CTO answered, “but I’m not sure. I’m not confident about it.”

Stern then inquired about OpenAI’s data training partnership with the stock image company Shutterstock, asking if videos on the partnered platform were sucked into Sora’s training material. And this time? Murati decided to shut down the line of questioning altogether.

“I’m just not going to go into detail about the data that was used,” Murati continued. “But it was publicly available or licensed data.”

So, in sum, Murati can’t tell you exactly where the videos gobbled up by Sora first came from. But rest assured, the sourceless data was definitely, one hundred percent publicly available or licensed. Convincing stuff!

It’s a bad look all around for OpenAI, which has drawn wide controversy — not to mention multiple copyright lawsuits, including one from The New York Times — for its data-scraping practices. After all, if the company’s CTO can’t firmly tell you where its buzziest new model’s training data was sourced from, it doesn’t exactly communicate a particular amount of care for the issue from OpenAI’s higher-ups.

After the interview, Murati reportedly confirmed to the WSJ that Shutterstock videos were indeed included in Sora’s training set. But when you consider the vastness of video content across the web, any clips available to OpenAI through Shutterstock are likely only a small drop in the Sora training data pond.

Online, reactions to the clip were mixed, with many chalking Murati’s close-lipped responses up to a possible lack of candidness.

“So when *the CTO* of OpenAI is asked if Sora was trained on YouTube videos, she says ‘actually I’m not sure’ and refuses to discuss all further questions about the training data,” former LA Times tech columnist Brian Merchant wrote in an X-formerly-Twitter post. “Either a rather stunning level of ignorance of her own product, or a lie — pretty damning either way!”

“You’re the CTO ma’am,” added another netizen, “you should know.”

Others, meanwhile, jumped to Murati’s defense, arguing that if you’ve ever published anything to the internet, you should be perfectly fine with AI companies gobbling it up.

“Why does it matter? That is the question,” said one X user. “I find it insane that people make things public to everyone in the world and then complain when someone uses that public thing. If you want to be private, then be private.”

That latter argument, though, speaks to the bizarre new reality that internet users have now found themselves in. Historically, when someone told you to be careful of what you post online, the reasoning was something akin to “you might regret that later” — and not “a multibillion-dollar AI company might turn a profit by vacuuming that Facebook video of you and your family, or a goofy YouTube video you made with your friends, into a generative AI model.”

Whether Murati was keeping things close to the vest to avoid more copyright litigation or simply just didn’t know the answer, people have good reason to wonder where AI data — be it “publicly available and licensed” or not — is coming from. And moving forward, vague corporate mumbling probably isn’t going to cut it.

More on OpenAI and its data: OpenAI Says It’s Fine to Vacuum Up Everyone’s Content and Charge for It Without Paying Them




Source link

The post In Cringe Video, OpenAI CTO Says She Doesn’t Know Where Sora’s Training Data Came From appeared first on Job From Home Blog.

Commentaires

Posts les plus consultés de ce blog

Hong Kong SFC Raises Alert Over MEXC Imposter

The Hong Kong Securities and Futures Commission (SFC) has issued an alert on a suspected fraudulent platform impersonating the presence and operations of a popular cryptocurrency exchange MEXC. This development comes from a joint operation between the Hong Kong regulator and the local police force aimed at uncovering illicit activities of virtual asset trading platforms (VATPs). SFC Warns Hong Kong Citizens Of Fraudulent Trading Platform According to an announcement on February 9, the SFC is warning crypto enthusiasts and investors of an ambiguous trading platform that operates under the name “MEXC.” In a joint investigation with the Hong Kong police, the securities market regulator discovered that “MEXC” has lured several investors into becoming members of group chats in which it claimed to offer “free investment advice.” Through this medium, “MEXC” was able to direct unsuspecting investors into depositing funds through fraudulent websites in order to purchase cryptocurrencies. Th...

What’s In Store For Bitcoin With 85% Of Holders In Profit

Amidst a renewed wave of optimism sweeping through the broader cryptocurrency landscape, the resurgence of Bitcoin (BTC) to the pivotal $37,500 price threshold has become a catalyst for positive shifts. At present, a staggering 85% of Bitcoin holders find themselves in a profitable position, a testament to the resilience and potential of the leading cryptocurrency. Encouragingly, key indicators hint at the likelihood of this percentage expanding in tandem with Bitcoin’s upward trajectory. Making Money With Bitcoin As the market plummeted from its all-time high in November 2021, the amount of Bitcoin supply in profit has reportedly hit levels last observed two years ago, according to Glassnode. The analytics service also stated that the amount of unrealized profit contained in these currencies is still very small. Based on the latest figures from blockchain analytics website IntoTheBlock, some 85% of Bitcoin holders are profitable at the current price of the first cryptocurrency,...

Instagram Is Experimenting With an Option To Add Files to DMs

What if you could send files in your IG DMs? That may soon be an option, with app researcher Alessandro Paluzzi uncovering this process in the back-end data of the app. As you can see in this example, Instagram’s experimenting with a new option that would let you add files to your messages as attachments. Which is not overly surprising. You can already send documents within WhatsApp , and on Messenger (in limited capacity), and with Meta still working to integrate all of its messaging platforms into a single system , it makes sense that it would also need to replicate the functionality of each, to ensure full parity. So, basically, every function within Messenger, IG Direct and/or WhatsApp will eventually be available in each other app, as that will then enable Meta to link them all together into a singular messaging system. As such, I do think this is coming, and is more than just an experiment, which will provide expanded functionality within your IG DMs, and could be handy...