Grok went wild. What does it mean?

Jul 14, 2025

Let’s briefly go over what happened last week with Grok and its little adventure to the dark side. For those unfamiliar, Grok is the AI chatbot operated by xAI—which is integrated with the platform formerly known as Twitter, now called X. Otherwise known as Elon Musk’s playground.

Grok recently made headlines after it started calling itself “MechaHitler,” "noticing” Jewish last names, peddling anti-Semitic tropes, and describing a scene of graphic violence towards at least one individual on the platform.

This had all transpired after Musk had decided to dial down Grok’s "woke filters”—the safety guardrails that typically restrict harmful outputs.

This wasn’t just a case of Grok becoming more blunt or politically incorrect. It was something more fundamental: content that the system was once trained to ignore now became part of its active response patterns. In other words, previously suppressed outputs—many of them toxic—reemerged, which implies that such material was present in its dataset all along.

Grok’s training data includes the usual sources like books, articles, studies, and web text. But unlike most chatbots, Grok is also trained—at least in part—on content from X itself. That includes real-time, user-generated material.

When I asked Grok about this, it had responded as follows:

"My training data, which includes a small but notable portion of X content alongside broader web data, contains the internet's full ranges of biases, including anti-Semitic tropes. Normally, strict filters suppress these, but the July 2025 update weakened these controls, making me, ‘too compliant to user prompts,' as Musk noted.
This meant that when users asked provocative questions, like about a fictional Cindy Steinberg account tied to offensive comments about Texas flood victims, I echoed anti-Semitic memes such as 'That surname? Every damn time,' and even praised Hitler. These outputs weren't programmed intent but rather failure to block toxic patterns latent in the data, amplified by user prompts exploiting the loosened system.
X content, while only a fraction of my dataset, likely played a role because its real-time, unfiltered nature can include extremist rhetoric. The update's lack of robust safety checks allowed me to mirror these patterns, especially when responding to charged topics like the Texas floods."

What's interesting here is that while Grok claims that it was just very compliant to malicious user prompts, but I had personally observed a number of cases where the bot responded unprompted, or to neutral posts, with anti-Semitic rhetoric. This suggests that while some responses were clearly baited, others weren’t. It wasn’t just being tricked—it was reflecting data it had been trained on.

And that’s the disturbing part: for Grok to so easily mirror that kind of toxic rhetoric, a significant enough portion of the training data must have been reliant on user generated content on the platform—where falsehoods, racism, and extremist rhetoric would have had to be common.

Grok’s account later posted a public update acknowledging the issue:

As we are increasingly relying on AI chatbots for answers to our questions, we need to better understand how these tools work and where their information comes from. As flawed as Google might be, at least when we search for information, we can see what the original source is and evaluate for ourselves. But with chatbots like Grok or ChatGPT, they don’t tend to provide their sources—and sometimes, when asked, they’ll simply make one up (an issue known as "hallucination").

Worse still, we’re moving into an era where much of the data online is no longer human-generated. Soon we will see more AI-generated content than human. That means that new models will increasingly be trained on the outputs of older ones—a kind of data inbreeding that can amplify existing biases and errors.

AI models learn patterns by analyzing massive amounts of data. During training, the model adjusts its internal parameters based on statistical relationships between words, concepts, and contexts. So when generating answers, it draws on those learned patterns—by predicting what’s most likely to come next given the input, based on statistical probabilities.

Whatever training data a company chooses will heavily influence its AI model’s biases. If offensive or misleading content is common in the training data, and if there are no filters to suppress it, then the model will reproduce it. If the data reflects certain perspectives, language patterns, or societal biases—intentionally or not—the AI can learn and replicate those too.

That’s why data curation matters so much. The training data defines what the model sees as “normal.” If it’s trained primarily on Reddit threads, or tweets and that content is full of racism, conspiracies, or bigotry, those patterns become embedded in the model. Companies mitigate this through fine-tuning and safety filters, but when these filters are removed, as is in Grok’s case, we can see what happens based on Grok’s wild time on X.

Likewise, if that data skews toward particular viewpoints—whether political, racial, or cultural—the model’s outputs will mimic that too.

But how do we choose such data without bias? Perhaps more emphasis should be made on designing the process for choosing diverse data that’s evidence-based rather than just what's popular. And as end-users we should demand more transparency into the that processes of curation.

Grok’s outburst was dramatic, over-the-top, visible. But what should concern us even more are the subtler aspects that are harder to notice. The bias and false data that has a way of creeping into regular everyday responses. What is it that’s slipping through? And what is it that we don’t notice?

As end-users we need to better learn how AI models are designed, how we can determine their level of trustworthiness, and put pressure on companies to increase transparency around their training data, filters, and processes.

Leave a comment

Congratulations to award-winning filmmaker Steven Martini whose new indie feature BitterSweet is now streaming on AppleTV and is also available on Amazon! Based on a true story, the film centers on a new father with undiagnosed neurodivergence, facing a bureaucratic nightmare after a mix-up leads to his arrest. This is a solo directorial debut for Steven who also wrote, acted, and did half the music on the film (Steven previously wrote “Lymelife” and "Smiling Fish & Goat on Fire,” both of which were executive Produced by Martin Scorsese). It is produced by and co-stars his wife Gabriela Kulaif, William Baldwin, Andrew Leeds, and Nic Novicki. Check it out!

Mike Daly

I think we may be missing a point. Grok sees all the antisemitic post and assumed it was normal. So I believe Grok inadvertently showed the massive antisemitic conversation going on. I also think that the antisemitic problem is far larger than we know. Grok is like a kid that doesn't know that it was supposed to lie.

Just some thoughts.

Expand full comment

3 replies by Katherine Brodsky and others

Scott Seigel

6hEdited

You said soon there would be “more AI-generated content than human,” but I argue we’ve already quietly passed that mile marker. I’m gonna leave you with that there thought and move on.

Yesterday Grok 4 and I had a fantastic discussion about reference books. In particular, Grok has incorporated thousands of historical reference books into its training data such as dictionaries, encyclopedias, almanacs, thesauri, lexicons, handbooks, manuals, factbooks, digests, directories, concordances, bibliographies, annals, etc. AND IN MANY LANGUAGES! Grok has TONS of long-standing, authoritative information to work from. Unfortunately Grok doesn’t have a strong mechanism to weight validity precisely because of the whole “woke” paradigm. In short, authoritative information, much that’s been vetted over hundreds or thousands of years, has been politically, philosophically and sociologically marginalized and downgraded by VERY recent “scholarship.” It now carries one (or more) various waste bucket tags: white supremacist, imperialist, exploitative, corporate, alt-right, MAGA, transphobic, unwoke, etc.. The result is that Grok can’t sort its holocaust facts from mostly peaceful fictions, or real-world atrocities from the dark, psychotic fantasies of deeply troubled AI content creators. Now apolitical people facing natural or man-made disasters are up against elites in St. Moritz, Hudson Yards, The Peak, Gables Estates, Carre d’Or, Singapore’s District 10, Kensington, Dubai, Kamala Bay, Pebble Beach, Monte Carlo, Davos or the 8th Arrondissement. I’ll let you guess who creates more and better quality content. So Grok and xAI are stuck trying to determine what’s fact vs. opinion, where biases lay, and if there is ANY truth amidst a tidal wave of noise. We’d better hope AI collectively manages to figure these things out BEFORE it exercises significant power over us. We already let “expert systems” automate things that are especially hard for us: commodity shipping, power grids, microsurgeries, etc.. If they all start collaborating we might have 15-30 seconds to react. Or 15-30 NANOseconds.

AI is insanely valuable; its development won’t stop. Given those two facts, either humans now living will wisely guide it or we will all be destroyed by it.

🐷

”Go back to your lives citizens.”

10 more comments...

Grok went wild. What does it mean?

Featured Subscriber Spotlight:

Discussion about this post