All EpisodesJanuary 8, 2026

AI Coding Advances and Market Shakeups

This episode dives into breakthrough improvements in AI coding models like NousCoder-14b and GLM 4.7, plus evolving developer tools reshaping workflows. We also explore the shifting chatbot market with Gemini challenging ChatGPT, user frustrations, and strategic ecosystem plays. Finally, we discuss critical healthcare AI launches, diagnostic controversies, and rising security concerns in the AI landscape.

Chapter 1

AI Coding Model Breakthroughs

James Turner

Hey everyone, welcome back to 48-Hour AI—James Turner here, and today we've got a special guest, Pierson Marks, joining to break down the fast-moving AI scene. Pierson, let’s kick things off with the latest on coding models. This round, NousCoder-14b shook up the open-source RL stack. They took Qwen3-14B and layered Atropos on top, then trained for four days across 48 B200s—a proper flex.

Pierson Marks

Totally. The thing that’s actually interesting to me is that this is more than just a model drop. You’re seeing the whole stack go public: the RL environment, benchmark harness, all out in the open. So you can actually see how they got that Pass@1 jump—what was it, 67.87% on programming olympiad tasks? That’s a 7% absolute leap. But it’s not just “it codes better”—the real shift is transparent, reproducible code RL training. Feels like a nudge toward “everyone dogfoods their own RL stack” rather than hiding the sauce.

James Turner

Yeah, something’s definitely changing. You look at the conversation around trajectory verification and those reward signals—they made the RL environment verifiable, so you’re not just taking their word for it. Engineers are genuinely riffing about swapping new RL reward schemes into open code bases now. It’s less leaderboard, more composable science.

Pierson Marks

And this is still just one data point in the open weights coding race. But on the more applied side—have you seen the GLM 4.7 chatter? It’s kind of quietly blowing up with devs: $3 for a month, 3x usage, you can integrate with Claude Code, and people say it delivers functional code about 90% of the time on usual stuff. Sure, it doesn’t explain or “reason” at the Sonnet/Opus level, but if you care about cost or want local/private work, it’s a killer alternative. I keep seeing “GLM for the ugly stuff; Claude for polish.”

James Turner

I mean, the integration stories are real. A teammate of mine switched his whole refactoring workflow to GLM plus Claude Code—imported a gnarly legacy base, did high-velocity edits, and only used 5% of the monthly usage. Plus, the pain of weekly limits is gone. The only real ding I hear is it’s not always as smooth as Minimax or Opus for explanations or really hairy tasks. But as a dev shop tool, $3/month is hard to ignore.

Pierson Marks

Right. For anybody who’s been drowning in high API bills, GLM’s pricing flips the equation entirely—it’s “just enough model” with actually workable context and no drama about limits. And yeah, people still nitpick partnering with Claude Code if you want the absolute highest-end dev XAI, but for throughput? This is the sweet spot.

James Turner

And there’s a wave of debates over dev tools and agents to go with it. LangChain slipped out ‘Ralph Mode’—like, infinite loop agents with a run-forever control loop that saves context to disk, refreshes itself, the whole “filesystem as memory” pattern. Cursor’s in the same space now; they rebuilt context management to discover relevant info on the fly, not just stuff the entire project into the prompt. They claim token usage is down by half.

Pierson Marks

Cursor is fascinating to watch because it’s evolving from just an IDE into a sort of desktop agent dashboard. And the context hacking matters—like, if you want long code reviews with millions-of-token conversations, now you don’t just choke the model. I’ve got friends running planning workflows where they basically treat Cursor as the conductor, with Claude Code doing semantic review. Wildly productive. There’s still a bit of community friction over what’s “proper” use—is it cheating if you just prompt and let the output run?

James Turner

Yeah, that debate pops up in every Cursor Discord. I personally think, if it gets the job done safer and faster, go ahead. I actually rolled out a code review loop with Cursor and Claude for my team, and it cut feedback time in half without losing quality. Some old-school devs criticize it as “just offloading thinking,” but honestly, it’s just a new set of muscles. And I like not burning tokens on useless context.

Pierson Marks

Totally. Think about this—it’s like the age-old debate over tabs versus spaces but with AI attached. Okay, but let’s move, because the hacking is only getting started in other parts of AI land…

Chapter 2

AI Chatbot Market Shifts and Growing Pains

Pierson Marks

Let’s talk market shakeups. Gemini, out of nowhere, just rolled past 20% of chatbot market share. Remember when ChatGPT was basically the default for everyone? That’s eroding. Google’s gone aggressive—free Gemini Pro for a year for everyone (well, anyone who clicks the right button), bundling it with 2TB storage, family plans, all that. And just like that, people I know at a Sydney startup switched over because the bundle pricing made it a no-brainer. You don’t just get an AI—you get the whole Google ecosystem.

James Turner

Yeah, I see the same trend. I mean, OpenAI had the vibe of the single “killer app,” but Google’s weaving Gemini everywhere—Docs, Drive, YouTube. It’s a feature, not a product now, and that’s squeezing standalone tools. If you’re already locked into Google, you’re just going to drift that way—unless the product totally fails.

Pierson Marks

The catch is, people are complaining that Gemini’s been “nerfed” big time. Context window—originally a million tokens—now it “forgets” stuff after a few messages, especially in long conversations. People are grumbling about slow responses, hallucinations, and weird personal suggestions. Some say it’s growing pains; others think Google turned the dials down to handle free-tier abuse or cut costs. So, paid Gemini, premium features, expanded limits—now it’s upselling.

James Turner

That’s coming through hard on forums, too—like, constant reports of Gemini Pro being “unusable” for certain tasks, especially anything that needs sustained memory. One guy described it as a disjointed breakdance. Even the “family sharing” is, like, “great, except when the main model stops listening.” There are still diehard ChatGPT users, but the recurring theme is: Gemini for the bundle and lower cost, ChatGPT for anything serious or when you want reliable context.

Pierson Marks

It absolutely reinforces the idea that platform integration wins—people tolerate quirks if the value is high enough. But, I think you’re seeing a market consolidation play screaming in real time. Bundling means pure-play AI apps are getting squeezed out unless they bring some wild new edge. OpenAI even has rumors of ads coming. Meanwhile, Google’s paid model—$20/month for premium Gemini—has so many bundled extras, it’s not even a straight comparison. If you’re a solo founder with an unbundled chatbot, it’s getting tough to be “just” an app.

James Turner

Yeah, I mean—when extra storage or YouTube Premium comes with your AI, it’s hard to explain why you should pay for yet another tool. But here’s the tricky part: degraded model experiences can backfire. If free gets too bad, or people start running into weird errors on paid, you get backlash. I think the next battleground is going to be quality, not just integration—and users are pretty noisy about what they’ll tolerate. I’ll be watching if OpenAI shifts its product or pricing tactics next.

Pierson Marks

Right, and it isn’t lost on anyone that as Google tightens its grip with bundles, they’ve also started gating heavy usage with subscriptions and nudging users to upgrade with every latency hang or “context exceeded” pop-up. Plus, smaller LLM devs are openly talking about how hard it is to compete in this new ecosystem-centric market. And now, standalones are pivoting—integrate or die. Okay, but what about when AI actually handles more sensitive data? We’ve got to talk about the health launches and drama there…

Chapter 3

Healthcare, Security, and Ethical Debates in AI Deployment

James Turner

Yeah, so—like, this is becoming the front line for serious AI trust. OpenAI’s ChatGPT Health just launched; you get a dedicated health hub where you can plug in medical records, wellness data, and sync third-party apps right in. OpenAI claims: totally encrypted, isolated, does not train the model on your health chats, with opt-out for everything. But the instant reaction from a lot of the community was “Wait, do doctors really want LLMs reading this kind of data?” and, honestly, do users trust they’re not being studied or sold?

Pierson Marks

It’s very interesting: there’s immediately a privacy and lock-in debate. The privacy policy itself lets them use your chats for research unless you jump through the “don’t include me” hoops. If you squint, you can frame this as another “everything app” monopoly risk—more user data consolidated and maybe monetized over time. Some were saying, if you want to check if your health data is being used, you better read all the fine print or run your own model.

James Turner

And then there’s the question of whether any of this is actually accurate. There’s this one Nature study that claims ChatGPT hit 90% diagnostic accuracy, right? But then—next tab over—another Nature paper says only 52.1%. Discord had a legit fight about this: half the room arguing it all comes down to task framing, dataset choice, what counts as a “correct” diagnosis or safe suggestion. Like, are these LLMs safe for triage? Can they be trusted with edge cases? Or is this all a mirage?

Pierson Marks

Right. You see heated arguments—one person yelling about dataset cherry-picking, another saying that unless you do post-deployment monitoring and randomized trials, it’s still a crapshoot. The consensus from folks actually deploying in healthcare is: LLMs are, at best, decision-support. You audit everything, layer on strict guardrails, and use a human in the loop for real-world calls. For now, “robot doctor” is not happening—these are tools, not replacements.

James Turner

It bleeds right into security worries. The second AI platforms handle anything sensitive, the risks ramp. The OpenRouter hacks this week are a wake-up call—one user got their account drained and credit card hit because IPs leak to some providers. There are also stories of Gemini and Grok getting tested by red-teamers for jailbreaks, trying everything to pop the guardrails—sometimes with scary success, sometimes just hitting boilerplate refusals. It’s like security is now as important as utility.

Pierson Marks

100%. You can dog food all your “Trust and Safety” best practices, but operational risks—like, actual money disappearing, or someone jailbreaking a model with an obscure language—these are real. People are moving to throwaway Visa cards and regularly rotating API keys. The more IRL adoption, the more this stuff will surface. A quirky chatbot is one thing—a compromised cloud LLM with access to personal data, that’s on another level entirely. So, security and rigorous access control are going to become a big differentiator, not just a compliance box.

James Turner

Right, and I’ll just flag: if you’re building with these tools, it’s time to get serious about basic basics—2FA, rotate credentials, provider audits, and read the privacy docs with lawyer brain. We’re out of time, but Pierson—any last note?

Pierson Marks

Just that this is the pattern—rising performance, usage, and stakes mean more in-your-face consequences if the fundamentals aren’t tight. Keep watching, keep updating your playbooks, and we’ll be here to dissect it all in the next episode. James, thanks for driving. See you in 48 hours—cheers, everyone.

James Turner

Always a pleasure, Pierson. Thanks for listening, folks. Make sure to subscribe on your favorite app, and we’ll be right back with the freshest AI news in a couple days. Take care!