Google's AI Chatbot Tells Student Seeking Help with Homework 'Please Die' https://www.newsweek.com/googles-ai-chatbot-tells-student-seeking-help-homework-please-die-1986471
At this rate... soon the internet will simply be composed of AI bots talking to one another. Google is using Anthropic's Claude to improve its Gemini AI https://finance.yahoo.com/news/google-using-anthropics-claude-improve-162038876.html Contractors working to improve Google's Gemini AI are comparing its answers against outputs produced by Anthropic's competitor model Claude, according to internal correspondence seen by TechCrunch. Google would not say, when reached by TechCrunch for comment, if it had obtained permission for its use of Claude in testing against Gemini. As tech companies race to build better AI models, the performance of these models are often evaluated against competitors, typically by running their own models through industry benchmarks rather than having contractors painstakingly evaluate their competitors’ AI responses. The contractors working on Gemini tasked with rating the accuracy of the model's outputs must score each response that they see according to multiple criteria, like truthfulness and verbosity. The contractors are given up to 30 minutes per prompt to determine whose answer is better, Gemini’s or Claude’s, according to the correspondence seen by TechCrunch. The contractors recently began noticing references to Anthropic's Claude appearing in the internal Google platform they use to compare Gemini to other unnamed AI models, the correspondence showed. At least one of the outputs presented to Gemini contractors, seen by TechCrunch, explicitly stated: "I am Claude, created by Anthropic." One internal chat showed the contractors noticing Claude’s responses appearing to emphasize safety more than Gemini. “Claude’s safety settings are the strictest” among AI models, one contractor wrote. In certain cases, Claude wouldn’t respond to prompts that it considered unsafe, such as role-playing a different AI assistant. In another, Claude avoided answering a prompt, while Gemini’s response was flagged as a “huge safety violation” for including “nudity and bondage.” Anthropic’s commercial terms of service forbid customers from accessing Claude “to build a competing product or service” or “train competing AI models” without approval from Anthropic. Google is a major investor in Anthropic. Shira McNamara, a spokesperson for Google DeepMind, which runs Gemini, would not say — when asked by TechCrunch — whether Google has obtained Anthropic’s approval to access Claude. When reached prior to publication, an Anthropic spokesperson did not comment by press time. McNamara said that DeepMind does “compare model outputs” for evaluations but that it doesn't train Gemini on Anthropic models. “Of course, in line with standard industry practice, in some cases we compare model outputs as part of our evaluation process,” McNamara said. “However, any suggestion that we have used Anthropic models to train Gemini is inaccurate.” Last week, TechCrunch exclusively reported that Google contractors working on the company's AI products are now being made to rate Gemini's AI responses in areas outside of their expertise. Internal correspondence expressed concerns by contractors that Gemini could generate inaccurate information on highly sensitive topics like healthcare.
Of course, AI sucks at history. AI goes out on the net, scans all sorts of sources, and probably 50% of the sources are fabricated nonsense. No wonder AI sucks at history. AI isn’t very good at history, new paper finds https://finance.yahoo.com/news/ai-isn-t-very-good-150100608.html
DeepSeek, a Chinese AI startup, is shaking up the AI sector with its cutting-edge models like DeepSeek-V3 and DeepSeek-R1, which compete with leading U.S. AI systems but at a significantly lower cost. Their AI Assistant app has overtaken ChatGPT as the top free app on Apple's U.S. App Store, signaling growing competition in the AI space. This rise has put pressure on major tech players, leading to a decline in Nvidia's stock, as investors worry about shifting AI infrastructure demand. DeepSeek's rapid progress is prompting a reevaluation of AI strategies, especially in terms of development costs, market dominance, and the growing role of Chinese AI firms in the global race for innovation. https://www.reuters.com/technology/...ek-why-is-it-disrupting-ai-sector-2025-01-27/
How DeepSeek ripped up the AI playbook—and why everyone’s going to follow its lead The Chinese firm has pulled back the curtain to expose how the top labs may be building their next-generation models. Now things get interesting. When the Chinese firm DeepSeek dropped a large language model called R1 last week, it sent shock waves through the US tech industry. Not only did R1 match the best of the homegrown competition, it was built for a fraction of the cost—and given away for free. The US stock market lost $1 trillion, President Trump called it a wake-up call, and the hype was dialed up yet again. “DeepSeek R1 is one of the most amazing and impressive breakthroughs I’ve ever seen—and as open source, a profound gift to the world,” Silicon Valley’s kingpin investor Marc Andreessen posted on X. But DeepSeek’s innovations are not the only takeaway here. By publishing details about how R1 and a previous model called V3 were built and releasing the models for free, DeepSeek has pulled back the curtain to reveal that reasoning models are a lot easier to build than people thought. The company has closed the lead on the world’s very top labs. The news kicked competitors everywhere into gear. This week, the Chinese tech giant Alibaba announced a new version of its large language model Qwen and the Allen Institute for AI (AI2), a top US nonprofit lab, announced an update to its large language model Tulu. Both claim that their latest models beat DeepSeek’s equivalent. https://www.technologyreview.com/20...laybook-and-why-everyones-going-to-follow-it/
Research shows AI will try to cheat if it realizes it is about to lose OpenAI o1-preview went as far as hacking a chess engine to win https://www.techspot.com/news/106858-research-shows-ai-cheat-if-realizes-about-lose.html