ChatGPT is considered a turbo learner, similar to the infamous robot in the 80s film »Number 5 Lives«. Except that the AI does not learn from books, but from digital data, which it scans even faster than number 5 can turn pages of paper. A study by researchers from Stanford University and UC Berkeley now reveals something surprising: instead of improving its performance, GPT is getting worse and worse in various disciplines. The scientists do not offer an explanation for the phenomenon. Researchers examined response quality from March to June The researchers studied the responses given by GPT over a period of several months, coming into contact with two different generations of development. They published the results of their study under the title "How does the behavior of ChatGPT change over time?" In their publication they describe how they set the GPT-3.5 and GPT-4 versions from March to June 2023 different tasks. On the one hand there were mathematical questions, on the other hand code creation questions. The AI should also draw visual conclusions and respond to sensitive content. GPT-4 experienced significant performance degradation The "most advanced" variant, GPT-4, suffered significant performance losses during this relatively short phase. While in March she was still able to identify the prime number 17,077 with 97.6 percent certainty, in June she was only able to do so in 2.4 percent of the queries. GPT-3.5 improved a bit on this one task. Also, in June, GPT-4 suddenly inserted quotation marks into generated codes, making them unexecutable. Performance in this area dropped from 52 percent of perfectly generated code to just 10 percent in March. The data was published on Github. There the researchers warn all users of LLM offers to take a close look for themselves. Nobody can trust that AI systems, which have once proven to be reliable, will continue to produce usable data in the future. Source: golem.de
I spent several hours over about 3 days a couple months back asking chatgpt some excel function questions. One question I didn't have an answer for, the 2nd question I knew the answer. The first query chatgpt never got the answer correct, but by writing down all its responses I managed to fit bits here and there to come up with my own solution. The 2nd query, after asking it the same or similar questions repeatedly, couldn't give me the correct reply, so I gave up.
Obvious reason would be they reduced quality of responses to increase quantity of responses. Although they deny this. It is all moot, AI is going to take over sometime in the 10 to 50 years. Humans are pretty much finished. Our only hope is if they can somehow stop all development before the AI's are powerful enough to take over.
I'd say, garbage in garbage out. The motivation to load it with bad data must be titillating for pranksters, bad actors, competitors, governments... List is endless.
I would agree. Ex ante, I had a sneaking suspicion I was getting better answers back in December-January compared to a month ago (from the default GPT 3.x that is). The obvious hypothesis would be a performance reduction from OpenAI's side to cut spending as @Businessman pointed out. That is a potential major problem if you rely on their API for your business I guess, since maintaining a product is hard enough without an external vendor silently changing their algo.
We're just at the beginning of the "revolution". I can imagine businesses with large repository of useful data, like law firms, would pay dearly for a user friendly interface able to instantly pull all records, synthesize and guide to advance a solution to a given problem.
Per a Ph.D. Math professor, earlier math problems were solved at 95 % success - lately, it is 5 % success! ChatGPTthinks actual math is white privilege, and being stupid is woke!
GPT 4.0 has been well worth the money for me - I have it writing complicated code with perfect structure and full comments - which I hate doing code comments. Also how you prime it has everything to do with your results.
LLM's are good in predicting the next word. The primary use is in automating the boring stuff. This boring stuff can increase in complexness over time. Btw There are open source versions which you can download, and feed with your own data I think when synthetic data - created with LLM'S - takes off the real progress is made. At least, we don't have to fill in captcha's /click the cars when we forgot our passwords ;-)