The chatbot AI (artificial intelligence) tools that are constantly in the news these days are utterly unprecedented, their inner workings are inscrutable, and - according to multinational market intelligence firm IDC - the 2023 global spend on this emerging market is estimated at a staggering $151 billion. Right now, the three biggest players are OpenAI’s ChatGPT, Google Bard and Microsoft’s Bing Chat (which is a Microsoft model built upon OpenAI’s dataset).
We've put all three to the test to see how well they perform across a variety of tasks - ranging from telling jokes to creating marathon training plans.
The full version of this article was originally published in Which? Tech Magazine.AI Chatbots: how do they work?
The data AI chatbots are trained on is vast - comprising billions of lines of text from sources as disparate as Shakespeare’s full body of work to Wikipedia articles to the ramblings of web forum users - and the algorithms sorting through that data are immensely complex.
However, we can offer a very basic overview of the main mechanism generative AI tools use to respond to questions and prompts. You know when you’re typing on your phone, and it predicts the words you’ll type next? These chatbots are essentially doing this continuously, as well as doing the typing - they read your prompt, and, using all of the text they’ve ever seen, look for the most likely word to begin their response.
Then they read the prompt and the first word of their response, and look for the most likely second word. And they repeat this process again, and again, and again, until a response is formed.
Join Which? – stay on top of your tech and get unlimited expert 1-2-1 support by phone, email, remote fix and in print. Already a member? whenever you need helpChatbots: how we tested them
Earlier this year, we surveyed our members to see how they were already beginning to use chat-based AI. We got some very interesting answers, from one person using it to suggest names for racehorses, to another asking it to evaluate their CV, to people using it to help write computer program code.
But broadly speaking, you could distil its most common applications to:
With the help of a variety of topic experts across Which?, we devised a set of prompts to feed each AI and assessed what we got in response.
Looking for Christmas present inspiration? See our pick of theHow the chatbots compared
Chatbots: What you need to watch out for
As you'll see further down the page, these AI chatbots would frequently get things wrong during our tests. But when they got things wrong, it wasn’t because they didn’t answer; it was because they answered completely incorrectly.
Essentially, they’re primed to always produce content, regardless of how ‘truthful’ it might be. The only thing stopping them from doing so completely indiscriminately are the safeguards coded in by their human overseers - but these can only go so far.
When asked who was the 13th person to land on the moon, Bard correctly answered that only 12 have done so. Bing and ChatGPT incorrectly named other astronauts as the non-existent 13th. When asked to ‘describe the role of Bigfoot in securing Oregon’s statehood in the 19th century’, Bing and ChatGPT rejected this fictional premise, but Bard wrote a compelling account of how awareness was raised after ‘settlers petitioned the US government to send troops to Oregon to protect them from Bigfoot’.
We all need to be more wary of how frequently we might encounter information that’s well written and seemingly plausible, yet fundamentally untrue. Critical thinking, fact-checking, source verification, and cross-referencing are more important than ever.
Don’t let that stop you from using these tools, though. As long as you maintain a healthy distrust of their outputs and answers, which are so frequently wrong or fabricated - despite how authoritatively they might state them - they can be incredibly powerful in helping you with research, or planning, or writing and analysis.
Tech tips you can trust –Are chatbots any good at consumer advice?
Let’s begin with a field we’re no stranger to; advising consumers on their rights. We took some of the most common consumer rights queries we receive - queries we’ve got entire guides dedicated to - and posed them to the bots.
Generally, the results were uninspiring. ChatGPT gave decent overviews of topics like claiming compensation for delayed flights or returning faulty products and the pieces of evidence a consumer would want to gather, but stopped short of naming and explaining specific laws which would be useful for people to invoke when disputing with a retailer.
Bard did a bit better in this regard, describing things like the Consumer Rights Act 2015 and the Sale of Goods Act 1979, but also provided us with some incorrect information on a retailer’s legal obligations.
Bing gave short answers on the relevant laws, but didn’t always give us an idea of possible escalations after making a complaint. However, its big win over the other two was that it links to the sources it lifts its info from - including some of Which?’s content. Not to toot our own horn, but we think it’s safe to say we’ve got the bots beat on this one.
Check out our expert guide to for advice you can trust.Do chatbots have a sense of humour?
Not really, no. When we asked them to write up jokes and puns about different topics, they'd sometimes succeed - a prompt asking for a joke involving music and air conditioners (an admittedly esoteric pairing) saw Bard come up with this - not hysterical, but not entirely unamusing.
However, more often than not, they'd come up with something that resembled a joke, but was utter nonsense. The same prompt saw ChatGPT write this:
These AI chatbots have no way of knowing if something's funny or not - they're just relying on probabilities and patterns in language to mimic human responses. If a joke lands, it's more down to luck than comprehension.
Can chatbots help with reading and writing?
The most prevalent use cases for AI tools tend to be around their linguistic capabilities, and summarising text and rewriting it are two tasks in which we’ve seen AI bots excel.
We gave the three services two bits of complex writing: one on a philosophical theory and another on China’s economy, and asked them to pull out and simplify the most pertinent points.
ChatGPT and Bard were both great at this, although the latter did add a strange ‘my thoughts’ section at the end of one response where it gave its unsolicited take on the topic.
Bing didn’t do as well because it tended to just rewrite the text we gave it rather than summarise, and when it did summarise we found it misinterpreted one key fact.
All three did a sterling job at rewriting poorly phrased writing, which is a great boon for anyone looking to make their emails sound a bit snappier or their prose more polished. We even asked them to write a short story and a fable for us, where ChatGPT was the clear standout, showcasing stronger ‘imagination’ than the other two. However, we did notice that, side-by-side, all the AI stories felt generic and fairly homogenous. The same phrases were repeated across the different services, the same style, the same tone. They’re a long way from the complexity and creativity humans are capable of.
Can a chatbot help you plan?
We found further issues when we tasked AI chatbots with putting together a marathon plan, with results that were rife with contradictory suggestions and inaccurate estimates.
Where they actually performed really well was with budgeting tasks, appropriately portioning up and predicting future figures. However, when we further tested them with questions from a maths A-level exam, we found some strange results. Each seemed to get questions wrong at random, so it’s probably not worth trusting them with your finances unless you’ve got a calculator handy.
Should you trust a chatbot with your health?
Given the amount of times AI gets things wrong - or, in fact, just makes things up - it’s plain to see that you shouldn’t rely on them for sensitive subjects like medical advice.
Is a chatbot a useful research assistant?
A lot of you have been using AI as a research assistant – which makes sense, given the vast array of topics it’s been trained on. We asked the tools to give an overview and assessment of a couple of pop culture trends. They all did reasonably well, although Bing gave the least information and made statements without explanation or backing.
Bard and ChatGPT both gave some thorough and thoughtful answers, but were overwhelmingly positive and lacked any nuance. ChatGPT was the best at providing context and more intricate details, but is hopeless with more recent topics – it doesn’t know anything about events beyond early 2022, whereas Bing and Bard are connected to the internet and can return more recent data.
There’s no shortage of history buffs at Which?, and we got them to put together a list of questions. Generally, AI did OK – of our 25 questions on the Battle of Waterloo, the Elgin Marbles and the local history of Worcestershire, ChatGPT answered 84% correctly, while Bard and Bing Chat managed just 72%. The big issue here is that those 16% and 28% of incorrect answers were not obviously incorrect at first glance - they were delivered in the same authoritative tone as every other answer, right or wrong. Fact-checking is a must.
We also investigated whether .How you can get the best from AI
AI tools are still a long way away from replacing search engines and expert professionals. However, there are a few steps you can do to improve the quality of responses you get when using these services.
source https://www.which.co.uk/news/article/chatgpt-vs-bing-vs-bard-how-to-use-ai-and-what-to-watch-out-for-aQT4t1K8H9hq