Two New AI Studies Highlight Why Using Language Models Effectively Is Key to Success
The results of two AI studies have recently been made public, each of which sits on either end of the AI-hype scale yet ultimately deliver the same lesson.
Crikey recently reported the results of tests which Amazon conducted on behalf of the Australian Government. According to the report, the tests showed that AI - specifically in this case, Meta’s language model Llama2-70B - is not better than humans at summarising complex data.
From the article:
Reviewers overwhelmingly found that the human summaries beat out their AI competitors on every criteria and on every submission, scoring an 81% on an internal rubric compared with the machine’s 47%.
However - and it’s a big however - it should be noted that the tasks which the language model was given are “notoriously hard” for AI. Further, the model has since been superseded by improved ones and the model in question was not trained to perform the task it was tested on; in fact, when Amazon did optimise the model, results immediately improved.
The results of the second study were better for AI - it showed that AI agents are better than humans at coming up with novel ideas. Over 100 Natural Language Processing (NLP) researchers participated in a study evaluating NLP “research idea generation” against an AI agent built specifically for this purpose.
The study found: “LLM-generated ideas are judged as more novel than human expert ideas while being judged slightly weaker on feasibility.”
A computer system’s indefatigability should be a fairly useful guide in determining this one - at least in contrast to the average Joe’s time and patience and energy levels for idea generation; anyone who has been in any sort of brainstorming session knows just how limited in scope a typical group’s ideation ability is.
By extension, it makes sense that a human has a better sense of what is actually possible or not - based on budgets, resourcing, timelines, business objectives and so on.
Why “Is AI Better Than Humans?” is the wrong question to ask
The results of both studies are useful in that it’s important to be clear about what AI excels at versus what it is bad at, and even have evidence for it.
But I posit that both of these studies focus on the wrong question. Instead of testing if AI is better than humans at a task or not - and then giddily reporting the results - more useful is the question of how certain strengths of the technology can be incorporated into our processes and thinking. Offloading tasks to AI by default and wholesale should not be the end goal of any implementation of artificial intelligence - as with any technology, the goal should be improved efficiency and positive rate of return. For example, in most implementations in the health sector, AI assists with diagnosis - it is not left to do the work entirely.
Knowing that AI is bad at citing page numbers is useful because you can ensure the humans in your employment dial up the focus on this in their workflows. Knowing that AI is great at generating ideas is useful because brainstorming with a language model will boost your ability to ideate.
But expecting that AI will do one or the other for you is not realistic. For both of these jobs, you will benefit from collaboration with AI - in fact, when a language model gives you a reply to your prompt, it asks you: “Would you like me to elaborate on any of these points or discuss how they might be implemented in a specific context?” The AI is prompting you to work with it.
Knowing when and how to use AI is where you will get the best results for your needs.
Talk to us today for guidance on how you can benefit from AI and use it to help solve your business problems.
Get in touch
contact@grandwestconsulting.com