ChatGPT-4 is More Creative than You
We think that generative AI creativity is limited by its training data and algorithms, and therefore can produce technically proficient content, but not truly novel and innovative content. We like to believe that our own creativity and innovation, rooted in personal experience, knowledge, and human emotions results in intentionality and emotional depth that cannot be replicated by AI. Current research contradicts that view.
For example, a recent Wharton study that compared the ideation capabilities of ChatGPT-4, a 2023 state-of-the-art LLM, with those of students at an elite university. The study found that:
- ChatGPT-4 is significantly better (faster and cheaper) at generating new product ideas than motivated, trained engineering and business students at an elite university.
- The LLM ideas are of higher quality on average (as measured by purchase-intent surveys).
- The majority of the best ideas were generated by ChatGPT-4, not by the students: 35 of the top 40 ideas (87.5%) were generated by ChatGPT-4.
- ChatGPT-4 generated the highest-rated idea, with an 11% higher purchase probability than the best human idea.
- Providing ChatGPT-4 with a few examples of highly rated ideas further increased its performance.
The order of magnitude advantage in productivity itself of ChatGPT-4 is nearly insurmountable, and the higher quality of the best ideas further adds to the advantage of the LLM.
To understand this better let's dive deeper into what it means to be creative and innovative.
What do we mean by creativity?
When it comes to creativity and innovation, for example, finding a new opportunity to improve the air travel experience, or launching a new aviation venture, an airline would prefer an ideator that generates one brilliant idea and nine nonsense ideas over one that generates ten decent ideas.
In creative tasks, consistently good ideas are not as valuable as one single great idea, given that only a few ideas can be pursued. Note that an ideator that generates 30 ideas is statistically more likely to have one brilliant idea than an ideator that generates just 10. In creative problem-solving variability in quality, and high productivity (as reflected in the number of ideas generated) are more valuable than consistency.
To achieve high variability and high productivity, most research on human ideation and brainstorming recommend generating many ideas while postponing evaluation or judgment. This is hard for human ideators to do (more on this below), but LLMs are designed to do exactly this— quickly generate many concepts without exercising much judgment. Further, the hallucinations and inconsistent behavior of LLMs increase the variability in quality, which improves the quality of the best ideas.
For ideation, an LLM’s lack of judgment and inconsistency are features, not bugs.
Why human brainstorming doesn't work very well
Human brainstorming sessions don't work as well as many people believe. Fewer ideas are generated, and the variability of the ideas is lower. Psychologists have several explanations for the challenges of group brainstorming:
- Production blocking: only one person can talk or produce an idea at once, while the other group members sit passively.
- Evaluation apprehension: meaning the fear of looking dumb in front of one’s peers.
- Feasibility bias: In a group setting people tend to filter their own ideas to those that are more "feasible" to make them more attractive to the group.
- Social loafing: in a group, some individuals tend to sit back and let others do the work. This could be due to introversion, not laziness.
Individual human ideation works better, where each member generates as many ideas as possible, alone. This results in both more ideas, and more variability. Those ideas can then be collected, collated, and reviewed in a group setting where the ideas are independent of the individual who submitted them. This creates more open dialog and deeper, richer group interactions. In other words, human groups are better at reviewing ideas than producing them.
Conventional wisdom prior to 2022 was that AI tools would likely be most useful in rote tasks and that creative work would likely remain the domain of humans. In many ways, the opposite is true of LLMs. Their lack of judgment, inconsistency, and occasional "hallucinations" lead to extreme productivity and high variance in idea quality -- resulting in higher overall creativity than the average human.
- Productivity Scale: ChatGPT-4 > Many individual humans > Group brainstorming
- Variability Scale: ChatGPT-4 > Many Individual humans > Group brainstorming
This research suggests that the critical human task in innovation practice may shift from idea generation to idea evaluation and selection, a task for which LLMs do not yet appear to be particularly well suited.
The key to unlocking the full potential of both human ingenuity and generative AI creativity lies in collaboration and integration. By understanding and appreciating the unique strengths of each, we can explore innovative ways to merge the two. One can envision a future where human experience and generative AI creativity complement and inspire one another, ultimately enriching our collective creative endeavors.
Image Credit: Midjourney AI
(Prompt) "realistic images, immersive vivid image depicting teams creating with unlimited ideas and inspiration and data, creative, hex codes 292929 A642E1 5A72D8 6DE2CF EFC365 D04B5E, AI and tech, immersed in creation, creative teams —ar 2:1"