Back to Blogs

One Year of ChatGPT - Here’s What’s Coming Next

November 29, 2023
5 mins
blog image

Before OpenAI was a producer of the most scintillating boardroom corporate drama outside of an episode of Succession, it was the creator of the universally known AI application ChatGPT. On the eve of the one-year anniversary of its launch(a whirlwind year of progress, innovations, and twists), it is worth revisiting the state of AI post-ChatGPT with a view towards looking forward.

A year ago, ChatGPT took the world by storm, smashing even OpenAI’s greatest expectations of adoption by becoming the world’s fastest-growing consumer app of all time. While the last year has been filled with a panoply of new models, hundreds of freshly minted startups, and gripping drama, it still very much feels like only the early days of the technology. 

As the cofounder of an AI company, and having been steeped in the ecosystem for years, the difference this last year has made has been nothing short of remarkable—not just in technological progress or academic research (although the strides here have been dizzying) —but more in unlocking the public imagination and discourse around AI. 

YCombinator, a leading barometer of the directionality of technological trends, has recently churned out batches where, for the first time, most companies are focused on AI. ChatGPT is now being used as a zinger in political debates. Once exotic terms like Retrieval Augmentation Generation are making their way into the vernacular of upper management in Fortune 500 companies. We have entered not a technological, but a societal change, where AI is palatable for mainstream digestion. So what’s next?

Seeing the future is easy, if you know where to look in the present: technological progress does not move as a uniform front where adoption of innovation propagates equally across all facets of society. Instead, it moves like waves crashing the jagged rocks of a coastline, splashing chaotically forward, soaking some while leaving others dry. Observing where the water hits first lets you guess what happens when it splashes on others later. It takes one visit to San Francisco to notice the eerily empty vehicles traversing the city in a silent yet conspicuous manner to preview what the future looks like for municipalities around the world—a world with the elimination of Uber driver small talk. 

While making firm predictions in a space arguably moving forward faster than any other technological movement in history is a fool’s game, clear themes are emerging that are worth paying attention to by looking at the water spots of those closest to the waves. We are only one year into this “new normal,” and the future will have much more to bring along the following:

Dive Into Complexity

One of the most exciting aspects of artificial intelligence as a technology is that it falls into a category few technologies do: “unbounded potential.” Moore’s Law in the ‘60s gave a self-fulling prophecy of computational progress for Silicon Valley to follow. The steady march of development cycles has paved the way from room-sized machines with the power of a home calculator to all the marvellous wonders we take for granted in society today. 

Similar to computation, there are no limits in principle for the cognitive power of computers across the full range of human capabilities. This can stoke the terrors of a world-conquering AGI, but it also brings up a key principle worth considering: ever-increasing intellectual power. 

The AIs of today that are drawing boxes over cars and running segmentations over people will be considered crude antiquities in a few years. They are sub-component solutions used only as intermediate steps to tackle more advanced problems (such as diagnosing cancer, counting cars for parking tickets, etc). We must walk before we can run, but it is not difficult to imagine an ability to tackle harder and harder questions over time. In the future, AI will be able to handle problems of increasing complexity and nuance, ones that are currently limitations for existing systems. 

While ChatGPT and other equivalent LLMs of today are conversant (and hallucinatory) in wide-ranging topics, they still cannot handle niche topics with reliability. Companies, however, have already begun tailoring these models with specialized datasets and techniques to handle more domain-specific use cases. With improved training and prompting, the emergence of AI professionals - such as doctors, paralegals, and claims adjusters - is on the horizon. We’re also approaching an era where these specialized applications, like a FashionGPT trained on the latest trends, can provide personalized advice and recommendations according to individual preferences.

We should expect a world where the complexity and nuance of problems, ones that are only available for particular domain experts of today, will be well within the scope of AI capabilities. Topics like advanced pathology, negotiating geopolitical situations, and company building will be problems within AI capacity. If the history of computers is any beacon, complexity is the direction forward. 

Multi-modality 

Right now, there are categorical boxes classifying different types of problems that AI systems can solve. We have “computer vision”, “NLP”, “reinforcement learning”, etc. We also have separations between “Predictive” and “Generative AI” (with a corresponding hype cycle accompanying the rise of the term). These categories are useful, but they are mostly in place because models can, by and large, solve one type of problem at a time. Whenever the categorizations are functions of technological limitations, you should not expect permanence; you should expect redefinitions.

Humans are predictive and generative. You can ask me if a picture is of a cat or a dog, and I can give a pretty confident answer. But I can also draw a cat (albeit badly). Humans are also multi-modal. I can listen to the soundtrack of a movie and take in the sensory details of facial expressions, body language, and voice in both semantic content as well as tonal and volume variations. We are performing complex feats of sensor fusion across a spectrum of inputs, and we can perform rather complex inferences from these considerations. Given that we can do this adeptly, we shouldn’t expect any of these abilities to be outside the purview of sufficiently advanced models.

The first inklings of this multi-modal direction are already upon us. ChatGPT has opened up to vision and can impressively discuss input images. Open-source models like LLaVA now reason over both text and vision. CLIP combines text and vision into a unified embedding structure and can be integrated with various types of applications. Other multimodal embedding agents are also becoming commonplace. 

Check out my webinar with Frederik Hvilshøj, Lead ML Engineer at Encord, on “How to build Semantic Visual Search with ChatGPT & CLIP”.

While these multimodal models haven’t found use in many practical applications yet, it is only a matter of time before they are integrated into commonplace workflows and products. Tied to the point above on complexity, multimodal models will start to replace their narrower counterparts to solve more sophisticated problems. Today's models can, by and large, see, hear, read, plan, move, etc. The models of the future will do all of these simultaneously.

The Many Faces of Alignment

The future themes poised to gain prominence in AI not only encompass technological advancements but also their societal impacts. Among the onslaught of buzzy terms borne out of the conversations in San Francisco coffee shops, alignment has stood out among the rest as the catch-all for all the surrounding non-technical considerations of the broader implications of AI. According to ChatGPT:

AI alignment refers to the process and goal of ensuring that artificial intelligence (AI) systems' goals, decisions, and behaviors are in harmony with human values and intentions. 

There are cascading conceptual circles of alignment dependent on the broadness of its application. As of now, the primary focus of laboratories and companies has been to align models to what is called a “loss function.” A loss function is a mathematical expression of how far away a model is from getting an answer “right.” At the end of the day, AI models are just very complicated functions, and all the surrounding infrastructure are very powerful functional optimization tool. A model behaving as it should as of now just means a function has been properly optimized to “having a low loss.”

It begs the question of how you choose the right loss function in the first place. Is the loss function itself aligned with the broader goal of the researcher building it? Then there is the question: if the researcher is getting what they want, does the institution the researcher is sitting in get what it wants? The incentives of a research team might not necessarily be aligned with those of the company. There is the question of how all of this is aligned with the interests of the broader public, and so on. 

Dall-E’s interpretation of the main concentric circles of alignment

Dall-E’s interpretation of the main concentric circles of alignment

The clear direction here is that infrastructure for disentangling multilevel alignment seems inevitable (and necessary). Research in “superalignment” by institutions such as OpenAI, before their board debacle, is getting heavy focus in the community. It will likely lead to tools and best practices to help calibrate AI to human intention even as AI becomes increasingly powerful.

At the coarse-grained societal level, this is a broad regulation imposed by politicians who need help finding the Google toolbar. Broad-brushed regulations similar to what we see in the EU AI Act, are very likely to follow worldwide. Tech companies will get better at aligning models to their loss, researchers and alignment advocates at a loss to human goals, and regulators at the technology to the law. Regulation, self-regulation, and corrective mechanisms are bound to come—their effectiveness is still uncertain. 

The AI Internet

A question in VC meetings all around the world is whether a small number of powerful foundation models will end up controlling all intelligence operations in the future or whether there will be a proliferation of smaller fine-tuned models floating around unmoored from centralized control. My guess is the answer is both

Clearly, centralized foundation models perform quite well on generalized questions and use cases, but it will be difficult for foundation model providers to get access to proprietary datasets housed in companies and institutions to solve finer-grained, domain-specific problems. Larger models are also constrained by their size and much more difficult to embed in edge devices for common workflows. For these issues, corporations will likely use alternatives to control their own fine-tuned models. Rather than having one model control everything, the future is likely to have many more AI models than today. 

The proliferation of AI models to come harkens back to the early proliferation of personal computing devices. The rise of the internet over the last 30 years has taught us a key lesson: things like to be connected. Intelligent models/agents will be no exception to this. 

AI agents, another buzz term on the rise, are according to ChatGPT:

Systems or entities that act autonomously in an environment to achieve specific goals or perform certain tasks. 

We are seeing an uptake now on AI agents powered by various models tasked with specific responsibilities. Perhaps this will come down even to the individual level, where each person has their own personal AI completing the routine monotonous tasks for them on a daily basis. Whether this occurs or not, it is only a matter of time before these agents start to connect and communicate with each other. My scheduling assistant AI will need to talk to your scheduling assistant. AI will be social!

My guess is a type of AI communication protocol will be one in which daisy-chaining models of different skills and occupations will exponentiate their individual usefulness. These communication protocols are still some ways from being established or formalized, but if the days of regular old computation mean much, they will not be far away. We are seeing the first Github repos showcasing orchestration systems of various models. While still crude, if you squint, you can see a world where this type of “AI internet” integrates into systems and workflows worldwide for everyday users.

Paywalling

The early internet provided a cornucopia of free content and usage powered by VC larges with the mandate of growth at all costs. It took a few years before the paywalls started, in news sites around the world, in walled-off premium features, and in jacked-up Uber rates. After proving the viability of a technology, the next logical step tends to be monetization. 

For AI, the days of open papers, datasets, and sharing in communities are numbered as the profit engine picks up. We have already seen this in the increasingly, almost comically, vague descriptions OpenAI releases about their models. By the time GPT-5 rolls around, the expected release won’t be much less guarded than OpenAI just admitting, “we used GPUs for this.” Even non-tech companies are realising that the data they possess has tremendous value and will be much more savvy before letting it loose.

AI is still only a small portion of the economy at the moment, but its generality and unbounded potential stated above lead to the expectation that it can have absolutely enormous economic impact.  Ironically, the value created by the early openness of technology will result in the end of technological sharing and a more closed mentality. 

The last generation of tech growth has been fueled by social media and “attention.” Any barriers to engagement, such as putting a credit card upfront, were discouraged, and the expectation that “everything is free” became commonplace in using many internet services. OpenAI, in contrast, rather than starting with a traditional ad-based approach for monetization, opened up a premium subscription service and is now charging hefty sums for tailored models for corporations. The value of AI technology in its own right obviates the middle step of funding through advertising. Data and intelligence will likely not come for free.

As we shift from an attention economy to an intelligence economy, where automation becomes a core driver of growth, expect the credit cards to start coming out.

Dall-E’s interpretation of the coming AI paywall paving the transition from an attention economy to an intelligence economy

Expect the Unexpected

As a usual mealy-mouthed hedge in any predictive article, the requisite disclaimer of the unimaginable items must be established. In this case, this is also a genuine belief. Even natural extrapolations of AI technology moving forward can leave us in heady disbelief of possible future states. Even much smaller questions, like if OpenAI itself will survive in a year, are extremely difficult to predict.

If you asked someone 50 years ago about capturing some of the most magnificent imagery in the world, of items big or small, wonders of the world captured within a device in the palm of your hand and served in an endless scroll among other wonders, it would seem possible and yet inconceivable. Now, we are bored by seeing some of the world's most magnificent, spectacular images and events. Our demand for stimulating content is being overtaken by supply. Analogously, with AI, we might be in a world where scientific progress is accelerated beyond our wildest dreams, where we have more answers than questions, and where we cannot even process the set of answers available to us. 

Using AI, deep mathematical puzzles like the Riemann Hypothesis may be laid bare as a trivial exercise. Yet, the formulation of interesting questions might be bottlenecked by our own ability and appetite to answer them. A machine to push forward mathematical progress beyond our dreams might seem too much to imagine, but it’s only one of many surreal potential futures. 

If you let yourself daydream of infinite personal assistants, where you have movies of arbitrary storylines created on the fly for individual consumption, where you can have long and insightful conversations with a cast of AI friends, where most manual and cognitive work of the day has completely transformed, you start to realize that it will be difficult to precisely chart out where AI is going. 

There are of course both utopian and dystopian branches of these possibilities. The technology is agnostic to moral consequence; it is only the people using it and the responsibility they incur that can be considered in these calculations. The only thing to expect is that we won’t expect what’s coming.

Conclusion

Is ChatGPT the equivalent of AI what the iPhone moment of the app wave was in the early 2010s? Possibly—and probably why OpenAI ran a very Apple-like keynote before Sam Altman’s shocking dismissal and return. But what is clear is that once items have permeated into public consciousness, they cannot be revoked. People understand the potential now. Just 3 years ago a company struggling to raise a seed round had to compete for attention against crypto companies, payments processors, and fitness software. AI companies today are a hot ticket item and have huge expectations baked into this potential.

It was only 9 months ago that I wrote about “bridging the gap” to production AI. Amidst all the frenzy around AI, it is difficult to forget that most models today are still only in the “POC” (Proof of Concept) state, not having proved sufficient value to be integrated with real-world applications. 

ChatGPT really showed us a world beyond just production, to “post-production” AI, where AI's broader societal interactions and implications become more of the story than the technological components that it’s made of. We are now at the dawn of the “Post-Production” era.

Where this will go exactly is of course impossible to say. But if you look at the past, and at the present, the themes to watch for are: complexity, multi-modality, connectivity, alignment, commercialization, and surprise. I am certainly ready to be surprised. 

cta banner

Build better ML models with Encord

Get started today
Written by
author-avatar-url

Eric Landau

View more posts