Google Gemini is definitely having a moment, starting with its debut in December and culminating in two blockbuster announcements in February: first, the surprise rebranding of Bard with the release of Gemini Ultra 1.0, followed just one week later by the dramatic reveal of Gemini Pro 1.5, with its staggering 1M token context window. Google is clearly doing everything they can to clear the AI field – but is it enough?

We’ve been checking out these features and reassessing the landscape based on what we’ve learned. In this post, I’ll share some early insights on whether this string of announcements changes the leaderboard, how we plan to use Gemini right now, and where we think Google will go from here.

A Crowded Field In Search Of Breakthroughs

As our co-founder Rakesh Malhotra pointed out in his predictions for 2024, AI is entering its “marathon” era. Way back in 2022, the release of GPT-3 was the catalyst for the AI race as we know it today. Since then, there’s been a surge of AI and large language model (LLM) innovations, with a proliferation of open and closed-source models like ChatGPT, Amazon Q, Meta’s Llama, Anthropic’s Claude, and (of course) Google Gemini. Benchmark test scores reveal incremental improvements, many of which represent remarkable advancements, but none have matched the advent of GPT-3 in terms of sheer influence. To break out of the crowded field of comparably powerful tools requires not just better tech, but better integration and better access. And that seems to be Google’s strategy.

Google Gemini: Bigger, Better, and Faster

Multi-modality (the ability for AI to interact with and reason over data beyond plain text, such as images and documents) has become the latest, buzziest trend in AI. When Google announced Gemini back in December 2023, one of its biggest selling points was “native multi-modality.” Despite some debate around the maturity of the tech at the time, the latest releases of Gemini – especially Gemini Pro 1.5 – claim to have cracked the code on capabilities like text-to-video and video-to-text. 

This functionality is supported by a massive increase in the context window. Google Gemini Pro now boasts a 1 million token context window – meaning you could ingest all three books in the Lord of the Rings trilogy plus The Hobbit and still have context window to spare – making Gemini the clear leader in the space for the moment. (Though it’s worth noting that context window advantages are usually short-lived, and we expect other providers to catch up to Gemini’s window sizes soon.)

Larger context windows also mean fewer round trips to the model, since the developer does not need to split and manage their larger input texts. Overall, this will decrease response times for end users and result in a faster experience overall.

Better Integration and Access: The Android Factor

Making your tech ubiquitous is just as (if not more) important than having the “best” tech, and clearly Google knows it. Their strategy to replace Google Assistant with Gemini on Android puts the platform front-and-center on over 2 billion active devices worldwide. By embedding AI into commonly used applications like email, calendars, and text messages, Gemini has the potential to become an indispensable personal assistant that understands and responds to requests in a more dynamic and context-aware manner than ever before.

When talking about integration and access, it’s important to distinguish between customer-facing LLMs like ChatGPT or Gemini and backend solutions like Google’s Vertex AI. Just as Gemini has a potentially massive impact on the customer-facing domain, the success of tools like GitHub Copilot illustrates the importance of seamless integration for engineers building the applications that run on consumer devices. Copilot’s integration with IDEs (Integrated Development Environments) through plugins has made it indispensable for many developers. Such ease of use and integration are likely to play an important role in Gemini’s adoption.

This is underlined by Gemini Nano, a model that runs locally on Android Pixel 8 phones, resulting in much lower latency and edge computing of the AI responses. A truly offline, GPT-3.5-like model not only drastically reduces response latencies, it also unlocks a world of new use cases on small devices.  Without the round trip to a machine in the cloud to do your inference, a small text can be generated, summarized, refined, or translated locally while staying offline or in airplane mode. Despite weaker computing power in handheld devices, the user experience will dramatically improve, thanks to lower response times.

But Android’s only one piece of the ubiquity puzzle. Consider the reach of Google Workspace, as well. Millions of users take advantage of this platform in their workplaces every day; further integrations on the horizon will only further entrench Gemini into the fabric of everyday life.

Our Take on Google Gemini, So Far

At Nuvalence, we’ve had the opportunity to test the Gemini assistant on Android phones and its LLM capabilities on Google Cloud Platform (GCP). Our findings indicate that Gemini’s native multi-modality and integration capabilities are on par with the currently available state-of-the-art models such as GPT-4 with Vision (GPT-4V). Of course, this is very use case-dependent; prompt engineering still remains an important part of the process.

Since many of our clients use Google Cloud Platform to power their business, we’re watching Gemini (and other Google AI technologies) very closely and are constantly thinking about impactful use cases, like:

  • Local document parsing on a mobile device with Gemini Nano.
  • Local conversational chatbots that do not require an internet connection, and can operate in air-gapped systems.
  • Visual analysis of receipts, schedules, food ingredients, and other tabular data.

Advantage: Google Gemini

So, given all this news, has Google claimed the lead? And more importantly, will they sustain it? My bet’s on yes for both counts.

In the near-term, it’s hard to beat a massive context window and Android ubiquity. But that could change at any moment. Case in point: on the same day that Gemini 1.5 dropped, OpenAI announced Sora, their own powerful multimodal text-to-video model. And we’re still waiting to see what AWS, Meta, and Apple have in the works next.

The more interesting play is the longer-term one. Gemini is a big strategic win for Google. Think of it this way: they have long had the monopoly on search with “I’m going to Google that” a part of nearly everyone’s vocabulary. Over the last year, as OpenAI dominated the AI conversation, “Let me ChatGPT that” started to get some traction. But if the Gemini integration succeeds (and I think it will), we will see Google reassert their leading position – not necessarily by virtue of technical superiority alone, but through sheer ubiquity.

The competition is fierce and the technology is moving fast, but at the end of the day, what matters most is access to the best, most powerful, advanced AI possible. Whether or not Google clears the field entirely, we’ll be using Gemini for sure.

Could Google Gemini give you a competitive edge? Our workshop offers the knowledge, tools, and strategic insight that business leaders need to make the right AI choices.