Daniel Song September 5, 2023
Generative AI can do some pretty astounding things – so astounding that it can sometimes feel magical. While building an AI-assisted cloud migration platform for enterprise applications, we’ve discovered more and more of these “magical” properties (and can understand why “magic” analogies have become so popular in the larger AI conversation).
But just as importantly, we have realized that even the best, most sophisticated magic tricks still need a magician: a person in control of the process, who can apply the expertise and creativity needed to delight the audience and refine the performance over time.
In the context of AI, that magician is the “human in the loop” (or HITL). And in our cloud migration experience, the trick was to generate working Terraform code that accurately reflected the architecture of our original cloud application with up to 90% accuracy.
In this blog post, we’ll share what we discovered about HITL while building our cloud migration platform, and explain why humans are indispensable when using GenAI to power a cloud migration.
HITL: Beyond Prompt Engineering
We focused on a cooperative GenAI model, which involves a “back and forth” protocol where generative AI and the user work collaboratively toward a goal. In our previous post, we explained how crafting high-quality prompts dramatically improved our results when automating the “translation” stage of our tool’s user flow.
While prompt engineering is guided by humans (think of the prompt as the “spell” in the magic analogy), technically it’s not “HITL.” In our recent experience, HITL was more about humans monitoring, validating, and improving on the overall model. Learnings found during HITL can certainly influence prompt engineering, but it’s much more than that. Check out this high-level diagram of the tool’s user flow:
HITL comes into play during what we called the “feedback loop” stage. Since our LLM-based cloud migration platform is in the early stages, we knew that we wanted to add a means for a human engineer to provide feedback along the way, identifying flaws in the Terraform code it created and encouraging the LLM to reexamine and improve it. This critical design decision paved the way to LLM-generated Terraform code that reflected the architecture we expected to see post-deployment with up to 90% accuracy.
How did we get to this point? We found that our “Terraform apply” step was prone to failure due to hallucinations in the generated code from the LLM. First, we tried asking the LLM to address the issues, but even with continuous iterations, there were occasions where it still fell short of the results we wanted. This led us to add a “copilot mode” that incorporated HITL principles to boost our results.
We found that our best results came when we leaned into the relative strengths of the technology and those of the engineers cooperating with the process – essentially, following the principles of Moravec’s Paradox, wherein complex computational challenges that are difficult or tedious for humans are a snap for an LLM, but things that come easily to an experienced engineer can easily confound or break the AI model.
We observed three main ways in which humans make the magic happen: contextual understanding, problem-solving, and teaching.
Humans Possess Contextual Understanding
Human experts possess a deep understanding of their organization’s unique requirements, business processes, and goals that may not be easily expressed in a prompt for the LLM to have the necessary context.
Here’s a great real-world example. Our AI-assisted cloud migration platform had a critical security requirement: we needed to use a Cloud SQL Proxy to configure secure database access for our WordPress application. Google highly recommends this approach when executing application workloads that utilize a database on Google Cloud.
Experienced human engineers would intuitively account for this requirement in a cloud migration use case. Unfortunately, when we were working with the GPT-4-32k model at the time, it lacked contextual understanding despite multiple prompts, and generated Terraform code that assumed the Compute Engine instance housing our WordPress application could access the database without a Cloud SQL Proxy. Left unchecked, this assumption leads to an application that cannot start due to a key missing requirement in a database, leaving it in a broken state.
In the future, LLMs might be sophisticated enough to avoid this issue, but in our experiences so far, the technology isn’t there yet. In order to further train the model and provide additional feedback to get past mistakes like this, it still takes a human’s expertise to guide and work in tandem with the LLM-based tool to recognize the problem and identify the solution.
WHY HITL MATTERS: Not only was a human able to use their wisdom to recognize a bad assumption before it became a problem, but they were able to adjust the cloud migration tooling to learn from the mistake and prevent it from happening again in the future.
Humans Are Problem Solvers
LLMs excel at processing vast amounts of data and providing insights, but they may struggle with ambiguous or incomplete information. Humans, on the other hand, are uniquely skilled at connecting the dots and extrapolating viable conclusions in the absence of concrete instructions.
Cloud migrations are inherently complex, and must take into account a number of moving parts such as network connectivity, performance bottlenecks, enterprise policy requirements, or security vulnerabilities that are not necessarily well-defined or documented in a text format like code. This leads to a significant level of ambiguity that is difficult for the current generation of LLMs to handle on their own – especially since these details are not publicly accessible for the LLM training process.
For example, when we applied Terraform code that was produced from the LLM in our WordPress application use case, some challenges emerged. In the first iteration, no firewall rules were configured at all, which made our application inaccessible and caused it to fail the load balancer health check. Even when we added the firewall rules to allow HTTP access via a prompt, it would often default to allowing traffic from “0.0.0.0/0” – fully open to the public internet worldwide. While this does allow access to the application, if our WordPress application is required to only be accessible internally, security requirements and policies would forbid a firewall rule like this.
With this and other potentially more nebulous areas in mind, human engineers can provide the feedback regarding security and access policies to the LLM, reducing the likelihood of similar errors down the line. While this can potentially be circumvented by adding security policies into the LLM’s context from the beginning, at this point in time, it is still critical to have human interaction to ensure the code produced is compliant with policies that the LLM doesn’t fully understand yet.
WHY HITL MATTERS: Human experts can handle uncertainties, make subjective judgments, and adapt the migration approach as needed.
Humans Are Natural Teachers
Cloud migrations are iterative processes that require continuous learning and improvement; that’s not something that AI can do on its own (yet). Human experts can analyze outcomes, evaluate the effectiveness of the migration, and identify areas for optimization. By incorporating their experience and feedback into the migration platform, it can be refined over time to become more accurate, efficient, and aligned with real-world scenarios.
Since our tool is still in the early stages, we haven’t yet reached the point where we have outcomes to analyze in depth and optimize performance over time. However, given what we’ve learned – and our team’s extensive domain knowledge of cloud migrations – we predict that myriad teaching opportunities will emerge in real-world migrations. For example, imagine what a human engineer could teach the model based on performance metrics, or stakeholder feedback. Or emerging technologies: as human engineers discover new solutions, that knowledge can be used to train the model to be more efficient or leverage previously-unavailable capabilities.
WHY HITL MATTERS: LLMs need humans to continuously assess their performance and adjust training in response to outcomes. Without this feedback loop, AI won’t learn effectively, and the ongoing benefits of the AI-assisted migration will diminish rather than increase in value over time.
Magic Still Needs a Magician (For Now)
You might be reading this and think: “But isn’t the value of AI to reduce human intervention?” That is still true.
Through automation and other capabilities, a cooperative GenAI model can dramatically reduce the number of engineers and/or the work they need to complete during a cloud migration. It enables you to automate the costly, repetitive, tedious tasks that complicate your cloud migration, freeing your engineers to focus on the things that humans do so well: solving problems and innovating. Not to mention, humans can transfer their knowledge to the AI over time, continuously training it to understand new “tricks” and creating a virtuous cycle of improvement.
If you want to leverage LLMs for complicated tasks such as cloud migrations, we highly recommend incorporating HITL as part of your overall strategy and generative AI tooling design. In our experience, it’s how you achieve the best outcome in these early days of the AI revolution: combining the power of LLMs and human engineer expertise.
It’s not hard to foresee a future where cloud migrations can be completed in weeks, days, even minutes, by a specialized “enabling team” that can handle the migrations of tens, if not hundreds, of applications. And if that’s not magical, what is?
This is the fourth post in our Supercharging Cloud Migrations with GenAI series. Interested in learning more? Check out our previous insights: Tame Your Cloud Migration Anxiety with Generative AI, Exclusive Preview: An AI-Powered Cloud Migration Platform, and Prompt Your Way to a Successful Cloud Migration.