Agile AI at Georgian — Part 5: A Roundup of our Favorite MLOps Tools
Welcome back to Agile AI at Georgian, where I share lessons learned about how to adapt agile methodologies for AI products.
In previous installments, we’ve talked about finding your project’s North Star, motivating your team, mastering experimental design, and managing the data lifecycle. Today, we’re going to dive into the world of MLOps – an area where Georgian’s R&D team has recently been spending a lot of time. To write this post, I spoke with our engineers, data scientists, and product managers about their insights from working on Georgian’s internal AI products and collaborating with our portfolio companies.
So what is MLOps, anyway? As AI becomes more mature, teams are applying DevOps procedures and ideas to make the development and deployment of machine learning products more predictable and efficient. However, there are some key differences between traditional DevOps and MLOps.
As Diego Huang, Software Engineer on Georgian’s engineering team puts it, “MLOps is still a fairly new field, and people are still figuring out the ‘best practices.’ There is not yet a cookie-cutter template that one can just follow like in the more mature field of DevOps. Therefore, we (engineers) need to understand how ML works and really think about what might break when the rubber hits the road.”
Choosing your MLOps Tools
Today, there’s a large and growing ecosystem of MLOps tools. Choosing the right ones can be a challenge. Here are some factors that we’ve found are important to consider:
- Who is the user? Faisal Anees, Machine Learning Engineer, notes that it’s important to choose a product with a UI/UX that will be accessible for the primary user’s technical expertise.
- Cost of support. Kyryl Truskovskyi, Machine Learning Engineer on the Applied Research team, says that cost is one of the biggest concerns around tools he finds when working with different portfolio companies. While open source tools may appear cheaper, you’ll want to factor in the cost of engineering time and weigh that against paying up front for a managed service. Aakash Goenka, Data Engineer, points out that open source can also require a lot of maintenance and runs the risk that community support may not persist a few years down the road.
- Ease of onboarding. As my colleague Will Callaghan, Software Engineer, says, “The quicker that I can onboard a tool and either review the documentation or reach out to someone in the community that has worked on this tool, then the more likely I’m going to get to a point where I can start using it effectively.”
- Making sure your tool supports iterative experimentation. Qaid Damji, Product Manager, notes that tools should optimally support model debugging both during training and production, as well as the ability to easily monitor and retrain models.
While tools can be really helpful in optimizing your experimentation and deployment, don’t overthink it. Azin Asgharin, Applied Research Scientist at Georgian, offers this pragmatic point of view for teams starting out:
If you have already set up tools to automate your process, that’s great! Just use them! If not, do not bother setting up all of these tools at the beginning, as it may take up a lot of your time. Instead go with the quickest and fastest possible solution and start adding these tools to your workflow gradually!
When working with larger or more complex teams that want to scale quickly, on the other hand, it may be worth investing more time in tools up front.
Georgian’s Picks for MLOps Tools
MLFlow | “MLFlow provides an end-to-end ML framework including a central model registry, deployment, experimentation, and reproducibility. I used MLFlow in my projects, and it decreased our time to market quite significantly and handled a bunch of things for us.” – Aakash Goenka, Data Engineer |
Comet | “Comet makes it easier to track performances on different experiments.” – Angeline Yasodhara, Applied Research Scientist |
Tensorboard | “TensorBoard allowed me to systematically track and compare different experiments and continue working on the most promising paths instead of brute-forcing my way through the models and hyper parameters. I could easily track and compare all the custom metrics of my model in one simplified view and find trade-off points.” – MJ Mashhadi, Machine Learning Engineer |
Kubeflow | “A neat tool, since it helps run our models on Kubernetes – and also has good UI features including showcasing our runs, management of experiments.” – Aakash Goenka |
Argo | “The Argo project is a collection of open-source projects that make it easier to do GitOps right on Kubernetes, from CI/CD, to workflows, to deployments and event-driven messaging and is one of the frameworks that powers Kubeflow.” – Will Callaghan, Software Engineer |
Knowledge Repo | “For sharing our findings internally; it’s especially important as our team size grows.” – Aakash Goenka |
SageMaker | “It brings everything about ML experimentation into one place and can easily switch hardware underneath to use GPU instances only when needed to save costs” – Diego Huang, Software Engineer |
Finding Success with MLOps
While choosing the right tools is important, there are other key things to keep in mind for a smooth MLOps process. Here are a few tips from our team:
Process and purpose first. Make sure tools support your process and North Star – not the other way around. As Aakash Goenka explains:
MLOps is a practice, and if just starting out or trying to incorporate an Agile framework, focus on the process side and how it can be leveraged to make the workflow better. Try to bake in user personas, teams and ownership from the get-go and think about upkeep and maintenance of the actual tools. Instead of individuals, focus on the teams when possible – as well as the function from these teams. Using metrics to gauge these and track over time benefits the team as well.
Integrated teams. MLOps engineers emphasize that their work cannot easily be compartmentalized away from data science and experimentation – instead, they need to have a solid understanding of the entire process. Engineers need to be involved in experimental design to check for assumptions that may not be compatible with production. Data scientists and engineers need to work together to account for the fact that while experimentation is often done with a static snapshot, production is dynamic and can change rapidly.
Keeping up with new developments. Because MLOps is a rapidly developing field, staying in touch with the community and monitoring improvements to tools is important. Diego points out that “even if a certain tool is not mature today, it may improve quite a bit in just a short amount of time.”
Recommended Reading
Agile for AI: Final thoughts
This wraps up our Agile for AI series. While it can take some creativity to adapt agile processes for the unique needs of AI projects, doing so has many benefits. It makes the complexity and uncertainty of AI more manageable, improving team efficiency and satisfaction, and increases predictability for external stakeholders.
Keep sight of your North Star, nurture your team, and use agile processes to break experimentation into manageable parts. Understanding the data lifecycle and MLOps provide the groundwork for realistic, efficient forward progress using an Agile framework.
This blog was originally published on Georgian’s Impact Blog.
Read more like this
Why Georgian Invested in Render
We are pleased to announce Georgian’s investment in Render’s $80M Series C…
Why Georgian Invested in Armis (Again)
Armis offers visibility, security and risk management to enterprises across the Internet…