Insights      Technology      Generative AI      Driving efficiencies in your AI process

Driving efficiencies in your AI process

Driving efficiencies in your AI process

This article first appeared on the Georgian Impact Blog on Medium.

by Akshay Budhkar and Parinaz Sobhani

In this article we make some suggestions on how to use the latest technologies to run your team as lean as possible while creating a differentiated machine learning product at your startup.

This image has an empty alt attribute; its file name is 1*5YaueU4meqq-bCM8y3OlkQ.jpeg
Photo by Chris Ried on Unsplash

As a team, you and your machine learning scientists and engineers are focused on driving efficiencies for your customers by automating processes and improving the usability of the product by providing insights to your users.

However, now more than ever, you will have to do so as efficiently as possible yourselves. Here we make some suggestions on how to use the latest technologies to run your team as lean as possible while creating a differentiated machine learning product at your startup.

Minimize spend on cloud infrastructure

The costs of running AI can rack up. High costs, in turn, eat into gross margins. Cloud providers can be pricey, to say the least, but there are ways to manage those costs.

Start with pre-trained models

Most startups lack the vast resources of the tech giants, so pre-trained models are a great option. The open nature of AI research also helps lower training costs as we all build on existing resources instead of reinventing the wheel.

If you correctly use transfer learning, there’s no need to create a model to learn everything from scratch. BERT language models, for example, already know enough about the English language to provide a good foundation for language-based tasks.

As an aside, transfer learning can also help you to deliver value from machine learning products to your new customers before they have amassed enough data. This means you can roll out to new customers or new geographies without waiting to collect enough labeled data to create a model that performs well. The result is that you can improve on-boarding times and reduce time to value for new customers. Win for you, win for your customers!

Other techniques also help manage training costs. With correct use, advances in distributed training and mixed-precision training, for example, reduce time to value. Advances in machine learning (ML) model architecture have reduced training costs, too.

Costs quickly escalate when businesses aren’t sure which problems to solve with ML, then build custom models for each problem they’re tackling. To help with this, certain AutoML tools can help determine the technical feasibility of ML solutions and enable faster iterations. This was one of the main reasons we built Foreshadow, our open-source toolkit for automatically generating machine learning pipelines.

Another alternative to eye-watering cloud services bills? Invest in hardware to reduce costs. It’s a capital investment, but it might be worth it in the long run, depending on your needs. Hat tip to Scott Locklin and many in Y Combinator’s community and on Reddit for pointing this out.

People will always cost more than hardware, making AI cost-effective overall with the proper strategy.

Reduce inference with optimizations where possible

Though model inference generally costs only a fraction of research and training of models, it’s worth finding ways to minimize these costs too. You can, for example, use optimizations, from FP16 to FP8 and ZPAQ variants. This essentially halves the cost of pruning a neural network, focusing on the most essential nodes., for example, helps optimize models, reducing compute time and associated costs.

As well, many models have distilled variants like BERT’s distillBERT, TinyBERT, and BERT-OF-THESEUS. These variations essentially learn to predict the output of the original model using a fraction of the compute. In other words, a fraction of the cost.

Plus, even cloud providers like Amazon are helping reduce cloud computing costs. They recently launched their new INF1 to accelerate inference, dropping costs drastically. This applies to models trained in standard frameworks such as TensorFlow (TF) and Pytorch.

Using these current inference tools, inference doesn’t need to be a major drain on resources.

Look for data processing optimization opportunities

Data inputs are getting bigger and more complex, with large image, audio, and video files. Datasets may also contain a heap of extra noise, burying the few small, relevant snippets.

There are new techniques that can process these large data files more efficiently. The Reformer architecture, for example, processed the entire text of Crime and Punishment on one graphics processing unit (GPU) during a single round of training. In fact, it can handle up to a million words using only 16GB of memory. That’s not bad at all.

Researchers are constantly tweaking their data processing techniques. Text tokenization, for example, is now much faster thanks to HuggingFace’s RUST-based tokenizer library that maximizes core usage. Once tokenized, models use just numbers to represent the original data. From there, researchers can harness FP16 and other tools to build the most efficient models.

Plan for scale

Companies may need to scale their models globally to improve reliability, latency, and compliance, ballooning cloud costs as they transfer models across regions.

In reality, there are actually good tools and techniques to manage scaling costs.

We recommend our companies use a dedicated AI production team to handle global scaling. Tools such as Algorithmia manage some scaling, reducing the burden of scaling instances up and down on demand. Cloud services such as AWS, GCP, and Azure are actually fairly intuitive when distributing models across different regions for reliability, latency and compliance, so it’s not as bad as it seems.

Although it’s true that scaling costs are not easily avoided, these costs are definitely manageable with the right approach.

Look for the fastest path to maximum performance

Cleaning and labeling large datasets is laborious. Not only that, the process never ends as the model demands a constant stream of new training data.

Although having humans in the loop can help reach maximum performance faster, most models perform reasonably well out-of-the-box. You can start with an existing model then use smart active learning to fine-tune it for a future task. Existing techniques can handle incorrectly-labeled data, filtering out low-quality labels scientifically, without the need for major human intervention.

In our experience, many problems are not actually as hard as they seem at first. AutoML tools or pre-trained models often handle most of the initial work without any tuning at all. Your humans can then make sure models are picking up the correct signal.

Synthetic (artificial) data generation and weak supervision can also help your team to tackle data scarcity without the costly, time-consuming process of labeling large datasets.

Synthetic data generation creates artificial input and outcome data; weak supervision generates noisy, limited, or imprecise labels, producing weak signals for labeling large amounts of training data in a supervised learning setting. Both innovations make much more training data available sooner.

Solve for scale through expectation-setting

End-users often overestimate what an AI model can accomplish then cram in the wrong type of data. Edge cases can also make scaling AI systems a bumpy ride. But trying to scale to meet these edge cases gets costly quickly.

The simple solution to users entering the wrong data? Manage their expectations. Right from the first sales conversation, you should let them know what they can expect. Don’t oversell its capabilities and your customers won’t be disappointed with poor results.

ML model outcomes are probabilities. You simply need to keep this in mind when making deterministic decisions. Don’t expect 100% accuracy in any ML product. Instead, ask your end-users what performance level is good enough, then with ongoing inputs into the feedback loop, the system and outcome will continuously improve.

Understand edge cases will happen and accept “good enough”. Communicate with your customers to both manage their expectations and discover their needs.

Layer your IP on top of open-source to create a unique offering

Some of the techniques we’re suggesting here, such as open-source models and customer-owned or public data make technical differentiation difficult. To differentiate, you must have either proprietary labels or data.

Proprietary data is not just private data. It can be a smart combination of public and private data applied to your task. Although everyone has access to internet data from sources like Common Crawl, you can select the core data most helpful for downstream tasks. Differentiate yourself by choosing the best way to augment data, for example.

Proprietary data labels can propel companies ahead of their competitors, especially in a flywheel-like setting where growth and improvements accelerate. Replicating this is difficult and can give the original company a competitive advantage.


While the techniques we’ve mentioned are not widely adopted, they can help your team to become more efficient as you build ML products to help your customers do the same. We’d love to hear about other technologies that you’re using at AI startups to address these issues and what other issues you’re encountering.

Read more like this

Redefining legal impact with the team at Darrow

When we think about legal tech software, we think about value add…

Testing LLMs for trust and safety

We all get a few chuckles when autocorrect gets something wrong, but…