Jason Brenier, our VP of Product and Innovation, recently moderated a panel discussion on the latest trends in NLP at BotTO with Justina Petraityte, Data Scientist and Developer Advocate for Conversational AI at Rasa, Cameron Schuler, Chief Commercialization Officer and VP, Industry Innovation, at Vector Institute, Gordon Gibson, Machine Learning Lead at Ada and Mangirdas Adomaitis, Data Scientist at Eddy Travels. The discussion covered design and automation in chatbots, trust and hiring for conversational teams. Here are 4 key takeaways from the discussion.
Takeaway #1: Balancing effective design with automation is still a challenge.
As NLP technology gets better, with advances in representation learning, dialogue management, and NLG, are we moving away from conversational design as a skillset? Or will we still need to balance the art and science of conversation to ensure the best possible user experience?
The panelists agreed that technology and design have to be balanced to give the best possible user experience. “Both should go hand-in-hand when you develop,” said Justina. You can achieve this by, for example, making use of UI and design elements like buttons to help automate while guiding the flow of the conversation.
Gordon said that you have to find the right balance between efficiency and control when you’re deciding how much automation you want. You could work towards a fully automated system, but then you lose control over the content of the bot. It’s important for conversational designers/developers to have a certain level of control over the persona in order to build effective conversational experiences that can build trust with users.
Takeaway #2: Trust challenges are emerging as conversational interfaces become more prevalent.
At Georgian, we believe Trust is fundamental to the success of any technology company today. Jason asked the panelists about their biggest fears around the use of conversational technology.
Some broad themes emerged around reliability, security, privacy, and fairness in an interesting discussion on trust and conversational interfaces.
With chatbots, there is a memory paradox when it comes to reliability. Despite being able to maintain a perfect record of every user interaction, chatbots are still incapable of relating information from a few turns earlier to the current conversation. This is frustrating for users who expect bots to be able to connect related pieces of information within a single conversation, or even to pick up where they left off in previous conversations. “From a reliability perspective, often a bot will forget the intent from two turns back in the conversation. It uses none of the information it collects, but takes on all the security risk,” Jason said. When users are frustrated by reliability issues, it can quickly break trust with a product, especially when it is compared to the service received from a human rep that the bot replaces.
Discussing privacy, the panelists agreed that developers need to think carefully about the relationship between personalization based on private or sensitive data and the value delivered to the customer. If the memory problem could be overcome, there are opportunities to use information from previous conversations to personalize the experience.
However, there are inherent risks with conversational interfaces, as people tend to overshare when they are in conversational mode, and often say things that they don’t realize or later forget that they have shared.
This needs to be handled with care by bot developers. They need to take responsibility and proactively act with the user’s best interest in mind so that privacy isn’t compromised. Gordon raised the interesting case of Amazon data being used by law enforcement. Many of us now have smart speakers in our homes that are recording our conversations and would not expect to have the recordings used in legal matters.
Another concern is that these ubiquitous voice interfaces may be encoding societal biases. Often they perform less reliably when faced with linguistic variations such as accents and dialects, because these are underrepresented in training datasets. Language inherently encodes something about the user, the way you speak, what you say. “If models take action or speak to you differently based on how you speak, there’s definitely room for some unfair technology to be developed,” said Gordon.
To go deeper on trust and understand how it can impact all areas of your business, read our Principles of Trust.
Takeaway #3: Advances in NLP technology only go so far – beware of technical pitfalls
Are there pitfalls that come along with using pre-trained models? And if so, how might we be able to address them?
More and more tools are leveraging pre-trained models in their NLP stack. If you go to the list of top scores on the squad question-answering dataset, the top 20 entries by score are all some variation on BERT. “People are taking it and ensembling with other models, fine-tuning it tweaking it and all these different variations of being successful,” said Margirdas.
As powerful as the developments in BERT and some of the other NLP platforms have been, NLP is still a highly data dependent problem. If you are looking to build an NLP tool, it makes a lot of sense to use a tool that has a lot of language understanding encoded in it. “But you also want to fine-tune that model on your data set or in some way customize it to your domain,” said Gordon.
On the other hand, taking a pre-trained model and applying it to your own use case may sound quite trivial, but it is actually a difficult problem in and of itself. “It’s hard to take these highly trained models and then retrain them. You are bringing a big dependency into your system,” said Mangirdas.
Another pitfall identified by both RASA and Eddy Travels was training on more data than you actually need. “It’s better to develop better models instead of just getting more data. What really matters is to have enough data, but good data,” Justina said. “People think they can just get some conversations and all of a sudden they will have a bot. But you have to make sure that your data is good quality.” So chances are that if you’re working with a generic dataset, you might need to purge some of the examples because they are adding more noise to your model than they are actually improving it.
Takeaway #4: Your first hire depends on your objectives
With these challenges, it’s important to clearly understand your requirements (e.g. technical, functional, and trust-related) from the beginning so that as you make decisions about design and architecture, they are the right ones. With that in mind, who is the most important team member to hire first?
How conversational teams prioritize their hires depends on their objectives. If you’re starting very small, a generalist may be a good hire, so that they can do a little bit of everything from design to implementation. If the goal is to build a product using off-the-shelf tools like RASA and Dialogue Flow, then any kind of developer would be able to do much of the work.
If, however, you are looking to build something from scratch and push the field forward, trying innovative things, then perhaps a machine learning engineer would make a better first hire because they will be able to build your system and algorithms while also identifying new approaches.
At Ada, the team has seen great value in hiring cross-functional teams with a designer, product manager, front-end developer, and a machine learning engineer. In this scenario, each contributor is able to bring their own perspective and skills to bear on the development of a conversational AI product. Having a product manager and designer has helped to research and prioritize the user experience in their products.
For more information on considerations when building out your conversational AI team, see our ebook: Building Conversational AI Teams.
Thanks to Eddy Travels for organizing a great event – we’re looking forward to the next one.