We are back with another episode of True ML Talks. In this, we again dive deep into MLOps and LLMs Applications at GitLab and we are speaking with Pruthvi Shetty.
Pruthvi is a Staff Data Scientist at Twilio. Before that, he was also leading ML for SAP as well as a startup called ZapLabs which was acquired by Anywhere RE. In Twilio, Pruthvi leads the Gen AI efforts for Twilio, and we'll deep dive into that today.
- ML and GenAI applications and use cases around GTM
- XGPT: Twilio's Powerhouse for Go-to-Market Teams
- Battling OpenAI Rate Limits
- Experimenting with Open-Source LLM
- RFP Genie: Automating RFP Responses
- Workflow for Traditional ML Models
Watch the full episode below:
Leveraging AI for Go-To-Market Teams
Twilio has a long history of leveraging machine learning (ML) and data science to optimize its products and services. However, the recent advancements in Generative AI (GenAI) have opened up new opportunities to further enhance the way GTM teams operate.
Traditional ML for GTM
While GenAI is a powerful tool, Twilio has not abandoned its traditional ML roots. The company continues to use ML for various GTM tasks, such as:
- Propensity models: Predict the likelihood of a customer converting into a paying user.
- Cross-sell models: Recommend additional products to existing customers based on their usage data.
- Upsell models: Recommend upgrades to higher tiers of service to existing customers based on their usage data.
- Lead generation models: Identify potential new customers who are likely to be interested in Twilio's products.
GenAI for GTM
Twilio recognized the potential of GenAI early on and established a dedicated team to explore its applications. This team has built a suite of GenAI-powered tools specifically for GTM teams, including:
- XGPT: This versatile tool empowers GTM teams to generate personalized outreach content like emails, saving significant time and effort. It also tackles customer inquiries, processing a remarkable 15,000 questions per month, demonstrating its ability to handle large volumes of interactions.
- FlexGPT and SegGPT: Tailored for specific products, these AI models generate comprehensive and accurate documentation for both Flex and Segment, ensuring users have readily available information.
- RFP Genie: This transformative tool tackles the tedious task of answering RFP questions. By processing inquiries with 90% accuracy, it reduces completion time from weeks to minutes, freeing up valuable resources for GTM teams.
XGPT: Twilio's Powerhouse for Go-to-Market Teams
Twilio recognized the potential of Generative AI (GenAI) early on and built a dedicated team to explore its applications. This team, led by Pruthvi, has built a suite of GenAI-powered tools specifically for GTM teams. One of the key tools they built is XGPT.
XGPT was developed as a response to two issues with using publicly available GenAI models like ChatGPT:
- Security and Privacy: Public models train on data shared publicly, which raises security and privacy concerns for Twilio's internal information.
- Limited Customization: Public models cannot incorporate Twilio's specific internal knowledge, such as product release information, sales plays, and competitor positioning.
XGPT tackled these issues by:
- Leveraging Twilio's data: Trained on internal information like product releases, sales plays, and competitor analysis, XGPT provides insights relevant to specific roles and situations.
- Ensuring data privacy: XGPT utilizes Twilio's private API, ensuring data remains secure and unavailable for external training.
We've had it for about 4-5 five months now. Currently, we are answering about 15,000 questions a month, and we've seen a super good lift in the power users of our applications. That's been XGPT so far.
XGPT's Functionality and Impact
XGPT is a secure and customizable platform that:
- Answers questions: It provides answers to user queries based on a vast knowledge base of Twilio's internal and external documents.
- Generates content: It helps users create personalized outreach content and emails based on customer conversations.
- Improves GTM efficiency: It empowers GTM teams with readily available information about Twilio's products, competitors, and sales strategies, leading to increased productivity and improved customer experience.
Technical Architecture of XGPT
XGPT is not just one model, but a suite of products, each tailored for specific GTM roles and needs. These products include FlexGPT for customer service representatives and SegGPT for segmentation tasks.
A custom pipeline of RAG flow gathers all relevant information for XGPT, including public and private data. This information comes from various sources, such as content management systems, internal documents, call transcripts, Salesforce notes, and product documentation.
Offline embeddings are used for FlexGPT and other applications, created using tools like Space and Chroma. Custom tweaks ensure scalability and control. In addition to text, XGPT also understands audio and visual data through multimodal embeddings. Whisper transcribes product demos, while a vision model extracts information from charts and diagrams. These embeddings are then converted to Face embeddings, allowing XGPT to link them to relevant sources in its answers.
The main LLM processing is handled by OpenAI API. In specific cases, like RFPs, Llama is used for interpretation. Parallelization and batching strategies optimize processing and avoid rate limits. An interpretation layer filters and contextualizes questions before feeding them to the LLM. XGPT provides links to the relevant documentation for each answer, allowing you to explore further.
Heroku hosts the applications, ensuring stability and performance. Docker containers enable easy deployment and scalability. Data is securely stored in Postgres. Airtable tracks questions and feedback, constantly improving XGPT's functionality. CloudWatch monitors metrics for optimal performance.
Future of XGPT and RAG flow
The team is constantly working on improving XGPT and RAG flow. Their vision for the future includes:
- Enhanced RAG flow: This includes simplifying the process of creating and maintaining embeddings for all Twilio documentation.
- Automated Documentation Gap Detection: XGPT can help identify areas where documentation is lacking and suggest additional content to fill the gaps.
- Hallucination Mitigation: The team is exploring new techniques to further reduce the occurrence of hallucinations in XGPT's responses.
Battling OpenAI Rate Limits: Engineering Tricks for a Parallel XGPT
Twilio's XGPT, a powerhouse for go-to-market teams, faced a significant obstacle: OpenAI's rate limits. Answering questions iteratively, the initial version quickly hit these limits. Rotating API keys offered a temporary solution, but OpenAI's organizational rate limit proved more challenging.
To solve this challenge, The team's first step was to utilize OpenAI's best practices for avoiding rate limits and parallelizing calls. This provided a solid foundation, but further optimization was needed. Twilio's engineers also devised a clever solution: strategically batching API calls to fly under OpenAI's radar. This involved carefully grouping questions while maintaining the user experience of the application. To further improve efficiency, engineers assigned strategic weights to different tasks. This ensured that critical questions received priority while still allowing less urgent requests to be processed.
Experimenting with Open-Source LLM
While both ChatGPT and Llama are powerful language models, Twilio opted for Llama for their XGPT application for a few key reasons:
- Cost-Effectiveness: Llama operates at a significantly lower cost than ChatGPT, making it a more economical choice for a task like interpretation, which requires less complex reasoning and nuance.
- Task Suitability: The first stage of XGPT involves interpreting user questions. This is a task that Lama is well-suited for, as it excels at understanding and translating the meaning of text.
- Avoiding Vendor Lock-in: Twilio wants to avoid relying solely on one vendor for their LLM needs. By using Llama alongside ChatGPT, they have a backup option in case of outages or changes in OpenAI's policies.
By choosing Llama for the first layer of interpretation, Twilio achieved a cost-effective solution that met the task requirements while diversifying their LLM usage and demonstrating their commitment to the open source community.
RFP Genie: Automating RFP Responses
RFP Genie is another generative AI tool developed by Twilio's internal team. It automates the process of responding to RFPs, which can be a time-consuming and tedious task for GTM teams. RFP Genie can:
- Extract key information: Automatically extract key information and requirements from RFP documents.
- Generate responses: Generate comprehensive and accurate responses to each RFP question, saving GTM teams countless hours of work.
- Maintain consistency: Ensure all responses are consistent with Twilio's branding and messaging.
Workflow for Traditional ML Models
In the Introduction, we briefly touched on the Traditional ML Models still used for GTM in Twilio, like Propensity and Lead Generation Models.
The Traditional ML Models workflow leverages a powerful combination of tools and technologies:
- Data Storage: Customer data is stored in various databases, including Postgres and Airtable, depending on the specific model.
- Model Training: SageMaker pipelines are used to train the ML models, ensuring scalability and efficiency.
- Data Pipelines and Notebook Management: Abacus provides a user-friendly platform for managing data pipelines and notebooks, simplifying the model development process.
- Deployment: Buildkite ensures that all regulatory compliance requirements are met before models are deployed to production.
Read our previous blogs in the True ML Talks series:
TrueFoundry is a ML Deployment PaaS over Kubernetes to speed up developer workflows while allowing them full flexibility in testing and deploying models while ensuring full security and control for the Infra team. Through our platform, we enable Machine learning Teams to deploy and monitor models in 15 minutes with 100% reliability, scalability, and the ability to roll back in seconds - allowing them to save cost and release Models to production faster, enabling real business value realisation.