Arpeet was part of the engineering team at Salesforce that built the entire ML platform. He is one of the founders of Builders Fund, where he and his colleagues invest and advise ML/AI companies across the world. And at the same time, he is the head of infrastructure at Skiff.
- ML Usecases in Salesforce
- Salesforce ML Team Structure
- Overview of Salesforce ML Infrastructure
- Prototyping ML models at Salesforce
- Managing Costs for Large-Scale ML Projects in the Cloud
- Automated Flow for Moving Models
- Building a Multi-Tenant Real-Time Prediction Service
- Optimization of models for enterprise AI
- Security and Reliability Measures in Salesforce AI Platform
- ML Infrastructure Platform vs Software Deployment Platform
Watch the full episode below:
Why is ML Important to Salesforce
- Personalized customer experiences → ML enables Salesforce to provide personalized customer experiences, as it allows them to analyze customer data and generate insights to improve customer interactions.
- Automation of marketing campaigns → It helps Salesforce customers to automate their marketing campaigns by analyzing images, text, and social media data, allowing them to focus on their customer personas and optimize their marketing strategy.
- Chatbots for efficient customer support → The Chatbots powered by ML help businesses automate customer support, which results in reduced wait times and lower costs for the business.
- Identifying and mitigating security risks → ML assists Salesforce in identifying and mitigating potential security risks by analyzing data and detecting anomalies.
- Continuously improving products and services → By leveraging ML, Salesforce can continuously improve its products and services, by analyzing customer feedback and using that information to develop new features and improvements.
Salesforce ML Team Structure
At Salesforce, the ML team was divided into three teams:
- Research Team: The research team consisted of hundreds of researchers who focused on novel research problems and published research papers
- Applied science team: The applied science team was responsible for the pure product, data science use cases
- Engineering team: The engineering team was responsible for building the ML platform infrastructure that could support the research and applied science teams.
We found this interesting blog on how ML is used by Salesforce:
Overview of Salesforce ML Infrastructure
Salesforce ML infrastructure was built on top of a tech stack that was chosen to provide a scalable and reliable platform. Here are some of the most relevant and unique pointers about the infrastructure:
- The infrastructure was running on AWS, and Kubernetes was used to manage all of the compute. The use of Kubernetes allowed for easy deployment of any type of machine learning framework, whether TensorFlow or Pytorch.
- There was a separation of clusters between the research team, applied science team, and engineering team. This allowed for better management of compute capacity and resources.
- The platform was built around an orchestrator for training, a real-time prediction service, batch prediction service, and a front-end API for managing user operations such as authentication and authorization.
- The infrastructure consisted of a structured SQL database and an unstructured file store like S3, which were used for managing data. The platform was responsible for managing the data between the two.
- There was a mix of GPU clusters, depending on the use case. This allowed for efficient use of resources and better performance.
1. Security: Separating clusters reduces the risk of data breaches and unauthorized access to sensitive data. Each team can work in their own environment with the necessary security measures.
2. Data Compliance: Different teams may have different data compliance requirements, which can be met by separating clusters. This ensures that each team is working with data that meets the necessary regulatory requirements.
3. Resource Management: Separating clusters allows teams to have the resources they need to complete their tasks without interfering with the resources of other teams. This ensures efficient use of resources and prevents resource contention.
Prototyping at Salesforce: An Opinionated Approach
At Salesforce, the prototyping framework was built around Jupyter Notebooks, allowing data scientists to run short-term experiments interactively and in real-time. The experiments were then transitioned to a long-running job on a large-scale cluster, producing real-time metrics as the job ran.
The training and experimentation SDK was built to abstract the complexity of scheduling jobs, pulling and pushing data, and system dependencies. Data scientists could call a Python API or function to take care of these tasks, and track experiment progress, metrics, logs, and more in the workbench dashboard.
The framework was opinionated, providing an abstracted solution, but still allowing for some flexibility in how data scientists chose to use the platform. However, it was not a completely freeform-style experiment, and there were internal guidelines and standards to follow.
When hosting Jupyter Notebooks at a large scale with sensitive data, the major challenges involve approval workflows for authentication. Data scientists must obtain approval from a certain person or manager to access the data. The notebook environment is ephemeral and destroyed after experiments are completed, but all artifacts generated are persisted. The authentication is API-driven and integrated with internal systems.
How to Manage Costs for Large-Scale Machine Learning Projects in the Cloud
Large-scale machine learning projects can quickly become costly, especially when utilizing GPU resources in the cloud. In order to manage costs during the prototyping phase, there are a few strategies that can be employed.
- Reserved Capacity: If you know how much capacity you need, you can reserve it in advance and get a discount on pricing. This works well if you have a good idea of what your resource requirements will be in the long run.
- Auto-Scaling: If you're not sure how much capacity you'll need or if your resource requirements fluctuate, auto-scaling can help. By automatically scaling resources up or down based on demand, you can avoid paying for unused capacity.
While there are other strategies for reducing costs, such as utilizing spot instances, these often require a lot of engineering effort and may not be practical for long-running jobs. Additionally, spot instances may not always be available in regions with GPU resources.
By utilizing reserved capacity and auto-scaling, you can effectively manage costs while still having the resources you need for your machine learning projects. These strategies continue to be relevant today and can be applied to any public cloud provider.
Automated Flow for Moving Models
Salesforce's promotion flow for moving models from one environment to another relied on the notion of golden datasets for every domain. The data scientists could evaluate the model's performance on these datasets and also on randomized datasets to assess the model's capability to perform well on different types of data. This helped them decide whether to promote a model into higher environments or not.
The promotion process was done through the workbench, but it was intentionally kept slightly manual to ensure that the model performed beyond a certain threshold on n+1 types of datasets. This was challenging because Salesforce is a multi-tenant system, and every customer has a different dataset, sometimes numbering in the hundreds of thousands. Salesforce built hundreds of thousands of models, each specific to a customer and dataset, and automated the process as much as possible.
Overall, the promotion flow at Salesforce was designed to ensure that models were thoroughly evaluated and performed well on diverse datasets before being promoted to higher environments.
Building a Multi-Tenant Real-Time Prediction Service for Complex Models
Building a multi-tenant real-time prediction service is a complex task that involves serving a large number of models with different sizes and architectures in real-time while meeting specific SLA requirements. To address this challenge, the engineering team at Salesforce developed a serving layer that underwent several iterations.
Initially, the team relied on a structured database for metadata and a file store for model artifacts. However, this approach was not scalable for larger and more complex models. To solve this, they sharded their clusters based on the complexity of the model and the type of compute required. For instance, smaller models ran on CPUs, while larger models needed GPUs. Clusters were dedicated to specific types of models, such as NLP models, LSTM models, transformer models, image classification models, object detection models, and OCR models.
The team also developed a layer that orchestrated deploying services on different clusters and node groups. They implemented caching to ensure frequently requested models had lower latencies. Initially, data and research scientists were allowed to use their preferred framework, which made it challenging to uniformly serve the models. The team narrowed down the frameworks to one or two and optimized the models for these frameworks.
Finally, the team converted the models into a uniform format regardless of the original training framework, allowing them to optimize the serving code for each type of model. Overall, the team's efforts resulted in a scalable, efficient, and reliable real-time prediction service.
The real-time inference was my favorite thing to work on. And I think, by the end of it, we also were able to file a patent on it. So, it was a great engineering feature that we added to the platform. It was the most used feature, actually. We were doing double-digit, millions of predictions per day and so it was very, very satisfying to see that getting used by so many customers.
We found this interesting blog on ML Lakes and the Salesforce's Data Platform`s architecture:
Optimization of models for enterprise AI
They heavily benchmarked models and aimed to stay within the bounds of widely supported operators and other operations within a framework to ensure easy conversion. Custom operators were a high-friction conversion and required a high touch approach, but the team found that 95% of use cases were easily solved by off-the-shelf models that did not require novel techniques. This allowed them to optimize for the majority of use cases and spend time on the remaining 5% of models that were not as widely used.
Arpeet also noted that frameworks such as Onyx, Triton, and NVIDIA's Inference Server have made significant strides in standardizing model formats and benchmarking, making them valuable tools for large real-time inference use cases.
Security and Reliability Measures in Salesforce AI Platform
- Approval workflows: Prior to deploying the model in production, there were approval workflows around the dataset to ensure data privacy.
- Security Isolation: Production environment was completely isolated and had certifications like HIPAA and ISO compliance.
- Multi-tenancy: Auth covered all of the tenants to ensure only one particular customer could access their own data.
- Redundancy and High Availability: Sufficient redundancy and high availability measures were built into the platform to ensure reliability.
ML Infrastructure Platform vs Software Deployment Platform
Machine learning infrastructure platforms and software deployment platforms have a lot in common, according to the discussion between Anuraag and Arpeet. Here are the key takeaways:
- Similarities: Both require a data layer, compute layer, orchestration layer, auth service and API gateway, and backend service. The backend service may run analytics or data engineering workload in a standard infrastructure, while in ML infrastructure, it runs an ML workload.
- Differences: The ML infrastructure may have a data lake or an ML lake for an ML workload, while standard infrastructure may not require that. The ML infrastructure may also use specialized orchestration systems.
- Tooling: The tooling in both infrastructure platforms is different. ML infrastructure may require specialized tools for orchestration, while standard infrastructure may use simpler ones. However, the deployment tool for both infrastructure platforms is the same in most cases, especially when deploying on a Kubernetes cluster.
Overall, there is no significant difference between machine learning infrastructure platforms and software deployment platforms, except for the nature of the workload and the tooling required for orchestration.
Additional Thoughts from Arpeet
MLOps: Build vs Buy
- Use available alternatives that provide many features out of the box when building ML infrastructure for midsize companies and startups.
- For early-stage startups, use simple Python scripts, train on one machine with multiple GPUs, and implement some form of orchestration to save costs.
- For established ML workflows in small to midsize companies, use open-source tools directly, but consider using an out-of-the-box solution like TrueFoundry.
Advice for ML Engineers
I think I would say that focusing on a niche at this point, in terms of, either operationalizing a large-scale AI workflow is probably going to be the next set of one of difficult challenges.
Read our previous blogs in the TrueML Series
TrueFoundry is a ML Deployment PaaS over Kubernetes to speed up developer workflows while allowing them full flexibility in testing and deploying models while ensuring full security and control for the Infra team. Through our platform, we enable Machine learning Teams to deploy and monitor models in 15 minutes with 100% reliability, scalability, and the ability to roll back in seconds - allowing them to save cost and release Models to production faster, enabling real business value realisation.