MLOPS

MLops负责机器学习模型的自动化: CI/CD/CT，pipeline的orchestration和automation

feature pipelines, training pipelines, inference pipelines
data/model version: DVC
feature store: feast
model version: MLFlow
feature caching, data sharding, real-time feature aggregation and serving

1. 模型部署

low latency

high qps

throughput

1.1 strategies

Shadow deployment strategy
A/B testing
Multi Armed Bandit
Blue-green deployment strategy
Canary deployment strategy

1.2 pipelines

feature processing
- batch serving: Apache Hive or Spark
- real-time serving: Kafka, Flink, Spark Streaming
Training pipeline
- Scheduled Triggering: Apache Airflow, Kubeflow Pipelines
- Event-Driven Triggering: AWS Lambda or Azure Functions can be set up to monitor certain metrics and trigger the training pipeline
Inference pipeline
- Batch Inference: Airflow or Kubernetes CronJobs
  - Batch Inference at Scale with Amazon SageMaker
- Real-Time Inference: Kafka, Flink, or an HTTP-based API (TensorFlow Serving, TorchServe)

1.2 工具

tf-serving
- 支持热部署，不会使服务失效
- TF-Serving 默认使用系统的内存分配器（如 glibc 的 malloc）。通过结合 TCMalloc，可以提升高并发场景下部署性能
onnxruntime
- I/O Binding: copy the data onto the GPU
kubeflow
Flyte
MLFlow
flask / fastapi / sanic
- 压力测试 jmeter
模型
- an end-to-end set
- a confidence test set
- a performance metric
- its range of acceptable values
Recovery
量化
高性能
- C++重写inference，配上模型加速措施(剪枝，蒸馏，量化)，高并发请求
LLM推理
- GEMV 是大模型中的核心操作，其耗时主要源于巨大的计算量、频繁调用和硬件瓶颈
- attention: flash attention, paged attention
- MOE
- vllm
  - paged attention/ continue batching
- fast-transformer
gpu多实例部署

2. 模型压缩

蒸馏
- 如何设计合适的学生模型和损失函数
量化
- 减少每个参数和激活的位数（如32位浮点数转换为8位整数)，来压缩模型的大小和加速模型的运算
低秩分解近似
剪枝 pruning

3. retrain

develop a strategy to trigger model invalidations and retrain models when performance degrades.

because of data drift, model bias, and explainability divergence

什么时候触发新的训练？

amount of additional data becomes available
model’s performance is degrading

4. Monitoring

模型性能: 准确性指标，延迟和吞吐性能
数据：drift
系统：资源使用情况
日志

5. 问答

模型部署后，怎么检测模型流量: 日志记录
如何将决策树模型部署在1000台机器上
- 模型序列化: JSON、Pickle 或 Protobuf
- 微服务架构
  - Flask / FastAPI: 轻量级服务
  - gRPC：高效的远程过程调用框架，适合需要高性能和低延迟的场景
  - Kubernetes: 大规模管理微服务实例
- 容器化服务
- 使用 Kubernetes 进行管理
- 负载均衡和流量管理
- 监控和日志管理
  - Prometheus 和 Grafana 监控微服务的性能指标
  - Elasticsearch、Fluentd 和 Kibana (EFK, 分别对应日志的索引、日志的采集、日志的展示与分析三个维度)
- 客户端请求
工具
- AWS Terraform: 用户可以用代码定义 AWS 资源（如 EC2 实例、S3 存储桶、RDS 数据库等），并自动化其创建、更新和删除
- AWS sagemaker
- AWS lambda
- MLflow, DVC, Neptune, or Weights & Biases

参考

https://mlip-cmu.github.io/s2025/
Open-source Workflow Management Tools: A Survey
Global MLOps and ML tools landscape
mlops-zoomcamp
Made With ML
ml-engineering
youtube-MLOps - Machine Learning Operations
Machine Learning Engineering for Production (MLOps) Specialization
Version and track Azure Machine Learning datasets
Model Deployment Strategies
ML Model Deployment Strategies
python实时语音识别服务部署 - 叫我小康的文章 - 知乎
通用目标检测开源框架YOLOv6在美团的量化部署实战
炼丹师的工程修养之五：KubeFlow介绍和源码分析
模型推理服务化框架Triton
https://github.com/rapidsai/cloud-ml-examples
模型部署优化学习路线是什么？ - Leslie的回答 - 知乎
推荐系统线上Serving简介与C++代码实现 - Shard Zhang的文章 - 知乎
使用TensorFlow C++ API构建线上预测服务 - 篇1
https://github.com/cortexlabs/cortex
https://github.com/ivanpanshin/flask_gunicorn_nginx_docker
ml-pipeline-with-airflow-docker-operator
Accessible Machine Learning through Data Workflow Management
了解/从事机器学习/深度学习系统相关的研究需要什么样的知识结构？ - 张睿的回答 - 知乎
https://github.com/logicalclocks/hopsworks-tutorials
https://github.com/iusztinpaul/energy-forecasting
https://github.com/cmunch1/nba-prediction
https://github.com/MatejFrnka/ScalableML-project
Scaling Apache Airflow for Machine Learning Workflows
外卖广告大规模深度学习模型工程实践
微信基于 PyTorch 的大规模推荐系统训练实践 - DataFunTalk的文章 - 知乎
AIOps在美团的探索与实践——事件管理篇
TFX: Real World Machine Learning in Production
https://www.youtube.com/playlist?list=PL3N9eeOlCrP5a6OA473MA4KnOXWnUyV_J
https://fullstackdeeplearning.com/course/2022/
https://github.com/visenger/awesome-mlops
From MLOps to ML Systems with Feature/Training/Inference Pipelines
https://github.com/GokuMohandas/Made-With-ML
reward-serving碎碎念 - haotian的文章 - 知乎

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

20_mlops.md

20_mlops.md

MLOPS

1. 模型部署

1.1 strategies

1.2 pipelines

1.2 工具

2. 模型压缩

3. retrain

4. Monitoring

5. 问答

参考

Files

20_mlops.md

Latest commit

History

20_mlops.md

File metadata and controls

MLOPS

1. 模型部署

1.1 strategies

1.2 pipelines

1.2 工具

2. 模型压缩

3. retrain

4. Monitoring

5. 问答

参考