vLLM
High-throughput LLM inference engine
High-Performance Inference with vLLM
vLLM provides fast and efficient LLM serving with PagedAttention.
#
Installation
bash
pip install vllm
#
Basic Usage
python
from vllm import LLM, SamplingParams
llm = LLM(model="meta-llama/Llama-2-7b-chat-hf")
sampling_params = SamplingParams(temperature=0.7, top_p=0.95, max_tokens=512)
outputs = llm.generate(["Tell me about AI"], sampling_params)
#