Spring Boot项目里,5分钟搞定LangChain4j调用本地Ollama模型(附完整代码)

张开发
2026/4/15 16:12:28 15 分钟阅读

分享文章

Spring Boot项目里,5分钟搞定LangChain4j调用本地Ollama模型(附完整代码)
Spring Boot项目快速集成LangChain4j调用本地Ollama模型实战指南在当今AI技术快速发展的背景下将大语言模型能力集成到现有Java应用中已成为提升产品竞争力的有效手段。本文将手把手指导Spring Boot开发者如何在5分钟内完成LangChain4j与本地Ollama模型的对接无需复杂配置即可为应用注入AI能力。1. 环境准备与基础配置1.1 创建Spring Boot项目使用Spring Initializr快速生成项目骨架选择以下核心依赖curl https://start.spring.io/starter.zip \ -d dependenciesweb,lombok \ -d languagejava \ -d typegradle-project \ -d javaVersion17 \ -d artifactIdollama-demo \ -o ollama-demo.zip关键依赖说明Spring Web提供RESTful接口支持Lombok简化Java Bean编写1.2 添加LangChain4j依赖在build.gradle中添加Ollama集成包dependencies { implementation dev.langchain4j:langchain4j-ollama-spring-boot-starter:0.25.0 // 其他已有依赖... }注意建议使用最新稳定版本可通过Maven中央仓库查询2. Ollama本地服务配置2.1 安装与启动Ollama根据操作系统下载对应版本以macOS为例brew install ollama ollama pull llama2 # 下载基础模型 ollama serve # 启动服务服务默认监听端口为11434可通过以下命令验证curl http://localhost:11434/api/tags2.2 Spring Boot应用配置在application.yml中添加langchain4j: ollama: chat-model: base-url: http://localhost:11434 model-name: llama2 temperature: 0.7 timeout: 60s关键参数说明model-name指定使用的本地模型temperature控制生成文本的随机性0-1timeout设置请求超时时间3. 核心代码实现3.1 创建对话服务Service RequiredArgsConstructor public class AIChatService { private final OllamaChatModel chatModel; public String chat(String message) { return chatModel.generate(message); } public ListString multiTurnChat(ListChatMessage history) { return chatModel.generate(history.stream() .map(m - new dev.langchain4j.data.message.ChatMessage( m.role().equals(user) ? USER : AI, m.content())) .collect(Collectors.toList())); } }3.2 实现REST接口RestController RequestMapping(/api/ai) RequiredArgsConstructor public class AIController { private final AIChatService chatService; PostMapping(/chat) public ResponseEntityString simpleChat(RequestBody String prompt) { return ResponseEntity.ok(chatService.chat(prompt)); } PostMapping(/chat/history) public ResponseEntityListString contextualChat( RequestBody ListChatMessage messages) { return ResponseEntity.ok(chatService.multiTurnChat(messages)); } }4. 高级功能扩展4.1 流式响应处理修改控制器支持Server-Sent EventsGetMapping(path /stream, produces MediaType.TEXT_EVENT_STREAM_VALUE) public FluxString streamChat(RequestParam String prompt) { return Flux.create(sink - { chatModel.generate(prompt, new StreamingResponseHandler() { Override public void onNext(String token) { sink.next(token); } Override public void onComplete() { sink.complete(); } Override public void onError(Throwable error) { sink.error(error); } }); }); }4.2 自定义模型参数动态调整生成参数public String chatWithParams(String prompt, float temperature, int maxTokens) { return chatModel.builder() .temperature(temperature) .maxTokens(maxTokens) .build() .generate(prompt); }4.3 异常处理建议全局异常处理器示例RestControllerAdvice public class AIExceptionHandler { ExceptionHandler(OllamaException.class) public ResponseEntityErrorResponse handleOllamaError(OllamaException ex) { return ResponseEntity.status(502) .body(new ErrorResponse(AI_SERVICE_ERROR, ex.getMessage())); } }5. 性能优化与监控5.1 连接池配置在application.yml中增加langchain4j: ollama: client: connect-timeout: 5s read-timeout: 30s max-retries: 3 connection-pool: max-idle: 10 max-total: 205.2 监控指标集成添加Micrometer监控Bean public OllamaMetrics ollamaMetrics(MeterRegistry registry) { return new OllamaMetrics(registry); }关键监控指标包括langchain4j.ollama.requests.durationlangchain4j.ollama.requests.countlangchain4j.ollama.errors.count5.3 缓存策略实现Cacheable(value aiResponses, key #prompt.hashCode()) public String getCachedResponse(String prompt) { return chatModel.generate(prompt); }6. 安全防护方案6.1 请求限流配置使用Resilience4j实现Bean public OllamaChatModel rateLimitedChatModel( OllamaChatModel delegate, RateLimiterRegistry registry) { RateLimiter limiter registry.rateLimiter(ollama); return new RateLimitedOllamaChatModel(delegate, limiter); }6.2 内容过滤机制public String safeChat(String prompt) { if (containsSensitiveWords(prompt)) { throw new ContentPolicyViolationException(); } return chatModel.generate(prompt); } private boolean containsSensitiveWords(String text) { // 实现自定义过滤逻辑 return false; }7. 测试验证方案7.1 单元测试示例SpringBootTest class AIChatServiceTest { Autowired private AIChatService chatService; Test void shouldReturnResponse() { String response chatService.chat(你好); assertThat(response).isNotBlank(); } }7.2 集成测试配置测试专用配置类TestConfiguration public class TestOllamaConfig { Bean Primary public OllamaChatModel mockOllama() { return new OllamaChatModel() { Override public String generate(String prompt) { return Mock response for: prompt; } }; } }8. 部署优化建议8.1 Docker Compose配置docker-compose.yml示例version: 3.8 services: ollama: image: ollama/ollama ports: - 11434:11434 volumes: - ollama_data:/root/.ollama app: build: . ports: - 8080:8080 depends_on: - ollama volumes: ollama_data:8.2 健康检查端点自定义健康指标Component public class OllamaHealthIndicator implements HealthIndicator { private final OllamaClient client; Override public Health health() { try { client.listModels(); return Health.up().build(); } catch (Exception e) { return Health.down(e).build(); } } }9. 常见问题解决9.1 连接超时排查典型错误场景处理流程验证Ollama服务状态ollama list检查端口连通性telnet localhost 11434查看Spring Boot日志中的连接错误详情9.2 内存优化配置JVM参数建议java -Xms512m -Xmx2g -XX:MaxRAMPercentage75.0 -jar your-app.jarOllama启动参数调整OLLAMA_NUM_PARALLEL2 ollama serve10. 架构设计建议10.1 服务分层设计推荐架构模式Controller Layer → Service Layer → AI Adapter Layer → LangChain4j Client10.2 异步处理实现使用Spring WebFlux实现非阻塞调用GetMapping(/async) public MonoString asyncChat(RequestParam String prompt) { return Mono.fromCallable(() - chatService.chat(prompt)) .subscribeOn(Schedulers.boundedElastic()); }11. 模型管理进阶11.1 多模型切换策略动态模型选择实现public String chatWithModel(String prompt, String modelName) { return OllamaChatModel.builder() .baseUrl(baseUrl) .modelName(modelName) .build() .generate(prompt); }11.2 本地模型缓存实现模型预加载机制PostConstruct public void preloadModels() { Arrays.asList(llama2, mistral).forEach(model - { ollamaClient.pullModel(model); }); }12. 客户端集成示例12.1 Web前端对接JavaScript调用示例fetch(/api/ai/chat, { method: POST, body: JSON.stringify(如何学习Spring Boot) }).then(response response.text()) .then(data console.log(data));12.2 移动端适配Android Retrofit接口定义interface AIService { POST(/api/ai/chat) suspend fun chat(Body prompt: String): ResponseString }13. 日志与诊断13.1 请求日志配置在application.yml中启用详细日志logging: level: dev.langchain4j: DEBUG13.2 诊断端点实现GetMapping(/diag) public MapString, Object diagnostics() { return Map.of( modelStatus, ollamaClient.listModels(), systemLoad, ManagementFactory.getOperatingSystemMXBean().getSystemLoadAverage() ); }14. 成本控制策略14.1 计费单元设计实现简单的使用量统计Aspect Component public class UsageTrackingAspect { Autowired private UsageMetrics metrics; Around(execution(* com..AIChatService.*(..))) public Object trackUsage(ProceedingJoinPoint pjp) throws Throwable { long start System.currentTimeMillis(); Object result pjp.proceed(); metrics.record(pjp.getSignature().getName(), System.currentTimeMillis() - start); return result; } }14.2 资源限制实现基于Spring的RateLimiterBean public RateLimiter aiRateLimiter() { return RateLimiter.create(10); // 10请求/秒 }15. 扩展阅读与资源15.1 官方文档参考LangChain4j Ollama集成文档Ollama模型库15.2 性能优化白皮书关键指标基准测试结果模型名称响应时间(ms)内存占用(MB)llama212003800mistral9504200

更多文章