milvus向量数据库笔记、python连接milvus数据库,pymilvus

张开发
2026/4/12 17:03:15 15 分钟阅读

分享文章

milvus向量数据库笔记、python连接milvus数据库,pymilvus
文章目录python连接milvus数据库1、采用配置文件的形式 # .env文件(支持多数据源)2、封装milvusClient工具类3、实现调用查询总条数-快速但非实时(有数秒延迟)查询总条数-实时新增修改-先查询再修改修改-全量字段修改-只修改一个字段删除-单条数据删除-批量删除报错报错 pymilvus.exceptions.MilvusException: MilvusException: (code65535, messagefail to search on QueryNode 1: worker(1) query failed: parser searchRequest failed: vector dimension mismatch, expected vector size(byte) 512, actual 16. at /go/src/github.com/milvus-io/milvus/internal/core/src/query/Plan.cpp:76milvus集合数据为什么需要先加载?文档chromadb是轻量级向量库那么milvus则是企业级向量库必须会用。是通过什么形式管理milvus的呢? sql命令? api接口?推荐用api接口规范及体系已经很成熟。规范是v2sdk v2 # sdk形式的v2版本pymilvus包已基本实现restful api v2 # rest形式的v2版本文中用的pymilvus版本为2.6.11。为了方便文中大量使用from milvus_client import client这里的milvus_client并非标准包而是自定义的包(见代码)请知悉。python连接milvus数据库采用最标准的方式实现如下1、采用配置文件的形式 # .env文件2、封装milvusClient工具类3、实现调用1、采用配置文件的形式 # .env文件(支持多数据源)替换密码为实际的数据库密码。(1)项目下创建.env文件内容MILVUS_URIhttp://c-a9d8c17bbd5c5d2a.milvus.aliyuncs.com:19530MILVUS_USERNAMEroot MILVUS_PASSWORD实际的密码 MILVUS_DB_NAMEdefault MILVUS_DB_NAME_SECONDsecond_milvus(2)项目下创建config.py内容# config.pyfrompydantic_settingsimportBaseSettings,SettingsConfigDictclassSettings(BaseSettings):# Milvus 配置milvus_uri:strmilvus_username:strmilvus_password:strmilvus_db_name:strdefault,milvus_db_name_second:strdefault# 其他配置...debug:boolFalsemodel_configSettingsConfigDict(env_file.env,extraignore)# 创建一个全局实例单例模式# 这样你在 main.py 里直接 from config import settings 就能用了settingsSettings()2、封装milvusClient工具类创建milvus_client.py代码# milvus_client.pyfrompymilvusimportMilvusClientfromconfigimportsettings# 导入你之前写好的配置# 1. 初始化客户端# 使用 settings 中的参数进行连接# 注意MilvusClient 的 token 参数格式通常是 user:passwordclientMilvusClient(urisettings.milvus_uri,tokenf{settings.milvus_username}:{settings.milvus_password},db_namesettings.milvus_db_name# --- 动态指定)print(f✅ Milvus 客户端已初始化连接地址:{settings.milvus_uri})# 2. 实例化一个专门连 销售 库的客户端second_clientMilvusClient(urisettings.milvus_uri,tokenf{settings.milvus_username}:{settings.milvus_password},db_namesettings.milvus_db_name_second# --- 初始化时定死)print(f✅ Milvus sencond客户端已初始化连接地址:{settings.milvus_uri})3、实现调用替换集合名称为实际的集合名称。替换查询向量为实际的维度向量(如128维数组)。创建milvus_demo.py代码# main.py# 方式一直接导入封装好的 client 实例最简单frommilvus_clientimportclient,second_clientdefsearch_example():collection_namefashion_item# 简单的查询示例query_vector[[0.8205300569534302,0.4099404215812683,0.39627617597579956,-0.7741699814796448,-0.4928055703639984,0.3399009704589844,0.3189021348953247,0.5187330842018127,0.2883453965187073,-0.7719671130180359,-0.08381801098585129,-0.8046928644180298,-0.1952885240316391,0.12886777520179749,-0.2597646117210388,-0.17221087217330933,-0.9784966111183167,-0.5716061592102051,0.37882983684539795,-0.34292125701904297,-0.6309902667999268,0.27827078104019165,-0.47276338934898376,0.1738283783197403,0.9165683388710022,-0.5660313367843628,-0.3803914785385132,-0.30034804344177246,0.6518695950508118,0.6775507926940918,0.17429892718791962,0.7504568696022034,0.6146184206008911,-0.17610716819763184,-0.7063156366348267,-0.8314393758773804,-0.3886834979057312,0.842677652835846,-0.8377030491828918,0.1993316113948822,-0.9610193967819214,0.7499078512191772,0.4662572145462036,-0.811089038848877,0.3537338376045227,-0.7749398350715637,0.559476375579834,0.5925202369689941,-0.2932983636856079,0.079729363322258,-0.89405357837677,-0.48456451296806335,0.7873122692108154,-0.41849103569984436,0.8095035552978516,0.8411709070205688,0.24906544387340546,-0.2396702617406845,-0.8429192900657654,-0.6976824998855591,0.24479956924915314,-0.266379177570343,-0.25824326276779175,-0.3464970886707306,0.5343883037567139,0.7019475698471069,0.6393605470657349,0.32661187648773193,-0.1667490005493164,-0.3554096519947052,-0.8155046701431274,0.6531398892402649,-0.5637432932853699,0.06950384378433228,0.47567635774612427,-0.30726751685142517,-0.7929423451423645,-0.7991929650306702,0.048295482993125916,-0.30530187487602234,0.3513074517250061,-0.30239230394363403,0.8689589500427246,0.03430260717868805,0.3641439378261566,0.2765483856201172,-0.6383322477340698,0.8168131113052368,0.9978047609329224,0.8768131136894226,-0.09796279668807983,0.24167074263095856,0.02407880872488022,-0.7623790502548218,-0.20949691534042358,-0.23544828593730927,-0.6273159980773926,-0.8625620603561401,0.8881083726882935,0.8987827301025391,-0.780926525592804,0.6973945498466492,0.6470697522163391,-0.12261192500591278,0.3772689700126648,-0.044691625982522964,0.5931768417358398,0.6109945774078369,-0.9108550548553467,0.09018359333276749,-0.4908239245414734,-0.7361722588539124,0.48472604155540466,0.7651014924049377,-0.38261428475379944,0.8098018765449524,0.5474557876586914,0.09784401953220367,0.7661294937133789,0.26230376958847046,0.15331418812274933,0.8528333306312561,0.3935127556324005,-0.6476650834083557,0.3295937478542328,-0.22980983555316925,-0.7503023743629456,-0.2237066775560379]]# 示例向量print(正在查询 Milvus...)# 使用 MilvusClient 的 search 方法resclient.search(collection_namecollection_name,dataquery_vector,limit3,output_fields[*])res2second_client.search(collection_namecollection_name,dataquery_vector,limit3,output_fields[*])print(查询结果:,res)print(查询结果:,res2)if__name____main__:search_example()查询总条数-快速但非实时(有数秒延迟)frommilvus_clientimportclient# 获取统计信息statsclient.get_collection_stats(fashion_item)# 解析条数countstats[row_count]print(ffashion_item 集合总条数:{count})为什么会有延迟呢?get_collection_stats()返回的数字是最近一次系统记账时的结果而不是“现在的结果”。适用于做监控大屏等对数据精确度要求不高的场景。查询总条数-实时这个就是直接查库了。frommilvus_clientimportclient# 使用 count(*) 聚合查询resclient.query(collection_namefashion_item,filter,# 空过滤表示查询所有output_fields[count(*)])# 解析结果countres[0].get(count(*))print(ffashion_item 实时精准条数:{count})新增milvus新增和修改并没有拆分开而是用一个方法来实现upsert()。修改-先查询再修改替换collection_name为实际集合名。替换target_id为实际id。frommilvus_clientimportclient target_id-184695224# 1. 先查出旧数据把原来的 embedding 捞出来resclient.query(collection_namefashion_item,filterfid {target_id},output_fields[embedding]# 只查向量字段)ifnotres:print(未找到该商品)else:# 2. 取出旧的 embeddingold_embeddingres[0].get(embedding)# 3. 组装完整数据 (保留旧向量更新新字段)data_to_update[{id:target_id,price:299.0,product_name:春季新款风衣,embedding:old_embedding# 【关键】把查出来的旧向量放回去}]# 4. 执行 upsert (不需要 partial_update因为数据是完整的)resclient.upsert(collection_namefashion_item,datadata_to_update)print(更新成功)修改-全量字段替换collection_name为实际集合名。替换target_id为实际id。# 如果不替换也行相当于新增。frommilvus_clientimportclientimportrandom# 1. 构造数据# 既然不查询我们就必须手动编造一个 128 维的向量填进去否则报错# 这里生成一个随机的 128 维向量作为占位符dummy_embedding[random.uniform(-1,1)for_inrange(128)]data[{id:101,# 指定要修改的主键price:299.0,# 修改价格product_name:春季风衣,# 修改名称embedding:dummy_embedding# 【必须】手动补全向量字段否则会报 fieldSchema 错误}]# 2. 执行 upsert# 注意这里不要加 partial_updateTrue因为我们提供了完整字段虽然向量是假的resclient.upsert(collection_namefashion_item,datadata)print(f更新完成。注意embedding 已被重置为随机向量)修改-只修改一个字段frommilvus_clientimportclient# 1. 准备要修改的数据只包含主键和要修改的字段data_to_update[{id:101,# 必须提供主键定位数据price:299.0,# 修改价格product_name:春季新款风衣333# 修改名称# 注意这里故意不提供 embedding}]# 2. 执行 Upserttry:resclient.upsert(collection_namefashion_item,datadata_to_update,partial_updateTrue# 【核心关键】开启局部更新# 告诉 Milvus只更新我提供的字段没提供的字段如 embedding保持原样)print(f修改成功影响行数:{res.get(upsert_count,0)})exceptExceptionase:print(f修改失败:{e})删除-单条数据frommilvus_clientimportclient target_id101# 假设你要删除 ID 为 101 的商品# 执行删除resclient.delete(collection_namefashion_item,ids[target_id]# 传入一个 ID 列表即使只有一个 ID)print(f删除成功共删除{res.get(delete_count,0)}条数据)删除-批量删除frommilvus_clientimportclient ids_to_delete[101,102,105]resclient.delete(collection_namefashion_item,idsids_to_delete)print(f批量删除完成共删除{res.get(delete_count,0)}条数据)注删除的时候是逻辑条数不是物理条数即使ids实际不存在这里还会打印出3条所以这个当不得真。报错报错 pymilvus.exceptions.MilvusException: MilvusException: (code65535, messagefail to search on QueryNode 1: worker(1) query failed: parser searchRequest failed: vector dimension mismatch, expected vector size(byte) 512, actual 16. at /go/src/github.com/milvus-io/milvus/internal/core/src/query/Plan.cpp:76重点在这一句vector dimension mismatch, expected vector size(byte)512, actual16翻译向量维度不匹配期待的维度是512实际是16。看了下确实是数据库是128维传的是4维改为128维即可。注具体大小和浮点数精度有关例如数据库这里用的是float32(32位4字节)所以数据库float32的128维128*4512字节长度报错时提示512。milvus集合数据为什么需要先加载?因为采用的是存储和计算分离的策略数据存储在磁盘上当需要计算时加载到内存中。这就是为什么需要加载的原因如果所有数据都放到内存中太贵了消耗不起。文档Python SDK v2 (pymilvus)文档地址 Milvus Python SDK v2.5.x API Reference核心类 pymilvus.MilvusClient (这是 v2 版本推荐使用的统一客户端)RESTful API v2文档地址 Milvus RESTful API v2.5.x Reference用途 如果你不使用 SDK而是通过 HTTP/JSON 调用请参考此文档。

更多文章