【【39/100个AI应用体验】:苹果用户的AI新选择:使用本地模型完全免费打造专属且私密的AI工作流(基于MLX-LM)】萍聚社区-德国热线-德国实用信息网人工智能

新闻发表于 2025-6-15 22:50

【39/100个AI应用体验】:苹果用户的AI新选择:使用本地模型完全免费打造专属且私密的AI工作流(基于MLX-LM)

作者：微信文章
┃项目推荐值：⭐️⭐️⭐️⭐️

MLX是苹果公司于2023年底推出的一个开源机器学习框架，专为苹果硬件(特别是搭载Apple Silicon（m系列）芯片的设备)优化设计。以下是MLX的主要特点和功能。

MLX-LM是基于mlx的语言模型，适合运行在mac m系列芯片上，与ollama相比，token输出效率更高，性能更强。mlx社区现在也发展的相当不错，我们可以基于mlx社区部署qwen、deepseek、llama等一系列开源模型。

✅ 苹果芯片专属优化：完美适配M1~M4系列，性能很强

✅ 隐私无忧：所有数据留在你的电脑，绝不上传云端
废话不多说，直接开整。┃部署和验证过程（mlx版本的qwen3-4B-4bit）：
1.前置条件：

✅苹果的MacOS系统，且是M系列芯片！

✅需要魔法🪜

✅需要提前安装了python环境

✅需要huggingface访问token

以下操作主要都在mac 终端中执行

2.装/升级 mlx环境依赖包
pip install -U mlx mlx-lm3.安装/升级huggingface环境依赖包pip install -U huggingface_hub4.创建加载mlx版本的qwen的python程序（我这里用vscode创建程序mlx_qwen3.py，现在字节跳动的trae也越来越多人使用）from mlx_lm import load, generate
model, tokenizer = load("mlx-community/Qwen3-4B-4bit")#选择适合mac环境的qwen3型号
prompt = "hello"
if tokenizer.chat_template is not None: messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template(    messages, add_generation_prompt=True )
response = generate(model, tokenizer, prompt=prompt, verbose=True)终端窗口进入到程序的根目录下（或者在vscode中打开终端也是同样效果）执行程序：python mlx_qwen3.py
出现以下打印语句时，说明mlx社区的Qwen3-4B-4bit下载安装并测试成功
==========<think>Okay, the user said "hello". I need to respond appropriately. Since they greeted me, I should reply in a friendly and welcoming manner. Maybe start with a greeting back, like "Hello!" to be polite. Then, I can ask how I can assist them today. That way, it's open-ended and invites them to ask for help. I should keep it simple and friendly. Let me make sure there are no errors in the response. Yep, that should work. Let me put that together.</think>
Hello! How can I assist you today? 😊==========Prompt: 9 tokens, 9.850 tokens-per-secGeneration: 119 tokens, 41.360 tokens-per-sec
小贴士：可以在终端通过 'ls ~/.cache/huggingface/hub/'命令来查看已经下载的huggingface模型及其在本地的位置

5.在终端窗口加载qwen3并对话
mlx_lm.generate --model mlx-community/Qwen3-4B-4bit--prompt "合肥最近5年的发展空间" --max-tokens 500
也可以用python程序验证：
from mlx_lm import load, generate
# 加载模型
model, tokenizer = load("mlx-community/Qwen3-8B-3bit")
# 单次生成prompt = "解释一下量子计算的基本原理"
response = generate(model, tokenizer, prompt=prompt, verbose=True)
print(response)┃进阶但使用的使用方法(推荐）
6.封装成api供其他程序或AI工作流调用

用deepseek等在线模型生成一个api服务，便于我们在工作流中调用

生成的完成的api服务代码：
# mlx_api.pyfrom fastapi import FastAPI, HTTPExceptionfrom fastapi.middleware.cors import CORSMiddlewarefrom pydantic import BaseModelimport mlx.core as mxfrom mlx_lm import load, generateimport reimport uvicornimport logging# 配置日志logging.basicConfig(level=logging.INFO)logger = logging.getLogger(__name__)app = FastAPI(title="MLX Model API for n8n")# 允许跨域请求，便于n8n调用app.add_middleware( CORSMiddleware, allow_origins=["*"],# 生产环境应限制为n8n服务器地址 allow_credentials=True, allow_methods=["*"], allow_headers=["*"],)# 全局变量存储模型model = Nonetokenizer = Noneclass GenerationRequest(BaseModel): prompt: str max_tokens: int = 4000 temperature: float = 0.7 top_p: float = 0.9 remove_thinking: bool = True system_prompt: str = None# 启动时加载模型@app.on_event("startup")async def startup_event(): global model, tokenizer logger.info("Loading Qwen3-4B-4bit model...") model, tokenizer = load("mlx-community/Qwen3-4B-4bit") logger.info("Model loaded successfully")@app.post("/generate")async def api_generate(request: GenerationRequest): global model, tokenizer
if model is None or tokenizer is None:    raise HTTPException(status_code=500, detail="Model not loaded yet")
logger.info(f"Generation request received: {request.prompt[:50]}...")
try:    # 准备完整提示，包含系统提示    full_prompt = request.prompt    if request.system_prompt:          full_prompt = f"{request.system_prompt}\n\n{request.prompt}"
   response = generate(          model,          tokenizer,          prompt=full_prompt,          max_tokens=request.max_tokens    )
   # 移除thinking部分(如果需要)    if request.remove_thinking:          response = re.sub(r'\n', '', response, flags=re.DOTALL)
   return {          "text": response,          "status": "success"    } except Exception as e:    logger.error(f"Generation error: {str(e)}")    raise HTTPException(status_code=500, detail=str(e))@app.get("/health")async def health_check(): return {"status": "healthy", "model_loaded": model is not None}if __name__ == "__main__": uvicorn.run(app, host="0.0.0.0", port=8000)
通过python mlx_api.py 运行这个api服务

7.在n8n工作流中使用http请求节点来调用刚刚运行的mlx-qwen3的api服务（n8n工作流之前文章介绍过，链接：【37/100个AI应用体验】：n8n，简单搓一个AI Agent 呗）

如果不会设置http请求参数，就询问大模型（这里我用的是claude 3.7 sonnet）

n8n工作流中打开http Request节点，我这里改名叫mlx-qwen-local，把上面的参数填入节点相应位置：

搭建一个n8n最简单的工作流，来测试验证api接口是否正常：

测试一下：

成功完成mlx-qwen的部署，并且封装成了本地API服务，供本地工作流调用。

对于涉及个人或者企业私密数据来说，还是很有现实意义的。

第39个项目部署并体验完成！

【100个AI应用体验】栏目介绍：围绕当前比较热门的AI项目（AI项目以开源为主，因为可玩性更高一点，过程信息也更透明些，便于AI学习）做一个快速的初步体验，亲自操作，不求甚解。通过快速体验，并在过程中记录以及输出成公众号文章，完成 “”输入-处理-输出”，对当前快速发展的AI项目有个初步的了解和实践。做这个栏目的初衷就是立一个量化目标，倒逼自己去学习、体验和输出。人工智能起起伏伏数十年，这一波以大语言模型为代表的人工智能浪潮，相信会深刻改变人类生活，而我将尝试把目光放远到10年之久，去思考，去践行。

页: [1]

萍聚社区-德国热线-德国实用信息网's Archiver

【39/100个AI应用体验】:苹果用户的AI新选择:使用本地模型完全免费打造专属且私密的AI工作流(基于MLX-LM)