MLX on Apple Silicon as local AI in comparison with Ollama & Co.

At a time when centralized AI services such as ChatGPT, Claude or Gemini are dominating the headlines, many professional users are increasingly looking for an alternative - a local, self-controllable AI infrastructure. Especially for creative processes, sensitive data or recurring workflows, a local solution is often the more sustainable and secure option.

Anyone working with a Mac - especially with Apple Silicon (M1, M2, M3 or M4) - can now find amazingly powerful tools to run their own language models directly on the device. At the center of this is a new, largely unknown component: MLX, a machine learning framework developed by Apple that is likely to play an increasingly central role in the company's AI ecosystem in the coming years.


Current topics on artificial intelligence

What is MLX - and what does the new format stand for?

MLX is an open source framework from Apple for machine learning that is specially tailored to the hardware architecture of Apple Silicon. In contrast to other AI backends such as PyTorch or TensorFlow, MLX directly utilizes the advantages of Apple's so-called "unified memory" - i.e. the shared access of CPU and GPU to the same RAM area. This ensures significantly more efficient processing of data and models - especially for large language models, which can comprise several gigabytes.

The associated MLX format typically describes models whose weights are stored in a compressed .npz file format (NumPy Zip). Models such as Mistral, Phi-2 or LLaMA 3 can be converted into this format using appropriate tools and run directly on a Mac - without a cloud, without an API, without restrictions.

The current situation: What Apple already offers

With the announcement of Apple Intelligence in 2024, Apple has started to integrate system-wide AI functions directly into the operating system. Writing assistants, image processing, semantic search, intelligent mail functions - much of this runs completely locally, especially on devices with an M1 or newer chip. However, none of the new functions are available on older Intel Macs.

At the same time, Apple has further developed the MLX framework and published it under an open license. In combination with tools such as mlx-lm or the new MLX Swift API, it is already possible to run text models locally, set up your own workflows or train models - directly on your own Mac, without data leaving the device.

Professional users in particular - for example from the fields of software development, publishing, marketing or research - can benefit greatly from this, as MLX gives them completely new opportunities to integrate AI models into their workflows without having to rely on external providers.

How MLX works in practice

If you want to use MLX today, all you need is a terminal, Python (ideally in a separate virtual environment) and the mlx-lm package, which bundles all the necessary functions: Model download, quantization, inference and chat. After installation, ready-made models from the Hugging Face community can be started - for example:

mlx_lm.chat --model mlx-community/Mistral-7B-Instruct-v0.3-4bit

Alternatively, you can also access the API using a Python script. The models are automatically loaded, cached and executed locally - without an internet connection after the initial download.

It is also easy to convert your own models. With a single command, you can download models from the Hugging Face Hub, quantize them and make them available for MLX:

python -m mlx_lm.convert --hf-path meta-llama/Llama-3-8B-Instruct -q

The resulting .npz files can then be permanently saved locally and reused.

The comparison: MLX vs. Ollama, Llama.cpp and LM Studio

In addition to MLX, there are several established alternatives for local AI use on the Mac - above all Ollama, Llama.cpp and LM Studio. Each of these tools has its strengths, but also specific limitations.

Ollama

Ollama is particularly popular with developers because it offers a simple command line and REST API. The models are available here in the so-called GGUF format, an optimized file format for fast execution on local machines. Ollama is quick to set up, flexible and supports a wide range of models. However, Ollama does not currently run on the Mac with the MLX engine, but primarily uses a metal-based backend via llama.cpp.

For workflows that require automation or headless operation (e.g. processes running in the background), Ollama is currently the first choice. However, if you want to use Apple's own optimizations, you will have to wait for future MLX integrations.

Llama.cpp

This project forms the basis for many other tools (including Ollama) and offers a very high-performance inference engine for GGUF models. It is extremely flexible, but not always easy to use or operate - especially for beginners. The big advantage: there is a huge community, many extensions and stable development.

LM Studio

Anyone looking for a graphical user interface usually ends up with LM Studio. The tool combines the download, administration and execution of language models in a lean, Mac-native app - including a chat interface, configuration and model management. The highlight: LM Studio has also supported the MLX engine for a few months now, allowing you to take full advantage of Apple's optimizations on an M1 or M2 Mac - and with significantly lower RAM consumption than comparable tools.

LM Studio is the ideal entry point into the world of local AI, especially for users who don't want to bother with terminal commands - and in combination with MLX, it is a real high performer.

When MLX on Silicon is the better choice

While GGUF-based solutions (Ollama, Llama.cpp) are very flexible and run on many platforms, MLX scores with its deep integration into the Apple world. Particularly noteworthy are:

  • Efficient memory utilization through unified memory
  • Optimization for Metal/GPU without complex configuration
  • Seamless integration into Swift projects and Apple frameworks
  • Future-proof, as Apple is actively developing the framework further
  • Expandability, e.g. through own models, fine-tuning and system integration

For Mac users who want to plan for the long term and retain full control over their data, MLX is already a promising entry into the world of local AI - with the potential to become the standard in the future.

An outlook: Where Apple wants to go with MLX and Apple Intelligence

With WWDC 2025, Apple has clearly signaled that MLX is not a gimmick, but a strategic component in the growing Apple ecosystem around AI. The integration of the new "Foundation Models" directly into macOS and iOS, native Swift support and the further development of MLX in the direction of training, quantization and inference clearly show that Apple wants to get involved - but in its own way.

In doing so, Apple remains true to its line: no spectacular promises, but solid, locally functioning technology that proves itself in the long term. For professional users, this is not only appealing, but also strategically highly interesting.

MLX is well on the way to becoming the standard solution for local AI on the Mac. Those who are already working with it today are gaining a valuable head start - whether for creative, technical or analytical applications. In combination with tools such as mlx-lm, LM Studio or the new Swift API, a robust, reliable and future-proof AI environment can be created - in line with the controlled, data-sovereign way of working that will become increasingly important in the future.


Current survey on artificial intelligence

What do you think of locally running AI software such as MLX or Ollama?

Using MLX on the Mac - simple instructions for beginners

With MLX, Apple has created a new system that allows you to use artificial intelligence (AI) directly on your own Mac - without an internet connection, without the cloud, without dependence on Google or OpenAI. The great thing about it: if you have a Mac with an M1, M2, M3 or M4 processor (i.e. Apple Silicon), you can try MLX in just a few steps. Everything runs locally - your texts, questions and data never leave your computer.

Here I explain step by step how to download and use a so-called language model with MLX. It sounds technical - but you'll see that it's easy to do.

Step 1: Check requirements

First you need:

  • A Apple Silicon Mac (M1 or newer). You can find this in the system settings under "About this Mac".
  • macOS 13 (Ventura) or newer.
  • A working internet connection - only for downloading the model, after that everything runs offline.
  • Some storage space, at least approx. 8-10 GB for a small model.

You also need the program called "Terminal", which is already pre1TP12 installed on every Mac. We use it to enter a few commands. You can find it on your Mac under "Programs/Utilities" or simply type and then "Terminal" and confirm with "Enter". Don't worry - you just need to copy and paste.

Step 2: Python 1TP12 animals (only if necessary)

MLX works with the Python programming language. Many Macs already have Python installiert. You can check whether it is available by entering the following in the terminal:

python3 --version

If you get a version number (e.g. Python 3.10.6), you can continue directly.

If not, I recommend using Homebrew to installieren (a popular tool for programs on the Mac). To do this, enter in the terminal:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Then installier you Python with:

brew install python

Step 3: MLX-Tool 1TP12Animal

Now we download the MLX tool with which you can later use the language model. To do this, installier a small program called mlx-lm. Enter it in the terminal:

pip3 install mlx-lm

This will take a few seconds. When it is finished, you are ready to load a model.

Step 4: Download and start a model

Now comes the exciting part: You get a real language model on your Mac - for example a version of Mistral, a very powerful, freely available AI model. Simply enter it into the terminal:

mlx_lm.chat --model mlx-community/Mistral-7B-Instruct-v0.3-4bit

This command does three things:

  • The model is downloaded automatically (once).
  • It is prepared and started.
  • You end up in a chat window in the terminal where you can ask questions - similar to ChatGPT.

When the download is complete (may take a few minutes depending on your internet speed), you will see a flashing cursor. You can now write, for example:

Tell me something about the history of Venice.

...and the model responds directly - completely offline.

Step 5: Continue working with the model

When you are finished, you can end the chat by typing exit or closing the window. Later you can reuse the same model without downloading it again by simply entering the same command again. The model is now stored locally on your Mac and remains there.
If you would like to try out different models, you can do so via Hugging Face or change the model name directly in the terminal line - e.g:

mlx_lm.chat --model mlx-community/Phi-2-4bit

Each model has a different style - some are factual, others are more creative or dialog-oriented.

Even easier? Use LM Studio as an interface

If you prefer to work with a mouse and window, you can also try the LM Studio program. It has a nice interface, supports MLX (on Apple Silicon) and allows you to download and use models with a click.

You can get LM Studio here:

👉 https://lmstudio.ai/

After installation, you can select "MLX" as the engine in the settings - the program then uses the same technology as above, but in a pretty window with a chat field.

You've done it - you can now use a modern AI completely locally on your Mac, without any cloud or subscription. Apple MLX makes it possible to run language models efficiently, securely and in a privacy-friendly way.

If you want to go even deeper later - for example, train your own models, improve them with your texts or integrate them into your own software (such as FileMaker) - then MLX is the right choice for you. But the first step is done: you have control back - and a powerful AI directly on your computer.


Current social issues

Frequently asked questions

  1. What exactly is MLX - and how does it differ from PyTorch or TensorFlow?
    MLX is a machine learning framework developed by Apple and specifically optimized for Apple Silicon (M1-M4). Unlike PyTorch or TensorFlow, which target many platforms, MLX specifically uses the architecture of Apple chips - e.g. the common memory structure (unified memory) and metal GPU acceleration. This makes it more memory efficient and faster on Macs - but only on Apple hardware.
  2. Why should you choose MLX over a tool like Ollama or Llama.cpp?
    MLX has an advantage if you are working specifically on Apple Silicon and want to get the maximum performance out of the device. Ollama and Llama.cpp are very flexible, but often run less efficiently on the Mac. MLX can also be integrated directly into Swift projects - ideal for developers building applications close to Apple. It is not a competitor to Ollama - but a specialized tool for professionals.
  3. Which models are compatible with MLX?
    Many open language models are compatible - such as Mistral, LLaMA 2 and 3, Phi-2 or TinyLLaMA - which are either already converted or can be converted using the mlx-lm.convert tool. It is important that they are available in NumPy-ZIP format (.npz) and are prepared for MLX. There is now a separate section on Hugging Face for MLX-compatible models.
  4. How easy is it to get started? Do I have to be a developer?
    A little technical understanding is helpful - e.g. for the terminal, Python environments or model names. But getting started is relatively easy thanks to mlx-lm: one installation command, one command to start, done. If you prefer to work with a user interface, you can use LM Studio - it now also supports MLX on the Mac.
  5. Can I also train MLX for my own projects - e.g. with my own texts?
    Yes, you can - but the training is currently intended more for advanced users. Most users use MLX models for inference (i.e. for answering, text generation, etc.). For training or fine-tuning, you need to be familiar with LoRA, data formats (JSONL) and memory requirements - or use tools such as FileMaker 2025, which simplify this process.
  6. What about security and data protection at MLX?
    Very good - because MLX runs completely locally. All data, inputs and model responses remain on your own computer. There is no cloud transfer, no external API - ideal for data-sensitive projects, internal documents, protected customer data or confidential notes.
  7. What role does Apple itself play in this? Will MLX be developed further?
    Apple has published MLX under an open license and is actively developing it further - especially in connection with Apple Intelligence, the AI system for macOS, iOS and iPadOS. At WWDC 2025, MLX was presented as the official framework for integrating custom language models into Apple software. It can be assumed that MLX will continue to gain importance in the Apple world.
  8. Can I also combine MLX with other tools, e.g. Neo4j, n8n or FileMaker?
    Yes - MLX is a pure ML framework, but it can be connected to other tools via REST APIs, custom Python services or local wrappers. For example, you can integrate it into your own automation (n8n), a semantic database (Neo4j) or FileMaker solutions - the latter is now even available natively with FileMaker 2025

Image (c) Monoar_CGI_Artist @ pixabay

Leave a Comment