Ollama meets Qdrant: A local memory for your AI on the Mac

Local AI with memory - without cloud, without subscription, without detour

In a previous articles I explained how to configure Ollama on the Mac install. If you have already completed this step, you now have a powerful local language model - such as Mistral, LLaMA3 or another compatible model that can be addressed via REST API.

However, the model only "knows" what is in the current prompt on its own. It does not remember previous conversations. What is missing is a memory.

This is exactly why we use Qdrant, a modern semantic vector database.
In this article I will show you step by step:

  • how to installier Qdrant on the Mac (via Docker)
  • How to create embeddings with Python
  • how to save, search and integrate content into the Ollama workflow
  • and what a complete prompt→memory→response sequence looks like

Why Qdrant?

Qdrant does not store traditional texts, but vectors that represent the meaning of a text as a numerical code. This means that content can not only be found exactly, but also semantically similar - even if the words vary.

Ollama + Qdrant therefore results:

A local language model with long-term memory - secure, controllable and expandable.

Prerequisites

Packageinstallation from Qdrant:

pip install qdrant-client sentence-transformers

Start Qdrant (Docker)

docker run -p 6333:6333 -p 6334:6334 qdrant/qdrant

Qdrant then runs:

http://localhost:6333 (REST API)

http://localhost:6334 (gRPC, not required for this article)

Qdrant on Docker under Apple macOS

Python example for Ollama + Qdrant

We now write a simple basic script that:

  • accepts the user prompt
  • generates an embedding vector from this
  • searches for semantically similar memories in Qdrant
  • the response is generated with context via Ollama
  • saves the new conversation as a reminder
Python-Script: ollama_memory.py
import requests
from sentence_transformers import SentenceTransformer
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct

# Einstellungen
OLLAMA_URL = "http://localhost:11434/api/generate"
COLLECTION_NAME = "memory"
VECTOR_SIZE = 384 # für 'all-MiniLM-L6-v2'

# Lade Embedding-Modell
embedder = SentenceTransformer("all-MiniLM-L6-v2")

# Verbinde mit Qdrant
qdrant = QdrantClient(host="localhost", port=6333)

# Erstelle Collection (einmalig)
def create_collection():
if COLLECTION_NAME not in qdrant.get_collections().collections:
qdrant.recreate_collection(
collection_name=COLLECTION_NAME,
vectors_config=VectorParams(size=VECTOR_SIZE, distance=Distance.COSINE)
)

# Füge Eintrag ins Gedächtnis hinzu
def add_to_memory(text: str):
vector = embedder.encode(text).tolist()
point = PointStruct(id=hash(text), vector=vector, payload={"text": text})
qdrant.upsert(collection_name=COLLECTION_NAME, points=[point])

# Suche im Gedächtnis
def search_memory(query: str, top_k=3):
vector = embedder.encode(query).tolist()
hits = qdrant.search(
collection_name=COLLECTION_NAME,
query_vector=vector,
limit=top_k
)
return [hit.payload["text"] for hit in hits]

# Sende Anfrage an Ollama
def query_ollama(context: list[str], user_prompt: str):
prompt = "\n\n".join(context + [user_prompt])
response = requests.post(OLLAMA_URL, json={
"model": "mistral",
"prompt": prompt,
"stream": False
})
return response.json()["response"]

# Ablauf
def main():
create_collection()
print("Frage an die KI:")
user_prompt = input("> ")
context = search_memory(user_prompt)
answer = query_ollama(context, user_prompt)
print("\nAntwort von Ollama:")
print(answer.strip())

# Speichern der Konversation
full_entry = f"Frage: {user_prompt}\nAntwort: {answer.strip()}"
add_to_memory(full_entry)

if __name__ == "__main__":
main()

Notes on practice

You can also use your own embedding models, e.g. via Ollama (e.g. nomic-embed-text) or Hugging Face models

Qdrant supports payload filters, time periods and fields (very useful for later expansion!)

The hash(text)-ID is sufficient for simple tests, for professional applications you should use UUIDs

Local AI with memory - and what you can do with it

In the previous chapters, I showed you how to build a real, local AI memory on a Mac with Ollama and Qdrant. A setup that works without the cloud, without a subscription and without external servers - fast, secure, private.

But what now?

What can this technology actually be used for? What is possible with it - today, tomorrow, the day after tomorrow?

The answer: quite a lot.

Because what you have here is more than just a chatbot. It's a platform-independent thinking machine with a long-term memory. And that opens doors.

🔍 1. personal knowledge database

You can use Ollama + Qdrant as your personal long-term memory.
Documents, notes from conversations, ideas - everything you tell him can be semantically stored and retrieved.

Example:

"What was my business idea from last Thursday again?"

"Which customers wanted an upgrade in March?"

Instead of searching through folders, you simply ask your system. What's particularly exciting is that it also works with imprecise questions because Qdrant searches semantically, not just for keywords.

📄 2. automatic logging and summary

In combination with audio or text input, the system can keep a running log:

  • Notes in meetings
  • Calls with customers
  • Daily logs or project histories

This data is automatically fed into the Qdrant memory and can therefore be queried later like an assistant:

"What did Mr. Meier say again about the delivery?"

"What was the process like in project XY?"

🧠 3. personal coach or diary assistant

By regularly jotting down thoughts, moods or decisions, you can create a reflective companion:

"What was my biggest progress this month?"

"How did I react to setbacks back then?"

The system gets to know you over time - and becomes a real mirror, not just a chatbot.

💼 4. business applications with FileMaker

If you - like me - use FileMaker, you can connect this setup directly:

  • Send prompts from FileMaker
  • Automatically retrieve and save answers
  • Control memory access directly via REST API or shell script

This creates an extremely powerful combination:

  • FileMaker = Front end, user interface, control center
  • Ollama = Language intelligence
  • Qdrant = semantic long-term memory

The result: a genuine AI component for FileMaker solutions, local, secure, individual.

🛠️ 5. Support in everyday life: reminders, ideas, recommendations

"Remind me of this idea next week"

"What books have I already recommended to you?"

"What could I offer Mr. Müller next?"

With targeted memory logic (time stamps, categories, users), you can structure your memory in a targeted way and use it for many areas of life and business.

🤖 6. basis for an agent system

If you think ahead, you can also build agent-like systems with this setup:

  • AI takes over simple tasks
  • AI recognizes patterns over time
  • AI gives proactive hints

Example:

"You've asked the same question four times this week - do you want to save a note?"

"A striking number of customers have mentioned this product - shall I summarize that for you?"

🌐 7. integration with other tools

The system can be easily linked with other tools:

  • Neo4jto graphically depict semantic relationships
  • Files & PDFsto index content automatically
  • Mail parserto analyze and memorize emails
  • Voice assistantsto interact via voice

🔐 8. everything remains local - and under control

The biggest advantage: you decide what is saved. You decide how long it stays saved. And: it never leaves your computer if you don't want it to. In a world where many people blindly rely on cloud AI, this is a powerful counterbalance - especially for freelancers, developers, authors and entrepreneurs.

Tame Ollama + Qdrant: How to give your local AI structure, rules and fine-tuning

Anyone who has taken the trouble to install Ollama and Qdrant locally on the Mac has already achieved great things. You now have:

  • A local voice AI
  • A semantic memory
  • And a working pipeline that maps Prompt → Memory → Ollama → Response

But anyone who works with it quickly realizes: It needs rules. Structure. Order.
Because without control, your assistant quickly becomes a chatterbox who remembers too much, constantly repeats himself or pulls up irrelevant memories.

🧭 What's still missing?

An orchestra also has a conductor. And that's exactly your job now: to control instead of just use.

Module 1: A "router" for memory logic

Instead of bluntly saving everything or bluntly searching for everything, you should decide in advance whether anything should be saved or loaded at all. You can do this, for example, with a simple relevance router that you place between the prompt and the memory:

ExampleCheck relevance via prompt to Ollama itself

def is_relevant_for_memory(prompt, response):
prüf_prompt = f"""
Nutzer hat gefragt: "{prompt}"
Die KI hat geantwortet: "{response}"
Sollte man sich diesen Dialog langfristig merken? Antworte nur mit 'Ja' oder 'Nein'.
"""
result = query_ollama([], prüf_prompt).strip().lower()
return result.startswith("ja")

So you give Ollama the task of evaluating its answer - and only if it is classified as relevant do you save it in Qdrant.

Module 2: Exclude older messages (context limitation)

With longer sessions in particular, it becomes problematic if old messages keep reappearing in context. The model does not forget - it gets bogged down.

SolutionLimit context window.

You can do this in two ways:

Method 1: Limit the number of hits

context = search_memory(user_prompt, top_k=3)

Only what is semantically relevant is loaded here - not everything.

Method 2: Limit the time

# Nur Nachrichten der letzten 7 Tage
now = datetime.utcnow()
filter = Filter(
must=[
FieldCondition(key="timestamp", range=Range(gte=now - timedelta(days=7)))
]
)

You can therefore "cut off" the time if the system reaches too far into the past.

Module 3: Introducing context weights and labels

Not every entry in your memory is of equal value. You can give them weight or categories:

  • Fixed (e.g. "User is called Markus")
  • Temporary (e.g. "Today is Tuesday")
  • Situational (e.g. "Chat from today 10:30 am")

Qdrant supports so-called payloads - i.e. additional information per entry. This allows you to filter or prioritize later.

Module 4: Fine-tuning via the prompt

The prompt itself is a powerful control unit.
Here are a few tricks you can use to make Ollama smarter:

Example prompt with instructions:

You are a local assistant with a semantic memory. If you find several memories, only use the three most relevant ones. Do not refer to information older than 10 days unless it is explicitly marked. Ignore trivial reminders such as "Good morning" or "Thank you". Answer precisely and in the style of an experienced consultant.

This allows you to carry out fine-tuning directly in the prompt itself - without new models, without training.

And: You can generate the prompt dynamically - depending on the situation.

Module 5: Storage hygiene

As the memory grows, it becomes confusing.
A simple maintenance script that deletes irrelevant or duplicate content is worth its weight in gold.

Example:

"Forget everything to do with 'weather'."

"Delete entries that are older than 3 months and have never been retrieved."

Qdrant supports this via API - and you can automate it once a week, for example.

Module 6: FileMaker as control panel

If you - like me - work with FileMaker, you can control all of this remotely via REST-API:

  • Send promptly
  • Retrieve context
  • Answer received
  • Have a valuation carried out
  • Save or forget

All you need is a small REST module in FileMaker (Insert from URL with JSON) and a few scripts.

The result: an interface that lets you control your AI like a living notebook - but with intelligence.

🔚 Conclusion: AI is only as good as its leadership

Ollama is powerful. Qdrant is flexible. But without clear rules, both become an unstructured pile of data. The trick is not to store everything - but to keep only what is relevant available and to think in a targeted way instead of just remembering.

Leave a Comment