Qdrant + Ollama: How to build a local AI memory

Local AI with memory - without cloud, without subscription, without detour

In a previous articles I explained how to configure Ollama on the Mac install. If you have already completed this step, you now have a powerful local language model - such as Mistral, LLaMA3 or another compatible model that can be addressed via REST API.

However, the model only "knows" what is in the current prompt on its own. It does not remember previous conversations. What is missing is a memory.

This is exactly why we use Qdrant, a modern semantic vector database.
In this article I will show you step by step:

how to installier Qdrant on the Mac (via Docker)
How to create embeddings with Python
how to save, search and integrate content into the Ollama workflow
and what a complete prompt→memory→response sequence looks like

Why Qdrant?

Qdrant does not store traditional texts, but vectors that represent the meaning of a text as a numerical code. This means that content can not only be found exactly, but also semantically similar - even if the words vary.

Ollama + Qdrant therefore results:

A local language model with long-term memory - secure, controllable and expandable.

Prerequisites

Ollama is installiert and runs (→ e.g. ollama run mistral)
Docker is installiert: https://www.docker.com/products/docker-desktop
Python 3.9+

Packageinstallation from Qdrant:

pip install qdrant-client sentence-transformers

Start Qdrant (Docker)

docker run -p 6333:6333 -p 6334:6334 qdrant/qdrant

Qdrant then runs:

http://localhost:6333 (REST API)

http://localhost:6334 (gRPC, not required for this article)

Python example for Ollama + Qdrant

We now write a simple basic script that:

accepts the user prompt
generates an embedding vector from this
searches for semantically similar memories in Qdrant
the response is generated with context via Ollama
saves the new conversation as a reminder

Python-Script: ollama_memory.py

import requests
from sentence_transformers import SentenceTransformer
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct

# Einstellungen
OLLAMA_URL = "http://localhost:11434/api/generate"
COLLECTION_NAME = "memory"
VECTOR_SIZE = 384 # für 'all-MiniLM-L6-v2'

# Lade Embedding-Modell
embedder = SentenceTransformer("all-MiniLM-L6-v2")

# Verbinde mit Qdrant
qdrant = QdrantClient(host="localhost", port=6333)

# Erstelle Collection (einmalig)
def create_collection():
if COLLECTION_NAME not in qdrant.get_collections().collections:
qdrant.recreate_collection(
collection_name=COLLECTION_NAME,
vectors_config=VectorParams(size=VECTOR_SIZE, distance=Distance.COSINE)
)

# Füge Eintrag ins Gedächtnis hinzu
def add_to_memory(text: str):
vector = embedder.encode(text).tolist()
point = PointStruct(id=hash(text), vector=vector, payload={"text": text})
qdrant.upsert(collection_name=COLLECTION_NAME, points=[point])

# Suche im Gedächtnis
def search_memory(query: str, top_k=3):
vector = embedder.encode(query).tolist()
hits = qdrant.search(
collection_name=COLLECTION_NAME,
query_vector=vector,
limit=top_k
)
return [hit.payload["text"] for hit in hits]

# Sende Anfrage an Ollama
def query_ollama(context: list[str], user_prompt: str):
prompt = "\n\n".join(context + [user_prompt])
response = requests.post(OLLAMA_URL, json={
"model": "mistral",
"prompt": prompt,
"stream": False
})
return response.json()["response"]

# Ablauf
def main():
create_collection()
print("Frage an die KI:")
user_prompt = input("> ")
context = search_memory(user_prompt)
answer = query_ollama(context, user_prompt)
print("\nAntwort von Ollama:")
print(answer.strip())

# Speichern der Konversation
full_entry = f"Frage: {user_prompt}\nAntwort: {answer.strip()}"
add_to_memory(full_entry)

if __name__ == "__main__":
main()

Notes on practice

You can also use your own embedding models, e.g. via Ollama (e.g. nomic-embed-text) or Hugging Face models

Qdrant supports payload filters, time periods and fields (very useful for later expansion!)

The hash(text)-ID is sufficient for simple tests, for professional applications you should use UUIDs

Local AI with memory - and what you can do with it

In the previous chapters, I showed you how to build a real, local AI memory on a Mac with Ollama and Qdrant. A setup that works without the cloud, without a subscription and without external servers - fast, secure, private.

But what now?

What can this technology actually be used for? What is possible with it - today, tomorrow, the day after tomorrow?

The answer: quite a lot.

Because what you have here is more than just a chatbot. It's a platform-independent thinking machine with a long-term memory. And that opens doors.

🔍 1. personal knowledge database

You can use Ollama + Qdrant as your personal long-term memory.
Documents, notes from conversations, ideas - everything you tell him can be semantically stored and retrieved.

Example:

"What was my business idea from last Thursday again?"

"Which customers wanted an upgrade in March?"

Instead of searching through folders, you simply ask your system. What's particularly exciting is that it also works with imprecise questions because Qdrant searches semantically, not just for keywords.

📄 2. automatic logging and summary

In combination with audio or text input, the system can keep a running log:

Notes in meetings
Calls with customers
Daily logs or project histories

This data is automatically fed into the Qdrant memory and can therefore be queried later like an assistant:

"What did Mr. Meier say again about the delivery?"

"What was the process like in project XY?"

🧠 3. personal coach or diary assistant

By regularly jotting down thoughts, moods or decisions, you can create a reflective companion:

"What was my biggest progress this month?"

"How did I react to setbacks back then?"

The system gets to know you over time - and becomes a real mirror, not just a chatbot.

💼 4. business applications with FileMaker

If you - like me - use FileMaker, you can connect this setup directly:

Send prompts from FileMaker
Automatically retrieve and save answers
Control memory access directly via REST API or shell script

This creates an extremely powerful combination:

FileMaker = Front end, user interface, control center
Ollama = Language intelligence
Qdrant = semantic long-term memory

The result: a genuine AI component for FileMaker solutions, local, secure, individual.

🛠️ 5. Support in everyday life: reminders, ideas, recommendations

"Remind me of this idea next week"

"What books have I already recommended to you?"

"What could I offer Mr. Müller next?"

With targeted memory logic (time stamps, categories, users), you can structure your memory in a targeted way and use it for many areas of life and business.

🤖 6. basis for an agent system

If you think ahead, you can also build agent-like systems with this setup:

AI takes over simple tasks
AI recognizes patterns over time
AI gives proactive hints

Example:

"You've asked the same question four times this week - do you want to save a note?"

"A striking number of customers have mentioned this product - shall I summarize that for you?"

🌐 7. integration with other tools

The system can be easily linked with other tools:

Neo4jto graphically depict semantic relationships
Files & PDFsto index content automatically
Mail parserto analyze and memorize emails
Voice assistantsto interact via voice

🔐 8. everything remains local - and under control

The biggest advantage: you decide what is saved. You decide how long it stays saved. And: it never leaves your computer if you don't want it to. In a world where many people blindly rely on cloud AI, this is a powerful counterbalance - especially for freelancers, developers, authors and entrepreneurs.

Tame Ollama + Qdrant: How to give your local AI structure, rules and fine-tuning

Anyone who has taken the trouble to install Ollama and Qdrant locally on the Mac has already achieved great things. You now have:

A local voice AI
A semantic memory
And a working pipeline that maps Prompt → Memory → Ollama → Response

But anyone who works with it quickly realizes: It needs rules. Structure. Order.
Because without control, your assistant quickly becomes a chatterbox who remembers too much, constantly repeats himself or pulls up irrelevant memories.

🧭 What's still missing?

An orchestra also has a conductor. And that's exactly your job now: to control instead of just use.

Module 1: A "router" for memory logic

Instead of bluntly saving everything or bluntly searching for everything, you should decide in advance whether anything should be saved or loaded at all. You can do this, for example, with a simple relevance router that you place between the prompt and the memory:

ExampleCheck relevance via prompt to Ollama itself

def is_relevant_for_memory(prompt, response):
prüf_prompt = f"""
Nutzer hat gefragt: "{prompt}"
Die KI hat geantwortet: "{response}"
Sollte man sich diesen Dialog langfristig merken? Antworte nur mit 'Ja' oder 'Nein'.
"""
result = query_ollama([], prüf_prompt).strip().lower()
return result.startswith("ja")

So you give Ollama the task of evaluating its answer - and only if it is classified as relevant do you save it in Qdrant.

Module 2: Exclude older messages (context limitation)

With longer sessions in particular, it becomes problematic if old messages keep reappearing in context. The model does not forget - it gets bogged down.

SolutionLimit context window.

You can do this in two ways:

Method 1: Limit the number of hits

context = search_memory(user_prompt, top_k=3)

Only what is semantically relevant is loaded here - not everything.

Method 2: Limit the time

# Nur Nachrichten der letzten 7 Tage
now = datetime.utcnow()
filter = Filter(
must=[
FieldCondition(key="timestamp", range=Range(gte=now - timedelta(days=7)))
]
)

You can therefore "cut off" the time if the system reaches too far into the past.

Module 3: Introducing context weights and labels

Not every entry in your memory is of equal value. You can give them weight or categories:

Fixed (e.g. "User is called Markus")
Temporary (e.g. "Today is Tuesday")
Situational (e.g. "Chat from today 10:30 am")

Qdrant supports so-called payloads - i.e. additional information per entry. This allows you to filter or prioritize later.

Module 4: Fine-tuning via the prompt

The prompt itself is a powerful control unit.
Here are a few tricks you can use to make Ollama smarter:

Example prompt with instructions:

You are a local assistant with a semantic memory. If you find several memories, only use the three most relevant ones. Do not refer to information older than 10 days unless it is explicitly marked. Ignore trivial reminders such as "Good morning" or "Thank you". Answer precisely and in the style of an experienced consultant.

This allows you to carry out fine-tuning directly in the prompt itself - without new models, without training.

And: You can generate the prompt dynamically - depending on the situation.

Module 5: Storage hygiene

As the memory grows, it becomes confusing.
A simple maintenance script that deletes irrelevant or duplicate content is worth its weight in gold.

Example:

"Forget everything to do with 'weather'."

"Delete entries that are older than 3 months and have never been retrieved."

Qdrant supports this via API - and you can automate it once a week, for example.

Module 6: FileMaker as control panel

If you - like me - work with FileMaker, you can control all of this remotely via REST-API:

Send promptly
Retrieve context
Answer received
Have a valuation carried out
Save or forget

All you need is a small REST module in FileMaker (Insert from URL with JSON) and a few scripts.

The result: an interface that lets you control your AI like a living notebook - but with intelligence.

🔚 Conclusion: AI is only as good as its leadership

Ollama is powerful. Qdrant is flexible. But without clear rules, both become an unstructured pile of data. The trick is not to store everything - but to keep only what is relevant available and to think in a targeted way instead of just remembering.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Ollama meets Qdrant: A local memory for your AI on the Mac

Local AI with memory - without cloud, without subscription, without detour

Why Qdrant?

Prerequisites

Packageinstallation from Qdrant:

Start Qdrant (Docker)

Python example for Ollama + Qdrant

Notes on practice

Local AI with memory - and what you can do with it

But what now?

🔍 1. personal knowledge database

📄 2. automatic logging and summary

🧠 3. personal coach or diary assistant

💼 4. business applications with FileMaker

🛠️ 5. Support in everyday life: reminders, ideas, recommendations

🤖 6. basis for an agent system

🌐 7. integration with other tools

🔐 8. everything remains local - and under control

Tame Ollama + Qdrant: How to give your local AI structure, rules and fine-tuning

🧭 What's still missing?

Module 1: A "router" for memory logic

Module 2: Exclude older messages (context limitation)

Module 3: Introducing context weights and labels

Module 4: Fine-tuning via the prompt

Module 5: Storage hygiene

Module 6: FileMaker as control panel

🔚 Conclusion: AI is only as good as its leadership

Leave a Comment