LLM Studio

This single-page workshop helps teammates see how language models break text into tokens and how a generation loop stitches them back together. Swap strategies, follow the pipeline, and narrate what each stage does.

← back to home

Foundations

Tokens are the morsels a model can actually understand. They are rarely whole words; instead, think characters, chunks, and punctuation cooked into a consistent vocabulary. The mock pipeline below mirrors the rhythm production teams rely on: tokenize → embed → attend → sample → decode.

Why tokenize?

Neural nets cannot ingest raw text. Tokenization converts text into numeric IDs aligned with a training vocabulary so the network can look up embeddings.

Vocabulary trade-offs

More tokens mean shorter sequences but a heavier model head. Fewer tokens mean longer sequences but cheaper embeddings. Modern LLMs strike a balance with Byte Pair or SentencePiece.

Inference loop

Each generated token feeds back into the context. Temperature nudges randomness; top-k limits options to the most likely candidates to keep responses on task.

Tokenization Playground

Paste any text, then flip between strategies to see how the same thought becomes model-ready tokens. The mock Byte Pair encoder uses a tiny merge table so the rules stay transparent.

Mock LLM Console

Send a prompt and watch the scripted model walk through each stage. The response is deterministic when temperature is 0 and becomes more exploratory as you dial it up.

Temperature 0.30

Top-k 4

system Welcome! Ask about tokens, embeddings, or LLM behaviour to see how the mock model responds.

1. Tokenize Split the prompt into vocabulary units. We reuse the playground strategy so you can reference the same tokens.

2. Embed Look up vector representations. Embeddings hold semantic meaning and let similar ideas cluster together.

3. Attend Self-attention mixes context. The model weighs each token against the others to see which details matter most.

4. Sample Sample the next token using temperature and top-k. Higher temperature injects randomness; smaller k keeps answers tight.

5. Decode Convert token IDs back to user-facing text. Repeat until we hit an end token or length limit.