This is an old revision of the document!

Generative AI — An in-depth guide for hackers (DokuWiki)

1. TL;DR / Executive summary

Generative AI = a set of machine-learning techniques that *produce* new content: text, images, audio, code, or other modalities. Modern generative systems are dominated by a few families of architectures (autoregressive transformers for text; diffusion and score-based models for images/audio; autoregressive and diffusion hybrids for multimodal). They are trained on massive datasets and can be tuned or controlled via conditioning, fine-tuning, and reinforcement learning with human feedback. These systems are powerful but brittle: hallucinations, prompt-injection and data-privacy risks, and supply-chain/operational hazards are real and must be treated like security problems.

2. What “generative” means (precise)

A generative model defines a probability distribution p_theta(x) or a conditional p_theta(x | c) over data x (text tokens, pixels, spectrogram frames…) and provides a method to *sample* realistic examples. Practically this breaks into two classes:

* Autoregressive / sequential sampling — factorize p(x) into p(x_1)prod_{t>1} p(x_t| x_{<t}). Standard for text (GPT family). * Score / diffusion / energy-based — learn a process that maps noise to data (or the reverse), sampling by iteratively denoising or solving an SDE. Dominates state-of-the-art image synthesis.

3. The main model families (what they are, how they sample)

3.1 Transformers (autoregressive & encoder-decoder variants)

* Core idea: self-attention lets the model compute context-dependent representations across all positions. Attention operation:

Attention(Q,K,V) = softmax( Q K^T / sqrt(d_k) ) V

* GPT-style models: stack masked self-attention layers for autoregressive generation (predict next token). Pre-trained on massive corpora, often fine-tuned.

3.2 Diffusion / score-based models (images, audio, sometimes text)

* Define a forward process that gradually adds noise to data; train a neural network to reverse that noising. Sampling reverses the process via iterative denoising. Connects to denoising score matching and stochastic differential equations.

3.3 GANs, flows, VAEs (historical / specialized use)

* GANs (Generator+Discriminator) were state-of-the-art for realistic images; flows and VAEs provide tractable likelihoods. Less central today, useful for speed/latent control tradeoffs.

4. Training paradigms

* Pretraining / self-supervised learning — train on large unlabeled corpora with a proxy objective (next-token, masked token, denoising). Builds general capabilities. * Fine-tuning — supervised or task-specific training on labeled data. * Reinforcement Learning from Human Feedback (RLHF) — humans rank outputs; reward model is trained and used with policy optimization to align generation with preferences. * Conditional / guided sampling — conditioning on prompts, class labels, or auxiliary inputs.

5. Capabilities and emergent behaviors

Generative models can compose coherent paragraphs, summarize, translate, write code, produce photorealistic/stylized images, synthesize audio/music, and perform zero-/few-shot tasks. Failure modes include hallucinations, prompt sensitivity, and biased content.

6. Practical internals — tokens, context, temperature, sampling

* Tokens & vocabulary: models operate on tokens (subword units). * Context window: finite length; older tokens may be “forgotten.” * Sampling controls: temperature, top-k, nucleus (top-p) sampling. * Prompting vs system messages vs embeddings: different APIs for giving instructions or context.

7. Deployment patterns & systems architecture

* On-prem vs cloud: tradeoffs in latency, data governance, cost. * Model sharding & quantization: reduces RAM and cost. * Safety stack: content filters, rate limits, retrieval filters, RLHF policies.

8. Security, adversarial vectors, and “hacker” concerns

8.1 Prompt injection and prompt-based attacks

* Malicious inputs causing models to ignore instructions, reveal hidden prompts, or perform unintended actions.

8.2 Jailbreaking & policy bypass

* Adversarial prompt sequences may circumvent model safety policies.

8.3 Data leakage, training-data extraction, and model inversion

* Models can memorize rare sequences; careful querying can attempt to extract memorized data.

8.4 Supply-chain & model poisoning

* Using third-party models/datasets can introduce backdoors or poisoned behaviors.

8.5 Downstream automation & RCE risk

* LLM outputs wired into automation can result in harmful actions; treat outputs as untrusted inputs.

9. Ethical, legal and policy considerations

* Bias & fairness: models can perpetuate societal biases. * Copyright & content provenance: outputs may reflect training data. * Regulatory landscape & industry self-governance.

10. Practical advice for hackers, researchers and practitioners

* Use controlled red-team exercises. * Prompt engineering: explicit system messages, few-shot examples. * Monitoring: log prompts/outputs. * Safe automation: secondary checks, human approval. * Harden interfaces: rate-limit, authenticate. * Data hygiene: provenance ledger, differential privacy.

11. Tools, frameworks and libraries (quick survey)

* Transformers & ecosystem: Hugging Face Transformers, Fairseq, DeepSpeed, Megatron-LM. * Diffusion/image stacks: guided diffusion, Stable Diffusion. * Security & auditing: OWASP GenAI resources.

12. Example: compact technical snippets

Masked self-attention:

q = x W_Q
k = x W_K
v = x W_V
A = softmax( (q k^T) / sqrt(d_k) + M )
y = A v

Diffusion training objective:

L = E_{x, t, noise} || epsilon - epsilon_theta( x_t, t ) ||^2

13. Failure modes & “what to watch for”

* Hallucinations, over-confident wrong answers, token-based sensitivity, model drift, adversarial inputs.

14. Ongoing research directions & the near future

* Larger context windows and retrieval-augmented generation. * Multimodal generative systems. * Efficiency & model compression. * Robustness & verifiable safety.

15. Responsible disclosure & whitehat norms for hackers

* Don’t publish exploit code; follow coordinated disclosure. * Contact vendors via official channels. * Provide minimal test cases, operational impact, and mitigation suggestions.

16. Further reading & canonical sources

* OpenAI API docs & overviews. * Diffusion model surveys. * OWASP GenAI. * Academic studies on prompt injection and red-teaming. * Industry reports on failures & jailbreaks.

17. Appendix — Glossary

LLM — Large Language Model.
RLHF — Reinforcement Learning from Human Feedback.
Diffusion — iterative denoising generative family.
Hallucination — fluent but false output.
Prompt injection — input that subverts model intentions.

18. Closing / ethical call to arms

Generative AI amplifies productivity and abuse potential. Hackers, researchers, and operators should approach systems with adversarial mindset, understand internals, threat models, auditing techniques, and responsible disclosure norms. Defensive engineering and continuous red-teaming are mandatory.

Hacker Wiki

Table of Contents