A pragmatic guide to LLM evals for devs