Skip to content

Commit

Permalink
Remove redundant text, move code examples higher up
Browse files Browse the repository at this point in the history
  • Loading branch information
nking-1 committed May 9, 2024
1 parent 62cf519 commit 1a1e83a
Showing 1 changed file with 93 additions and 108 deletions.
201 changes: 93 additions & 108 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,22 @@
<style>
a button {
background-color: #0164D9;
color: white;
border: none;
border-radius: 5px;
padding-top: 7px;
padding-bottom: 7px;
width: 120px;
text-decoration: none;
cursor: pointer;
transition: background-color 0.2s;
}

a button:hover {
background-color: #014CAB;
}
</style>

<div align="right"><a href="https://guidance.readthedocs.org"><img src="https://readthedocs.org/projects/guidance/badge/?version=latest&style=flat" /></a></div>
<div align="center"><picture>
<source media="(prefers-color-scheme: dark)" srcset="docs/figures/guidance_logo_blue_dark.svg">
Expand All @@ -8,41 +27,16 @@
# An efficient programming paradigm for steering language models
**`{guidance}`** is a programming paradigm that offers superior control and efficiency compared to conventional prompting and chaining. With `{guidance}`, you can control how output is structured and get high-quality output for your use case—**while reducing latency and costs of conventional prompting or fine-tuning.** It allows users to constrain generation (e.g. with regex and CFGs) as well as to interleave control (conditional, loops) and generation seamlessly.

With `{guidance}`, developers write prompt-programs using a blend of text and Python to express the rules and constraints the model must follow. Constraints are then compiled by `{guidance}` software that sits directly in the model's decoding loop and examines generation token by token, eliminating tokens that don’t match what you expressed.

## Demo video
[![Watch the video](https://img.youtube.com/vi/9oXjP5IIMzQ/hqdefault.jpg)](https://www.youtube.com/watch?v=9oXjP5IIMzQ)
<a href="https://www.youtube.com/watch?v=9oXjP5IIMzQ" aria-label="Watch demo"><button>Watch demo</button></a>

<a href="#get-started" aria-label="Get started"><button>Get started</button></a>

# Key Features
The **`{guidance}`** library offers developers an elegant prompt-writing experience, allowing users to interleave control (conditional, loops) and constrain generation seamlessly:

## Developer workflow
- Write pure Python, with additional LM functionality. For example, see [basic generation](#basic-generation).
- [Constrain generation](#constrained-generation) with [selects](#select-basic) (i.e., sets of options), [regular expressions](#regular-expressions), and [context-free grammars](#context-free-grammars), as well as with pre-built components (e.g., substring).
- [Call and deploy tools easily](#easy-tool-use) with automatic interleaving of control and generation.
- [Get high compatibility](#high-compatibility)—execute a single {guidance} program on many backends (Transformers, llama.cpp, VertexAI, OpenAI).
- Gain [speed](#speed) with [stateful control + generation functions](#stateful-control--generation)—no need for intermediate parsers.
- Ensure valid code easily with [token healing](#token-healing)—a `{guidance}` feature that lets you avoid worrying about token boundaries (e.g., ending a prompt with a white space).
### Write pure Python, with additional LM functionality. For example, see [basic generation](#basic-generation).

## Guidance API features
- [Rich templates with f-strings.](#rich-templates-with-f-strings)
- [Abstract chat interface that uses correct special tokens for any chat model.](#abstract-chat-interface-that-uses-the-correct-special-tokens-for-any-chat-model)
- [Easy-to-write reusable components.](#easy-to-write-reusable-components)
- [Streaming support, also integrated with Jupyter notebooks.](#streaming-support-also-integrated-with-jupyter-notebooks)
- [Multi-modal support.](#multi-modal-support)

# Getting started
You can see a brief walkthrough of `{guidance}` before installing:

- [Basic tutorial](notebooks/tutorials/intro_to_guidance.ipynb)
- [Install](#install)
- [Loading models](#loading-models)

# Features and examples

### **Pure, beautiful python** with additional LM functionality.
E.g. here is [basic generation](#basic-generation):
```python
from guidance import models, gen

Expand All @@ -54,7 +48,8 @@ llama2 + f'Do you want a joke or a poem? ' + gen(stop='.')
```
<img alt="Do you want a joke or a poem? I'll give you a poem" src="docs/figures/simple_gen_llama2_7b.png" width="354">

### [**Constrained generation**](#constrained-generation) with [selects](#select-basic), [regular expressions](#regular-expressions), and [context-free grammars](#context-free-grammars)
### [Constrain generation](#constrained-generation) with [selects](#select-basic) (i.e., sets of options), [regular expressions](#regular-expressions), and [context-free grammars](#context-free-grammars), as well as with pre-built components (e.g., substring).

```python
from guidance import select

Expand All @@ -63,7 +58,63 @@ llama2 + f'Do you want a joke or a poem? A ' + select(['joke', 'poem'])
```
<img alt="Do you want a joke or a poem? A poem" src="docs/figures/simple_select_llama2_7b.png" width="277">

### Rich templates with f-strings
### Call and deploy tools easily with automatic interleaving of control and generation.
[Easy tool use](#automatic-interleaving-of-control-and-generation-tool-use), where the model stops generation when a tool is called, calls the tool, then resumes generation. For example, here is a simple version of a calculator, via four separate 'tools':
```python
@guidance
def add(lm, input1, input2):
lm += f' = {int(input1) + int(input2)}'
return lm
@guidance
def subtract(lm, input1, input2):
lm += f' = {int(input1) - int(input2)}'
return lm
@guidance
def multiply(lm, input1, input2):
lm += f' = {float(input1) * float(input2)}'
return lm
@guidance
def divide(lm, input1, input2):
lm += f' = {float(input1) / float(input2)}'
return lm
```
Now we call `gen` with these tools as options. Notice how generation is stopped and restarted automatically:
```python
lm = llama2 + '''\
1 + 1 = add(1, 1) = 2
2 - 3 = subtract(2, 3) = -1
'''
lm + gen(max_tokens=15, tools=[add, subtract, multiply, divide])
```
<img width="201" alt="image" src="https://github.com/guidance-ai/guidance/assets/3740613/646e1a7d-0206-419b-8206-1d835c3a0e0a"><br>

### Get high compatibility—execute a single {guidance} program on many backends (Transformers, llama.cpp, VertexAI, OpenAI).
Works with Transformers, llama.cpp, VertexAI, OpenAI. Users can write one guidance program and execute it on many backends. (note that the most powerful control features require endpoint integration, and for now work best with Transformers and llama.cpp).
```python
gpt = models.OpenAI("gpt-3.5-turbo")

with user():
lm = gpt + "What is the capital of France?"

with assistant():
lm += gen("capital")

with user():
lm += "What is one short surprising fact about it?"

with assistant():
lm += gen("fact")
```
<img width="645" alt="image" src="https://github.com/guidance-ai/guidance/assets/3740613/f31ed7b8-1868-44d2-b14c-4842b0a40e5c"><br>

### Gain speed with [stateful control + generation functions](#stateful-control--generation)—no need for intermediate parsers.
In contrast to chaining, `{guidance}` programs are the equivalent of a single LLM call. More so, whatever non-generated text that gets appended is batched, so that `{guidance}` programs are **faster** than having the LM generate intermediate text when you have a set structure.

### Ensure valid code easily with token healing—a `{guidance}` feature that lets you avoid worrying about token boundaries (e.g., ending a prompt with a white space).
Users deal with text (or bytes) rather than tokens, and thus don't have to worry about [perverse token boundaries issues](https://towardsdatascience.com/the-art-of-prompt-design-prompt-boundaries-and-token-healing-3b2448b0be38) such as 'prompt ending in whitespace'.

## Guidance API features
### Rich templates with f-strings.
```python
llama2 + f'''\
Do you want a joke or a poem? A {select(['joke', 'poem'])}.
Expand All @@ -72,7 +123,7 @@ Okay, here is a one-liner: "{gen(stop='"')}"
```
<img width="358" alt="image" src="https://github.com/guidance-ai/guidance/assets/3740613/486ca968-89b1-4c02-b914-3b9714fe5890"><br>

### [**Stateful control + generation**](#stateful-control--generation) makes it easy to interleave prompting / logic / generation, no need for intermediate parsers
### Abstract chat interface that uses correct special tokens for any chat model.
```python
# capture our selection under the name 'answer'
lm = llama2 + f"Do you want a joke or a poem? A {select(['joke', 'poem'], name='answer')}.\n"
Expand All @@ -85,24 +136,7 @@ else:
```
<img width="393" alt="image" src="https://github.com/guidance-ai/guidance/assets/3740613/66d47ce7-1d5a-4dbd-b676-66b9c1094184"><br>


### **Abstract chat interface** that uses the correct special tokens for any chat model
```python
from guidance import user, assistant

# load a chat model
chat_lm = models.LlamaCppChat(path)

# wrap with chat block contexts
with user():
lm = chat_lm + 'Do you want a joke or a poem?'

with assistant():
lm += f"A {select(['joke', 'poem'])}."`
```
<img width="331" alt="image" src="https://github.com/guidance-ai/guidance/assets/3740613/89c3e0e2-ed0a-4715-8366-2efca74b7b71"><br>

### Easy to write reusable components
### Easy-to-write reusable components.
```python
import guidance

Expand All @@ -119,7 +153,7 @@ lm += one_line_thing(lm['thing'], 'cats')
```
<img width="386" alt="image" src="https://github.com/guidance-ai/guidance/assets/3740613/60071680-8bbb-4fa5-a298-613d4fd55fa7"><br>

7. **A library of pre-built components**, e.g. substring:
### A library of pre-built components, e.g. substring:
```python
from guidance import substring

Expand All @@ -131,43 +165,7 @@ llama2 + f'Here is a true statement about the guidance library: "{substring(text
```
<img width="589" alt="image" src="https://github.com/guidance-ai/guidance/assets/3740613/9a7178ad-ed73-4e6b-b418-f9d2a3a76b88"><br>

### Easy tool use
**[Easy tool use]**(#automatic-interleaving-of-control-and-generation-tool-use), where the model stops generation when a tool is called, calls the tool, then resumes generation. For example, here is a simple version of a calculator, via four separate 'tools':
```python
@guidance
def add(lm, input1, input2):
lm += f' = {int(input1) + int(input2)}'
return lm
@guidance
def subtract(lm, input1, input2):
lm += f' = {int(input1) - int(input2)}'
return lm
@guidance
def multiply(lm, input1, input2):
lm += f' = {float(input1) * float(input2)}'
return lm
@guidance
def divide(lm, input1, input2):
lm += f' = {float(input1) / float(input2)}'
return lm
```
Now we call `gen` with these tools as options. Notice how generation is stopped and restarted automatically:
```python
lm = llama2 + '''\
1 + 1 = add(1, 1) = 2
2 - 3 = subtract(2, 3) = -1
'''
lm + gen(max_tokens=15, tools=[add, subtract, multiply, divide])
```
<img width="201" alt="image" src="https://github.com/guidance-ai/guidance/assets/3740613/646e1a7d-0206-419b-8206-1d835c3a0e0a"><br>

### Speed
In contrast to chaining, `{guidance}` programs are the equivalent of a single LLM call. More so, whatever non-generated text that gets appended is batched, so that `{guidance}` programs are **faster** than having the LM generate intermediate text when you have a set structure.

### Token healing
Users deal with text (or bytes) rather than tokens, and thus don't have to worry about [perverse token boundaries issues](https://towardsdatascience.com/the-art-of-prompt-design-prompt-boundaries-and-token-healing-3b2448b0be38) such as 'prompt ending in whitespace'.

### Streaming support, also integrated with Jupyter notebooks
### Streaming support, also integrated with Jupyter notebooks.
```python
lm = llama2 + 'Here is a cute 5-line poem about cats and dogs:\n'
for i in range(5):
Expand All @@ -180,27 +178,7 @@ For environments that don't support guidance's rich IPython/Jupyter/HTML based v
```python
llama2 = models.LlamaCpp(path, echo=False)
```

### High compatibility
Works with Transformers, llama.cpp, VertexAI, OpenAI. Users can write one guidance program and execute it on many backends. (note that the most powerful control features require endpoint integration, and for now work best with Transformers and llama.cpp).
```python
gpt = models.OpenAI("gpt-3.5-turbo")

with user():
lm = gpt + "What is the capital of France?"

with assistant():
lm += gen("capital")

with user():
lm += "What is one short surprising fact about it?"

with assistant():
lm += gen("fact")
```
<img width="645" alt="image" src="https://github.com/guidance-ai/guidance/assets/3740613/f31ed7b8-1868-44d2-b14c-4842b0a40e5c"><br>

### Multi-modal support
### Multi-modal support.
```python
from guidance import image

Expand All @@ -215,6 +193,13 @@ with assistant():
<img width="673" alt="image" src="https://github.com/guidance-ai/guidance/assets/3740613/6450d05d-52e9-4ef5-b280-8b57e733d46d">


# Get started
You can see a brief walkthrough of `{guidance}` before installing:

- [Basic tutorial](notebooks/tutorials/intro_to_guidance.ipynb)
- [Install](#install)
- [Loading models](#loading-models)


## Table of Contents
* [Install](#install)
Expand Down

0 comments on commit 1a1e83a

Please sign in to comment.