How does LMQL work and Why is it dead?

LMQL (arxiv) is a tool to steer language model generation. The language interface nicely stitches together a set of features such as:

String manipulation, filling holes and variables
String constraints for output tokens, switch on and off generation
Tool use

Here’s how it works:

1. String manipulation: [WORDS] denotes a hole that needs language model to fill; {WORDS} denotes a variable that exists in the scope context. For instance:

  Write a summary of {name}, the singer:
  {{
      "name": "[STRING_VALUE]",
      "age": [INT_VALUE],
      "top_songs": [[
          "[STRING_VALUE]",
          "[STRING_VALUE]"
      ]]
  }}

Given the context {'name': 'Bruno Mars'}, the variable name is replaced with Bruno Mars to get the initial prompt Write a summary of {name}, the singer: \{\{ "name": ", the generation spits out a few tokens, followed by a quote ", LMQL detects the quote and stops the generation, appends ", "age": , and continues.

2. String constraints using masking: constraints, options all use the simple idea that we could limit the available tokens models can choose from. In openai API, this is done by setting the logit_bias parameter, with the format of {"50256": -100, ...}.

3. Tool use: This is similiar to variable substitution, with the ability to call a function to get the variable’s value. However it is different from the commonly used ‘tool use’ where models can choose among a set of tools. Although in theory, there’s nothing that stops LMQL to switch to structured output to choose a tool and run it, and switch back to the previous generation context.

Why it didn’t take off?

I suspect there are two reasons: The real world use cases are largely satisfied by structured/json output; instruction following has improved a lot. Combining these two gives us a nice alternative to LMQL using simply string interpolation using context from json output.

A Different Perspective

Another way to think about tool use in this manner is factual knowledge retrieval. Kilian Weinberger recently showed a related technique that steers model’s generation to pause and fetch factual knowledge from a database. The behavior is trained into the model without hurting its generalization ability.

Josherich's Blog