Tuesday, August 18, 2009

BioLogic: Logic Programming with Biological Parts

BioLogic is shaping up nicely.

Right now, I have a Perl program that takes all FASTA-formatted BioBricks provided by this page of the Parts Registry, scrapes additional information for each entry from the Parts Registry website, and puts it in a giant Prolog program.

When this program is loaded, you can query the knowledge base to get, say, the names of all the parts sharing the same part parameters as part with id number 9422:

?- parameters(Partname, Parameters),parameters(ThePart, Parameters),part_id(ThePart,9422).

Partname = 'BBa_I14016';
Partname = 'BBa_K0991107';

Cool! From there it's only a small step to get additional information about the parts in question, pick the ones marked 'Group Favorite,' print out their DNA sequences, or get a link to the web page with information about obtaining the part (e.g. a page like this).

It seems to me this is already a neat tool that could help scientists interact and explore the parts database, based on their needs. Just a little work on getting it robust, opening web-pages in a browser window, and the nifty command-line tool is ready. Slap a UI on there (more work than it sounds, unfortunately), and it might actually be user-friendly.

But the real power of Logic Programming is not exploited with little steps like these. We plan on introducing rules into the program that allow it to reason about combining parts.

With just a few basic rules, it'll be possible to give the program a list of one or more parts we'd like to use, and have it return an entire ordered list of the parts and processes required to incorporate them. Several lists, in fact, leaving the final choice up to the scientist, but at least ensuring that - according to our basic rules - each entry in the list is possible.

Ok, so this step in the development of BioLogic is pretty neat - automating some of the reasoning the synthetic biologists need to do to ensure they end up with a functioning organism. Useful and handy, but not as far as we can push this technology.

By adding additional rules about the behavior of parts, the program will be able to generate a complete part sequence based on a list of desired behaviors supplied by a scientist. These rules would apply to what the program knows about each part - the semantic implications of different categories, the part parameters, etc. Specific rules applying to individual parts or part classes could also be added, providing even more detailed and subtle knowledge to the program.

Prolog provides two great benefits in addition to the core functionality described above.
One great thing about Prolog programs is that they don't simply return one "best" answer, unless you really want them to. They can generate a nearly limitless number of solutions, which in this case would give researchers ample opportunity to experiment with its answers and introduce rules to improve the program. The second great bonus is that you can ask Prolog to explain itself - that is, show its chain of reasoning. This feature can give greater insight into the program, how the parts function together, and any errors or limitations in the knowledge base. It also helps explain how Prolog comes up with surprising answers - which often happens, for better or for worse.

Most of this advanced work will have to be performed by synthetic biologists who understand the interaction between parts, chassis, DNA, and the enzymes involved. Anybody know someone who might be interested?


  1. I have a question! How would a scientist define a desired behavior? As an example, take the first iGEM competition goal of making a cell blink. Suppose that another criterion would be making it easy to vary the blinking frequency, or (conversely) making the frequency as robust as possible. I have little experience with logic programming, but defining the goal seems to me to be a tricky issue with BioLogic. Good luck!

  2. This is precisely where logic programming gets to be fun.

    The scientist first has to define what he means by "blink". For example, blinking might be toggling the production of a luminescent substance once every (predefined) period.

    Then the "blink" proposition would entail having a "toggle_function", a "bioluminescence" bound to that function, and a "loop_function" with a period, bound to the "toggle_function".

    Each entailed proposition is defined similarly, logically breaking it down to the concept propositions that relate directly to the standard parts in the registry. The Parts Registry contains several operons involved in bioluminescence, the control sequences for switching gene expression on and off, etc., and the program would try to use any combination of them to realize this simple version of the "blink" proposition.

    More complicated versions of "blink" would include additional functions - a proposition relating the blink frequency with the concentration of iron, for instance, or some restriction on the variation in wave duration, or peak strength. It would also be simple to add restrictions like "only yeast dna" or "does not cause odors as a side effect" in exactly the same way.

    Hopefully a nice UI, an in-depth user guide with a tutorial, and some cleverness will cut the learning curve sufficiently that people will want to use the system.

    I'm thinking we could maybe invent some syntactic sugar that could change the Prolog look-and-feel to something more like Fortran or Mathematica, which many scientists are already familiar with.

  3. I guess the Prolog code for "blink" would be something like:

    blink(X) :- toggle_function(Tog, Bio_lum), bioluminescence(Bio_lum), loop_function(X,Tog,1000).

    Then you'd make a blinking creature like:
    creature(X) :- chassis(X,'e_coli'), blink(X), parts_verified(X).

    Whereas blink/1 operates at a logical level of behavior, parts_verified/1 is a proposition that makes sure all the parts fit together properly at a mechanical/chemical level. This knowledge acts as an additional filter against bad advice from the program. Similar propositions can be made to suit different requirements, like "optimize_survivability", "minimize_cost_in_dollars", or "minimize_nr_nucleotides"...