Friday, May 23, 2014

An Intro to Next-Generation Sequencing (NGS) - Part 1

A few months back, I wrote a post about the alleged dangers of eating DNA from a GMO. In summary, our bodies don't know the difference between DNA derived from a transgenic crop or a traditionally bred crop. We've been eating cellular material for quite some time now and we haven't become green from eating veggies. (I wonder if the Creation Museum has a display of a caveman chomping down on some delicious dinosaur ribs.)

However, I still see many arguments about how we've now found DNA from our food circulating in our blood or how we've found RNA from rice inside us. Many of these studies have been enabled through a technology known as Next-Generation Sequencing (NGS). In the spirit of full disclosure, this is my field of work: I've worked in companies that develop NGS technologies for 6 years: the last 4 have been in internal product development, and this last year has been in the R&D lab itself. In this series of posts I'm going to explain the controversies about a few papers that have used NGS technologies and are used as examples of how eating DNA from a GMO is dangerous.

This post gives an overview of the technology and the considerations in experimental design. Next week (or sometime after), I'll post reviews of the papers.

Let’s begin with a brief history of DNA sequencing: prior to NGS, you generally had to know what you were going to sequence. You couldn't just randomly take a DNA sample and tell a lab: “tell me what's in here”. You had to know what you were looking for in order to do your experimental design. Even in forensics, which has yet to adopt NGS, they look at very specific and well characterized regions of the genome. There were ways around this to allow for discovery, but the processes were long and very expensive, which was why the sequencing of the human genome took 13 years (1990-2003) and cost 2.7 billion dollars. The most popular technologies behind next-generation sequencing follow the same general principle: you take your DNA, you chop it up, you amplify it so that the machines have enough to work with and detect, then you put it on a machine that “reads” each DNA base and tells you what’s there. There are several different chemistries for sequencing, each patented by a different company, and each of which has its pros and cons. With the advent of NGS, scientists found that they could virtually sequence anything. There have been a lot of exploratory experiments going on in the past decade based on this technology. Not only that, but you aren’t necessarily restricted to the analysis of DNA. You can indirectly sequence RNA, DNA modifications and structures, as well as DNA bound to proteins.

Here are some examples of the amazing things that have been done with NGS (and why):

Pretty awesome, eh? The possibilities are seemingly endless. I actually want a sequencer in my garage, but I’ve been deterred by the thought that my employers would notice if one went missing… (and a shout-out to the spouse for cleaning out the garage for Mother’s Day!! Don’t worry. I won’t turn it into a lab. For now).

Hopefully, you can imagine the applications and experiments that could be performed in agricultural biotechnology, as well as studies pertaining to transgenic organisms. As I've mentioned before, companies use NGS to determine if there were any unintended consequences of the transgenic event. Studies could also be performed to determine the impact of glyphosate on the gut microbiome or on bacteria in the soil, or to determine what happens to the DNA of the food we eat. In another possible application, the mysterious pathogenic organism that Dr Don Huber claims to be enriched in GMOs could also be sequenced, if he were to release the organism (outlined eloquently by Dr Kevin Folta in this petition).

There are a few more concepts that require explanation. One of the key questions in an NGS experiment is “how much sequencing do I have to do”? Here’s an analogy: imagine you’re baking oatmeal cookies with chocolate chips and raisins. You make a big batch of cookie dough. Your kid walks by and throws in a very small handful of dried cranberries. Then you bake cookies. For the sake of this analogy, we have to imagine that the number of cookies you could bake was infinite (i.e. you had an endless amount of cookie dough).

How many cookies do you have to bake and eat in order to determine the ratio of chocolate chips to raisins? If you bake 10 of them, you probably get a good enough idea, right? What if you want to know if there are any raisins at all. You might be able to get away with baking a single cookie. But what if you want to know how many cranberries your kid threw in. Do you bake 20? 30? 100? The number of cookies that you bake depends on the question that you’re asking.

The same is true in the world of NGS. If you’re looking for a mutation that you inherited from your mom and is present in all your cells, you can do a “standard” amount of sequencing for the technology you’re using. But what if you suspect that you might be HIV positive, and the event that led to this suspicion occurred very recently? How much DNA do you have to sequence in order to detect the presence of the virus? The answer will be very different. It's basically a question of abundance. Looking for something that is present in every cell will require much less sequencing than looking for something that is much more rare.
File:Chocolate Chip Oatmeal Cookies detail.jpg
Oatmeal chocolate chip cookies.
Beware! You may become a chocolate chip
oatmeal cookie by absorbing its DNA!
From Wikimedia commons.
But I wish it was from cookies in my pantry.
The next concept is that of input material vs contaminant. In our cookie analogy, imagine that you make 2 batches of cookies: a 1 cup batch and a 1 gallon batch. Since the toddler in this analogy is of the “up-to-no-good-variety”, he manages to throw the same amount of cranberries in both batches without you noticing. For the small batch, odds are that you’ll have a cranberry in every cookie you bake. You might even conclude that the cookies weren’t chocolate-raisin, but were chocolate-raisin-cranberry. However, for the second and larger batch, you could probably eat a full dozen without coming across a single cranberry. If you do come across a cranberry, you’d probably say “Huh… What’s that doing in there?”

In the world of NGS, the same is true. If you start with a lot of DNA, you can exclude contaminants more easily/readily than if you start with a small amount, and the inclusion of appropriate controls is a key element. Contamination does happen, however, its impact on your experiment depends on the amount of sequencing you perform and the question you're trying to get answered. For example, the world's first next-gen sequencing diagnostic assay actually allows for 10% contamination before the experiment is deemed a failure. However, the assay's accuracy is still incredible because it does a lot of sequencing and is asking a simple question (i.e, it's only looking for chocolate chips and raisins, not cranberries).

So you see that there are many considerations on how to use the technology depending on the experiment, and every experiment needs to use different controls even though the technology used may be the same. However, such considerations can be often overlooked.

Make sense? Alrighty! I hope to see you here next week when we start reviewing the papers.

BTW, my husband wanted me to change cranberries to walnuts. But I pointed out that walnuts belong in a cookie and would never be mistaken for a contaminant, whereas cranberries don't belong in there. He has seen the error in his views and now agrees.

1 comment:

  1. They also didnt use very strict p-values or do multiple test corrections - pretty amazing how few significant hits they found without it!