Analyzing Public Datasets 5: Introduction To IGV (cDNA)

Hello, this is David coil and welcome to the next video in our series, analyzing public datasets, an introduction to IGA cDNA in a previous video. We generated three files shown here taken from galaxy. First is a cDNA database that we used for our alignment. And the second and third are the bam and by files from the BWA alignment that we performed on galaxy.

What we're going to do today is use a genomics browser called IGA, which stands for integrative genomics viewer to actually visualize this. Alignment to begin let's, get I GV open up a browser will search for integrative genomics, viewer there it is and over here on the left is downloads. You may have to register for the site, but registration is free. I GV is a Java-based application. So you'll need to relaunch it from the website each time, or you can download a shortcut to your toolbar, which will also relaunch the application and for our purposes, the smallest launch, the smallest memory usage is fine. So we'll click on launch and. We'll, go ahead and get this running.

And this will take a couple of minutes to load up. It will ask for access to your computer, which is fine, and we'll be back when it's ready to go. And here we are in IGA by default. I GV opens the most recent version of the human genome, which is obviously not what we want there's, a variety of genomes present in IGA. But we don't want to align our BAM and buy files to genomic DNA, because we performed our alignment with cDNA.

We want to visualize the alignment against. The same cDNA database that we used on galaxy. So to begin, we do file import genome, and we'll give it a name we'll just call it rice.

And then the file is this all cDNA, go ahead and save this, and it's going to go ahead and import that as a genome. Okay, now you can see. This is the rice cDNA.

And here is the list of all the cDNA. So we can scroll through and look at whatever cDNA. We want let's get the alignment in here.

Go to file load from file and here's. We open the BAM file and we'll automatically. Associate with the appropriate buy file I should note here that if you don't use exactly the same cDNA or genomic sequence that you performed your alignment with IGA is probably not going to work if the file structure is not identical.

The genomics viewer is not going to be able to visualize your alignment. Let's, go ahead and open right up BAM. And here it is what you see here up at the top is the cDNA, so it's, a three thousand base cDNA with this location. And each of these boxes down. Here is a read from our Illumine run, and we can zoom in, and then we can drive around this alignment to wherever we'd like, and you can see that there are some arrows that point to the left. So those are reads found in the reverse orientation and some arrows that point to the right.

Those are reads found in the forward orientation at the top. Here is a histogram of coverage. And this line in the middle shows where we are. So for example, this base is very well, covered a lot of reads covered that base. And you can see that the histogram is high at that position. If we zoom in a little further, you can see a color representation of the basis at the bottom and even further, and you can actually see the bases themselves.

So you can see this position is a G in the cDNA and is a G in all the reads as one would expect. In this case because this RNA seek data is from the same genotype as a control condition as the cDNA sequence was taken from so other than looking at the pretty arrows and colors. What.

Is it we might want to do with IGA. One thing is just to examine coverage through a gene of interest. We can see that coverage in this region is reasonably good and there's, no coverage in this region. Another thing we can do with IGA is to visualize alternative splicing, we're, not going to do that in this video because we've aligned here to cDNA, but in a subsequent video we're going to align two genomic DNA, where it's very easy to visualize alternative splicing.

One thing we can do with this data set. However, is to look for snips base changes. So the fact that all of these are gray or white means that they're a perfect match, the entire read matches the underlying cDNA sequence. And we scroll around a little, we see some color, and that means that there's a change. And so right here, for example, this read has a C where, in fact, there should be an A. And so the histogram shows 50% a 50% C. Now, this isn't actually a snip because it occurs in only one of two reads.

And it occurs right at the beginning. Of a read, so this is probably a sequencing error. So what might a snip actually look like, but rather than looking through hundreds of cDNAs for a good snip? I've gone ahead and found a snip already that's, a good example, so we'll, go ahead and take a look at that. Here we go. Go ahead and zoom in a bit and scroll over right here.

Zoom in all the way one thing I should note is that these darker quality reads are higher quality reads. And these lighter colored reads are lower quality. And so here you have. A case where you have many high quality reads that cover this position and a little over half of them have a G where in the reference sequence, it was an A. So this is most likely a snip. And in fact, this is exactly what heterozygosity would normally look like about half. The reads have a G about half and have an A.

So you would expect in a heterozygote. You got G from one parent a from another parent. In fact, this particular dataset is homozygous and is the same as a reference. So.

Something like this is either residual heterozygosity, which is possible or a gene duplication event where this gene has duplicated. But the sequence is so close that there are only a couple of small changes between the two versions. So they map to the same position regardless for our purposes. This is if you are looking your gene of interest, you would examine this further as a putative snip and that's all we're going to talk about in this video in the next video we're going to do a different kind. Of alignment where we're going to take RNA, seek data, and we're going to align it to genomic DNA, and in a subsequent video we're going to come back to ITV, and we're going to see what RNA seek reads look like aligned to genomic DNA, instead of cDNA as in this video thanks for watching, and we'll. See you then you.