Biology

BNFO301: Exam 1 1. List all the changes that can be produced by a single base pair mutation in the ​AGA

codon encoding arginine and label the resulting amino acid. In addition label each mutation as silent, missense or nonsense. (​4pts​)

2. What would be the value of using a dot plot to compare a sequence to its own reverse complement? (​2 pts​)

Sketch the dot plot o

3. f a 1 kb sequence in which a motif of approximately 50 consecutive bases appears six times in the N terminal region of the sequence. (​4 pts​)

4. Use the PAM250 matrix to answer question 4. a. Give the score for aligning two alanines (A) (​1 pt​)

b. Give the score for aligning two tryptophans (W) (​1 pt​)

c. Both of these alignments constitute “matches”, so why are the scores so different? (​2

pts​)

Use the BLOSUM62 matrix for questions 5 and 6.

5. Calculate the dynamic programming matrix and an optimal ​GLOBAL​ alignment for the

protein sequences ​FKHMEDPLE ​ and ​FMDTPLNE ​, scoring -2 for a gap (i.e. 2 is the gap penalty). Use the BLOSUM62 substitution matrix (given above).

a. Fill out the matrix. (​6 pts​) b. Highlight the traceback alignment. (​1 pt​) c. Write out the final alignment. (​2 pts​) d. Score the final alignment. (​1 pt​)

6. Calculate the dynamic programming matrix and an optimal​ LOCAL​ alignment for the protein sequences ​FKHMEDPLE ​ and ​FMDTPLNE ​. Use the BLOSUM62 matrix (provided above).

a. Fill out the matrix. (​6 pts​) b. Highlight the traceback local alignments. (​1 pt​) c. Write out the final alignment. (​2 pts​) d. Score the final alignment. (​1 pt​)

7. What is 16S rRNA and what is its function inside a cell? (​2 pts​)

8. 16s rRNA is widely used in microbiome studies. List two strengths and two limitations of

16S rRNA sequencing. (​4 pts​) 9. Can 16S rRNA be used to classify viruses? Why or why not? (​2 pts​) 10. Which of the following amino acids is least mutable according to the PAM scoring

matrix? (​2 pts​)

a. Alanine

b. Glutamine

c. Methionine

d. Cysteine

1. Which of the following sentences ​BEST​ describes the difference between a global

alignment and a local alignment between two sequences? (​2 pts​)

a. Global alignment is usually used for DNA sequences, while local alignment is usually used for protein sequences.

b. Global alignment has gaps, while local alignment does not have gaps.

c. Global alignment finds the global maximum, while local alignment finds the local maximum.

d. Global alignment aligns the whole sequence, while local alignment finds the best subsequence that aligns.

2. How does the BLOSUM scoring matrix differ most notably from the PAM scoring matrix?

(​2 pts​)

a. It is best used for aligning very closely related proteins.

b. It is based on global multiple alignment from closely related proteins.

c. It is based on local multiple alignments from distantly related proteins.

d. It combines local and global alignment information.

3. A global alignment algorithm (such as Needleman-Wunsch algorithm) is guaranteed to

find an optimal alignment. Such an algorithm: (​2 pts​)

a. puts the two proteins being compared into a matrix and finds the optimal score by exhaustively searching every possible combination of alignments.

b. puts the two proteins being compared into a matrix and finds the optimal score by iterative recursions.

c. puts the two proteins being compared into a matrix and finds the optimal alignment by finding optimal subpaths that define the best alignment(s)

d. can be used for proteins but not for DNA sequences.

4. What are the basic concepts of library preparation? (​4 pts​)

5. List 3 applications of next-generation sequencing. (​2 pts​) 6. How many reads do you need to get 30x coverage of your genome if your read length is

300bp and your genome size is 10Mb? (​2 pts​)

Command line

Log in to compile. Navigate to the ​bnfo301 ​ (​home/bnfo301 ​) directory. There is a folder called ​exam1 ​ where you will find all the files you need to answer the next set of questions.

Instructions for this section​:

• Write your output files to your user specific folder in ​/home/bnfo301 ​ (ex. my user specific folder is ​/home/bnfo301/huangb2 ​). You will be graded on the files found in your specific folder. If the files are not in that folder you will not get credit for your answers. No exceptions.

• Make sure you name your output file as instructed in each question. I will take off 1 point for each output file that is not correctly named.

• Code is typically written using a fixed width font. Use a fixed width font to type your commands in this section (ex. courier, inconsolata, menlo, monaco).

• For each question, provide the command when specified, or the command and answer. All output files from this section should be written to you user specific folder on compile. I will access your user specific folder to grade this section.

1. List the files in the ​exam1 ​folder. ​command only​ (​2 pts​)

2. Count how many sequences are in the ​protein-db.faa ​ file? ​command and answer​ (​2 pts​)

3. You have an ​unknown1.faa ​ sequence that you want to blast against sequences in the protein-db.faa ​ file.

a. Copy the ​protein-db.faa ​ to your user specific folder. ​command only

b. Create a blast database for ​protein-db.faa ​. ​command only​ (​2 pts​)

c. Blast ​unknown1.faa ​ against the database you just created. Name your blast output file ​3b-unknown-output.txt ​. ​command only, leave output file on Compile​ (​2 pts​)

d. Filter your blast results for hits with an evalue greater than 1e-05. Name your blast output file ​3c-unknown-output.txt ​. ​command only, leave output file on Compile (​2 pts​)

e. What is the percent identity and alignment length of the ​best hit​ in your blast results when you filter based on an evalue greater than 1e-05? ​Hint: you may need to change your output format. ​ (​8 pts​)

f. What is the percent identity and alignment length of the ​worst hit​ in your blast results when you filter based on an evalue greater than 1e-05? ​Hint: you may need to change your output format. ​ (​4 pts​)

7. BLAST is a tool that can be used to query multiple databases. It is not always necessary to create your own database. One of the most common blast databases is the non-redundant database (nr).

a. Blast the ​unknown1.faa ​ sequence against the nr database (​/home/norrissw/bin/I-TASSER4.2/lib/nr/nr ​) to find out what it is. Name your blast output file ​4a-unknown-nr-output.txt ​. NOTE: you do not need to run the ​makeblastdb ​ command. Also, it can take a few minutes for your blast to run because the nr database is very big. ​command only, leave output file on Compile​ (​2 pts​)

b. Filter your blast results for hits with an e-value greater than 1e-10. Name your blast output file ​4b-unknown-nr-output.txt ​. ​command only, leave output file on Compile​ (​2 pts​)

c. Based on the best hit from nr, take the accession number and identify what that protein is. (​4 pts​)

8. The next set of questions involve the ​pipeline.py ​ script

a. Copy the ​pipeline.py ​ script to your ​/home/bnfo301/vcuid ​(​2 pts​)

b. Rename the ​pipeline.py ​ script to ​5b-pipeline.py ​. (​2 pts​)

c. Describe in detail what the script is doing, including what the output from each step is. (​4 pts​)

d. Modify the script so it filters the blast results using an e-value cut off of 1e-05. Save the modified script as ​5d-pipeline.py ​. You do not need to run the script, just add in your modification. ​leave output file on Compile​ (​2 pts​)

Order now and get 10% discount on all orders above $50 now!!The professional are ready and willing handle your assignment.

ORDER NOW »»