Dna Assignment

The Big Picture

See The Medium Pictures and The Little Pictures for the specifics of what you're going to do and how to proceed. The Big Picture is to give you a bird's-eye view of what's going to happen. I recommend you read this page through, start to finish, before you snarf any code!

Background
In this assignment you'll experiment with different implementations of a simulated restriction enzyme cutting (or cleaving) a DNA molecule. Three scientists shared the Nobel Prize in 1978 for the discover of restriction enzymes. They're also an essential part of the process called PCR ("Polymerase Chain Reaction") which is one of the most significant discoveries/inventions in chemistry and for which Kary Mullis won the Nobel Prize in 1993. As cool as it is, this has nothing to do with the Nobel Prize that a Duke researcher just won.

Manipulations of this sort are exactly what linked lists are best at. Specifically, you'll be working with a chunk list, which fills a niche halfway between array-based lists () and linked lists. There's some extra credit available for doing more things with chunks. As you might have guessed, this assignment is a simplification of the chemical process, but still gets at something biologically sound.

DNA has two goals:

  1. Building familiarity with linked lists and how to use them. You'll do this by actually writing a special-purpose linked list (the chunk list).
  2. Seeing the effects, both in Big-O and real time, of various ways of implementing algorithms on long sequences of data. (In other words: what sort of sequences favor what sort of data structures?)

The Medium Pictures

DNA
In this assignment, you'll be given an interface () that represents a strand of DNA (see below, and the class notes, for more on Java interfaces). You're also given a simple implementation () of that interface that relies on the built-in Java classes and . You'll be writing a new implementation of that interface that uses a linked list, rather than and .

The interface, and implementations, simulate two ideas from genomics: reversing a strand of DNA, and recombination. The details of recombination are explained in the little pictures; the code implements it. You'll experiment with the provided implementation, and then implement a new version that's more efficient. Your experiments will evaluate the tradeoffs in doing things in these two different ways.

Interfaces
In this class, we've seen many built-in Java classes: , (briefly before now), , , , (both briefly), and so on. Let's consider and . Both provide , , , and so on. In short, they can do the same things, although (as we've mentioned, and as you'll see in this assignment) they do them differently. Because their methods are the same, the two should be interchangeable, in some sense.

This leads to an interesting idea: an abstract idea of a "list" that's independent of the details of its implementation. We call this an abstract data type, often abbreviated to ADT. ADTs define a set of methods according to their names, arguments, and return types, and say nothing about how those methods are actually implemented. "List" is an ADT, as is "Set", and "Map" (and there are others).

In Java, ADTs are called "Interfaces." A class is said to "implement" an Interface if it provides all the required methods. Java has a great many built-in Interfaces: , , and are examples you've seen before. Take a look at to see how an interface is defined. (By convention, not-built-in Interfaces start with a capital I). The code in looks very much like a class, but with the body of each method missing. It looks that way because that's exactly what it is. (See the October 17th class notes for more.)

The other side of the coin is a "Concrete Data Type" (almost always just called an "implementation"). and are Interfaces; and are implementations. In this assignment, is an interface, is one implementation, and you'll be writing another implementation (called ).

The distinction between interface and implementation, believe it or not, will make your life easier. Fundamentally, it makes your code more general. For example, the testing code we provide (more on that later) doesn't care about whether you give it a or a ; it only cares that you provide something that implements . This generality is powerful.

Vitally, the same way writing a class defines a new datatype, writing an interface defines a new datatype: one that can take on a value from any class that implements the interface. All of the following are completely legal:

List<Integer> a = new ArrayList<Integer>(); List<Integer> b = new LinkedList<Integer>(); Set<String> s = new TreeSet<String>(); Set<String> t = new HashSet<String>(); Map<String, Integer> m = new TreeMap<String, Integer>(); Map<String, Integer> n = new HashMap<String, Integer>();

Restriction Enzyme Cleaving
For the coding in this assignment to make sense, you need to know a bit about restriction enzyme cleaving. Restriction enzymes cut a strand of DNA into two pieces at a specific location (the "binding site"). In chemical processes, a strand can be split in several places by the same enzyme; we'll be doing that too.

Conisder the DNA strand "aatccgaattcgtatc", and the restriction enzyme "gaattc". The restriction enzyme specifices where in the DNA sequence the split should occur; you'll also be told where in the restriction enzyme the split should occur. Consider the example below: Here, "gaattcc" (in red) matched the DNA sequence, and cut it with one nucleotide on the left, and the rest on the right.

In the code you'll be running, another strand of DNA will be spliced into the gap. Note how the splice also matches the restriction enzyme in the following picture: After the join, we have this sequence: The heavily-shadowed areas show the original restriction enzyme. Your code is a software simulation of this process: the restriction enzyme will cut a strand of DNA and new material will be spliced in to create a recombinant strand. In the provided code, this is implemented in the method of .

LinkStrand
The fundamental coding exercise in this assignment is the implementation of the class, which implements the interface. implements the using a linked list; as a result, it can splice new DNA into the original DNA in constant time, rather than depending on the length of the splicee. The details of how this will be done are in the little pictures.

The Little Pictures

At this point, you should have a pretty good bird's-eye view of Assignment 3. The Little Pictures explain the steps you're going to take to complete it and how it will be graded.

You'll be submitting three things: your implementation of , a detailed experimental analysis, and a README. All of these files should be submitted via Ambient or websubmit. Your analysis should be in .pdf format, and your README in .txt format.

Step 0:
Pause to collect your thoughts. Snarf the code. Read through and .

Step 1: Analysis of SimpleStrand
Take a close look at in . With that method, generating a new recombinant strand is O(n), where n is the length of the resulting recombinant strand. (This ignores the cost of finding the breaks, which is O(t) for a strand with t breaks). The program does timing benchmarking for you; give it a run. (If it runs out of memory; don't worry. That's expected.) The timing data it provides probably aren't enough to complete this part of the assignment, though: you'll need to add some code to it to take more. (However, you'll need to use the original behavior in the next section; therefore, don't remove anything.)

In your assignment writeup, rigorously justify (using both Big-O analysis and empirical benchmarking) that generating recombinant DNA using is O(n), where n is the length of the resulting strand.

Some things to think about for this analysis:

  1. For purposes of computing timings, it may be helpful to generate smaller DNA sequences. The two provided sequences ( and ) could be truncated, for example. If you generate a new sequence for your benchmarking, be sure to include your new file in your submission.
  2. The discussion of increasing Java's heap space (Step 2) may come in handy for dealing with large sequences, as well. If you do this, be sure to document it in your analysis.
  3. Plots (especially with functions fit to them) are a very compelling way to demonstrate how code's runtime changes as a function of n. Note that plots are not (at all) the way to make a compelling argument about Big-O behavior.

Step 2: Memory, and the lack of it.
As you noticed above, the benchmarking program ran out of memory. DNA sequences are (potentially) gigantic, so this isn't too surprising. The next step is to figure out (on your computer) what the longest value of is that doesn't generate a Java Out of Memory exception, and how that length changes as you allow Java more memory. The benchmarking program works with Strings whose lengths are a power of two; start by reporting the largest value that doesn't run out of memory.

Next, increase the amount of heap space that Java is allowed. Roughly, heap space corresponds to the total amount of memory available to your program; more precisely, the heap is the part of memory where anything created using is stored. (Other variables are on the stack.)

To do this, open Eclipse's Run menu, and select "Run Configurations", and then the "Arguments" tab. Then, add the line "-Xmx1024M" to the "VM arguments", as seen below: This changes the maximum heap space to 1024 megabytes (1 gigabyte). Report what this does to the maximum-size benchmark you can run. Then, try multiplying the maximum heap size by 2 (to 2048M), and report that. Keep increasing by powers of two until your computer can't handle it anymore. You should indicate when your computer runs out of memory, and what behavior the program exhibited as you increased the maximum heap size.

Step 3:
This step is full of very small pictures indeed. It should give you a fairly precise view of what, exactly, you'll need to code.

This assignment is a microcosm of a very common programming scenario: code that already works, but isn't really efficient enough. This is a convenient place to be, because the (correct, if too slow) implementation gives you something to test against. In DNA, you're given , which implements the interface. However, you've benchmarked , determined its efficiency, and you're confident that you can do it in better than O(n). (You're confident because...we told you so!)

You'll be implementing a class called that also implements the interface. Rather than the complete copy made by , you'll be using the power of linked lists to splice pieces into the strand.

One might ask what is being stored in a single node of the linked list stored by . Many choices are possible: one nucleotide, two, ten, some arbitrary k, or something else. We're going to store variable-length chunks in each node. The size of a particular chunk isn't fixed: it depends on what splices have taken place already. In short, the algorithm is this: find a place a splice needs to take place. Break that node into two pieces: one before, and one after. Splice a new node (with the enzyme) into the middle. This operation takes O(1) time, because the necessary String operations are O(1), and linked-list splicing is O(1).

Begin by creating a class that implements . You'll need to store a linked list of , so your class is going to start out like this:

class LinkStrand implements IDnaStrand { // Inner class representing a single node in our linked list. private class Node { // Because the entire Node class is private, we can reasonably make its // member variables public. public Node myNext; public String myData; // You'll want a constructor for Node, too. } // Note that we're now in LinkStrand, but no longer in Node. We need to // store pointers to the first and last nodes in your list: private Node myHead; private Node myTail; // And this is handy, too. private long mySize; // How many nucleotides does this strand represent? // You'll want (at least one) constructor, plus all of the methods required // by IDnaStrand. }

LinkStrand Implementation Details
When you build a (either using the constructor, or the method required by ), it should start out as a one-node linked list, where the node's is the entire String. When you call , you'll go from one node to (at least) three: the data before the spliced-in text, the spliced-in text, and the data after the spliced-in text. See the image below for an example of what the list might look like after a .

In the figure above, you ended up with more than three nodes, because the enzyme occured in multiple places. The image below explains why doing it this way saves memory: because of the way Strings work, their data is shared; therefore, the total memory requirements of should be lower than the memory requirements of .

Note: Java's .substring is O(1), and doesn't copy the data. That's why this works.

Some things to keep in mind:

  • Your goal is to match, exactly, the behavior of , but with better performance characteristics. When in doubt about how something should work, match how does it.
  • Both your constructor and methods should use the same code. The best way to do this is to write a third, helper, method, and then have both the constructor and call the helper. Since the helper isn't part of the interface, it should be private.
  • Implementing should not concatenate Strings; it should create a new node. Note that (because takes in an ), it might get a or a as a parameter. If you're appending a , you should create new nodes; if you're appending a , convert it to a String, and then use the that takes a String. Look at to see how to figure out the type of the argument using Java's operator.
  • When implementing , you may assume that your contains exactly one node. If it doesn't, you should throw an exception: if (...LinkStrand has multiple nodes...) { throw new RuntimeException("LinkStrand has multiple nodes!"); } We won't be getting into more detail on exceptions; suffice to say that this will crash your program with the provided message. We permit this only-one-node simplification because finding the breakpoints without it is much trickier.
  • The code for will be almost completely identical to the code in . However, because is implemented differently, you'll have better performance characteristics.
  • We provide JUnit tests for implementations of . passes those tests; your should too. Note that passing all of these tests does not guarantee full credit, as they measure correctness, not efficiency.

Step 4: LinkStrand Analysis
A correctly-implemented will run in O(b) time, where b is the number of breaks being introduced into the strand. Repeat your analysis from Part 1, but replace with in the method of .

Note that here (and in Part 1) we're assuming that the cost of finding the breaking points is negligible, and can be ignored. Another way to think about it is that it's equal for the two implementations; we're speeding up the other part of the process.

Reversing and Extra Credit
In the lab, DNA doesn't have a fixed direction; reversing it is necessary. relies on a built-in method of to do this. For full credit in , an N-node list, when reversed, should still have N nodes. (That is, you should reverse each node individually). For extra credit, save some reversals: reverse each String only once (rather than each node). (Recall from the picture above that String data is shared.)

Grading
The entire assignment is graded on a ten-point scale. This assignment is weighted by a factor of 2 (Markov was weighted by 1.5, and the previous assignments were weighted by 1.0). Your analysis (both parts), which should be extensive, is worth five points. Your implementation of is worth four points, and you get one point for a complete README and good code style.

Submission
Submit your code, analysis, and README using Ambient or websubmit. Please make sure you submit the entire assignment each time you submit, including any code we provided, even if you haven't changed it. You can submit as many times as you'd like; we'll take the last one.

The README
Like every assignment (but not APTs), you should include a README text file in your submission. It should include:

  • Your name.
  • When you started and finished the assignment, and your best estimate of how long it took.
  • The credits: who helped you. Recalling the collaboration policy from the syllabus, be sure to give credit where credit is due. Also, take credit where it's due: if you helped someone else, say so! If you didn't talk to anybody about this assignment, say that.
  • Anything we've specifically said should be included in the README.
  • Any feedback you have about the assignment itself; we're always trying to make these assignments better. If you'd rather that feedback be anonymous, there's a link for that on the course page.
You can add a text file to your code using in Eclipse.

Code Style
Fact: code is hard to read.
How you lay out your code, and your program, make a big difference in how easy it is to understand. There are (at least!) four people who need to understand your code: you (while writing it), anybody you ask for help, your grader, and you (six months from now, when you look at it to figure out how you did something). While the following rules are not set in stone, they provide a good place to start:

  • Think about how your code is going to be organized before you've written it. Good design matters!
  • Give your classes, variables, and methods descriptive names. Naming your variables and doesn't say much; and are much better.
  • Use Java's conventions on naming. ClassesAreNamedLikeThis: words are run together, and each word is capitalized. Similarly, methodsAreNamedLikeThis: words run together, and all but the first word are capitalized. Variables are named like methods.
  • Indent your code properly. Roughly speaking, this means "everything inside a curly brace gets indented one extra level." See the code we provide for how this should look. To make this easy, Eclipse can do it for you: select your code with the mouse, and then do .
  • Don't let your lines get too long. Although Java doesn't care how long your lines are, long lines are much harder to read. The closest thing there is to a universal standard is 80 characters. Eclipse makes this easy, too: in , turn on "Print Margin" and set it to 80. That will add a vertical line at the 80-character mark.
Remember: easy-to-understand code makes your grader happy. Happy graders are friendly graders!

Abstract

The web service DNATCO (dnatco.org) classifies local conformations of DNA molecules beyond their traditional sorting to A, B and Z DNA forms. DNATCO provides an interface to robust algorithms assigning conformation classes called ntC to dinucleotides extracted from DNA-containing structures uploaded in PDB format version 3.1 or above. The assigned dinucleotide ntC classes are further grouped into DNA structural alphabet ntA, to the best of our knowledge the first DNA structural alphabet. The results are presented at two levels: in the form of user friendly visualization and analysis of the assignment, and in the form of a downloadable, more detailed table for further analysis offline. The website is free and open to all users and there is no login requirement.

INTRODUCTION

The complexity and variability of DNA structures can no more be understood within the traditional ‘A–B–Z structural code’. DNA molecules are able to form sharp kinks in complexes with some transcription factors, spiral around the histone core proteins, accommodate sharp kinks in Holliday junctions, extend their backbone in intercalation complexes with aromatic drugs, or form stable quadruplex or hairpin structures. Surprisingly though, tools allowing to go beyond a simplified picture of DNA structure and reducing its complexity to a few qualitative descriptors are scarce. NDB (1) provides a comprehensive overview of the available structures and offers their limited structure classification, software tool 3DNA (2), concentrating on description of the geometry of base pairing.

Several years ago, some of us attempted to classify the geometry of the DNA backbone at the level of dinucleotides (3) and later on, we developed a robust automated pipeline to perform such an analysis (4). In this contribution, we present an improvement of this methodology implemented into a web-based tool that offers an objective analysis of the DNA local conformation based on a rigorous geometry-based algorithm. Conformations of dinucleotide steps are assigned to one of 57 conformers called ntC. To help interpret the results and the overall structural features of the analyzed structure, the assignment is also interpreted in terms of conceptually simpler structural alphabet ntA consisting of just 12 members, which were created by grouping structurally related ntC. The relatively high number of ntC classes, 57, resulted from the analysis of the available DNA structures and reflects the complexity of the DNA conformational space. In contrast, the particular way of grouping of ntC into letters of the ntA structural alphabet and the number of ntA letters are subjective.

MATERIALS AND METHODS

Conformations of dinucleotide steps (nomenclature defined in Figure 1) are analyzed by comparing their torsion angles to the torsions of 4439 dinucleotides in the so called ‘golden set’. The golden set is an ensemble of dinucleotides manually curated and classified into one of the 57 conformational classes called ntC. The geometry of the ntC classes is summarized in Supplementary Table S1 and is also available at the dnatco.org website. The assignment begins by uploading a PDB-formatted structure to the website and is performed in the torsional space by comparing values of nine DNA backbone torsion angles of the analyzed step and all 4439 steps in the golden set by the modified k-nearest neighbors algorithm according to the protocol by Čech et al. (4) with modifications. The currently used ntC definitions originate from the set reported by Svozil et al. (3), who described them in detail along with the methods how they were identified.

Figure 1.

The dinucleotide step as defined by ntC conformers is drawn in red. The step is described by seven backbone torsions from δ of the first nucleotide to δ1 of the second one, plus two torsion values around the glycosidic bonds of the first and second bases (χ and χ1).

Figure 1.

The dinucleotide step as defined by ntC conformers is drawn in red. The step is described by seven backbone torsions from δ of the first nucleotide to δ1 of the second one, plus two torsion values around the glycosidic bonds of the first and second bases (χ and χ1).

THE DNATCO SERVER

Hardware and software

The previous version of the DNATCO server (available at dnatco.org/v1) was migrated from a home based hardware and is now hosted as a Linux based virtual machine in the environment provided by the ELIXIR CZ infrastructure. This ensures 24/7 availability and professional maintenance as well as easy scaling of the resources if necessary. The presented second version of the server ran about a year internally and had been tested over a year as a publicly available service accessible at the dnatco.org address.

The software part employs Apache web server and PHP5 for the server side scripting. The internal processing of uploaded PDB-formatted structures is performed using the VMD program (5) extracting only nucleic acid atoms for further analysis. The torsion angle measurement and the assignment of ntC conformers itself is performed by in house programs written in the Python programming language. The interactive display of analyzed 3D structures relies on JSmol (6), a JavaScript based molecular viewer running in a browser. The JavaScript allows straightforward transfer of the DNATCO web service to various platforms and devices including mobile devices without a need to install additional applets. The JSmol performance is known to depend on the browser version and the computer operation system used. The complete web service was successfully tested in the major web browser programs under Linux, OS X and Windows with Firefox having currently the best performance in the JSmol part.

The home page

The home page (snapshot in Supplementary Figure S1 and Table S1) briefly introduces the purpose of the web, defines the dinucleotide step, lists the geometries of the ntC conformers with their brief characterization and provides the tool to upload the structure to be analyzed. The top of the home page contains links to the tutorial section that describes the submission process step by step, explains the results, and also contains the link to a test run using the Dickerson-Drew dodecamer of PDB ID 1bna (7). The PDB formatted structure file can be uploaded either from user's disk or by typing a PDB four-letter code and pressing the respective SUBMIT button; the former way is useful for structures generated or modified by the user, the latter for analysis of the released PDB structures.

Names and brief annotations of the 57 ntC DNA conformers are tabulated on the home page together with values of the torsion angles defining their geometry. These are seven torsions defining the backbone conformation of the step from delta of the first nucleotide to delta of the second one, plus two torsion values around the glycosidic bonds of the first and second bases (Figure 1). ntC are identified by four-letter symbols. The first letter aims to characterize the main feature(s) of the first nucleotide, the second letter of the second one. A, B and Z letters imply stacked bases with the first/second nucleotide in the conformation bearing features typical of the A, B or Z DNA forms such as sugar pucker, torsion around the glycosidic bond and combination of the other torsions such as zeta and alpha as they have been described in various treaties, e.g. by Neidle (8). The first two letters ‘NS’ indicate that the bases are Not Stacked in the step. The third and fourth positions of the code are usually formed by numbers, which just guarantee the uniqueness of the ntC symbol; ‘S’ at either of these positions means that the first or second base is in the syn orientation. The nomenclature of ntC classes can best be understood from Supplemental Table S1 or a downloadable table at the dnatco.org home page, where the main structural features of each ntC are briefly annotated. A specific ‘conformational class’, NANT, was reserved for conformationally extreme steps that are not assigned to any of the above ntC classes; NANT formally represents the 58th conformer. Both torsional definitions of ntC and Cartesian coordinates of dinucleotides representing their structures can also be downloaded from dnatco.org.

Input and output

The input is a crystal, NMR or computer model structure containing DNA in the standard PDB format. DNA steps are identified based on atom names as defined by the PDB format, version 3.1 or above (sugar atoms as O4' not O4*, standard nucleotides DA, DG, DC, DT). If the PDB file contains multiple structures (NMR models or MD simulation snapshots) as MODELs, the currently available version of dnatco.org analyzes only the last MODEL. Conformer classes are also assigned for modified residues if they contain standard names for atoms defining the step torsions between δ of the first deoxyribose to δ+1 of the second one, and glycosidic torsions χ and χ+1; on the other hand, steps with non-standard or missing atoms that define these torsions cannot currently be considered in the assignment process.

The output of the assignment process is a comma separated summary of the assigned ntC and ntA classes. The standard CSV file contains the step ID, its assigned ntC and ntA, its nine torsion angles, and angular distances from the ntC averages. During the testing phase, we have analyzed over 1800 DNA structures, mostly experimental crystal and NMR structures from PDB.

When the structure is identified by its four letter PDB code, we compare the version stored on our server with the most recent version at the PDB website. If these two versions are identical, we use the pre-calculated results to speed up the analysis; otherwise, the full assignment process is performed. In either case, the results are displayed within a few minutes after the upload at the latest. An example of the result page can be obtained from the Tutorial section or simply by running the ‘test run’ on the website. The structures uploaded by users and their assignment results are protected by adding a hash value to the file names. The ntC assignment of structures deposited in the PDB database is accessible via PDB four-letter codes.

Results page

The results page (snapshot in Figure 2) is divided into three columns. The central column contains a table summarizing the results of the assignment of ntC classes. Each row of the table represents one complete step. The step name is displayed as PDBid_chain_base1_base2 and is followed by the corresponding ntA and ntC codes. The table is interactive; mouse over a row shows the detailed description of the assigned step with the ntC and ntA classes as well as values of the backbone torsion angles. Further analysis of a step can be obtained by clicking inside the table. The left panel contains an interactive 3D view of the analyzed DNA structure in the JSmol applet. The DNA structure is shown as a cartoon with the selected dinucleotide step highlighted in a ball and stick representation. At the same time, the right part of the page summarizes the results of the dinucleotide step assignment in graphical representation. The black line connects the torsion values of the selected step. For unassigned steps (ntC NANT), the chart contains only the line, while more information is shown for the assigned conformers: a violin plot summarizes the distribution of torsions for the assigned ntC in the golden set. Inside each violin plot, a thick black bar indicates the 1st and 3rd quartile of the golden set data with a white spot showing the mean value. The ‘error bars’ outside each violin plot depict the angular range still fulfilling the assignment algorithm criteria. The circularity of the torsion angles is taken into account, and for conformers with mean values approaching 0 or 360 degrees, the ‘error bar’ appears near 360 or 0 degrees, respectively. This diagram is a simple visual measure of the quality of the assignment, the table below the diagram is its numerical representation with the torsion values of the step, assigned ntC, and their differences. Further, a mouse over the thumbnail in the upper left corner of the chart will zoom in the figure with definitions of the backbone torsion angles. The results of the assignment in CVS format, representative Cartesian coordinates of the conformers, and the table of conformers defining the ntC and ntA classes can be downloaded using links at the bottom of the results page.

Figure 2.

Snapshot of the results page using the Dickerson–Drew dodecamer of PDB code 1bna as an example. The central column contains a table summarizing the assignment of ntC and ntA classes and links to downloadable files. Each row of the table represents one complete step listing the step name as PDBid_chain_base1_base2, ntA and ntC codes. The left panel contains an interactive 3D view of the analyzed DNA structure in the JSmol applet; the analyzed step is highlighted. The right part of the page summarizes the results of the dinucleotide step assignment in graphical representation and provides a table of torsion values for the analyzed step, the assigned ntC and their differences.

Figure 2.

Snapshot of the results page using the Dickerson–Drew dodecamer of PDB code 1bna as an example. The central column contains a table summarizing the assignment of ntC and ntA classes and links to downloadable files. Each row of the table represents one complete step listing the step name as PDBid_chain_base1_base2, ntA and ntC codes. The left panel contains an interactive 3D view of the analyzed DNA structure in the JSmol applet; the analyzed step is highlighted. The right part of the page summarizes the results of the dinucleotide step assignment in graphical representation and provides a table of torsion values for the analyzed step, the assigned ntC and their differences.

CONCLUSIONS

The website dnatco.org provides fine-grained classification of the DNA local structure based on the geometry of its backbone. Analysis of the nine dinucleotide torsions results in assignment of one of the 58 conformers called ntC, one of which is set aside for dinucleotides with exotic or as yet uncharacterized conformations. Based on the results of the ntC analysis, the web also provides a coarse-grained characterization of the structure by grouping ntC into a unique DNA structural alphabet ntA. The web service represents a valuable tool for DNA structure analysis and validation during the refinement of crystal and NMR structures, of structures deposited to PDB, as well as for analysis of results of molecular modeling and simulations by molecular dynamics.

The website is one of the services supported by the Czech ELIXIR node (elixir-czech.cz) with full financial coverage until the year 2019 guaranteed by the Czech national funding. Further sustainability and maintenance of the web service and the website for additional three years (until the end of year 2022) is likely to be obtained within the framework of European Infrastructural projects ESFRI ELIXIR from Czech national or pan-European funds. The Institute of Biotechnology CAS, employer of the authors of the website, provides additional long-term support. Further development of the service, namely its extension to RNA and more flexible treatment of modified nucleotides can therefore be realistically envisioned.

FUNDING

Czech National Infrastructure for Biological Data (ELIXIR CZ) [LM2015047]; ERDF [BIOCEV CZ.1.05/1.1.00/02.0109]; Institutional Research Project of the Institute of Biotechnology [RVO 86652036]. Funding for openaccess charge: Czech National Infrastructure for Biological Data (ELIXIR CZ) [LM2015047]; Institutional Research Project of the Institute of Biotechnology [RVO 86652036].

Conflict of interest statement. None declared.

REFERENCES

2.

3DNA: a versatile, integrated software system for the analysis, rebuilding and visualization of three-dimensional nucleic-acid structures

Nat. Protoc.

2008

3

1213

1227

4.

6.

JSmol and the next-generation web-based representation of 3D molecular structure as applied to proteopedia

Isr. J. Chem.

2013

53

207

216

7.

Structure of a B-DNA dodecamer: conformation and dynamics

Proc. Natl. Acad. Sci. U.S.A.

1981

78

2179

2183

8.

Principles of Nucleic Acid Structure

2008

Cambridge

Academic Press, Elsevier

1

289

© The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

0 thoughts on “Dna Assignment”

    -->

Leave a Comment

Your email address will not be published. Required fields are marked *