Fighting Amyotrophic Lateral Sclerosis: In-Silico Molecular Docking Study of SFPQ Protein to Identify a Potential Therapeutic Compound
Fighting Amyotrophic Lateral Sclerosis: In-Silico Molecular Docking Study of SFPQ Protein to Identify a Potential Therapeutic Compound
Omkar Kovvali
Thomas Jefferson High School for Science and Technology
This paper was originally included in the 2021 print publication of the Teknos Science Journal.
Abstract
Amyotrophic Lateral Sclerosis, known as ALS, is a neurodegenerative disease that sentences fifteen new people to death each day, leaving a patient unable to move their fingers. This is caused by motor neuron death at the molecular level, which impairs normal muscle function. After scientists studied motor neuron death in ALS patients’ cells, they found that in all diseased cells, the SFPQ protein, a vital nuclear DNA/RNA binding protein, had been displaced from the nucleus to the cytoplasm. Since this defect was a molecular hallmark of ALS, solving this problem could stop motor neuron death. To treat this molecular hallmark, I used the in-silico process to sift through thousands of potential molecules to find one able to resolve this issue. Using the in-silico method, I was able to identify a lead molecule (NLS 551) that passed all the in-silico tests and brought the SFPQ protein back into the nucleus, suggesting a potential treatment for ALS. My study presents a novel method to bring the SFPQ protein back into the nucleus. If successful, this molecule could serve as the base for an ALS treatment.
Introduction
Amyotrophic Lateral Sclerosis (ALS) is a brutal disease that affects fifteen new people every day, leaving them just three to five years to live. During these five years, ALS slowly cripples patients to the point where they struggle to breathe. After scientists from the Crick Institute (Luisier et al., 2018) studied ALS patients’ cells, they found that in all diseased cells, the SFPQ protein, a vital DNA/RNA binding protein, had been displaced from the nucleus to the cytoplasm. This phenomenon was present in cases of both familial and sporadic ALS.
I sought ways to solve this newly found molecular hallmark of ALS. After detailed research, I decided to attempt to return the SFPQ protein to its original location, the nucleus. To implement this, I had two options: either implant new SFPQ proteins in the nucleus or find a way to move the existing proteins back in. I decided to go with the latter, as implanting new SFPQ proteins would not truly solve the problem — the outward flow of proteins would still continue. To place the current ones back in, I researched molecules that had the capacity to bring proteins back into the nucleus. After researching a class of peptides called signal peptides, I discovered the ligand that could do the job — a Nuclear Localization Sequence. A Nuclear Localization Sequence (NLS) is a molecule that acts as an “address label” for a protein by directing it into the nucleus. Using a NLS would provide a method to bring the SFPQ protein back into the nucleus.
First, the NLS is introduced into the cell through an orally active drug or injection. Then, through the use of diffusion and docking, the NLS binds onto the SFPQ protein, tagging it for nuclear import. Then, a free floating helper protein in the cytoplasm transports the NLS-docked SFPQ protein to the edge of the nucleus. Here, importin α acts as an adaptor protein, binding to the NLS on the compound and bringing it into the nucleus. Though an NLS will work, it is still unknown which NLS will perform the task most efficiently. To pinpoint the best one, I explored the NLSdb, a database of 3000+ nuclear signals. To filter through the candidates, I followed the in-silico process to definitively prove one NLS as the best fit for the SFPQ protein. This process is the first in three major processes that make up drug discovery before clinical trials. The in-silico method consists of six steps that conclude in the declaration of a lead molecule, selecting that one molecule as the best out of the given set.
In this study, I used Nuclear Localization Signals to return the SFPQ protein to the nucleus. However, since not every NLS is equally capable of performing this task, I utilized the in-silico process to filter through my choices and prove a single NLS as the best match for the SFPQ protein.
Methods
Softwares/Databases Used
RCSB PDB Database
I used this database to retrieve the 6ncq representation of the SFPQ protein. I chose this model because it was an isolated protein and was free of any paraspeckle components.
NLSdb
It is unknown which NLS is most optimal, so I used the NLSdb database to obtain a workable set of NLSs to experiment with. I used the NLSdb because it was the only comprehensive and publicly available database that compiled all NLSs.
PepSMI
I used this tool to convert raw peptide sequences into the SMILES format. This conversion was necessary because it ensured software compatibility. I did this for each NLS’s sequence, resulting in 3,255 SMILES strings.
CLC Drug Discovery Workbench
I used this workbench for Lipinski's Rule of Five, initial screening, and molecular docking. Since CLC Bio ceased production of this workbench, the software is only available in trial mode.
admetSAR 2.0
I used this online tool to assist in verifying the ADMET properties of passed ligands.
NLS/SFPQ Protein Handling
First, I entered the NLSdb and downloaded a CSV file of all 3,255 signals in the database. Next, I used the PepSMI tool, which assumed linear configuration and converted each NLS sequence into a SMILES string. Then, I copied each string into the CLC Workbench, which generated the 2D molecular structure in the workbench and the relevant statistics, such as size and weight. After these steps, all 3,255 NLSs were ready for testing.
After downloading the 6ncq representation of the SFPQ protein from the RCSB PDB Database, I imported it into the CLC Workbench.
Initial Screening
In my initial screening, I screened out NLSs that were certain to fail, saving time and energy in later phases of testing. I performed this screening based on multiple factors, such as size and structural deformations. If the raw peptide sequence was greater than ten, the NLS was removed from testing, since a large NLS (>10) would automatically fail Lipinski’s Rule of Five.
After this step, 650 NLSs were left, about 20% of the initial 3,255 NLSs from the NLSdb database.
Lipinski’s Rule of Five
To narrow down our pool of NLSs, I applied Lipinski’s Rule of Five, a rule to evaluate the potential of a chemical compound to act as a drug. Lipinski’s Rule of Five states that a given molecule must have no more than five hydrogen bond donors, no more than ten hydrogen bond acceptors, a molecular mass less than 500 daltons, and an octanol-water partition coefficient that does not exceed five. To pass this test, a molecule must meet a minimum of three of the requirements. I then placed all 650 NLSs that passed initial screening into a molecule project in the CLC Workbench and ran the Rule of Five software. After running the software for three to four hours, I found that 26 molecules passed Lipinski’s Rule of Five with two violations or less.
Docking Simulation
Afterwards, I put the 26 molecules that passed Rule of Five testing through a docking simulation in the CLC Workbench to see how well the NLSs would bind to the SFPQ protein. As inputs, I gave the workbench the 6ncq SFPQ protein representations of the 26 NLSs. Given these, the workbench identified the optimal docking site on the SFPQ protein and ran the docking simulator to discover how well each NLS would bind to the SFPQ protein. This docking simulator returned a PLANTSPLP score for each NLS, which was calculated by the equation score = Starget-ligand + Sligand. This scoring system rewarded hydrogen bond, lone-pair, and nonpolar interactions. Conversely, it punished non-polar - polar interactions, hydrogen bond donor-donor contacts, and hydrogen bond acceptor-acceptor contacts. Since this final output score would mimic the potential energy change between the protein and ligand, I knew that a highly negative score would correspond to a strong binding, while a positive score would correlate with a weaker binding. Looking at [Table 2], we see that all 26 NLSs had excellent binding affinities for the SFPQ protein.
ADMET Verification
Taking the 26 NLSs, which all passed the docking simulation, I went on to conduct ADMET verification on all the molecules. ADMET, which stands for Absorption, Distribution, Metabolism, Excretion, and Toxicity, is a way to measure the efficacy and safety of a molecule. Using the admetSAR 2.0 webtool, I inputted the SMILES strings of all 26 molecules into the tool. Here, I tested for crucial properties such as human intestinal absorption, Caco-2 permeability, blood–brain barrier, P-glycoprotein inhibitor/substrate, carcinogenicity (binary/ternary), Ames mutagenesis, ether go-go inhibition, and acute oral toxicity. These properties were hand-picked for their relevance to NLSs; for example, I picked the blood–brain barrier (BBB) due to the location of my target drug (neurons). After testing each of the 26 NLS for all eight properties, only two NLSs (NLSs 551 & 544) passed all eight properties with positive or neutral results (Four properties show in [Table 2]).
Lead Declaration
Out of the two NLSs that passed ADMET verification (551 & 544), I chose NLS 551 to be my lead molecule, because it had a stronger assurance of Human Intestinal Absorption (.9419) when compared to NLS 544 (.4308).
Results
Findings
The central idea of my project was to utilize an NLS to bring a protein back into the nucleus. I found that NLS 551, represented by the SMILES string NCC(=O)N[C@@H]([C@H](CC)C)C(=O)NCC(=O)N[C@@H](CO)C(=O)O, was the optimal NLS out of the 3,255 NLSs in the NLSdb database. I came to this conclusion by using the in-silico process and following its steps of screening. I also discovered that an NLS can be used by researchers as a ligand to bind to the SFPQ protein, inducing a favorable response (re-entry of the nucleus).
Secondary Findings
An interesting finding in this study was the fact that all 26 NLSs’ binding scores were all extraordinary. This is most likely due to the fact that the purpose of an NLS is to bind to proteins effectively, so lacking this basic requirement would disable an NLS. I also uncovered a correlation between the number of Lipinski violations for an NLS and how well it performed in ADMET testing. While molecules with two Lipinski violations or higher performed poorly in ADMET testing, NLSs with just one violation performed much better. NLSs 551 and 544 had one Lipinski violation and had excellent ADMET results. On the other hand, other NLSs that had more violations did not do well in ADMET testing.
Process
NLS 551 emerged as the lead molecule in this study due to its excellent molecular properties. A key factor of NLS 551’s success was due to its molecular weight: 332.357 g/mol. Calculating a z-score for this, we see NLS 551’s weight is 1.4207 standard deviations less than the mean (1013.597). Using a z-score table and assuming normality, I found that the probability of achieving a weight equal or lower to NLS 551’s weight is about 7.8%. This tells us that NLS 551’s molecular weight is much lower than normal, making it a prime candidate for use in drugs. The numbers also correlate with the graph [Figure 6], as the molecular weight is on the low side. Furthermore, NLS 551 had the lowest number of Lipinki violations in this study, with only one. Out of all 650 NLSs that passed initial screening, only three (NLSs 62, 544, and 551)had one violation or less. NLS 551 performed highly in ADMET testing as well. The Human Intestinal Absorption property (HIA) was a decisive factor in deciding whether NLS 551 or 544 would be my lead molecule. Observing the HIA graph, I noticed that it followed a roughly normal distribution, with NLS 551’s score being an outlier (.9419). Calculating a z-score, I found that .9419 is 2.22 standard deviations above the mean. The probability of getting a value greater than .9418 came out to be 1.28%. With these stellar statistical results, I can confidently say that NLS 551 is the best option in the NLSdb database for the SFPQ protein.
Discussion
My results in this study exceeded my expectations. Not only was I able to declare a lead molecule, but I also had a choice between two NLSs. My results solve the problem of the loss of the SFPQ protein from the nucleus that researchers posed by coming up with a novel application for a tested method. My research question asked for a specific molecule to bring the SFPQ protein back into the nucleus. After in-silico testing, I am confident that NLS 551 is the most optimal NLS for the SFPQ protein.
Though the idea of applying an NLS for nuclear import of the SFPQ protein is new, the idea of using an NLS to transport proteins to the nucleus in general has been tested and proven to work in the past. New research from Bourgeois et al. (2020, p. 8504) showed in 2019 that this type of nuclear import was possible. They found that both Transportin-1 and Transportin-3 recognize two nonclassical NLSs within the cold-inducible RNA-binding protein CIRBP. Just like the SFPQ protein, CIRBP is an RNA-binding protein that plays an important role in gene expression and post-transcriptional processes such as splicing regulation.
What makes my study unique is that fact that no other research of this type has been done in the ALS treatment field. The current treatments are made up of two primary drugs—riluzole and edaravone. Riluzole works as a glutamate blocker by preventing the release of glutamate from nerve cells. This slows the onset of the disease by a couple of months but does not actually stop it. Edaravone functions by reducing oxidative stress, but just like riluzole, it does not actually treat or cure ALS. By targeting the loss of the SFPQ protein from the nucleus, a molecular hallmark of ALS, my treatment has the potential to stop ALS.
However, my method does have a few weaknesses and drawbacks. Since the in-silico process is computer-based, it is not infallible. Prediction accuracy rates range anywhere from 50% all the way to 100%. The in-silico method is an effective tool to identify a testable molecule, but it is not absolute. To combat this, I ran each screening tool multiple times with the exception of the docking simulation. Another key downfall of the in-silico process is the fact that testing can take long periods of time and be highly CPU intensive. Though I used a powerful i7 core, my laptop ran for 76 hours consecutively to accomplish a cumulative docking simulation. This proved to be a major hindrance.
Even if everything in the in-silico process is valid, my key theory may be incorrect to start with. My central idea in this study was that an NLS could be used to bind to the SFPQ protein and bring it into the nucleus. The entire premise of my research would be defeated if I found in later research that the SFPQ protein cannot accommodate ligands. This is a key issue, as the SFPQ protein’s N-terminus, the most common place for NLSs to dock, is blocked, and having the NLS dock on the side of the SFPQ protein may produce a null effect. The helper protein that is supposed to recognize the NLS-binded SFPQ protein may not even recognize the NLS.
In the best case scenario that the NLS is able to bind to the SFPQ protein, there may be another problem. There is no guarantee that the NLS will dock onto the SFPQ protein first instead of attaching to other cytoplasmic proteins when the NLS solution is inserted into cells. Although binding scores for the NLS-SFPQ are excellent, there could be another protein that binds to an NLS before the SFPQ does. I could remedy this in future testing by using selective docking.
Assuming all goes well and the SFPQ protein returns to the nucleus, there is still no guarantee that my proposal will work. The loss of the SFPQ protein from the nucleus may simply be a symptom that is masking the true cause of the disease. A different problem in these diseased cells may be causing the nuclear export of the SFPQ protein.
In this study, I strove to simply correct the problem posed by researchers but I did not truly fix the genetic defect that causes the SFPQ protein to leave the nucleus. I am proposing an artificial solution here that could work but is actually a surface-level solution. Solving this molecular hallmark at its core would entail finding the genetic defect that caused the SFPQ protein to leave the nucleus and correcting that through means of gene-editing.
Now that I have a lead molecule, I plan to perform future research in the form of animal studies such as in-vitro and in-vivo testing to test the effectiveness of NLS 551 in preventing the departure of the SFPQ protein from the nucleus. If NLS 551 is able to prevent the loss of the SFPQ protein, then I can move on to judge how well this method counteracts motor neuron death.
Conclusion
ALS is an overlooked disease. Thousands of people per year are newly diagnosed, people that have no hope to recover from the brink. With other diseases, there may still be a chance, but with ALS, it is a guaranteed fate. My proposed treatment attempts to solve this by using NLSs to treat the loss of the SFPQ protein from the nucleus. This type of work has never been done before, and if it is successful in future phases of testing, it could prove to be the dawn of a new era, an improved world of bioinformatics where NLSs are commonly used to treat diseases.
References
[1] Bourgeois, B., Hutten, S., Gottschalk, B., Hofweber, M., Richter, G., Sternat, J., Abou-Ajram, C., Göbl, C., Leitinger, G., Graier, W. F., Dormann, D., & Madl, T. (2020). Nonclassical nuclear localization signals mediate nuclear import of CIRBP. Proceedings of the National Academy of Sciences, 117(15), 8503–8514. https://doi.org/10.1073/pnas.1918944117
[2] Luisier, R., Tyzack, G. E., Hall, C. E., Mitchell, J. S., Devine, H., Taha, D. M., Malik, B., Meyer, I., Greensmith, L., Newcombe, J., Ule, J., Luscombe, N. M., & Patani, R. (2018). Intron retention and nuclear loss of SFPQ are molecular hallmarks of ALS. Nature Communications, 9(1), 0. https://doi.org/10.1038/s41467-018-04373-8
[3] Company, Q. (2017). CLC Drug Discovery Workbench (Version 4.0) [Computer software]. Retrieved December 28, 2020, from http://resources.qiagenbioinformatics.com/manuals/clcdrugdiscoveryworkbench/current/index.php?manual=Introduction_CLC_Drug_Discovery_Workbench.html
[4] N M O'Boyle, M Banck, C A James, C Morley, T Vandermeersch, and G R Hutchison. "Open Babel: An open chemical toolbox." J. Cheminf. (2011), 3, 33. DOI:10.1186/1758-2946-3-33
[5] Rostlab. (n.d.). NLSdb. Retrieved December 28, 2020, from https://rostlab.org/services/nlsdb/browse/signals
[6] Hongbin Yang, Chaofeng Lou, Lixia Sun, Jie Li, Yingchun Cai, Zhuang Wang, Weihua Li, Guixia Liu, Yun Tang. admetSAR 2.0: web-service for prediction and optimization of chemical ADMET properties. Bioinformatics, 2018, bty707
[7] PepSMI: Convert Peptide to SMILES string. (n.d.). Retrieved December 28, 2020, from https://www.novoprolabs.com/tools/convert-peptide-to-smiles-string
[8] Lee, M. (2019, June 26). RCSB PDB - 6NCQ: The dimerization domain of human SFPQ in space group C2221. RCSB PDB Database. https://www.rcsb.org/structure/6NCQ