Bioinformatics for Systems Biology
Samenvatting
With the completion of the human genome project, followed by the rise in high-throughput technologies like the various microarray and now high throughput genomic sequencing platforms,weexperiencedthebirthofSystemsBiologyafteritslonggestation. Thisrevolutionis markedbyachangeintheresearchparadigmfromthesinglesmall-scaleexperiment,i. e. ,following thechangeofacomponentinamulticomponentsystem,toonethatattemptstosimultaneously monitorthechangeoftensofthousandsofmoleculeswithinthisbody. Thisclearlynecessitatesthe unparalleleduseofproject-specificinformatictools,which,todate,requiresanunprecedentedlevel ofdevelopmenttocollect,manageandminethedataforinterestingassociations. Tobegintounderstandthisinformationwenowrelyonstatisticalanalysistoaidinourselection ofthefruitfromthetree. However,thisoftentakesusonajourneyintoanewfieldforwhichweare notyetprepared. SamuelJohnson(1709–1784)foreshadowedthedilemmawewouldfaceand characterizeditasfollows:‘‘Knowledgeisoftwokinds. Weknowasubjectourselves,orweknow wherewecanfindinformationonit. ’’Itisforthelatterthatweroutinelyturntotheliterature. The rateofgrowthoftheliteratureparallelsthatofsequencingdataandthearraydataplacingan almostimpossibletaskbeforeeachinvestigator. Topartiallyeasethisburdenweareagainturning towardsdevelopinginformaticaidsthatminetheliteratureanddatatodevelopsummariesand associationstodirectlyaddressthequestionsposedandthenewhypothesesthataretobetested. Althoughmoreclearlyarticulated,weagainfacesimilarchallengesasthosetackledduringthe courseofthehumangenomeproject. Itisessentialthatthetrainingofthebiologistandcomput- scientistoccurinaninterdisciplinaryenvironmentofcross-fertilization. Withthisgoalinmindthe textbook‘‘BioinformaticsforSystemsBiology’’wasundertaken. WebeginthisexplorationwithPartI,toprovidethecomputerscientistwithanintroductiontothe underlyingprinciplesofcellbiology. ThisisfollowedbyabriefintroductioninPartIIasameansfor thebiologisttobecomefamiliarwithconceptsandthestatisticalanalysisoflargedatasets. PartIII thendescribes,todate,thebestcharacterizeduseofthemicroarrayplatformthatisnowmoving towardswholegenomeanalysis. Withallofthisdata,howdowebeginanalysisforcommonelements guidingtheunderlyingprinciples?ThisisdiscussedinPartIVwhichleadstoPartVandPartVIto test,insilico,therelationshipsonawidescaleinordertoassesstheirapplicability. Upondeveloping theassociations,PartVIIaskshowdoesthisinformationrelatetowhatwasmeasured?Asthesebasic principlesaredevelopedfroman‘‘omics’’drivenbiologicalsystemsapproach,theyareappliedin PartVIIItotranslationalmedicine. Anexcellentexampleisthenewterm‘‘personalizedmedicine’’ thatisbeginningtoreverberateinclinicalcare. ItistheculminationoftheSystemsBiologyrevolution wheretechnologicaladvancesandcross-fertilizationhavedriventhefieldtomaturetothepoint whereitisbeingincorporatedinatruebench-to-bedsidemanner. Asyoureadthechapters,youwillfindthattheycanstandalone,yetcanbecombinedto emphasizetheintegralroleofinformaticsinSystemsBiology. Mostofthefiguresandtablesarein greyscale. IwouldencourageyoutoviewthosethatbenefitfromcolorontheaccompanyingCD. ThematerialcontainedontheCDprovidesanexcellentsourceofslidesforyourlecturesand presentations. v vi Preface Thechapter-relatedGlossaryandAbbreviationssectionwillassistinfamiliarizingyouwiththe terms. Youwillalsofindtheliteratureandsuggestedreadingsections,includingkeyreferences, veryusefulasyoudelveintothesubjectmatter. Technology,byitsverymeaningimpliesrefinement and change. The informatics approaches used in systems biology are continually subject to refinement. Withthisreality,youareencouragedtoutilizethewebsiteinformationprovidedin variouschapterstohelpaccessthemostcurrentinformationandresourcesavailable. AsSystems Biologydevelopsweareabletowitnessgrowingpainsandmilestones. Withcontinuedinformatic andbiologicalcross-fertilization,advancementsinSystemsBiologywillrevolutionizepersonalized medicineansweringquestionsbyintegratinginformationinunexpectedways. Contents PartI LifeofaCellandItsAnalysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 StructureandFunctionoftheNucleusandCellOrganelles. . . . . . . . . . . . . . . . . . . . . . . . 3 JonHolyandEdPerkins 2 TranscriptionandtheControlofGeneExpression. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 NadineWiper-BergeronandIlonaS. Skerjanc 3 RNAProcessingandTranslation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 ChristinaKaramboulas,NadineWiper-Bergeron,andIlonaS. Skerjanc 4 DNAReplication,Recombination,andRepair. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 LindaB. Bloom 5 CellSignaling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 DanielA. RappoleeandD. RandallArmant 6 EpigeneticsofSpermiogenesis–CombiningInSilicoandProteomicApproaches intheMouseModel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 SophieRousseauxandMyriamFerro 7 GenomicToolsforAnalyzingTranscriptionalRegulatoryNetworks. . . . . . . . . . . . . . . . . 119 JohnJ. Wyrick PartII StatisticalToolsandTheirApplication. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 8 ProbabilityandHypothesisTesting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 MichaelL. Kruger 9 StochasticModelsforBiologicalPatterns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 GautamB. Singh 10 PopulationGenetics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 JillS. Barnholtz-SloanandHemantK. Tiwari 11 StatisticalToolsforGeneExpressionAnalysisandSystemsBiology andRelatedWebResources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 ChiaraRomualdiandGerolamoLanfranchi vii viii Contents PartIII TranscriptomeAnalysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 12 WhatGoesinisWhatComesOut:HowtoDesignandImplementaSuccessful MicroarrayExperiment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 JeffreyA. LoebandThomasL. Beaumont 13 ToolsandApproachesforanEnd-to-EndExpressionArrayAnalysis. . . . . . . . . . . . . . . . 227 AdrianE. PlattsandStephenA. Krawetz 14 AnalysisofAlternativeSplicingwithMicroarrays. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 JingyiHui,ShivendraKishore,AmitKhanna,andStefanStamm PartIV StructuralandFunctionalSequenceAnalysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 15 AnIntroductiontoMultipleSequenceAlignment—andtheT-CoffeeShop. BeyondJust AligningSequences:HowGoodcanyouMakeyourAlignment,andsoWhat?. . . . . . . . . 283 StevenM. Thompson 16 ASpectrumofPhylogenetic-BasedApproachesforPredictingProtein FunctionalSites. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 DukkaBahadurK. C. andDennisR. Livesay 17 TheRoleofTranscriptionFactorBindingSitesinPromotersandTheir InSilicoDetection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339 ThomasWerner 18 InSilicoDiscoveryofDNARegulatorySitesandModules. . . . . . . . . . . . . . . . . . . . . . . . 353 PanayiotisV. Benos PartV LiteratureMiningforAssociationandMeaning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367 19 MiningtheResearchLiteratureinSystemsBiology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369 KeirT. Reavie 20 GoPubMed:ExploringPubMedwithOntologicalBackgroundKnowledge. . . . . . . . . . . . 385 HeikoDietze,DimitraAlexopoulou,MichaelR.