HOMEWORK: Working with Files

HOMEWORK: Working with Files

Often, bioinformatic pipelines imply to manipulate text files. Here, we are going to parse a very simple FASTA file just as an example.

There is a FASTA file in the data folder of this repo:

using  JuliaForBioinformatics
repo_path = pathof(JuliaForBioinformatics)
"/home/travis/build/diegozea/JuliaForBioinformatics/src/JuliaForBioinformatics.jl"

You can use joinpath and abspath to construct a path that works in all the operative systems:

data_path = abspath(repo_path, "..", "..", "data")
"/home/travis/build/diegozea/JuliaForBioinformatics/data"
fasta_file = joinpath(data_path, "O43521.fasta")
"/home/travis/build/diegozea/JuliaForBioinformatics/data/O43521.fasta"

You can use open with the do syntax to read or write a file in Julia:

open(fasta_file, "r") do file
    for line in eachline(file)
        println(line)
    end
end
>sp|O43521|B2L11_HUMAN Bcl-2-like protein 11 OS=Homo sapiens OX=9606 GN=BCL2L11 PE=1 SV=1
MAKQPSDVSSECDREGRQLQPAERPPQLRPGAPTSLQTEPQGNPEGNHGGEGDSCPHGSP
QGPLAPPASPGPFATRSPLFIFMRRSSLLSRSSSGYFSFDTDRSPAPMSCDKSTQTPSPP
CQAFNHYLSAMASMRQAEPADMRPEIWIAQELRRIGDEFNAYYARRVFLNNYQAAEDHPR
MVILRLLRYIVRLVWRMH
>sp|O43521-2|B2L11_HUMAN Isoform BimL of Bcl-2-like protein 11 OS=Homo sapiens OX=9606 GN=BCL2L11
MAKQPSDVSSECDREGRQLQPAERPPQLRPGAPTSLQTEPQDRSPAPMSCDKSTQTPSPP
CQAFNHYLSAMASMRQAEPADMRPEIWIAQELRRIGDEFNAYYARRVFLNNYQAAEDHPR
MVILRLLRYIVRLVWRMH
>sp|O43521-3|B2L11_HUMAN Isoform BimS of Bcl-2-like protein 11 OS=Homo sapiens OX=9606 GN=BCL2L11
MAKQPSDVSSECDREGRQLQPAERPPQLRPGAPTSLQTEPQASMRQAEPADMRPEIWIAQ
ELRRIGDEFNAYYARRVFLNNYQAAEDHPRMVILRLLRYIVRLVWRMH
>sp|O43521-4|B2L11_HUMAN Isoform Bim-alpha1 of Bcl-2-like protein 11 OS=Homo sapiens OX=9606 GN=BCL2L11
MAKQPSDVSSECDREGRQLQPAERPPQLRPGAPTSLQTEPQGNPEGNHGGEGDSCPHGSP
QGPLAPPASPGPFATRSPLFIFMRRSSLLSRSSSGYFSFDTDRSPAPMSCDKSTQTPSPP
CQAFNHYLSAMASMRQAEPADMRPEIWIAQELRRIGDEFNAYYARRLEK
>sp|O43521-5|B2L11_HUMAN Isoform Bim-alpha2 of Bcl-2-like protein 11 OS=Homo sapiens OX=9606 GN=BCL2L11
MAKQPSDVSSECDREGRQLQPAERPPQLRPGAPTSLQTEPQDRSPAPMSCDKSTQTPSPP
CQAFNHYLSAMASMRQAEPADMRPEIWIAQELRRIGDEFNAYYARRLEK
>sp|O43521-6|B2L11_HUMAN Isoform Bim-alpha3 of Bcl-2-like protein 11 OS=Homo sapiens OX=9606 GN=BCL2L11
MAKQPSDVSSECDREGRQLQPAERPPQLRPGAPTSLQTEPQASMRQAEPADMRPEIWIAQ
ELRRIGDEFNAYYARRLEK
>sp|O43521-7|B2L11_HUMAN Isoform Bim-alpha4 of Bcl-2-like protein 11 OS=Homo sapiens OX=9606 GN=BCL2L11
MAKQPSDVSSECDREGRQLQPAERPPQLRPGAPTSLQTEPQASMRQAEPADMRPEIWIAQ
ELRRIGDEFNAYYARRLAKLLASST
>sp|O43521-8|B2L11_HUMAN Isoform Bim-alpha5 of Bcl-2-like protein 11 OS=Homo sapiens OX=9606 GN=BCL2L11
MAKQPSDVSSECDREGRQLQPAERPPQLRPGAPTSLQTEPQGNPEGNHGGEGDSCPHGSP
QGPLAPPASPGPFATRSPLFIFMRRSSLLSRSSSGYFSFDTDRSPAPMSCDKSTQTPSPP
CQAFNHYLSAMASMRQAEPADMRPEIWIAQELRRIGDEFNAYYARRMPLPPD
>sp|O43521-9|B2L11_HUMAN Isoform Bim-alpha6 of Bcl-2-like protein 11 OS=Homo sapiens OX=9606 GN=BCL2L11
MAKQPSDVSSECDREGRQLQPAERPPQLRPGAPTSLQTEPQASMRQAEPADMRPEIWIAQ
ELRRIGDEFNAYYARRMPLPPD
>sp|O43521-10|B2L11_HUMAN Isoform Bim-beta1 of Bcl-2-like protein 11 OS=Homo sapiens OX=9606 GN=BCL2L11
MAKQPSDVSSECDREGRQLQPAERPPQLRPGAPTSLQTEPQGNPEGNHGGEGDSCPHGSP
QGPLAPPASPGPFATRSPLFIFMRRSSLLSRSSSGYFSFDTDRSPAPMSCDKSTQTPSPP
CQAFNHYLSAMANWD
>sp|O43521-11|B2L11_HUMAN Isoform Bim-beta2 of Bcl-2-like protein 11 OS=Homo sapiens OX=9606 GN=BCL2L11
MAKQPSDVSSECDREGRQLQPAERPPQLRPGAPTSLQTEPQGNPEGNHGGEGDSCPHGSP
QGPLAPPASPGPFATRSPLFIFMRRSSLLSRSSSGYFSFDTDRSPAPMSCDKSTQTPSPP
CQAFNHYLSAMGIFE
>sp|O43521-12|B2L11_HUMAN Isoform Bim-beta3 of Bcl-2-like protein 11 OS=Homo sapiens OX=9606 GN=BCL2L11
MAKQPSDVSSECDREGRQLQPAERPPQLRPGAPTSLQTEPQVSLCHPGWSALVRSWLTAT
SNSQVQAVLLPQPPKRSPLFIFMRRSSLLSRSSSGYFSFDTDRSPAPMSCDKSTQTPSPP
CQAFNHYLSAMASMRQAEPADMRPEIWIAQELRRIGDEFNAYYARRVFLNNYQAAEDHPR
MVILRLLRYIVRLVWRMH
>sp|O43521-13|B2L11_HUMAN Isoform Bim-beta4 of Bcl-2-like protein 11 OS=Homo sapiens OX=9606 GN=BCL2L11
MAKQPSDVSSECDREGRQLQPAERPPQLRPGAPTSLQTEPQGIF
>sp|O43521-14|B2L11_HUMAN Isoform Bim-beta5 of Bcl-2-like protein 11 OS=Homo sapiens OX=9606 GN=BCL2L11
MAKQPSDVSSECDREGRQLQPAERPPQLRPGAPTSLQTEPQGNPEGNHGGEGDSCPHGSP
QGPLAPPASPGPFATRSPLFIFMRRSSLLSRSSSGYFSFDTDRSPAPMSCDKSTQTPSPP
CQAFNHYLSAMVREIEEVVV
>sp|O43521-15|B2L11_HUMAN Isoform Bim-beta6 of Bcl-2-like protein 11 OS=Homo sapiens OX=9606 GN=BCL2L11
MAKQPSDVSSECDREGRQLQPAERPPQLRPGAPTSLQTEPQDRSPAPMSCDKSTQTPSPP
CQAFNHYLSAMGIFE
>sp|O43521-16|B2L11_HUMAN Isoform Bim-beta7 of Bcl-2-like protein 11 OS=Homo sapiens OX=9606 GN=BCL2L11
MAKQPSDVSSECDREGRQLQPAERPPQLRPGAPTSLQTEPQDRSPAPMSCDKSTQTPSPP
CQAFNHYLSAMVREIEEVVV
>sp|O43521-17|B2L11_HUMAN Isoform Bim-gamma of Bcl-2-like protein 11 OS=Homo sapiens OX=9606 GN=BCL2L11
MAKQPSDVSSECDREGRQLQPAERPPQLRPGAPTSLQTEPQDRSPAPMSCDKSTQTPSPP
CQAFNHYLSAMVVILEDIGDLSLCFGFIFTGLDLYGHHHSQDTEQLNHKDFS
>sp|O43521-18|B2L11_HUMAN Isoform BimABC of Bcl-2-like protein 11 OS=Homo sapiens OX=9606 GN=BCL2L11
MAKQPSDVSSECDREGRQLQPAERPPQLRPGAPTSLQTEPQDRSPAPMSCDKSTQTPSPP
CQAFNHYLSAMVFLNNYQAAEDHPRMVILRLLRYIVRLVWRMH
>sp|O43521-19|B2L11_HUMAN Isoform BimAC of Bcl-2-like protein 11 OS=Homo sapiens OX=9606 GN=BCL2L11
MAKQPSDVSSECDREGRQLQPAERPPQLRPGAPTSLQTEPQGNPEGNHGGEGDSCPHGSP
QGPLAPPASPGPFATRSPLFIFMRRSSLLSRSSSGYFSFDTDRSPAPMSCDKSTQTPSPP
CQAFNHYLSAMVFLNNYQAAEDHPRMVILRLLRYIVRLVWRMH
>sp|O43521-20|B2L11_HUMAN Isoform BimA of Bcl-2-like protein 11 OS=Homo sapiens OX=9606 GN=BCL2L11
MAKQPSDVSSECDREGRQLQPAERPPQLRPGAPTSLQTEPQVFLNNYQAAEDHPRMVILR
LLRYIVRLVWRMH

Homework

Write a function to read the FASTA file into a dictionary from the sequence/isoform UniProt name, i.e. the one between |, to the sequence.

Hint! You can use the following functions:

split("1 2 3", ' ')
3-element Array{SubString{String},1}:
 "1"
 "2"
 "3"
startswith("Hello world!", 'H')
true
strip("  Hello world!  ")
"Hello world!"

and string concatenation:

"Hello " * "world!"
"Hello world!"
# function read_fasta(...)
#     ...
# end

This page was generated using Literate.jl.