Genomic sequences matches

For everything that's not in any way related to PureBasic. General chat etc...
User avatar
Psychophanta
Addict
Addict
Posts: 4996
Joined: Wed Jun 11, 2003 9:33 pm
Location: Lípetsk, Russian Federation
Contact:

Genomic sequences matches

Post by Psychophanta »

There is some info there about the current SARS-CoV-2 genome sequence has several threads coincident with other HIV threads in its genome.
In order to locate the supposed coincident sequences, here you have the supposed SARS-CoV-2 complete genome sequence (this one from Japan), and also the complete sequences of the HIV-1 and HIV-2.

I have there some dealing tips nice to work with plain text in order to perform a lot of tasts, however, i don't have any to do this kind of so simple task (and not found there in the forum).
The task is so simple: it should locate any identical sequence, with a size at least of 'n' bytes (programmable parameter) between 2 given ascii (or unicode, or any) texts.

https://www.ncbi.nlm.nih.gov/nuccore/LC528232

HIV 1 :
https://www.ncbi.nlm.nih.gov/nuccore/MN692147.1

HIV 2:
https://www.ncbi.nlm.nih.gov/nuccore/AF082339.1

I will do it when i have time enough, even some of you do it too. My version will be able also to manage files bigger than the amount of RAM in the system.
I just post this here for your possible interest and curiosity.
http://www.zeitgeistmovie.com

While world=business:world+mafia:Wend
Will never leave this forum until the absolute bugfree PB :mrgreen:
User avatar
Olliv
Enthusiast
Enthusiast
Posts: 542
Joined: Tue Sep 22, 2009 10:41 pm

Re: Genomic sequences matches

Post by Olliv »

There is two roots : S and L.
Why just only one root is displayed ?

This just shows it is caused by a natural selecting.
User avatar
Psychophanta
Addict
Addict
Posts: 4996
Joined: Wed Jun 11, 2003 9:33 pm
Location: Lípetsk, Russian Federation
Contact:

Re: Genomic sequences matches

Post by Psychophanta »

@Olliv, also in ARN is two roots?
I have no idea about genetics, but no doubt it must be an awesome knowledge.

Well, in fact, the goal is not to deal just with ARN sequences, but to locate coincident sequences of data in any kind of data type large sequence (file, for example).

So the silliest algorithm is the one i wrote, which is functional, but silly and slooooow:
I have conceived another algorythms to do it, but it requires some few time to write a correct code, and it is not my priority for now:

Code: Select all

;Esto es un programa que usa el algoritmo más tonto para hacer lo que sigue:
;Se trata de encontrar toda secuencia coincidente entre un fichero 'fil0$' y otro 'fil1$'
;la LONGITUD MINIMA de la secuencia se da como entrada.
; a ver si a alguien se le ocurre un algoritmo, medio decente al menos, para hacer esto.

Procedure.q leedato(fil,*store.ascii,tamano.u)
  Protected dato.a,n.u,punto.q=0
  For n=0 to tamano-1
    If Eof(fil):ProcedureReturn 0:EndIf
    dato.a=ReadAsciiCharacter(fil)
    While dato<'A' Or dato>'z' Or (dato>'Z' And dato<'a')
      If Eof(fil):ProcedureReturn 0:EndIf
      dato.a=ReadAsciiCharacter(fil)
    Wend
    PokeA(*store.ascii+n,dato.a):If n=0:punto.q=Loc(fil):EndIf
  Next
  ProcedureReturn punto.q
EndProcedure
fil0$="D:\Genoma SARS-CoV-2.txt"
; fil1$="D:\Genoma VIH-1.txt"
fil1$="D:\Genoma VIH-2.txt"
Readfile(0,fil0$,#PB_Ascii)
Readfile(1,fil1$,#PB_Ascii)
tamanobloque.u=10; <- minimum size of string to find
*almacen.ascii=AllocateMemory(1024,#PB_Memory_NoClear)
*almacen2.ascii=AllocateMemory(1024,#PB_Memory_NoClear)
posinit0.q=18937:posinit1.q=9862; <- starting positions in each file
pos0.q=posinit0.q:pos1.q=posinit1.q
OpenConsole("encuentros",#PB_Ascii)
FileSeek(0,posinit0.q,#PB_Absolute)
While Eof(0)=0
  pos0=leedato(0,*almacen.ascii,tamanobloque.u)
  If pos0=0:Break:EndIf
  FileSeek(1,posinit1.q,#PB_Absolute)
  matchesinfil1.q=0
  While Eof(1)=0
    pos1=leedato(1,*almacen2.ascii,tamanobloque.u)
    If pos1=0:Break:EndIf
    If CompareMemory(*almacen.ascii,*almacen2.ascii,tamanobloque.u):matchesinfil1.q+1
      pushpos0.q=Loc(0)
      n.u=0
      dato.a=ReadAsciiCharacter(0)
      dato1.a=ReadAsciiCharacter(1)
      If Eof(0) Or Eof(1):Break 2:EndIf
      While dato=dato1 And (dato>='A' And dato<='z') And (dato<='Z' Or dato>='a')
        PokeA(*almacen.ascii+tamanobloque.u+n,dato)
        PokeA(*almacen2.ascii+tamanobloque.u+n,dato1)
        n.u+1
        dato.a=ReadAsciiCharacter(0)
        dato1.a=ReadAsciiCharacter(1)
        If Eof(0) Or Eof(1):Break 3:EndIf
      Wend
      If n:Beep_(444+n*100,100):EndIf
      FileSeek(0,pushpos0,#PB_Absolute)
      pos1=Loc(1)-1
      PrintN("Match#: "+Str(matchesinfil1.q)+", Pos in SARS-CoV-2: "+Hex(pos0-posinit0)+", Pos in VIH-2: "+Hex(pos1-posinit1)+", String: "+PeekS(*almacen.ascii,tamanobloque.u+n.u,#PB_Ascii))
    EndIf
    FileSeek(1,pos1,#PB_Absolute)
  Wend
  If matchesinfil1.q:pos0+tamanobloque.u+matchesinfil1.q:EndIf
  FileSeek(0,pos0,#PB_Absolute)
Wend
Beep_(811,1200)
PrintN("End of data reached. Press return to exit"):Input()
CloseConsole()
FreeMemory(*almacen2.ascii)
FreeMemory(*almacen.ascii)
CloseFile(1)
CloseFile(0)
http://www.zeitgeistmovie.com

While world=business:world+mafia:Wend
Will never leave this forum until the absolute bugfree PB :mrgreen:
User avatar
Olliv
Enthusiast
Enthusiast
Posts: 542
Joined: Tue Sep 22, 2009 10:41 pm

Re: Genomic sequences matches

Post by Olliv »

Psychophanta wrote:@Olliv, also in ARN is two roots?
There is no DNA, just RNA.

You produce RNA from your DNA in order to make again your DNA next generation cells.

Retrovirions replace a part of your RNA to modify your DNA cell in order to produce its own RNA and capsids to multiply themselves. A capsid is the protecting coverage of this virus. Human DNA changes allow also the virus to prevent antibody producing.

It is a delicate and fragile period without any datas about cancerous effects on asymptomatic bodies... But the humankind can certainly win. Remember London under V2s bombing : they won.
User avatar
Psychophanta
Addict
Addict
Posts: 4996
Joined: Wed Jun 11, 2003 9:33 pm
Location: Lípetsk, Russian Federation
Contact:

Re: Genomic sequences matches

Post by Psychophanta »

About this called SARS-CoV-2, there are not answered questions:
Every country has their genetists and virologists which work for the main administrations.
There is known that the complete genome of any virus is sequenced in some hours or few days.
The global media is confusing people every day , because some of them say the virus has a 100% natural origin, and other ones say the virus is synthetic, or semi-synthetic.

So, looks like very clear that all the governments know the truth.
Then: Why they do not say that true about the virus origin and other technical details to the people?
http://www.zeitgeistmovie.com

While world=business:world+mafia:Wend
Will never leave this forum until the absolute bugfree PB :mrgreen:
User avatar
Lord
Addict
Addict
Posts: 849
Joined: Tue May 26, 2009 2:11 pm

Re: Genomic sequences matches

Post by Lord »

Don't start a conspiracy theory.
It is known that Virus is originated from bats.
Image
User avatar
Psychophanta
Addict
Addict
Posts: 4996
Joined: Wed Jun 11, 2003 9:33 pm
Location: Lípetsk, Russian Federation
Contact:

Re: Genomic sequences matches

Post by Psychophanta »

Lord wrote:Don't start a conspiracy theory.
It is known that Virus is originated from bats.
Sorry, but firstly:
My comment had NOTHING to do with your answer, neither about conspiracy, because the statement "Virus is originated from bats" IS oficially hypothetical. Just for your info.

Secondly:
conspiracy is a fact, not a theory. Man; what about to read a little bit, instead to repeat things without thinks.
There are lots of works at your hand about the conspiracy facts, since Sumeria, at least.
"The prince", from Niccolò Machiavelli, is one which comes now to my mind, but you have lots.
http://www.zeitgeistmovie.com

While world=business:world+mafia:Wend
Will never leave this forum until the absolute bugfree PB :mrgreen:
User avatar
Kiffi
Addict
Addict
Posts: 1357
Joined: Tue Mar 02, 2004 1:20 pm
Location: Amphibios 9

Re: Genomic sequences matches

Post by Kiffi »

Image
Hygge
User avatar
Olliv
Enthusiast
Enthusiast
Posts: 542
Joined: Tue Sep 22, 2009 10:41 pm

Re: Genomic sequences matches

Post by Olliv »

I do not know what to say, Psychophanta. I think I said the maximum about a simple explanation of retroviral mechanism.

Conspiracy question is ever to keep in mind, but conspiracy answer does not exist without well detailed analysis.

We often find excellent point of view in the analysis, excellent anymore to stop the conspiracy concludings...
User avatar
mk-soft
Always Here
Always Here
Posts: 5386
Joined: Fri May 12, 2006 6:51 pm
Location: Germany

Re: Genomic sequences matches

Post by mk-soft »

In conspiracy theories, facts don't help either. They are always ignored.

My think,
This is the first wave of an alien invasion. 8)
My Projects ThreadToGUI / OOP-BaseClass / EventDesigner V3
PB v3.30 / v5.75 - OS Mac Mini OSX 10.xx - VM Window Pro / Linux Ubuntu
Downloads on my Webspace / OneDrive
User avatar
idle
Always Here
Always Here
Posts: 5089
Joined: Fri Sep 21, 2007 5:52 am
Location: New Zealand

Re: Genomic sequences matches

Post by idle »

mk-soft wrote:In conspiracy theories, facts don't help either. They are always ignored.

My think,
This is the first wave of an alien invasion. 8)
and we're all packaged up in our homes ready for harvesting! :shock:

There's nothing wrong with looking at it but it's not enough to search for exact matches as such.
It might be better to try BLAST https://blast.ncbi.nlm.nih.gov/Blast.cgi
Windows 11, Manjaro, Raspberry Pi OS
Image
User avatar
Josh
Addict
Addict
Posts: 1183
Joined: Sat Feb 13, 2010 3:45 pm

Re: Genomic sequences matches

Post by Josh »

Oh folks, don't act like that. Psychophanta just forgot to set the irony tags :mrgreen:
sorry for my bad english
User avatar
Olliv
Enthusiast
Enthusiast
Posts: 542
Joined: Tue Sep 22, 2009 10:41 pm

Re: Genomic sequences matches

Post by Olliv »

Here in France, a quick blood test, to check if a person has antibodies defenses, is the object of a dealing authorizing submission to the governement.

What about your own countries ?

@Idle

I can access to the site BLAST. I add I saw APIs. Temptation to a PureBlast ? :D
User avatar
idle
Always Here
Always Here
Posts: 5089
Joined: Fri Sep 21, 2007 5:52 am
Location: New Zealand

Re: Genomic sequences matches

Post by idle »

this site is interesting perhaps, shows all the strains
https://nextstrain.org/ncov

@Olliv

I think you can also download Blast executables and DataBase and run it locally.
Windows 11, Manjaro, Raspberry Pi OS
Image
User avatar
Olliv
Enthusiast
Enthusiast
Posts: 542
Joined: Tue Sep 22, 2009 10:41 pm

Re: Genomic sequences matches

Post by Olliv »

I have TV since 13 years. I try to repair computer first !
Post Reply