Regular Expression dude

Just starting out? Need help? Post your questions and find answers here.
zikitrake
Addict
Addict
Posts: 833
Joined: Thu Mar 25, 2004 2:15 pm
Location: Spain

Regular Expression dude

Post by zikitrake »

Hi, In Notepad++ I can do this search/replace:

Find: <a [[:<:]]href(.*?)<\/a>
Replace with: <zz href$1</zz>

And it will change all <a href...>I'm a phrase</a> tags in a document to <zz href...>I'm a phrase</zz>

<a href=\"https://www.sample.com\">I'm a phrase</a> will be replace to
<zz href=\"https://www.sample.com\">I'm a phrase</zz>

(I don't want to change pairs without href attribute, as <a noPop" onClick="var e=document.createElement('script');>Another phrase</a>)

How can I do it in pb? Actually I use this code

Code: Select all

Procedure.s Ereg_Replace(Text$, Pattern$, Replace$ = "", Options.l = #PB_RegularExpression_DotAll |  #PB_RegularExpression_Extended |  #PB_RegularExpression_AnyNewLine)
  Protected hRegex = CreateRegularExpression(#PB_Any, Pattern$, Options)
  Protected Dim result.s(0)
  If hRegex
    Repeat
      ReDim result(0)
      ExtractRegularExpression(hRegex, Text$, result())
      Text$ = ReplaceRegularExpression(hRegex, Text$, Replace$)
      Delay(0)
    Until ArraySize(result())=0

    FreeRegularExpression(hRegex)
  Else
    Debug "Can't create a Regex with this pattern : " + Pattern$
  EndIf
  ProcedureReturn Text$
EndProcedure

Text$ = ~"<a href=\"https://www.sample.com\">I'm a phrase</a>"

Debug Ereg_Replace(Text$, "<a href(.*?)<\/a>", "<zz href$1</zz>", #PB_RegularExpression_DotAll|#PB_RegularExpression_MultiLine|#PB_RegularExpression_NoCase)
Thank you and sorry my english!
normeus
Enthusiast
Enthusiast
Posts: 414
Joined: Fri Apr 20, 2012 8:09 pm
Contact:

Re: Regular Expression dude

Post by normeus »

You are using a back reference which does not work with PB ($1 or \1).
"ReplaceRegularExpression" is just replacing the whole expression.

you might want to do a simple find and replace if the patterns are simple "<a href=" and end "</a>"
or if it s more like

Code: Select all

"<a style="margin: 0;" href="
then do a regex for the beginning get the size then
get Mid( string, sizeoffoundregex)
take the result from this and delete the last "</a>" to a newlycreatedstring
finally add your " + <zz href" newlycreatedstring + "</zz>"

Norm.
google Translate;Makes my jokes fall flat- Fait mes blagues tombent à plat- Machte meine Witze verpuffen- Eh cumpari ci vo sunari
zikitrake
Addict
Addict
Posts: 833
Joined: Thu Mar 25, 2004 2:15 pm
Location: Spain

Re: Regular Expression dude

Post by zikitrake »

normeus wrote:You are using a back reference which does not work with PB ($1 or \1).
"ReplaceRegularExpression" is just replacing the whole expression.

you might want to do a simple find and replace if the patterns are simple "<a href=" and end "</a>"
or if it s more like

Code: Select all

"<a style="margin: 0;" href="
then do a regex for the beginning get the size then
get Mid( string, sizeoffoundregex)
take the result from this and delete the last "</a>" to a newlycreatedstring
finally add your " + <zz href" newlycreatedstring + "</zz>"

Norm.
Thank you! For now, I did this ugly, but functional, code

Code: Select all

Procedure.s Ereg_ReplaceTags(Text$, tagIn$, tagOut$)
  Protected Pattern$
  Pattern$ = "<" + tagIn$ + " href(.*?)<\/" + tagIn$ + ">"
  Debug Pattern$
  Protected hRegex = CreateRegularExpression(#PB_Any, Pattern$, #PB_RegularExpression_DotAll|#PB_RegularExpression_MultiLine|#PB_RegularExpression_NoCase)
  Protected Dim result.s(0)
  Protected aux$, cont.l
  If hRegex
    Repeat
      ReDim result(0)
      ExtractRegularExpression(hRegex, Text$, result())
      If ArraySize(result())>0
        For cont = 0 To ArraySize(result())-1
          aux$ = result(cont)
          aux$ = ReplaceString(aux$, "<" + tagIn$ + " ", "<" + tagOut$ + " ", #PB_String_NoCase)
          aux$ = ReplaceString(aux$, "</" + tagIn$ + ">", "</"+tagOut$ + ">", #PB_String_NoCase)
          Text$ = ReplaceString(Text$, result(cont), aux$)
        Next
      EndIf
      Delay(0)
    Until ArraySize(result())=0
    FreeRegularExpression(hRegex)
  Else
    Debug "Can't create a Regex with this pattern : " + Pattern$
  EndIf
  ProcedureReturn Text$
EndProcedure

Text$ = ~"<a href=\"https://www.sample.com\">Share</a>" + #CRLF$
Text$ + ~"<p> Share</p>" + #CRLF$
Text$ + ~"<a NoReplaceMe>Share</a>" + #CRLF$
Text$ + ~"<a href=\"https://www.sample.com\">Share</a>" + #CRLF$
Text$ + ~"<p> Share</p>" + #CRLF$
Text$ + ~"<p> Share</p>" + #CRLF$
Text$ + ~"<a href=\"https://www.sample.com\">Share</a>" + #CRLF$

Debug Ereg_ReplaceTags(Text$, "a", "zz")
RASHAD
PureBasic Expert
PureBasic Expert
Posts: 4635
Joined: Sun Apr 12, 2009 6:27 am

Re: Regular Expression dude

Post by RASHAD »

Be careful with Escape String
Yours misses "href=\"

Code: Select all

Text$ = ~"<a href=\"https://www.sample.com\">Share</a>" + #CRLF$
Text$ + ~"<p> Share</p>" + #CRLF$
Text$ + ~"<a NoReplaceMe>Share</a>" + #CRLF$
Text$ + ~"<a href=\"https://www.sample.com\">Share</a>" + #CRLF$
Text$ + ~"<p> Share</p>" + #CRLF$
Text$ + ~"<p> Share</p>" + #CRLF$
Text$ + ~"<a href=\"https://www.sample.com\">Share</a>" + #CRLF$


Dim String$(0)
CreateRegularExpression(0, "(?<=<)a(?=\s+href)|(?<=</)a(?=>)", #PB_RegularExpression_NoCase)
For k = 1 To 100
  ReDim String$(k)
  String$(k) = StringField(Text$, k,#CRLF$)
  If String$(k) = ""
    Break
  ElseIf Left(String$(k),7)= "<a href"
    new$ = ReplaceRegularExpression(0, UnescapeString(string$(k)), "zz")
    String$(k) = EscapeString(new$)
  EndIf
  final$ = final$ + string$(k)+#CRLF$
Next
FreeRegularExpression(0)
Debug final$
Egypt my love
zikitrake
Addict
Addict
Posts: 833
Joined: Thu Mar 25, 2004 2:15 pm
Location: Spain

Re: Regular Expression dude

Post by zikitrake »

RASHAD wrote:Be careful with Escape String
Yours misses "href=\"
...
Thank you, RASHAD, but the original source don't has '\', it's only to manage the input text with double quotes :)

Code: Select all

Text$ = ~"<a href=\"https://www.sample.com\">Share</a>" + #CRLF$ 
or 
Text$ = "<a href=" + #DQUOTE$ + "https://www.sample.com" + #DQUOTE$ +">Share</a>" + #CRLF$
I only need get <a href="https://www.sample.com">Share</a>

PS: If your comment goes the other way, excuse me, my English is limited and I often confuse the real meaning of your comments.
Post Reply