Page 1 of 1

Regular Expression dude

Posted: Thu Dec 13, 2018 6:44 pm
by zikitrake
Hi, In Notepad++ I can do this search/replace:

Find: <a [[:<:]]href(.*?)<\/a>
Replace with: <zz href$1</zz>

And it will change all <a href...>I'm a phrase</a> tags in a document to <zz href...>I'm a phrase</zz>

<a href=\"https://www.sample.com\">I'm a phrase</a> will be replace to
<zz href=\"https://www.sample.com\">I'm a phrase</zz>

(I don't want to change pairs without href attribute, as <a noPop" onClick="var e=document.createElement('script');>Another phrase</a>)

How can I do it in pb? Actually I use this code

Code: Select all

Procedure.s Ereg_Replace(Text$, Pattern$, Replace$ = "", Options.l = #PB_RegularExpression_DotAll |  #PB_RegularExpression_Extended |  #PB_RegularExpression_AnyNewLine)
  Protected hRegex = CreateRegularExpression(#PB_Any, Pattern$, Options)
  Protected Dim result.s(0)
  If hRegex
    Repeat
      ReDim result(0)
      ExtractRegularExpression(hRegex, Text$, result())
      Text$ = ReplaceRegularExpression(hRegex, Text$, Replace$)
      Delay(0)
    Until ArraySize(result())=0

    FreeRegularExpression(hRegex)
  Else
    Debug "Can't create a Regex with this pattern : " + Pattern$
  EndIf
  ProcedureReturn Text$
EndProcedure

Text$ = ~"<a href=\"https://www.sample.com\">I'm a phrase</a>"

Debug Ereg_Replace(Text$, "<a href(.*?)<\/a>", "<zz href$1</zz>", #PB_RegularExpression_DotAll|#PB_RegularExpression_MultiLine|#PB_RegularExpression_NoCase)
Thank you and sorry my english!

Re: Regular Expression dude

Posted: Thu Dec 13, 2018 9:18 pm
by normeus
You are using a back reference which does not work with PB ($1 or \1).
"ReplaceRegularExpression" is just replacing the whole expression.

you might want to do a simple find and replace if the patterns are simple "<a href=" and end "</a>"
or if it s more like

Code: Select all

"<a style="margin: 0;" href="
then do a regex for the beginning get the size then
get Mid( string, sizeoffoundregex)
take the result from this and delete the last "</a>" to a newlycreatedstring
finally add your " + <zz href" newlycreatedstring + "</zz>"

Norm.

Re: Regular Expression dude

Posted: Thu Dec 13, 2018 9:44 pm
by zikitrake
normeus wrote:You are using a back reference which does not work with PB ($1 or \1).
"ReplaceRegularExpression" is just replacing the whole expression.

you might want to do a simple find and replace if the patterns are simple "<a href=" and end "</a>"
or if it s more like

Code: Select all

"<a style="margin: 0;" href="
then do a regex for the beginning get the size then
get Mid( string, sizeoffoundregex)
take the result from this and delete the last "</a>" to a newlycreatedstring
finally add your " + <zz href" newlycreatedstring + "</zz>"

Norm.
Thank you! For now, I did this ugly, but functional, code

Code: Select all

Procedure.s Ereg_ReplaceTags(Text$, tagIn$, tagOut$)
  Protected Pattern$
  Pattern$ = "<" + tagIn$ + " href(.*?)<\/" + tagIn$ + ">"
  Debug Pattern$
  Protected hRegex = CreateRegularExpression(#PB_Any, Pattern$, #PB_RegularExpression_DotAll|#PB_RegularExpression_MultiLine|#PB_RegularExpression_NoCase)
  Protected Dim result.s(0)
  Protected aux$, cont.l
  If hRegex
    Repeat
      ReDim result(0)
      ExtractRegularExpression(hRegex, Text$, result())
      If ArraySize(result())>0
        For cont = 0 To ArraySize(result())-1
          aux$ = result(cont)
          aux$ = ReplaceString(aux$, "<" + tagIn$ + " ", "<" + tagOut$ + " ", #PB_String_NoCase)
          aux$ = ReplaceString(aux$, "</" + tagIn$ + ">", "</"+tagOut$ + ">", #PB_String_NoCase)
          Text$ = ReplaceString(Text$, result(cont), aux$)
        Next
      EndIf
      Delay(0)
    Until ArraySize(result())=0
    FreeRegularExpression(hRegex)
  Else
    Debug "Can't create a Regex with this pattern : " + Pattern$
  EndIf
  ProcedureReturn Text$
EndProcedure

Text$ = ~"<a href=\"https://www.sample.com\">Share</a>" + #CRLF$
Text$ + ~"<p> Share</p>" + #CRLF$
Text$ + ~"<a NoReplaceMe>Share</a>" + #CRLF$
Text$ + ~"<a href=\"https://www.sample.com\">Share</a>" + #CRLF$
Text$ + ~"<p> Share</p>" + #CRLF$
Text$ + ~"<p> Share</p>" + #CRLF$
Text$ + ~"<a href=\"https://www.sample.com\">Share</a>" + #CRLF$

Debug Ereg_ReplaceTags(Text$, "a", "zz")

Re: Regular Expression dude

Posted: Thu Dec 13, 2018 10:31 pm
by RASHAD
Be careful with Escape String
Yours misses "href=\"

Code: Select all

Text$ = ~"<a href=\"https://www.sample.com\">Share</a>" + #CRLF$
Text$ + ~"<p> Share</p>" + #CRLF$
Text$ + ~"<a NoReplaceMe>Share</a>" + #CRLF$
Text$ + ~"<a href=\"https://www.sample.com\">Share</a>" + #CRLF$
Text$ + ~"<p> Share</p>" + #CRLF$
Text$ + ~"<p> Share</p>" + #CRLF$
Text$ + ~"<a href=\"https://www.sample.com\">Share</a>" + #CRLF$


Dim String$(0)
CreateRegularExpression(0, "(?<=<)a(?=\s+href)|(?<=</)a(?=>)", #PB_RegularExpression_NoCase)
For k = 1 To 100
  ReDim String$(k)
  String$(k) = StringField(Text$, k,#CRLF$)
  If String$(k) = ""
    Break
  ElseIf Left(String$(k),7)= "<a href"
    new$ = ReplaceRegularExpression(0, UnescapeString(string$(k)), "zz")
    String$(k) = EscapeString(new$)
  EndIf
  final$ = final$ + string$(k)+#CRLF$
Next
FreeRegularExpression(0)
Debug final$

Re: Regular Expression dude

Posted: Fri Dec 14, 2018 9:23 am
by zikitrake
RASHAD wrote:Be careful with Escape String
Yours misses "href=\"
...
Thank you, RASHAD, but the original source don't has '\', it's only to manage the input text with double quotes :)

Code: Select all

Text$ = ~"<a href=\"https://www.sample.com\">Share</a>" + #CRLF$ 
or 
Text$ = "<a href=" + #DQUOTE$ + "https://www.sample.com" + #DQUOTE$ +">Share</a>" + #CRLF$
I only need get <a href="https://www.sample.com">Share</a>

PS: If your comment goes the other way, excuse me, my English is limited and I often confuse the real meaning of your comments.