Regex and $1 parameter

Just starting out? Need help? Post your questions and find answers here.
BarryG
Addict
Addict
Posts: 3292
Joined: Thu Apr 18, 2019 8:17 am

Regex and $1 parameter

Post by BarryG »

Hi, back again with another regular expression question. I was interested in converting camel case to title case, and found the following example at StackOverflow. It has 442 upvotes, so it must be correct, hehe. But I can't make it work with PureBasic (I've only tried putting spaces before each capital in my code below). Please help. Thanks.

https://stackoverflow.com/a/4149393/7908170

Image

Code: Select all

text$="thisStringIsGood"

r=CreateRegularExpression(#PB_Any,"/([A-Z])/g")
If r
  If MatchRegularExpression(r,text$)
    text$=ReplaceRegularExpression(r,text$," $1")
  EndIf
  FreeRegularExpression(r)
  Debug text$ ; Want "This String Is Good"
EndIf
Last edited by BarryG on Sun Nov 28, 2021 10:55 am, edited 2 times in total.
Marc56us
Addict
Addict
Posts: 1477
Joined: Sat Feb 08, 2014 3:26 pm

Re: Regex and $1 parameter

Post by Marc56us »

ReplaceRegularExpression()
...
Remarks

Back references (usually described as \1, \2, etc.) are not supported. ExtractRegularExpression() combined with ReplaceString() should achieve the requested behaviour.
:wink:
#NULL
Addict
Addict
Posts: 1440
Joined: Thu Aug 30, 2007 11:54 pm
Location: right here

Re: Regex and $1 parameter

Post by #NULL »

I didn't see the doc talking about extractreg.. as Marc56us posted, but this seem to work:

Code: Select all

s.s = "thisStringIsGood"

If CreateRegularExpression(0, "([A-Z])")
  If ExamineRegularExpression(0, s)
    While NextRegularExpressionMatch(0)
      s = ReplaceString(s,
                    RegularExpressionGroup(0, 1),
                    " " + RegularExpressionGroup(0, 1),
                    #PB_String_CaseSensitive,
                    RegularExpressionGroupPosition(0, 1),
                    1)
    Wend
  EndIf
Else
  Debug RegularExpressionError()
EndIf
Debug s

If CreateRegularExpression(0, "(^.)")
  If ExamineRegularExpression(0, s)
    While NextRegularExpressionMatch(0)
      s = ReplaceString(s,
                    RegularExpressionGroup(0, 1),
                    UCase(RegularExpressionGroup(0, 1)),
                    #PB_String_CaseSensitive,
                    RegularExpressionGroupPosition(0, 1),
                    1)
    Wend
  EndIf
Else
  Debug RegularExpressionError()
EndIf
Debug s
BarryG
Addict
Addict
Posts: 3292
Joined: Thu Apr 18, 2019 8:17 am

Re: Regex and $1 parameter

Post by BarryG »

Oh crap, so "$1" is what PureBasic doesn't support? Darn. My app was to offer regex for its users to specify the regex text that they need, but obviously they can't now. So I'll have to not offer that feature, which is a real shame. I can't use entire replacements like #NULL's example for the reasons I just explained. This is very disappointing. Unless there's some other unofficial way to support regex fully and ignore PureBasic's version?
Last edited by BarryG on Sun Nov 28, 2021 9:47 am, edited 1 time in total.
Marc56us
Addict
Addict
Posts: 1477
Joined: Sat Feb 08, 2014 3:26 pm

Re: Regex and $1 parameter

Post by Marc56us »

A quick and dirty solution without regex

Code: Select all

text$="thisStringIsGood"

For i = 1 To Len(text$)
  Char$ = Mid(text$, i, 1)
  If Char$ = UCase(Char$)
    Char$ = " " + Char$
  EndIf
  Full$ + Char$
Next

Debug Full$
(need to add a line for first char)
:wink:
BarryG
Addict
Addict
Posts: 3292
Joined: Thu Apr 18, 2019 8:17 am

Re: Regex and $1 parameter

Post by BarryG »

No good Marc56us - see my post above yours for why. PureBasic doesn't support drop-in regex statements obtained from the web, so it can't be used.
Marc56us
Addict
Addict
Posts: 1477
Joined: Sat Feb 08, 2014 3:26 pm

Re: [Ignore] Regex and $1 parameter

Post by Marc56us »

Unless there's some other unofficial way to support regex fully and ignore PureBasic's version?
Use RunProgram() and call external tool like SED or AWK (yes these unix tools exists for Windows too)
:wink:
BarryG
Addict
Addict
Posts: 3292
Joined: Thu Apr 18, 2019 8:17 am

Re: Regex and $1 parameter

Post by BarryG »

Hi #NULL, your code works great for your example text ("thisStringIsGood") but if I change it to something just slightly different ("thisIsCamelCase") then it fails (has extra spaces, and "CamelCase" doesn't separate like "IsGood" does). I can't work out why that would be. Any ideas?

Here's what I'm testing with:

Code: Select all

new$=" "
text$="thisStringIsGood" ; Works.
text$="thisIsCamelCase" ; Fails.
r=CreateRegularExpression(#PB_Any,"([A-Z])")
If r
  If ExamineRegularExpression(r,text$)
    While NextRegularExpressionMatch(r)
      text$=ReplaceString(text$,RegularExpressionGroup(r,1),new$+RegularExpressionGroup(r,1),#PB_String_CaseSensitive,RegularExpressionGroupPosition(r,1),1)
    Wend
  EndIf
  FreeRegularExpression(r)
EndIf
Debug text$
#NULL
Addict
Addict
Posts: 1440
Joined: Thu Aug 30, 2007 11:54 pm
Location: right here

Re: Regex and $1 parameter

Post by #NULL »

The StartPosition Parameter for ReplaceString needs to be changed from RegularExpressionGroupPosition(r,1) to RegularExpressionMatchPosition(r). Group position isn't correct there so the first C gets replaced twice. :)
BarryG
Addict
Addict
Posts: 3292
Joined: Thu Apr 18, 2019 8:17 am

Re: Regex and $1 parameter

Post by BarryG »

You da man! Haha. That works, but it's bedtime now so I'll test more extensively tomorrow. Thanks!
BarryG
Addict
Addict
Posts: 3292
Joined: Thu Apr 18, 2019 8:17 am

Re: Regex and $1 parameter

Post by BarryG »

AZJIO, I took a look but can't see how that helps with my question here? It outputs the original string. Granted, I'm not great with regex's so I'm probably doing something wrong. The aim is to make the regex work exactly the same way as the StackOverflow version at the start of this thread, since that's what my users will be providing.

Here's your code and what I tried:

Code: Select all

#RegExp = 0

Procedure.s RegexReplace2(RgEx, *Result.string, Replace0$)
  Protected i, CountGr, Pos, Offset = 1
  Protected Result$, Replace$
  Protected NewList item.s()
  Protected LenT, *Point
  CountGr = CountRegularExpressionGroups(RgEx)
  If CountGr > 9
    CountGr = 9
  EndIf
  If ExamineRegularExpression(RgEx, *Result\s)
    While NextRegularExpressionMatch(RgEx)
      Pos = RegularExpressionMatchPosition(RgEx)
      Replace$ = ReplaceString(Replace0$,"\0", RegularExpressionMatchString(RgEx))
      For i = 1 To CountGr
        Replace$ = ReplaceString(Replace$, "\"+Str(i), RegularExpressionGroup(RgEx, i))
      Next
      If AddElement(item())
        item() = Mid(*Result\s, Offset, Pos - Offset) + Replace$
      EndIf
      Offset = Pos + RegularExpressionMatchLength(RgEx)
    Wend
    If AddElement(item())
      item() = Mid(*Result\s, Offset)
    EndIf
    LenT = 0
    ForEach item()
      LenT + Len(item())
    Next
    *Result\s = Space(LenT)
    *Point = @*Result\s
    ForEach item()
      CopyMemoryString(item(), @*Point)
    Next
    FreeList(item())
  EndIf
EndProcedure


#RegExp = 0
Define Text.string

Text\s = "thisStringIsGood"
CreateRegularExpression(#RegExp , "/([A-Z])/g" )
RegexReplace2(#RegExp, @Text, " \1" )
FreeRegularExpression(#RegExp)
Debug Text\s ; thisStringIsGood
AZJIO
Addict
Addict
Posts: 1312
Joined: Sun May 14, 2017 1:48 am

Re: Regex and $1 parameter

Post by AZJIO »

I'm busy right now, but a hint that the problem of groups is being solved here.
2. You need to remove the character at the beginning "/" and the character at the end "/[gim]+", but not just remove, but use these flags to enable the appropriate mode.
Marc56us
Addict
Addict
Posts: 1477
Joined: Sat Feb 08, 2014 3:26 pm

Re: Regex and $1 parameter

Post by Marc56us »

(just for the fun following my suggestion https://www.purebasic.fr/english/viewto ... 74#p577574)

Quick and dirty code taking the user input and transmitting it as is to SED. (so using SED Regex)
(SED use \1 instead of $1)

Code: Select all

;  Regex And $1 parameter
;  Post by BarryG » Sun Nov 28, 2021 8:47 am 
;  https://www.purebasic.fr/english/viewtopic.php?p=577567#p577567
;  Marc56 - 2021-11-23

EnableExplicit

Enumeration 
    #RegExp
EndEnumeration

Procedure RegexReplaceNew(RegEx$, Text$, Replace$)
    Debug "Regex source   : " + Regex$
    RegEx$ = ReplaceString(RegEx$, "(", "\(")
    RegEx$ = ReplaceString(RegEx$, ")", "\)")
    RegEx$ = RTrim(RegEx$, "g")
    Debug "Regex with esc : " + Regex$ 
    
    Protected Arg$  = "sed 's" + RegEx$ + Replace$ + "/g' Tmp_File.in > Tmp_File.out"
    Debug "SED command line: " + Arg$
    
    Protected Run = RunProgram("wsl", Arg$, GetTemporaryDirectory(), #PB_Program_Wait)
    
    Protected Tmp_File$ = GetTemporaryDirectory() + "Tmp_File.out"
    If FileSize(Tmp_File$) > 0
        ReadFile(1, Tmp_File$)
        Protected New_Line$ = ReadString(1)
        CloseFile(1)
        Debug "---"
        Debug Text$
        Debug New_Line$
        Debug UCase(Left(New_Line$, 1)) + Right(New_Line$, Len(New_Line$) -1)
    Else
        Debug "No file"
    EndIf
EndProcedure

Global Text$ = "thisStringIsGood"
If OpenFile(0, GetTemporaryDirectory() + "Tmp_File.in")
    WriteString(0, "thisStringIsGood")
    CloseFile(0)
    Global RegEx$ = "/([A-Z])/g"
    RegexReplaceNew(RegEx$ ,Text$, " \1")
Else
    Debug "Can't create Temp file"
    End
EndIf

DeleteFile(GetTemporaryDirectory() + "Tmp_File.in")
DeleteFile(GetTemporaryDirectory() + "Tmp_File.out")

End
(Using SED of WSL 1. If you don't have it installed, download SED from Unix Tools for Windows instead)

Code: Select all

Regex source   : /([A-Z])/g
Regex with esc : /\([A-Z]\)/
SED command line: sed 's/\([A-Z]\)/ \1/g' Tmp_File.in > Tmp_File.out
---
thisStringIsGood
this String Is Good
This String Is Good
:mrgreen:

But, the simplest solution would obviously be to parse the user input (remove // and quantifiers) and use the regular expression functions of PB. But create your own regular expression filter with all the solutions, I hope you have lots of coffee and time :wink:
AZJIO
Addict
Addict
Posts: 1312
Joined: Sun May 14, 2017 1:48 am

Re: Regex and $1 parameter

Post by AZJIO »

Code: Select all

EnableExplicit

Procedure.s RegexReplace2(RgEx, *Result.string, Replace0$, Once = 0)
	Protected i, CountGr, Pos, Offset = 1
	Protected Result$, Replace$
	Protected NewList item.s()
	Protected LenT, *Point
	CountGr = CountRegularExpressionGroups(RgEx)
	If CountGr > 9
		CountGr = 9
	EndIf
	If ExamineRegularExpression(RgEx, *Result\s)
		While NextRegularExpressionMatch(RgEx)
			Pos = RegularExpressionMatchPosition(RgEx)
			Replace$ = ReplaceString(Replace0$,"\0", RegularExpressionMatchString(RgEx))
			For i = 1 To CountGr
				Replace$ = ReplaceString(Replace$, "\"+Str(i), RegularExpressionGroup(RgEx, i))
			Next
			If AddElement(item())
				item() = Mid(*Result\s, Offset, Pos - Offset) + Replace$
			EndIf
			Offset = Pos + RegularExpressionMatchLength(RgEx)
			If Once
				Break
			EndIf
		Wend
		If AddElement(item())
			item() = Mid(*Result\s, Offset)
		EndIf
		LenT = 0
		ForEach item()
			LenT + Len(item())
		Next
		*Result\s = Space(LenT)
		*Point = @*Result\s
		ForEach item()
			CopyMemoryString(item(), @*Point)
		Next
		FreeList(item())
	EndIf
EndProcedure


Define reSource$, reFlag$, User_entered$, re, re2, CreFlags = 0, Once = 0

User_entered$ = "/([A-Z])/g"

re=CreateRegularExpression(#PB_Any,"/(.+?)/([gim]*)")
If re
	If ExamineRegularExpression(re, User_entered$)
		If NextRegularExpressionMatch(re)
			reSource$ = RegularExpressionGroup(re, 1)
			reFlag$ = RegularExpressionGroup(re, 2)
		EndIf
	EndIf
	FreeRegularExpression(re)
EndIf

If Not Asc(reSource$)
	Debug "User, you're wrong. Empty regular expression"
	Debug "The regular expression should be in the following format: /anything/gim"
	End
EndIf

If FindString(reFlag$, "i")
	CreFlags | #PB_RegularExpression_NoCase
EndIf
If FindString(reFlag$, "m")
	CreFlags | #PB_RegularExpression_MultiLine
EndIf
If Not FindString(reFlag$, "g")
	Once = 1
EndIf

Define Text.string

Text\s = "thisStringIsGood"
re2 = CreateRegularExpression(#PB_Any, reSource$, CreFlags)
If re2
	RegexReplace2(re2, @Text, " \1", Once)
	FreeRegularExpression(re2)
	Debug Text\s ; thisStringIsGood
Else
	Debug "User, you're wrong:"
	Debug RegularExpressionError()
	End
EndIf
Post Reply