Quantcast
Channel: Xojo Programming Forum - Latest topics
Viewing all articles
Browse latest Browse all 3780

Hang in regex

$
0
0

I have a really weird hang doing a regex. The regex itself is relatively simple:

\b(https?://|www\.)([^<>\s]+)

It is supposed to make links out of urls. I have a really old email (ironically from @Kem_Tekinay ) where the start of the second match goes before the search start position of the first match. The result is an infinite loop because matching never finishes.

The code isn’t really new. Does anyone see what I’m doing wrong?

'make urls to links

Dim theRegex As New RegExMBS
theRegex.CompileOptionCaseLess = True
theRegex.CompileOptionDotAll = True
theRegex.CompileOptionUngreedy = False
theRegex.CompileOptionNewLineAny = True
Dim searchString As String = "\b(https?://|www\.)([^<>\s]+)"

If theRegex.Compile(searchString) Then
  Dim searchStart As Integer
  Dim punctuation As String
  Dim protocol As String
  Dim url As String
  Dim replacement As String
  
  While theRegex.Execute(theText, searchStart) > 0
    ' Get match offsets
    Dim matchStart As Integer = theRegex.OffsetCharacters(0)
    Dim matchEnd As Integer = theRegex.OffsetCharacters(1)
    
    ' Extract submatches using offsets
    Dim protocolStart As Integer = theRegex.OffsetCharacters(2)
    Dim protocolEnd As Integer = theRegex.OffsetCharacters(3)
    protocol = theText.Middle(protocolStart, protocolEnd - protocolStart)
    
    Dim urlStart As Integer = theRegex.OffsetCharacters(4)
    Dim urlEnd As Integer = theRegex.OffsetCharacters(5)
    url = theText.Middle(urlStart, urlEnd - urlStart)
    
    ' Remove punctuation at the end of the URL
    Select Case url.Right(1)
    Case ")"
      url = url.TrimRight(")")
      punctuation = ")"
    Case "."
      url = url.TrimRight(".")
      punctuation = "."
    Case ","
      url = url.TrimRight(",")
      punctuation = ","
    Case "?"
      url = url.TrimRight("?")
      punctuation = "?"
    Else
      punctuation = ""
    End Select
    
    ' Add https and change http to https
    If protocol = "www." Then
      protocol = "https://www."
    ElseIf protocol = "http://" Then
      protocol = "https://"
    End If
    
    ' Create the replacement string
    replacement = "<a href=""" + protocol + url + """>" + protocol + url + "</a>" + punctuation
    
    ' Replace the match in the text
    theText = theText.Left(matchStart) + replacement + theText.Middle(matchEnd)
    
    ' Adjust search start position for the next match
    searchStart = matchStart + replacement.Length
  Wend
End If
beep

Example:
regex hang.xojo_binary_project.zip (7.0 KB)

9 posts - 4 participants

Read full topic


Viewing all articles
Browse latest Browse all 3780

Trending Articles