HTMLDecoder() (again)

Share your advanced PureBasic knowledge/code with the community.
infratec
Always Here
Always Here
Posts: 6871
Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany

HTMLDecoder() (again)

Post by infratec »

Hi,

html decoding was needed:
viewtopic.php?p=541446#p541446

after I did it, I found:
viewtopic.php?p=276574#p276574

But my version can also hex coded tags :wink:

Code: Select all

EnableExplicit


Procedure.s HTMLDecoder(HTMLText$)
 
  Protected Length.i, Result$, Help$, Value$, NumFlag.i, i.i, Char$
 
  Length = Len(HTMLText$)
 
  For i = 1 To Length
    Char$ = Mid(HTMLText$, i, 1)
    If Char$ = "&"
      Value$ = ""
      
      i + 1
      Char$ = Mid(HTMLText$, i, 1)
      If Char$ = "#"
        NumFlag = #True
        If Mid(HTMLText$, i + 1, 1) = "x"
          Value$ + "$"
          i + 1
        EndIf
      Else
        Value$ + Char$
        NumFlag = #False
      EndIf
      
      Help$ = ""
      While Help$ <> ";"
        i + 1
        Help$ = Mid(HTMLText$, i, 1)
        If Help$ <> ";"
          Value$ + Help$
        EndIf
      Wend
      
      If Not NumFlag
        Select Value$
          Case "nbsp" : Value$ = "160"
          Case "copy" : Value$ = "169"
          ; German umlauts
          Case "auml" : Value$ = "228"
          Case "Auml" : Value$ = "196"
          Case "ouml" : Value$ = "246"
          Case "Ouml" : Value$ = "214"
          Case "uuml" : Value$ = "252"
          Case "Uuml" : Value$ = "220"
          Case "szlig" : Value$ = "223"
          ; Mathematics
          Case "infin" : Value$ = "$221E"
          Case "radic" : Value$ = "$221A"
          Case "sum" : Value$ = "$2211"
          Case "part" : Value$ = "$2202"
          Case "foarall" : Value$ = "$2200"
          Case "exist" : Value$ = "$2203"
          Case "int" : Value$ = "$222B"
          Case "permil" : Value$ = "$2030"
          Case "frac12" : Value$ = "$BD"
          Case "frac14" : Value$ = "$BC"
          Case "frac34" : Value$ = "$BE"
          ; Operators
          Case "lt" : Value$ = "$3C"
          Case "gt" : Value$ = "$3E"
          Case "le" : Value$ = "$2264"
          Case "ge" : Value$ = "$2265"
          Case "ne" : Value$ = "$2260"
          Case "asymp" : Value$ = "$2248"
          Case "equiv" : Value$ = "$2261"
          Case "amp" : Value$ = "$26"
          Case "plusmn" : Value$ = "$B1"
          ; Physics
          Case "lambda" : Value$ = "$39B"
          Case "omega" : Value$ = "$3C9"
          Case "Omega" : Value$ = "$937"
          ; Currencies
          Case "euro" : Value$ = "$20AC"
          Case "dollar" : Value$ = "$24"
          Case "pound" : Value$ = "$A3"
          Case "cent" : Value$ = "$A2"
          Case "yen" : Value$ = "$A5"
          ; Punctuation marks
          Case "iquest" : Value$ = "$BF"
          Case "lsaquo" : Value$ = "$2039"
          Case "rsaquo" : Value$ = "$203A"
          Case "laquo" : Value$ = "$AB"
          Case "raquo" : Value$ = "$BB"
          ; Set Theory
          Case "cup" : Value$ = "$222A"
          Case "cap" : Value$ = "$2229"
          Case "sub" : Value$ = "$2282"
          Case "sup" : Value$ = "$2283"
          Case "isin" : Value$ = "$2208"
          Case "notin" : Value$ = "$2209"
          Case "ni" : Value$ = "$220B"
        EndSelect
      EndIf  
      
      Result$ + Chr(Val(Value$))
      
    Else
      Result$ + Char$
    EndIf
  Next i
 
  ProcedureReturn Result$
 
EndProcedure

Debug HTMLDecoder("&Auml;tsch! &#65; &#x41; &permil;")
If you can not see all characters (a square is shown instead), then the font which is used can not show them.
Choose an other font.