How to preg_match a pattern having linebreaks?

Discussion in 'App Development' started by deanstreet, Aug 31, 2021.

  1. Can you please help on matching a pattern having potential linebreaks.

    On [PHPRegexLive](, I use the regex pattern = {{\s*IF(.+)}}(.+){{\s*ENDIF}} on search string:

    before if....{{IF !empty('')}} <div class='h6 mt-4 mb-2 edit-btn-container'>About</div> {{ENDIF}} after if....

    The result is fine, array[0] = entire {{IF <condition>}}...{{ENDIF}} string, array[1] = <condition>, and array[2] = whatever between {{IF <con>}} and {{ENDIF}}.

    The problem is when the entire {{IF <con>}}...{{ENDIF}} spans more than one line, such as

    before if....{{IF !empty('')}}
    <div class='h6 mt-4 mb-2 edit-btn-container'>About</div>
    {{ENDIF}} after if....

    I tried different combinations of \n*, \n*\r*, etc, and s, m modifier but cannot get it to work.
  2. DaveV


    Look into the preg_match ending modifiers 's' (Single line) and 'm' (Multiline)

    Also, instead of: IF(.+)}}
    I would try: IF(.*?)}
    the reason is the the first way the ending }} could match the }} that follows ENDIF}}
    .*? means a non-greedy match; i.e. it will stop at the first }} match
  3. trudnai


    Btw try to avoid using "unlimited wildcars" like * and +. Use {1,n} instead of + and {0,n} instead of * where n is the number of max characters you expect to a positive match. The reason is unlimited search can lead very slow regex matches in case of big input data and where end results is not guaranteed.

    Also instead of . You can use a definite set of characters or a stopper. It seems like you never want to go further than { and/or } character so you can write [^}] and [^{] instead of the dot.

    And finally it is better to escape { and } as normally they used for ranges (see above).

    So the final regex might look a bit more obfuscated than yours but much safer and faster to use:

    DaveV likes this.
  4. ph1l


    Since PHP has nested IF statements, a single regular expression that searches for an IF followed by an ENDIF would not be able to match arbitrary nested IFs properly.
  5. You might wanna look into compiler theory with lex and yacc. You have a lexer that identifies individual grammar tokens, in your case IF, ENDIF, "{" ... and then uses a syntax specification to parse them into something meaningful (an abstract syntax tree in compiler / interpreter case). Not sure what you're trying to achieve but parsing grammars is not trivial, prepare for pain :p