Explaining the epilog of fnmatch.translate, \Z(?ms)
I was debugging a filtering directory walker (on which, more to follow) and I was trying to figure out the mysterious suffix that fnmatch.translate appends to its result, \Z(?ms).
fnmatch.translate takes a Unix-style glob, like *.py or test_*.py[cod], and translates it character-by-character into a regular expression. It then appends \Z(?ms). Hence the latter glob becomes r'test\_.*\.py[cod]\Z(?ms)', using Python’s raw string notation to avoid the backslash plague. Also, the ? wildcard character becomes the . regex special character, while the * wildcard becomes the .* greedy regex.
A StackOverflow answer partially explains, which set me on the right track. (?ms) is equivalent to compiling the regex with re.MULTILINE | re.DOTALL. The re.DOTALL modifier makes the . special character match any character, including newline; normally, . excludes newlines. The re.MULTILINE modifier makes ^ and $ operate on newline boundaries within the search string; otherwise, they anchor to the beginning and end of the string. \A always matches the beginning of the string; \Z always matches the end of the string.
Another way of saying this:
# No multiline, so ^ and $ anchor beginning and end of string >>> re.search(r'^\.git$(?s)', '.git') <_sre.SRE_Match object at 0x10e73a850> >>> re.search(r'^\.git$(?s)', 'bar\n.git\nfoo') # Nope # Multiline => ^ matches after \n and $ before \n >>> re.search(r'^\.git$(?ms)', 'bar\n.git\nfoo') <_sre.SRE_Match object at 0x10e73a988> # \A and \Z always anchor beginning and end of string >>> re.search(r'\A\.git\Z(?ms)', '.git') <_sre.SRE_Match object at 0x10e73a850> >>> re.search(r'\A\.git\Z(?ms)', 'foo\n.git') # Nope
So, \Z(?ms) at the end of the pattern means:
- \Z: the pattern must match all the way to the end of the search string;
- (?m): the search string may contain newlines;
- (?s): the ? and * wildcards may match newlines in the search string.