

You could add a final \n in the regular expression if you want to enforce a double newline at the end.Instead, put all the textlines in one group. ((?: textline )+) means match one or more textlines but do not put each line in a group.This defines what I will call a textline. +\n means "match as many upper case letters as possible until you reach a newline.The result (without the newlines) is put in the first group. Then (.+?)\n\n means "match as few characters as possible (all characters are allowed) until you reach two newlines".Be aware that it does not match the newline itself (same for $: it means "just before a newline", but it does not match the newline itself). The first character ( ^) means "starting at the beginning of a line".Some explanation about this regular expression might be useful: ^(.+?)\n\n((?:+\n)+) Sequence: LLLLLMMMMMMNNNNNNNOOOOPPPPPPPQQQQQQRRRRRRSSSTTTTTUUUUUVVVVVVWWWWWW Sequence: AAABBBBBBCCCCCCDDDDDDDEEEEEEEFFFFFFFFGGGGGGGHHHHHHIIIIIJJJJJJJKKKK To do this, you need to make the match after string optional, but anchor that match to the end of the string.

> for match in rx_sequence.finditer(text): You have two separate issues you need to address: (1) the sentence ending directly after string, and (2) the sentence ending sometime after string but with no end-of-sentence punctuation. > rx_blanks=re.compile(r"\W+") # to remove blanks and newlines If anyone's curious, it's supposed to be a sequence of amino acids that make up a protein. I'd like oup(1) to be some Varying Text and group(2) to be line1+line2+line3+etc until the empty line is encountered. I can catch the first part, no problem, but I can't seem to catch the 4-5 lines of uppercase text. The last one seems to match the lines of text one by one, which is not what I really want. and a lot of variations hereof with no luck. Re.compile(r"(^+)$", re.MULTILINE|re.DOTALL) # just textlines I've tried a few approaches: re.compile(r"^>(\w+)$$(+)^$", re.MULTILINE) # try to capture both parts

