I have a set of LaTeX files. I would like to extract the "abstract" section for each one:
\begin{abstract}
.....
\end{abstract}
I have tried the suggestion here: How to Parse LaTex file
And tried :
A = re.findall(r'\\begin{abstract}(.*?)\\end{abstract}', data)
Where data contains the text from the LaTeX file. But A
is just an empty list. Any help would be greatly appreciated!
.*
does not match newlines unless the re.S flag is given:
re.findall(r'\\begin{abstract}(.*?)\\end{abstract}', data, re.S)
Consider this test file:
\documentclass{report}
\usepackage[margin=1in]{geometry}
\usepackage{longtable}
\begin{document}
Title maybe
\begin{abstract}
Good stuff
\end{abstract}
Other stuff
\end{document}
This gets the abstract:
>>> import re
>>> data = open('a.tex').read()
>>> re.findall(r'\\begin{abstract}(.*?)\\end{abstract}', data, re.S)
['\nGood stuff\n']
From the re
module's webpage:
re.S
re.DOTALLMake the '.' special character match any character at all, including a newline; without this flag, '.' will match anything except a newline.
The .
does not match newline character. However, you can pass a flag to ask it to include newlines.
Example:
import re
s = r"""\begin{abstract}
this is a test of the
linebreak capture.
\end{abstract}"""
pattern = r'\\begin\{abstract\}(.*?)\\end\{abstract\}'
re.findall(pattern, s, re.DOTALL)
#output:
['\nthis is a test of the\nlinebreak capture.\n']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With