Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

deleting a specific entry from a bibtex file based on cite key using Python

how can a delete a specific entry from a bibtex file based on a cite key using python? I basically want a function that takes two arguments (path to bibtex file and cite key) and deletes the entry that corresponds to the key from the file. I played around with regular expressions but wasn't successful. I also looked a little for bibtex parsers but that seems like an overkill. In the skeleton function below, the decisive part is content_modified =.

def deleteEntry(path, key):
  # get content of bibtex file
  f = open(path, 'r')
  content = f.read()
  f.close() 
  # delete entry from content string
  content_modified = 

  # rewrite file
  f = open(path, 'w')
  f.write(content_modified)
  f.close() 

Here is an example bibtex file (with spaces in the abstract):

@article{dai2008thebigfishlittlepond,
    title = {The {Big-Fish-Little-Pond} Effect: What Do We Know and Where Do We Go from Here?},
    volume = {20},
    shorttitle = {The {Big-Fish-Little-Pond} Effect},
    url = {http://dx.doi.org/10.1007/s10648-008-9071-x},
    doi = {10.1007/s10648-008-9071-x},
    abstract = {The big-fish-little-pond effect {(BFLPE)} refers to the theoretical prediction that equally able students will have lower academic
self-concepts in higher-achieving or selective schools or programs than in lower-achieving or less selective schools or programs,
largely due to social comparison based on local norms. While negative consequences of being in a more competitive educational
setting are highlighted by the {BFLPE}, the exact nature of the {BFLPE} has not been closely scrutinized. This article provides
a critique of the {BFLPE} in terms of its conceptualization, methodology, and practical implications. Our main argument is that
of the {BFLPE.}},
    number = {3},
    journal = {Educational Psychology Review},
    author = {Dai, David Yun and Rinn, Anne N.},
    year = {2008},
    keywords = {education, composition by performance, education, peer effect, education, school context, education, social comparison/big-fish{\textendash}little-pond effect},
    pages = {283--317},
    file = {Dai_Rinn_2008_The Big-Fish-Little-Pond Effect.pdf:/Users/jpl2136/Documents/Literatur/Dai_Rinn_2008_The Big-Fish-Little-Pond Effect.pdf:application/pdf}
}

@book{coleman1966equality,
    title = {Equality of Educational Opportunity},
    shorttitle = {Equality of educational opportunity},
    publisher = {{U.S.} Dept. of Health, Education, and Welfare, Office of Education},
    author = {Coleman, James},
    year = {1966},
    keywords = {\_task\_obtain, education, school context, soz. Ungleichheit, education}
}

EDIT: Here is a solution that I came up with. It's not based on matching the whole bibtex entry but instead looks for all the beginnings @article{dai2008thebigfishlittlepond, and then removes the corresponding entry by slicing the context string.

content_keys = [(m.group(1), m.start(0)) for m in re.finditer("@\w{1,20}\{([\w\d-]+),", content)]
idx = [k[0] for k in content_keys].index(key)
content_modified = content[0:content_keys[idx][1]] + content[content_keys[idx + 1][1]:]
like image 228
user2503795 Avatar asked Nov 12 '22 17:11

user2503795


1 Answers

As Beni Cherniavsky-Paskin mentioned in the comment, you will have to rely on the fact, that your BibTex entries will start and end right after the start of the line (without any tabs or spaces). Then you can do this:

pattern = re.compile(r"^@\w+\{"+key+r",.*?^\}", re.S | re.M)
content_modified = re.sub(pattern, "", content)

Note the two modifiers. S makes the . match line breaks. M makes ^ match at the start of the string.

If you cannot rely on this fact, then the BibTex format is simply not a regular language (since it allows nesting of {} which has to be counted for correct results. There are regex flavors, which might still make this task possible (using recursion or balancing group), but I think Python supports none of those features. Hence, you would actually have to use a BibTex parser (which would also make your code a lot more understable, I guess).

like image 151
Martin Ender Avatar answered Nov 15 '22 06:11

Martin Ender