I'm trying to use python to parse lines of c++ source code. The only thing I am interested in is include directives.
#include "header.hpp"
I want it to be flexible and still work with poor coding styles like:
# include"header.hpp"
I have gotten to the point where I can read lines and trim whitespace before and after the #. However I still need to find out what directive it is by reading the string until a non-alpha character is encountered regardless of weather it is a space, quote, tab or angled bracket.
So basically my question is: How can I split a string starting with alphas until a non alpha is encountered?
I think I might be able to do this with regex, but I have not found anything in the documentation that looks like what I want.
Also if anyone has advice on how I would get the file name inside the quotes or angled brackets that would be a plus.
Use the re. split() method to split a string on all special characters. The re. split() method takes a pattern and a string and splits the string on each occurrence of the pattern.
Use the isalnum() Method to Remove All Non-Alphanumeric Characters in Python String. We can use the isalnum() method to check whether a given character or string is alphanumeric or not. We can compare each character individually from a string, and if it is alphanumeric, then we combine it using the join() function.
A common solution to remove all non-alphanumeric characters from a String is with regular expressions. The idea is to use the regular expression [^A-Za-z0-9] to retain only alphanumeric characters in the string. You can also use [^\w] regular expression, which is equivalent to [^a-zA-Z_0-9] .
Python split() method is used to split the string into chunks, and it accepts one argument called separator. A separator can be any character or a symbol. If no separators are defined, then it will split the given string and whitespace will be used by default.
Your instinct on using regex is correct.
import re
re.split('[^a-zA-Z]', string_to_split)
The [^a-zA-Z]
part means "not alphabetic characters".
You can do that with a regex. However, you can also use a simple while
loop.
def splitnonalpha(s):
pos = 1
while pos < len(s) and s[pos].isalpha():
pos+=1
return (s[:pos], s[pos:])
Test:
>>> splitnonalpha('#include"blah.hpp"')
('#include', '"blah.hpp"')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With