Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - How to split a string by non alpha characters

I'm trying to use python to parse lines of c++ source code. The only thing I am interested in is include directives.

    #include "header.hpp"

I want it to be flexible and still work with poor coding styles like:

          #   include"header.hpp"  

I have gotten to the point where I can read lines and trim whitespace before and after the #. However I still need to find out what directive it is by reading the string until a non-alpha character is encountered regardless of weather it is a space, quote, tab or angled bracket.

So basically my question is: How can I split a string starting with alphas until a non alpha is encountered?

I think I might be able to do this with regex, but I have not found anything in the documentation that looks like what I want.

Also if anyone has advice on how I would get the file name inside the quotes or angled brackets that would be a plus.

like image 815
nickeb96 Avatar asked Feb 05 '16 18:02

nickeb96


People also ask

How do you split a string with special characters in Python?

Use the re. split() method to split a string on all special characters. The re. split() method takes a pattern and a string and splits the string on each occurrence of the pattern.

How do you filter non-alphanumeric characters in Python?

Use the isalnum() Method to Remove All Non-Alphanumeric Characters in Python String. We can use the isalnum() method to check whether a given character or string is alphanumeric or not. We can compare each character individually from a string, and if it is alphanumeric, then we combine it using the join() function.

How do you remove a non-alphanumeric character from a string?

A common solution to remove all non-alphanumeric characters from a String is with regular expressions. The idea is to use the regular expression [^A-Za-z0-9] to retain only alphanumeric characters in the string. You can also use [^\w] regular expression, which is equivalent to [^a-zA-Z_0-9] .

How do you split part of a string in Python?

Python split() method is used to split the string into chunks, and it accepts one argument called separator. A separator can be any character or a symbol. If no separators are defined, then it will split the given string and whitespace will be used by default.


2 Answers

Your instinct on using regex is correct.

import re
re.split('[^a-zA-Z]', string_to_split)

The [^a-zA-Z] part means "not alphabetic characters".

like image 185
nlloyd Avatar answered Oct 03 '22 19:10

nlloyd


You can do that with a regex. However, you can also use a simple while loop.

def splitnonalpha(s):
   pos = 1
   while pos < len(s) and s[pos].isalpha():
      pos+=1
   return (s[:pos], s[pos:])

Test:

>>> splitnonalpha('#include"blah.hpp"')
('#include', '"blah.hpp"')
like image 22
kfx Avatar answered Oct 03 '22 20:10

kfx