I'm writing an application in Python that is going to have a lot of different functions, so logically I thought it would be best to split up my script into different modules. Currently my script reads in a text file that contains code which has been converted into tokens and spellings. The script then reconstructs the code into a string, with blank lines where comments would have been in the original code.
I'm having a problem making the script object-oriented though. Whatever I try I can't seem to get the program running the same way it would as if it was just a single script file. Ideally I'd like to have two script files, one that contains a class and function that cleans and reconstructs the file. The second script would simply call the function from the class in the other file on a file given as an argument from the command line. This is my current script:
import sys tokenList = open(sys.argv[1], 'r') cleanedInput = '' prevLine = 0 for line in tokenList: if line.startswith('LINE:'): lineNo = int(line.split(':', 1)[1].strip()) diff = lineNo - prevLine - 1 if diff == 0: cleanedInput += '\n' if diff == 1: cleanedInput += '\n\n' else: cleanedInput += '\n' * diff prevLine = lineNo continue cleanedLine = line.split(':', 1)[1].strip() cleanedInput += cleanedLine + ' ' print cleanedInput
After following Alex Martelli advice below, I now have the following code which gives me the same output as my original code.
def main(): tokenList = open(sys.argv[1], 'r') cleanedInput = [] prevLine = 0 for line in tokenList: if line.startswith('LINE:'): lineNo = int(line.split(':', 1)[1].strip()) diff = lineNo - prevLine - 1 if diff == 0: cleanedInput.append('\n') if diff == 1: cleanedInput.append('\n\n') else: cleanedInput.append('\n' * diff) prevLine = lineNo continue cleanedLine = line.split(':', 1)[1].strip() cleanedInput.append(cleanedLine + ' ') print cleanedInput if __name__ == '__main__': main()
I would still like to split my code into multiple modules though. A 'cleaned file' in my program will have other functions performed on it so naturally a cleaned file should be a class in its own right?
Python is a fantastic programming language that allows you to use both functional and object-oriented programming paradigms.
Object-Oriented Python, The heart of Python programming is a way of programming that focuses on using objects and classes to design and build applications. Major pillars of Object Oriented Programming (OOP) are Inheritance, Polymorphism, Data Abstraction, and Encapsulation.
Python supports all the concept of "object oriented programming" but it is NOT fully object oriented because - The code in Python can also be written without creating classes.
To speed up your existing code measurably, add def main():
before the assignment to tokenList
, indent everything after that 4 spaces, and at the end put the usual idiom
if __name__ == '__main__': main()
(The guard is not actually necessary, but it's a good habit to have nevertheless since, for scripts with reusable functions, it makes them importable from other modules).
This has little to do with "object oriented" anything: it's simply faster, in Python, to keep all your substantial code in functions, not as top-level module code.
Second speedup, change cleanedInput
into a list, i.e., its first assignment should be = []
, and wherever you now have +=
, use .append
instead. At the end, ''.join(cleanedInput)
to get the final resulting string. This makes your code take linear time as a function of input size (O(N)
is the normal way of expressing this) while it currently takes quadratic time (O(N squared)
).
Then, correctness: the two statements right after continue
never execute. Do you need them or not? Remove them (and the continue
) if not needed, remove the continue
if those two statements are actually needed. And the tests starting with if diff
will fail dramatically unless the previous if
was executed, because diff
would be undefined then. Does your code as posted perhaps have indentation errors, i.e., is the indentation of what you posted different from that of your actual code?
Considering these important needed enhancements, and the fact that it's hard to see what advantage you are pursuing in making this tiny code OO (and/or modular), I suggest clarifying the indenting / correctness situation, applying the enhancements I've proposed, and leaving it at that;-).
Edit: as the OP has now applied most of my suggestions, let me follow up with one reasonable way to hive off most functionality to a class in a separate module. In a new file, for example foobar.py
, in the same directory as the original script (or in site-packages
, or elsewhere on sys.path
), place this code:
def token_of(line): return line.partition(':')[-1].strip() class FileParser(object): def __init__(self, filename): self.tokenList = open(filename, 'r') def cleaned_input(self): cleanedInput = [] prevLine = 0 for line in self.tokenList: if line.startswith('LINE:'): lineNo = int(token_of(line)) diff = lineNo - prevLine - 1 cleanedInput.append('\n' * (diff if diff>1 else diff+1)) prevLine = lineNo else: cleanedLine = token_of(line) cleanedInput.append(cleanedLine + ' ') return cleanedInput
Your main script then becomes just:
import sys import foobar def main(): thefile = foobar.FileParser(sys.argv[1]) print thefile.cleaned_input() if __name__ == '__main__': main()
When I do this particular refactoring, I usually start with an initial transformation within the first file. Step 1: move the functionality into a method in a new class. Step 2: add the magic invocation below to get the file to run like a script again:
class LineCleaner: def cleanFile(filename): cleanInput = "" prevLine = 0 for line in open(filename,'r'): <... as in original script ..> if __name__ == '__main__': cleaner = LineCleaner() cleaner.cleanFile(sys.argv[1])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With