I'm trying to figured out how to manage process with Python, although maybe C++ could be better for this. I'm using Python 2.7 and Ubuntu 14.04 is my OS.
Resume of what I'm trying to achieve:
My intention is to create a script to manage other softwares, something similar to what Selenium do with the browsers, but with any program. Maybe execute the process with Python using subprocess would give me the option to manage the process UI
Send Actions/Interact with Running Process
Right now I'm making this script in Linux using psutil
. I know there are some Windows Libraries like pywin
or pywindll
.
I want to manage a process, for example any kind of software with UI(Skype, Gedit, Firefox..), I would like to know if it's possible to send an action to make a click in a button.
I don't want to manage the mouse in the computer, because let's say this window is "hidden" under other windows/stuff:
I'm using psutil
to get the process, and I have a lot of options like:
But none of this actions seems to be what I'm looking for, that is to interact with the process UI...
Is even possible what I'm trying to achieve ?
Would be the simplest solution for this to send key strokes and mouse clicks ?
Read Memory Address Value
I've been using scanmem
in Linux to find the memmory address of some variable, and once I've found the memory address I'm looking for, I want to use that address in Python to get the value stored in that address.
The closest thing I've found was using ctypes
, something like:
from ctypes import string_at
from sys import getsizeof
mem_address = 0x7c3f
value = string_at(id(mem_address), getsizeof(mem_address))
I was thinking that a program when is executed has to send the UI of the program to the OS, could be possible to "capture" the interface with python, and redirect to the OS ?
Something like executing the software via Python so may be possible to manage directly the UI
As we said before, the run function is the recommended way to run an external process and should cover the majority of cases.
How to Start a Process in Python? To start a new process, or in other words, a new subprocess in Python, you need to use the Popen function call. It is possible to pass two parameters in the function call. The first parameter is the program you want to start, and the second is the file argument.
I like the way you think :D UI automation is awesome
On the question itself, as far as I can tell, all the software that can interact with the GUI of processes is based on computer vision with OCR or reading the memory to get the object model of the UI. The latter is probably not universal, since different widget toolkits and approaches to building the UI will have different underlying models - it'll probably be more of a pain in the ass than CV+OCR.
If you want to see some stuff that has already been made for this purpose check out the wikpedia list. You already know about Selenium but there's more - AutoIt and Sikuli where ones I checked for a similar project I want to make, in python. (AutoIt is BASIC-like -YUCK- and windows-only but Sikuli seems to be python related and is cross-platform - I checked them out ages ago so I don't remember details).
The really good news is that python has pretty good CV and OCR modules. My personal recommendation is simplecv which can wrap around opencv and other cv software and although I don't have a module of choice to recommend for OCR, I liked python-tesseract the most when I was scouting for modules.
The approach is generally to take a snapshot of the GUI (graphicsmagick can do that plenty well and there's a python wrapper for it), make out where the elements are with CV, read the labels with OCR and that way you get a model for what is on the window. Then, you give your script instructions on what to do and when, based on where it is on the GUI. Since python can send mouse and keyboard events, you're golden. You can even use the minidom
module to make an easier object model for your code.
As an aside, the CV+OCR approach is also used by a Hearthstone-related app that takes snapshots of the game and reads the score, which it then tracks for the player so they can make up metrics. It's a more lightweight and easy approach than it seems - I checked out the code and it was quite easy to understand, despite the heavyweight technologies behind it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With