Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Area selection and capture of the screen with Python on Windows

How do you use python 3 on Windows to create and get the coordinates of a basic selection box? I need it to work anywhere on the screen/in any window. E.g., ideally you run the program, and then wherever you click/hold/drag, a semi-transparent, light blue box will show up, and python will register the coordinates (which it needs to save for later).

I'm creating a desktop tool that allows you to select portions of the screen, similar to how Capture2Text's area selection works. It's supposed to allow you to select a region of a video game screen (i.e. anything being displayed, no matter what program, whether browser, steam, or an emulator). It will then take a screenshot somehow, using maybe PIL or PyAutoGUI after it gets the desired coordinates.

So, I'm stuck with the area selection step. I've run across possible solutions using OpenCV, Matplotlib, pygame, tkiner, and Qt, but the first 2 only work in specified windows, and I don't know if the latter two work on the screen in general (and I'm not about to try to learn all about all of these different libraries without knowing if I'm on the right track or if this is even possible). I don't even know which is the simplest for my use case, or which libraries allow for this kind of general functionality.

This is a random attempt based on another SO answer I found, but it only works with a pre-saved image.

#ref(best?):https://stackoverflow.com/questions/6916054/how-to-crop-a-region-selected-with-mouse-click-using-python
import numpy as np
from PIL import Image
import matplotlib.widgets as widgets


def onselect(eclick, erelease):
    if eclick.ydata>erelease.ydata:
        eclick.ydata,erelease.ydata=erelease.ydata,eclick.ydata
    if eclick.xdata>erelease.xdata:
        eclick.xdata,erelease.xdata=erelease.xdata,eclick.xdata
    ax.set_ylim(erelease.ydata,eclick.ydata)
    ax.set_xlim(eclick.xdata,erelease.xdata)
    fig.canvas.draw()

fig = plt.figure()
ax = fig.add_subplot(111)
filename="test.jpg"
im = Image.open(filename)
arr = np.asarray(im)
plt_image=plt.imshow(arr)
rs=widgets.RectangleSelector(
    ax, onselect, drawtype='box',
    rectprops = dict(facecolor='blue', edgecolor = 'black', alpha=0.5, fill=True))
plt.show()

I'm looking for a solution that works directly on the screen without requiring a screenshot be taken in advance, since my application is supposed to be used alongside the game you're playing without interruption.

This is just the first step (from the user's perspective) of what my application does, and I've already implemented most of what happens after that (about 3000 LoC right now), so I'm looking for the most straightforward way to implement this so I can wrap up the project and make it usable.

like image 560
weirdalsuperfan Avatar asked Sep 02 '19 18:09

weirdalsuperfan


People also ask

How do you capture part of a screen in Python?

To capture a particular portion of a screen in Python, we need to import the pyscreenshot package. We will use the grab() function to take a screenshot. we must set pixel positions in the grab() function, to take a part of the screen. show() uses to display the screenshot image.


Video Answer


1 Answers

(DISCLAIMER: I cannot test right now the samples. If there are some bugs, let me know to fix them)

What you want to achieve is OS specific.

  • To access screen resources, use pywin32 (reference) a Python extension for the Win32 API.
  • To handle the mouse, use the PyHook library, a wrapper of hooks in the Windows Hooking API.

To get a screen shot area (source):

import win32gui
import win32ui
import win32con
import win32api

def saveScreenShot(x,y,width,height,path):
    # grab a handle to the main desktop window
    hdesktop = win32gui.GetDesktopWindow()

    # create a device context
    desktop_dc = win32gui.GetWindowDC(hdesktop)
    img_dc = win32ui.CreateDCFromHandle(desktop_dc)

    # create a memory based device context
    mem_dc = img_dc.CreateCompatibleDC()

    # create a bitmap object
    screenshot = win32ui.CreateBitmap()
    screenshot.CreateCompatibleBitmap(img_dc, width, height)
    mem_dc.SelectObject(screenshot)


    # copy the screen into our memory device context
    mem_dc.BitBlt((0, 0), (width, height), img_dc, (x, y),win32con.SRCCOPY)

    # save the bitmap to a file
    screenshot.SaveBitmapFile(mem_dc, path)
    # free our objects
    mem_dc.DeleteDC()
    win32gui.DeleteObject(screenshot.GetHandle())

To handle mouse events:

# Callback function when the event is fired
def onMouseDown(event):
    # Here, the beginning of your rectangle drawing
    # [...]

# Subscribe the event to the callback:
hm = pyHook.HookManager()
hm.SubscribeMouseAllButtonsDown(onMouseDown)

Finally to draw a rectangle selection, you may have to process this way:

  1. On mouse button down, store coords as first edge
  2. On mouse move, update the rectangle selection with coords as second edge
  3. On mouse button up, store the second edge
  4. Process the capture

The tricky part of drawing a rectangle is restoring the pixels where the previous rectangle was drawn. I see two ways:

  • Before you draw your rectangle, you store the pixels to be overwritten in memory; before drawing the next rectangle, you restore the previous pixels and so on.
  • You draw your rectangle performing a XOR operation between its pixels and the pixels to be overwritten; before drawing the next rectangle, you redraw the previous rectangle again with the XOR operation. XOR is a logical operation which restores a value if applied twice with another value.

The easiest way to draw a rectangle with a XOR operation is DrawFocusRect().

To solve further problems, remember that pywin32 wraps the Win32 API. You can search in this scope of to perform something.

like image 184
Amessihel Avatar answered Sep 27 '22 22:09

Amessihel