Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Screen-scraping a windows application in c# [closed]

I need to scrape data from a windows application to run a query in another program. Does anyone know of a good starting point for me to do this in .NET?

like image 906
Tony Boarman Avatar asked Dec 17 '08 16:12

Tony Boarman


1 Answers

You may want to look into the WM_GETTEXT message. This can be used to read text from other windows -- it's an archaic part of the Windows API, and if you're in C#, you'll need to p/invoke for it.

Check out this page for an example of doing this in C#.

Basically, you first FindControlEx() to get the handle of the window that you want (by caption).

Second, you recursively enumerate the controls on that window with EnumChildWindows() to find all of the window's child controls, and all of those children's children until you have a complete map of the target form.

Here is a selected portion of Theta-ga's excellent explanation from Google Answers:

To get the contents of any textbox or listbox control, all we need is it's window handle. If you have already obtained the window handle then move to part 2 of the explaination.

PART 1: Obtaining the control handle

  • To obtain the handle of a control, we first obtain the handle of it?s parent window. We can do this by using the Win32 FindControlEx() method. This method takes in the window caption (such as 'Calculator') and/or its class name, and return its handle.
  • Once we have the parent window handle, we can call the Win32 EnumChildWindows method. This method takes in a callback method, which it calls with the handle of every child control it finds for the specified parent. For eg., if we call this method with the handle of the Calculator window, it will call the callback method with the handle of the textbox control, and then again with the handles of each of the buttons on the Calculator window, and so on.
  • Since we are only interested in the handle of the textbox control, we can check the class of the window in the callback method. The Win32 method GetClassName() can be used for this. This method takes in a window handle and provides us with a string containing the class name. So a textbox belongs to the ?Edit? class, a listbox to the 'ListBox' class and so on. Once you have determined that you have the handle for the right control, you can read its contents.

PART 2: Reading the contents of a control

  • You can read in the contents of a control by using the Win32 SendMessage() function, and using it to pass the WM_GETTEXT message to the target control. This will give you the text content of the control. This method will work for a textbox, button, or static control.
  • However, the above approach will fail if you try to read the contents of a listbox. To get the contents of a listbox, we need to first use SendMessage() with the LB_GETCOUNT message to get the count of list items. Then we need to call SendMessage() with the LB_GETTEXT message for each item in the list.
like image 186
HanClinto Avatar answered Sep 20 '22 20:09

HanClinto