Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Mac’s Dictation Inside Python

Does anyone have any ideas on how to use the Mac’s built-in dictation tool to create strings to be used by Python?

To launch a dictation, you have to double-press the Fn key inside any text editor. If this is the case, is there a way to combine the keystroke command with the input command? Something like:

Step 1: Simulate a keystroke to double-press the Fn key, launching the Dictation tool, and then Step 2. Creating a variable by using the speech-to-text content as part of the input function, i.e. text_string = input(“Start dictation: “)

In this thread (Can I use OS X 10.8's speech recognition/dictation without a GUI?) a user suggests he figured it out with CGEventCreateKeyboardEvent(src, 0x3F, true), but there is no code.

Any ideas? Code samples would be appreciated.

UPDATE: Thanks to the suggestions below, I've imported AppScript. I'm trying the code to work along these lines, with no success:

from appscript import app, its
se = app('System Events')
proc = app.processes[its.frontmost == True]
mi = proc.menu_bars[1].menu_bar_items['Edit'].menus[1].menu_items['Start Dictation']
user_voice_text = input(mi.click())
print(user_voice_text)

Any ideas on how I can turn on the dictation tool to be input for a string?

UPDATE 2:

Here is a simple example of the program I'm trying to create:

Ideally i want to launch the program, and then have it ask me: "what is 1 + 1?"
Then I want the program to turn on the dictation tool, and I want the program to record my voice, with me answering "two".
The dictation-to-text function will then pass the string value = "two" to my program, and an if statement is then used to say back "correct" or "incorrect".

Im trying to pass commands to the program without ever typing on the keyboard.

like image 802
RollingStone1234 Avatar asked Oct 20 '22 02:10

RollingStone1234


1 Answers

First, FnFn dictation is a feature of the NSText (or maybe NSTextView?) Cocoa control. If you've got one of those, the dictated text gets inserted into that control. (It also uses that control's existing text for context.) From the point of view of the app using an NSTextView, if you just create a standard Edit menu, the Start Dictation item gets added to the end, with FnFn as a shortcut, and anything that gets dictated appears as input, just like input typed on a keyboard, or pasted or dragged with the mouse, or via any other input method.

So, if you don't have a GUI app, enabling dictation is going to be pointless, because you have no way to get the input.

If you do have a GUI app, the simplest thing to do is just get the menu item via NSMenu, and click the item.

You're almost certainly using some kind of GUI library, like PyQt or Tkinter, which has its own way of accessing your app's menu. But if not, you can do it directly through Cocoa (using PyObjC—which comes with Apple's pre-installed Python, but which you'll have to pip install if you're using a third-party Python):

import AppKit
mb = AppKit.NSApp.mainMenu()
edit = mb.itemWithTitle_('Edit').submenu()
sd = edit.indexOfItemWithTitle_('Start Dictation')
edit.performActionForItemAtIndex_(sd)

But if you're writing a console program that runs in the terminal (whether Terminal.app or an alternative like iTerm), the app you're running under has its own text widget and Edit menu, and you can parasitically use its menu instead.

The problem is that you don't have permission to just control other apps unless the user allows it. In older versions of OS X, this was done just by turning on "assistive scripting for accessibility" globally. As of 10.10, there's an Accessibility anchor in the Privacy tab of the Security & Privacy pane of System Preferences that has a list of apps that have permissions. Fortunately, if you're not on the list, the first time you try to use accessibility features, it'll pop up a dialog, and if the user clicks on it, it'll launch System Preferences, reveal that anchor, add your app to the list with the checkbox disabled, and scroll it into view, so all the user has to do is click the checkbox.

The AppleScript to do this is:

tell application "System Events"
    click (menu item "Start Dictation" of menu of menu bar item "Edit" 
        of menu bar of (first process whose frontmost is true))
end tell

The "right" way to do the equivalent in Python is via ScriptingBridge, which you can access via PyObjC… but it's a lot easier to use the third-party library appscript:

from appscript import app, its
se = app('System Events')
proc = app.processes[its.frontmost == True]
mi = proc.menu_bars[1].menu_bar_items['Edit'].menus[1].menu_items['Start Dictation']
mi.click()

If you really want to send the Fn key twice, the APIs for generating and sending keyboard events are part of Quartz Events Services, which (even though it's a CoreFoundation C API, not a Cocoa ObjC API) is also wrapped by PyObjC. The documentation can be a bit tricky to understand, but basically, the idea is that you create an event of the appropriate type, then either post it to a specific application, an event tap, or a tap location. So, you can create and send a system-wide key-down Fn-key event like this:

evt = Quartz.CGEventCreateKeyboardEvent(None, 63, True)
Quartz.CGEventPost(Quartz.kCGSessionEventTap, evt)

To send a key-up event, just change that True to False.

like image 179
abarnert Avatar answered Oct 22 '22 16:10

abarnert