I've a python script that uses subprocess.Popen to execute Windows *.exe files. All EXEs except one produce expected output. When printed using print() the output in question includes whitespace between every character of the output.
This is how the output looks when executing the EXE in Windows command line:
C:\Python27>autorunsc.exe /accepteula
Sysinternals Autoruns v13.51 - Autostart program viewer
Copyright (C) 2002-2015 Mark Russinovich
Sysinternals - www.sysinternals.com
HKLM\System\CurrentControlSet\Control\Terminal Server\Wds\rdpwd\StartupPrograms
rdpclip
rdpclip
RDP Clip Monitor
Microsoft Corporation
6.1.7601.17514
c:\windows\system32\rdpclip.exe
20/11/2010 11:22
HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Winlogon\Userinit
C:\Windows\system32\userinit.exe
This is how it looks when printed in Python:
Sysinternals Autoruns v13.51 - Autostart program viewer
Copyright (C) 2002-2015 Mark Russinovich
Sysinternals - www.sysinternals.com
H K L M \ S y s t e m \ C u r r e n t C o n t r o l S e t \ C o n t r o l \
r m i n a l S e r v e r \ W d s \ r d p w d \ S t a r t u p P r o g r a m
r d p c l i p
r d p c l i p
R D P C l i p M o n i t o r
M i c r o s o f t C o r p o r a t i o n
6 . 1 . 7 6 0 1 . 1 7 5 1 4
c : \ w i n d o w s \ s y s t e m 3 2 \ r d p c l i p . e x e
2 0 / 1 1 / 2 0 1 0 1 1 : 2 2
H K L M \ S O F T W A R E \ M i c r o s o f t \ W i n d o w s N T \ C u r
n t V e r s i o n \ W i n l o g o n \ U s e r i n i t
We can clearly see the whitespace and what's interesting is that the first few lines don't include the spaces.
This is the code:
p = subprocess.Popen('autorunsc.exe /accepteula', stderr=subprocess.STDOUT,
stdout=subprocess.PIPE, shell=True)
a=p.stdout.read()
print(a)
Where does the spaces come from and how do I remove them?
Windows tools output format is encoded in UTF-16.
You have to decode output to correct encoding using str.decode method. Quoting docs:
str.decode([encoding[, errors]])
Decodes the string using the codec registered for encoding. encoding defaults to the default string encoding. errors may be given to set a different error handling scheme. The default is 'strict', meaning that encoding errors raise UnicodeError. Other possible values are 'ignore', 'replace' and any other name registered via codecs.register_error(), see section Codec Base Classes.
a=p.stdout.read().decode('UTF16')
For table of standard encodings you may refer to 7.8.3. Standard Encodings.
Since your output seems to have mixed encoding [as "spaces" (which are really 0x00
characters, not 0x20
) exists only in part of output], you may want to preprocess or partition your string before performing decoding.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With