I am trying to format lsof output in a more parsable way.
Background: since not all processes with open handles have thread IDs, the number of fields separated by whitespaces (blanks AFAIS) is not necessarily determined.
As output fields, I need the PID, UID/username and path (if it is a file - I am grepping for the path since +D is quite slow).
As field separator I switched from NL to NUL (and replacing null by "|" for readability)
So I tried
> /usr/sbin/lsof -F pnuf0 | sed 's/\x0/|/g' | grep "cvmfs" | tail -n 2
ftxt|n/usr/bin/cvmfs2|
fmem|n/usr/lib64/libcvmfs_fuse.so.2.3.5|
which produces only the file descriptor and name (not in the given order?) but not the PID or UID?
As side note, the PID and UID fields are apparently already 'empty' when selecting them individually
> /usr/sbin/lsof -F u0 | sed 's/\x0/|/g' | grep "cvmfs" | tail -n 2
> /usr/sbin/lsof -F p0 | sed 's/\x0/|/g' | grep "cvmfs" | tail -n 2
> /usr/sbin/lsof -F n0 | sed 's/\x0/|/g' | grep "cvmfs" | tail -n 2
n/usr/bin/cvmfs2|
n/usr/lib64/libcvmfs_fuse.so.2.3.5|
What would be the correct way to parse lsof's output as "PD,NAME,UID,FILEDESC" ?
Since I never found a good answer to this on the web, I spent many hours working on this problem. I hope I can spare someone this pain. lsof by itself will print out horizontal output with missing values making it impossible to parse properly
To format lsof you need to use the command:
lsof -F pcuftDsin
adding the -F will print results out vertically, let me explain each part.
lsof: gets a list of all open files by process-F: formats the output vertical instead of horizontalp: will prefix the PID or (Process ID) columnc: will prefix the COMMAND or (Process Name) columnu: will prefix the User column that the process is running underf: will prefix the File Descriptor columnt: will prefix the type columnD: will prefix the Device columns: will prefix the SizeOff columni: will prefix the Node columnn: will prefix the Name or (File Path)output:
p3026
ccom.apple.appkit.xpc.openAndSavePanelService
u501
fcwd
tDIR
D0x1000004
s704
i2
n/
ftxt
tREG
D0x1000004
s94592
i1152921500312434319
n/System/Library/Frameworks/AppKit.framework/Versions/C/XPCServices/com.apple.appkit.xpc.openAndSavePanelService.xpc/Contents/MacOS/com.apple.appkit.xpc.openAndSavePanelService
ftxt
tREG
D0x1000004
s27876
i45156619
n/Library/Preferences/Logging/.plist-cache.usI0gbvW
ftxt
tREG
D0x1000004
s28515184
i1152921500312399135
n/usr/share/icu/icudt64l.dat
ftxt
tREG
D0x1000004
s239648
i31225967
n/private/var/db/timezone/tz/2019c.1.0/icutz/icutz44l.dat
ftxt
tREG
D0x1000004
s3695464
i1152921500312406201
n/System/Library/CoreServices/SystemAppearance.bundle/Contents/Resources/SystemAppearance.car
ftxt
tREG
D0x1000004
s136100
i38828241
n/System/Library/Caches/com.apple.IntlDataCache.le.kbdx
As you can see, each line is prefixed with the proper letter assigned above. Another important thing to note is that "Process ID", "Process Name" and User will only be printed one time per set of open files, for the database storage, I needed these fields for each line that was printed. I was performing a java project, so the code I used to parse it was as shown below:
public static void main(String[] args) {
String command = "lsof -F pcuftDsin";
String captureBody = "";
Process proc = null;
try {
proc = Runtime.getRuntime().exec(command);
} catch (IOException e) {
e.printStackTrace();
}
BufferedReader reader = new BufferedReader(new InputStreamReader(proc.getInputStream()));
String line = "";
String ProcessID = "";
String ProcessName = "";
String User = "";
String FD = "null";
String Type = "null";
String Device = "null";
String SizeOff = "null";
String Node = "null";
String File = "null";
while(true) {
try {
line = reader.readLine();
if (line == null) {
break;
} else {
if (line.startsWith("p")) {
ProcessID = line;
} else if (line.startsWith("c")) {
ProcessName = line;
} else if (line.startsWith("u")) {
User = line;
} else if (line.startsWith("f")) {
FD = line;
} else if (line.startsWith("t")) {
Type = line;
} else if (line.startsWith("D")) {
Device = line;
} else if (line.startsWith("s")) {
SizeOff = line;
} else if (line.startsWith("i")) {
Node = line;
} else if (line.startsWith("n")){
File = line;
System.out.println(ProcessID + "," + ProcessName + "," + User + "," + FD + "," + Type + "," + Device + "," + SizeOff + "," + Node + "," + File);
FD = "null";
Type = "null";
Device = "null";
SizeOff = "null";
Node = "null";
File = "null";
}
}
} catch (IOException e) {
e.printStackTrace();
}
}
try {
proc.waitFor();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
output
p94484,ccom.apple.CoreSimulator.CoreSim,u501,ftxt,tREG,D0x1000004,s239648,i31225967,n/private/var/db/timezone/tz/2019c.1.0/icutz/icutz44l.dat
Because I was storing the output, I needed the empty fields to show something, I used null, you can use anything as default text, or even just use an empty string for the missing fields, not all fields will be populated. If anyone has any suggestions on how I could improve the code performance I am all ears.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With