Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I read OS-compatible strings from stdin?

Tags:

rust

I'm trying to write a Rust program that gets a separated list of filenames on stdin.

On Windows, I might invoke it from a cmd window with something like:

dir /b /s | findstr .*,v$ | rust-prog -n

On Unix I'd use something like:

find . -name '*,v' -print0 | rust-prog -0

I'm having trouble converting what I receive on stdin into something that can be used by std::path::Path. As I understand it, to get something that will compile on Windows or Unix, I'm going to need to use conditional compilation, and std::os::windows::ffi or std::os::unix::ffi as appropriate.

Furthermore, It seems on Windows I'll need to use kernel32::MultiByteToWideChar using the current code page to create something usable by std::os::windows::ffi::OsStrExt.

Is there an easier way to do this? Does what I'm suggesting even seem workable?

As an example, it's easy to convert a string to a path, so I tried to use the string handling functions of stdin:

use std::io::{self, Read};
fn main() {
    let mut buffer = String::new();
    match io::stdin().read_line(&mut buffer) {
        Ok(n) => println!("{}", buffer),
        Err(error) => println!("error: {}", error)
    }
}

On Windows, if I have a directory with a single file called ¿.txt (that's 0xbf). and pipe the name into stdin. I get: error: stream did not contain valid UTF-8.

like image 860
Laurence Avatar asked Nov 06 '16 22:11

Laurence


1 Answers

Here's a reasonable looking version for Windows. Convert the console supplied string to a wide string using win32api functions then wrap it in an OsString using OsString::from_wide.

I'm not convinced it uses the correct code page yet. dir seems to use OEM code page, so maybe that should be the default. There's also a distinction between input code page and output code page in a console.

In my Cargo.toml

[dependencies]
winapi = "0.2"
kernel32-sys = "0.2.2"

Code to read a list of filenames piped through stdin on Windows as per the question.

extern crate kernel32;
extern crate winapi;

use std::io::{self, Read};
use std::ptr;
use std::fs::metadata;
use std::ffi::OsString;
use std::os::windows::ffi::OsStringExt;

/// Convert windows console input to wide string that can
/// be used by OS functions
fn wide_from_console_string(bytes: &[u8]) -> Vec<u16> {
    assert!(bytes.len() < std::i32::MAX as usize);
    let mut wide;
    let mut len;
    unsafe {
        let cp = kernel32::GetConsoleCP();
        len = kernel32::MultiByteToWideChar(cp, 0, bytes.as_ptr() as *const i8, bytes.len() as i32, ptr::null_mut(), 0);
        wide = Vec::with_capacity(len as usize);
        len = kernel32::MultiByteToWideChar(cp, 0, bytes.as_ptr() as *const i8, bytes.len() as i32, wide.as_mut_ptr(), len);
        wide.set_len(len as usize);
    }
    wide
}

/// Extract paths from a list supplied as Cr LF
/// separated wide string
/// Would use a generic split on substring if it existed
fn paths_from_wide(wide: &[u16]) -> Vec<OsString> {
    let mut r = Vec::new();
    let mut start = 0;
    let mut i = start;
    let len = wide.len() - 1;
    while i < len {
        if wide[i] == 13 && wide[i + 1]  == 10 {
            if i > start {
                r.push(OsString::from_wide(&wide[start..i]));
            }
            start = i + 2;
            i = i + 2;
        } else {
            i = i + 1;
        }
    }
    if i > start {
        r.push(OsString::from_wide(&wide[start..i]));
    }
    r
}

fn main() {
    let mut bytes = Vec::new();
    if let Ok(_) = io::stdin().read_to_end(&mut bytes) {
        let pathlist = wide_from_console_string(&bytes[..]);
        let paths = paths_from_wide(&pathlist[..]);
        for path in paths {
            match metadata(&path) {
                Ok(stat) => println!("{:?} is_file: {}", &path, stat.is_file()),
                Err(e) => println!("Error: {:?} for {:?}", e, &path)
            }
        }
    }
}
like image 160
Laurence Avatar answered Nov 02 '22 14:11

Laurence