I have the following working example in Python
which takes a string, uses a dict comprehension and a regular expression on it and finally generates a dataframe from it:
import re, pandas as pd
junk = """total=7871MB;free=5711MB;used=2159MB;shared=0MB;buffers=304MB;cached=1059MB;
free=71MB;total=5751MB;shared=3159MB;used=5MB;buffers=30MB;cached=1059MB;
cached=1059MB;total=5751MB;shared=3159MB;used=5MB;buffers=30MB;free=109MB;"""
rx = re.compile(r'(?P<key>\w+)=(?P<value>[^;]+)')
records = [{m.group('key'): m.group('value')
for m in rx.finditer(line)}
for line in junk.split("\n")]
df = pd.DataFrame(records)
print(df)
This yields
buffers cached free shared total used
0 304MB 1059MB 5711MB 0MB 7871MB 2159MB
1 30MB 1059MB 71MB 3159MB 5751MB 5MB
2 30MB 1059MB 109MB 3159MB 5751MB 5MB
R
?lapply
and regmatches
but to no avail. Additionally, how would I do this with missing values?
A purrr option:
library(purrr)
'total=7871MB;free=5711MB;used=2159MB;shared=0MB;buffers=304MB;cached=1059MB;
free=71MB;total=5751MB;shared=3159MB;used=5MB;buffers=30MB;cached=1059MB;
cached=1059MB;total=5751MB;shared=3159MB;used=5MB;buffers=30MB;free=109MB;' %>%
strsplit('\n') %>% .[[1]] %>% # separate lines into character vector
strsplit(';') %>% # separate each line into a list of key-value pairs
map(strsplit, '=') %>% # split key-value pairs into length-2 sublists
map(transpose) %>% # flip list of key-value pairs to list of keys and values
map_dfr(~set_names(.x[[2]], .x[[1]])) # set names of values to keys and simplify to data frame
#> # A tibble: 3 x 6
#> total free used shared buffers cached
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 7871MB 5711MB 2159MB 0MB 304MB 1059MB
#> 2 5751MB 71MB 5MB 3159MB 30MB 1059MB
#> 3 5751MB 109MB 5MB 3159MB 30MB 1059MB
or a more data-frame-centric option:
library(tidyverse)
# put text in data frame
data_frame(text = 'total=7871MB;free=5711MB;used=2159MB;shared=0MB;buffers=304MB;cached=1059MB;
free=71MB;total=5751MB;shared=3159MB;used=5MB;buffers=30MB;cached=1059MB;
cached=1059MB;total=5751MB;shared=3159MB;used=5MB;buffers=30MB;free=109MB;') %>%
separate_rows(text, sep = '\n') %>% # separate lines into separate rows
rowid_to_column('line') %>% # add index for each line to help spreading later
separate_rows(text, sep = ';') %>% # separate each line into key-value pairs
filter(text != '') %>% # drop extra entries from superfluous semicolons
separate(text, c('key', 'value')) %>% # separate keys and values into columns
spread(key, value) %>% # reshape to wide form
select(-line) # drop line index column
#> # A tibble: 3 x 6
#> buffers cached free shared total used
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 304MB 1059MB 5711MB 0MB 7871MB 2159MB
#> 2 30MB 1059MB 71MB 3159MB 5751MB 5MB
#> 3 30MB 1059MB 109MB 3159MB 5751MB 5MB
If you want to avoid packages, you can hack it through read.dcf
, which reads Debian Control Format (like R package DESCRIPTION files), which is just key-value pairs. DCF uses :
instead of =
and \n
instead of ;
, though, so you'll need to do a little gsub
ing first:
junk <- 'total=7871MB;free=5711MB;used=2159MB;shared=0MB;buffers=304MB;cached=1059MB;
free=71MB;total=5751MB;shared=3159MB;used=5MB;buffers=30MB;cached=1059MB;
cached=1059MB;total=5751MB;shared=3159MB;used=5MB;buffers=30MB;free=109MB;'
junk <- gsub('=', ':', junk)
junk <- gsub(';', '\n', junk)
mat <- read.dcf(textConnection(junk))
mat
#> total free used shared buffers cached
#> [1,] "7871MB" "5711MB" "2159MB" "0MB" "304MB" "1059MB"
#> [2,] "5751MB" "71MB" "5MB" "3159MB" "30MB" "1059MB"
#> [3,] "5751MB" "109MB" "5MB" "3159MB" "30MB" "1059MB"
It returns a matrix, but it's well-formed and easy to convert to a proper data.frame:
df <- as.data.frame(mat, stringsAsFactors = FALSE)
df
#> total free used shared buffers cached
#> 1 7871MB 5711MB 2159MB 0MB 304MB 1059MB
#> 2 5751MB 71MB 5MB 3159MB 30MB 1059MB
#> 3 5751MB 109MB 5MB 3159MB 30MB 1059MB
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With