extract text between the two blocks using regex

Question

I am trying to extract the text between the two strings using the following regex.

(?s)Non-terminated Pods:.*?in total.\R(.*)(?=Allocated resources)

This regex looks fine in regex101 but somehow does not print the pod details when used with perl or grep -P. Below command results in empty output.

kubectl describe  node |perl -le '/(?s)Non-terminated Pods:.*?in total.\R(.*)(?=Allocated resources)/m; printf "$1"'

Here is the sample input:

PodCIDRs:                     10.233.65.0/24
Non-terminated Pods:          (7 in total)
  Namespace                   Name                                        CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                        ------------  ----------  ---------------  -------------  ---
  default                     foo                                         0 (0%)        0 (0%)      0 (0%)           0 (0%)         105s
  kube-system                 nginx-proxy-kube-worker-1                   25m (1%)      0 (0%)      32M (1%)         0 (0%)         9m8s
  kube-system                 nodelocaldns-xbjp8                          100m (5%)     0 (0%)      70Mi (4%)        170Mi (10%)    7m4s
Allocated resources:

Question:

how to extract the info from the above output, to look like below. What is wrong in the regex or the command that I am using?

Namespace                   Name                                        CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                        ------------  ----------  ---------------  -------------  ---
  default                     foo                                         0 (0%)        0 (0%)      0 (0%)           0 (0%)         105s
  kube-system                 nginx-proxy-kube-worker-1                   25m (1%)      0 (0%)      32M (1%)         0 (0%)         9m8s
  kube-system                 nodelocaldns-xbjp8                          100m (5%)     0 (0%)      70Mi (4%)

Question-2: What if I have two blocks of similar inputs. How to extract the pod details ? Eg:

if the input is:

PodCIDRs:                     10.233.65.0/24
Non-terminated Pods:          (7 in total)
  Namespace                   Name                                        CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                        ------------  ----------  ---------------  -------------  ---
  default                     foo                                         0 (0%)        0 (0%)      0 (0%)           0 (0%)         105s
  kube-system                 nginx-proxy-kube-worker-1                   25m (1%)      0 (0%)      32M (1%)         0 (0%)         9m8s
  kube-system                 nodelocaldns-xbjp8                          100m (5%)     0 (0%)      70Mi (4%)        170Mi (10%)    7m4s
Allocated resources:
....some
.......random data...
PodCIDRs:                     10.233.65.0/24
Non-terminated Pods:          (7 in total)
  Namespace                   Name                                        CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                        ------------  ----------  ---------------  -------------  ---
  default                     foo-1                                         0 (0%)        0 (0%)      0 (0%)           0 (0%)         105s
  kube-system                 nginx-proxy-kube-worker-2                   25m (1%)      0 (0%)      32M (1%)         0 (0%)         9m8s
  kube-system                 nodelocaldns-xbjp3-2                          100m (5%)     0 (0%)      70Mi (4%)        170Mi (10%)    7m4s
Allocated resources:

zdim · Accepted Answer

With some obvious assumptions, and keeping it close to the pattern in the question:

perl -0777 -wnE'
    @pods = /Non-terminated\s+Pods:\s+$[0-9]+\s+in\s+total$\n(.*?)\nAllocated resources:/gs;
    say for @pods
' input-file

(note modifiers on the regex in this line, which is too wide to fit on screen: /gs)

The regex from the question works when used instead of the one in this answer (and with no /s modifier, as it should) on a single block of text. To work with multiple blocks the (.*) in it need be changed to (.*?), so that it doesn't match all the way to the last Allocated...

The question doesn't say how precisely is that regex "used with perl"; I can't say what failed.

Comments on the command-line program above:

The -0777 switch makes it read the file whole into a string, available in the program in the variable $_, to which the regex is bound by default

There is also the switch -g, an alias for -0777, available starting with 5.36.0
We still need the -n switch so that the program iterates over the "lines" of input (STDIN or a file). In this case the input record separator is undefined so it's all just one "line"
The regex captures are returned since the match operator is in the list context, being assigned to the array @pods

anubhava · Answer

Using gnu-grep you can use your regex with some tweaks:

kubectl describe  node |
grep -zoP '(?s)Non-terminated Pods:.*?in total.\R\K(.*?)(?=Allocated resources)'

  Namespace                   Name                                        CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                        ------------  ----------  ---------------  -------------  ---
  default                     foo                                         0 (0%)        0 (0%)      0 (0%)           0 (0%)         105s
  kube-system                 nginx-proxy-kube-worker-1                   25m (1%)      0 (0%)      32M (1%)         0 (0%)         9m8s
  kube-system                 nodelocaldns-xbjp8                          100m (5%)     0 (0%)      70Mi (4%)        170Mi (10%)    7m4s

Used \K (match reset) after \R to remove that line from output
Used -z option to treat treat input and output data as sequences of lines, each terminated by a zero byte (the ASCII NUL character) instead of a newline.

PS: Same regex will work with second input block as well with header line shown before each block.

Alternatively you can use any version sed for this job as well:

kubectl describe  node |
sed -n '/Non-terminated Pods:.*in total.*/,/Allocated resources:/ {//!p;}'

  Namespace                   Name                                        CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                        ------------  ----------  ---------------  -------------  ---
  default                     foo                                         0 (0%)        0 (0%)      0 (0%)           0 (0%)         105s
  kube-system                 nginx-proxy-kube-worker-1                   25m (1%)      0 (0%)      32M (1%)         0 (0%)         9m8s
  kube-system                 nodelocaldns-xbjp8                          100m (5%)     0 (0%)      70Mi (4%)        170Mi (10%)    7m4s

extract text between the two blocks using regex

Tags:

regex

grep

perl

monk

2 Answers

zdim

anubhava

Recent Activity

Donate For Us

extract text between the two blocks using regex

Tags:

regex

grep

perl

monk

2 Answers

zdim

anubhava

Related questions

Recent Activity

Donate For Us