Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using regex to tell csplit where to split the file

I have a large text file with content set up like this:

---
title: Lorim Ipsum Dolar
---
Lorim ipsum content
---
title: Excelvier whatever 
---
Lorim ipsum content goes here.

I'm trying to split up this file into individual files using csplit.

The individual files would have content formatted like this:

---
title: Lorim Ipsum Dolar
---
Lorim ipsum content

I was hoping to be able to regex the ---, newline & title like so ---\ntitle

But I'm not able to select it with…

csplit -k products.txt '/---[^\n]title/' {99}

I've tried lots of variations to no avail. I keeping getting "no match".

like image 538
Philip Meissner Avatar asked Aug 21 '13 17:08

Philip Meissner


4 Answers

csplit reads the input file one line at a time and applies the regex to each line. It is therefore not possible to match a regex across multiple lines.

One way around this is to massage the input file first, replacing ---\ntitle: with a single line pattern that csplit can match. For example, using sed:

sed 'N;s/---\ntitle: /===\n' products.txt | csplit -k - '/===/' {*}
sed 'N;s/===\n/---\ntitle: /' -i xx*

This replaces ---\ntitle: with a single line ===, then has csplit split when it sees that pattern. Passing - as a file name tells csplit to read from stdin. The second sed command reverses the change.

like image 70
John Kugelman Avatar answered Oct 08 '22 01:10

John Kugelman


You could use a regular expression that matches until the end of the line ($)

What do you think about:

csplit -k products.txt '/^title:/' {99}
like image 24
inthenite Avatar answered Oct 08 '22 01:10

inthenite


Try using {*} instead of {99} to fix match not found problem.

like image 32
Aleks-Daniel Jakimenko-A. Avatar answered Oct 08 '22 02:10

Aleks-Daniel Jakimenko-A.


This might work for you:

csplit -z products.txt '/^title/-1' '{*}'
like image 38
potong Avatar answered Oct 08 '22 00:10

potong