Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Splitting bulk text file every n line

I have a folder that contains multiple text files. I'm trying to split all text files at 10000 line per file while keeping the base file name i.e. if filename1.txt contains 20000 lines the output will be filename1-1.txt (10000 lines) and filename1-2.txt (10000 lines).

I tried to use split -10000 filename1.txt but this is not keeping the base filename and i have to repeat the command for each text file in the folder. I also tried doing for f in *.txt; do split -10000 $f.txt; done. This didn't work too.

Any idea how can i do this? Thanks.

like image 254
user2334436 Avatar asked Oct 30 '15 20:10

user2334436


People also ask

How do you break a file into parts?

To split a file into pieces, you simply use the split command. By default, the split command uses a very simple naming scheme. The file chunks will be named xaa, xab, xac, etc., and, presumably, if you break up a file that is sufficiently large, you might even get chunks named xza and xzz.

How do I split a large file into smaller parts in Windows?

Right-click the file and select the Split operation from the program's context menu. This opens a new configuration window where you need to specify the destination for the split files and the maximum size of each volume. You can select one of the pre-configured values or enter your own into the form directly.


2 Answers

for f in filename*.txt; do split -d -a1 -l10000 --additional-suffix=.txt "$f" "${f%.txt}-"; done

Or, written over multiple lines:

for f in filename*.txt
do
    split -d -a1 -l10000 --additional-suffix=.txt "$f" "${f%.txt}-"
done

How it works:

  • -d tells split to use numeric suffixes

  • -a1 tells split to start with only single digits for the suffix.

  • -l10000 tells split to split every 10,000 lines.

  • --additional-suffix=.txt tells split to add .txt to the end of the names of the new files.

  • "$f" tells split the name of the file to split.

  • "${f%.txt}-" tells split the prefix name to use for the split files.

Example

Suppose that we start with these files:

$ ls
filename1.txt  filename2.txt

Then we run our command:

$ for f in filename*.txt; do split -d -a1 -l10000 --additional-suffix=.txt "$f" "${f%.txt}-"; done

When this is done, we now have the original files and the new split files:

$ ls
filename1-0.txt  filename1-1.txt  filename1.txt  filename2-0.txt  filename2-1.txt  filename2.txt

Using older, less featureful forms of split

If your split does not offer --additional-suffix, then consider:

for f in filename*.txt
do 
    split -d -a1 -l10000 "$f" "${f%.txt}-"
    for g in "${f%.txt}-"*
    do 
        mv "$g" "$g.txt"
    done
done
like image 169
John1024 Avatar answered Oct 22 '22 00:10

John1024


No need for shell loops, just one simple awk command does it for all files:

awk 'FNR%1000==1{if(FNR==1)c=0; close(out); out=FILENAME; sub(/.txt/,"-"++c".txt)} {print > out}' *
like image 36
Ed Morton Avatar answered Oct 22 '22 01:10

Ed Morton