Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merging CSV files : Appending instead of merging

So basically i want to merge a couple of CSV files. Im using the following script to do that :

paste -d , *.csv > final.txt 

However this has worked for me in the past but this time it doesn't work. It appends the data next to each other as opposed to below each other. For instance two files that contain records in the following format

CreatedAt   ID Mon Jul 07 20:43:47 +0000 2014  4.86249E+17 Mon Jul 07 19:58:29 +0000 2014  4.86238E+17 Mon Jul 07 19:42:33 +0000 2014  4.86234E+17 

When merged give

CreatedAt   ID CreatedAt    ID Mon Jul 07 20:43:47 +0000 2014  4.86249E+17 Mon Jul 07 18:25:53 +0000 2014  4.86215E+17 Mon Jul 07 19:58:29 +0000 2014  4.86238E+17 Mon Jul 07 17:19:18 +0000 2014  4.86198E+17 Mon Jul 07 19:42:33 +0000 2014  4.86234E+17 Mon Jul 07 15:45:13 +0000 2014  4.86174E+17                                             Mon Jul 07 15:34:13 +0000 2014  4.86176E+17 

Would anyone know what the reason behind this is? Or what i can do to force merge below records?

like image 905
user2233834 Avatar asked Jul 08 '14 21:07

user2233834


People also ask

How do I merge CSV files in Excel?

Type “copy *. csv merged-csv-files. csv” in the command line, right after the folder path. Once you finish, press Enter.

How do I merge CSV files in pandas?

To merge all CSV files, use the GLOB module. The os. path. join() method is used inside the concat() to merge the CSV files together.


1 Answers

Assuming that all the csv files have the same format and all start with the same header, you can write a little script as the following to append all files in only one and to take only one time the header.

#!/bin/bash OutFileName="X.csv"                       # Fix the output name i=0                                       # Reset a counter for filename in ./*.csv; do   if [ "$filename"  != "$OutFileName" ] ;      # Avoid recursion   then     if [[ $i -eq 0 ]] ; then        head -1  "$filename" >   "$OutFileName" # Copy header if it is the first file    fi    tail -n +2  "$filename" >>  "$OutFileName" # Append from the 2nd line each file    i=$(( $i + 1 ))                            # Increase the counter  fi done 

Notes:

  • The head -1 or head -n 1 command print the first line of a file (the head).
  • The tail -n +2 prints the tail of a file starting from the lines number 2 (+2)
  • Test [ ... ] is used to exclude the output file from the input list.
  • The output file is rewritten each time.
  • The command cat a.csv b.csv > X.csv can be simply used to append a.csv and b csv in a single file (but you copy 2 times the header).

The paste command pastes the files one on a side of the other. If a file has white spaces as lines you can obtain the output that you reported above.
The use of -d , asks to paste command to define fields separated by a comma ,, but this is not the case for the format of the files you reported above.

The cat command instead concatenates files and prints on the standard output, that means it writes one file after the other.

Refer to man head or man tail for the syntax of the single options (some version allows head -1 other instead head -n 1)...

like image 182
Hastur Avatar answered Oct 14 '22 17:10

Hastur