Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Retaining n most recent directories in a backup script

I have directory in /home/backup/ that stores yearly backups. Inside the backup folder, we have these directories:

/home/backup/2012
/home/backup/2013
/home/backup/2014
/home/backup/2015
/home/backup/2016
/home/backup/2017

and every year I have to clean up the data, keeping only the last three years of backup.

In the above case, I have to delete:

/home/backup/2012
/home/backup/2013
/home/backup/2014

What is the best way to find the directories to be deleted? I have this but it doesn't work:

find /home/ecentrix/recording/ -maxdepth 1 -mindepth 1 -type d -ctime +1095 -exec rm -rf {} \;

Do you guys have another idea to do that?

like image 641
abuybuy Avatar asked Jun 07 '17 01:06

abuybuy


People also ask

Which command is used for backup?

Tar stands for tape archive and allows you to create backups using: tar , gzip , and bzip . It compresses files and directories into an archive file, known as a tarball. This command is one of the most widely-used commands for this purpose.

How do I rm a directory in Linux?

To permanently remove a directory in Linux, use either rmdir or rm command: For empty directories, use rmdir [dirname] or rm -d [dirname] For non-empty directories, use rm -r [dirname]

What is a backup script?

backup-script This shell script backups the directories specified in the CONFIGURATION SECTION by compressing and storing them. It is designed to be executed @midnight (same as @daily) using Cron.


4 Answers

A more generic solution

I think it is best to traverse the directories in the descending order and then delete the ones after the third. This way, there is no danger of losing a directory when the script is run again and again:

#!/bin/bash
backups_to_keep=3
count=0
cd /home/backup
while read -d '' -r dir; do
  [[ -d "$dir" ]]                || continue  # skip if not directory
  ((++count <= backups_to_keep)) && continue  # skip if we are within retaining territory
  echo "Removing old backup directory '$dir'" # it is good to log what was cleaned up
  echo rm -rf -- "$dir"
done < <(find . -maxdepth 1 -name '[2-9][0-9][0-9][0-9]' -type d -print0 | sort -nrz)

Remove the echo before rm -rf after testing. For your example, it gives this output:

rm -rf -- ./2014
rm -rf -- ./2013
rm -rf -- ./2012
  • cd /home/backup restricts rm -rf to just that directory for extra safety
  • find . -maxdepth 1 -name '[2-9][0-9][0-9][0-9]' -type d gives the top level directories that match the glob
  • sort -nrz makes sure newer directories come first, -z processes the null terminated output of find ... -print0
  • This solution doesn't hardcode the years - it just assumes that the directories to be removed are named in numerically sortable way
  • It is resilient to any other files or directories being present in the backup directory
  • There are no side effects if the script is run again and again
  • This can easily be extended to support different naming conventions for the backup directory - just change the glob expression
like image 162
codeforester Avatar answered Nov 17 '22 06:11

codeforester


Consider this:

find /home/backup/2* -maxdepth 1 | sort -r | awk "NR>3" | xargs rm -rf

How this works

  1. Produce a list of filenames starting with "2", only under /home/backup/

  2. Alphabetically sort the list, in reverse order.

  3. Use AWK to filter the number of rows in the list. NR specifies the number of reverse-sorted rows. You can change that 3 to be however many rows you want to be left. So if you only wanted the latest two years, change the 3 to a 2. If you wanted the latest 10 to be kept, make it "NR>10".

  4. Append the resultant list to the command "rm -rf".

Run as dedicated user, for safety

The danger here is that I'm suggesting rm -rf. This is risky. If something goes wrong, you could delete things you want to keep. I mitigate this risk by only invoking these commands by a dedicated user that ONLY has permissions to delete backup files (and not beyond).

Merit

The merit of this approach is that when you throw it in a cron job and time advances, it'll continue to retain only the latest few directories. So this, I consider to be a general solution to your problem.

Demonstration

To test this, I created a test directory with all the same directories you have. I altered it just to see what would be executed at the end, so I've tried:

find test01/2* -maxdepth 1 | sort -r | awk "NR>4" | xargs echo rm -rf

I used NR>4 rather than NR>3 (as you'd want) because NR>4 shows that we're selecting how many rows to remove from the list, and thus not delete.

Here's what I get: Demonstration

The second-to-final command above changed the final stage not to echo what it would do, but actually do it.

I have a crude copy of a dump of this in a script as I use it on some servers of mine, you can view it here: https://github.com/docdawning/teenybackup

Required for success

This approach DEPENDS on the alphabetization of whatever the find command produces. In my case, I use ISO-8601 type dates, which lend themselves entirely to being inherently date-sorted when they're alphabetized. Your YYYY type dates totally qualify.

Additional Safety

I recommend that you change your backups to be stored as tar archives. Then you can change the rm -rf to a simple rm. This is a lot safer, though not fool-proof. Regardless, you really should run this as a dedicated otherwise unprivileged user (as you should do for any script calling a delete, in my opinion).

Be aware that if you start it with

find /home/backup

Then the call to xargs will include /home/backup itself, which would be a disaster, because it'd get deleted too. So you must search within that path. Insteading calling it with the below would work:

find /home/backup/* 

The 2* I gave above is just a way of somewhat limiting the search operation.

Warranty

None; this is the Internet. Be careful. Test things heavily to convince yourself. Also, maybe get some offline backups too.


Finally - I previously posted this as an answer, but made the fatal mistake of representing the find command based out of /home/backup and not /home/backup/* or /home/backup/2*. This caused /home/backup to also be sent for deletion, which would be a disaster. It's a very small distinction that I've tried to be clear about above. I've deleted that previous answer and replaced it with this one.

like image 24
James T Snell Avatar answered Nov 17 '22 07:11

James T Snell


Since your directories have well-defined and integer names, I'd just use bash to calculate the appropriate targets:

mkdir -p backup/201{2..7} # just for testing

cd backup
rm -fr $(seq 2012 $(( $(date +"%Y") - 3)))

seq generates a list of numbers from 2012 through the current year minus 3, which are then passed to rm to blast them.

like image 32
bishop Avatar answered Nov 17 '22 05:11

bishop


Solution

# Check if extended globbing is on
shopt extglob

# If extended globbing is off, run this line
shopt -s extglob

# Remove all files except 2015, 2016, and 2017
rm -r -i /home/backup/!(2015|2016|2017)

# Turn off extended globbing (optional)
shop -u extglob

Explanation

shopt -s extglob allows you to match any files except the ones inside !(...). So that line means remove any file in /home/backup except 2015, 2016, or 2017.

The -i flag in rm -r -i ... allows you to interactively confirm the removal of each file. Remove -i if you want the files to be removed automatically.

Dynamic Dates

This solution is valid for automation (e.g. cron jobs)

# Number of latest years to keep
LATEST_YEARS=3

# Get the current year
current_year=$(date '+%Y')

# Get the first/earliest year to keep
first_year=$(( current_year - $(($LATEST_YEARS - 1)) ))

# Turn on extended globbing
shopt -s extglob

# Store years to keep in an array
keep_years=( $(seq $first_year $current_year) )

# Specify files to keep
rm -r /home/backup/!(${keep_years[0]}|${keep_years[1]}|${keep_years[2]})

NOTE: ALL FILES IN BACKUP DIRECTORY WILL BE REMOVED EXCEPT LAST 3 YEARS

like image 4
Shammel Lee Avatar answered Nov 17 '22 06:11

Shammel Lee