Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Matching patterns from a file returns multiple same outputs in bash

Tags:

bash

sed

I'm trying to extract a list of files defined in my .gitattributes file in bash.

The .gitattributes file looks like this

#
# Exclude these files from release archives.
# This will also make them unavailable when using Composer with `--prefer-dist`.
# https://blog.madewithlove.be/post/gitattributes/
#
/.git export-ignore
/.github export-ignore
/bin export-ignore
/wp-content/themes/**/.storybook export-ignore
/wp-content/themes/**/assets export-ignore
/wp-content/themes/**/storybook export-ignore
/wp-content/themes/**/tests export-ignore
/wp-content/themes/**/.editorconfig export-ignore
/wp-content/themes/**/.env.testing export-ignore
/wp-content/themes/**/.eslintignore export-ignore
/wp-content/themes/**/.eslintrc export-ignore
/wp-content/themes/**/.gitignore export-ignore
/wp-content/themes/**/.stylelintrc export-ignore
/wp-content/themes/**/babel.config.js export-ignore
/wp-content/themes/**/composer.json export-ignore
/wp-content/themes/**/composer.lock export-ignore
/wp-content/themes/**/package.json export-ignore
/wp-content/themes/**/package-lock.json export-ignore
/wp-content/themes/**/phpcs.xml.dist export-ignore
/wp-content/themes/**/phpstan.neon export-ignore
/wp-content/themes/**/phpstan.neon.dist export-ignore
/wp-content/themes/**/postcss.config.js export-ignore
/wp-content/themes/**/webpack.config.js export-ignore
/wp-content/themes/**/CODE_OF_CONDUCT.md export-ignore

composer.lock -diff
yarn.lock -diff
package.lock -diff

#
# Auto detect text files and perform LF normalization
# http://davidlaing.com/2012/09/19/customise-your-gitattributes-to-become-a-git-ninja/
#
* text=auto

#
# The above will handle all files NOT found below
#
*.md text
*.php text
*.inc text

My bash script is inside the bin/ folder, and my .gitattributes is at the root of the project.

sh bin/test.sh path

The script looks like this

#!/bin/bash

#$1 - current_path variable (root)
file_list=()

while read -r line; do
  if [[ "$line" =~ (\/wp-content\/themes\/\*\*/) ]]; then
    newline=$(echo "$line" | sed 's/ export-ignore//p' | sed 's/\/wp-content\/themes\/\*\*\///p')
    file_list+=("$newline")
  fi
done <"$1"/.gitattributes

echo "${file_list[@]}"

But this will return me multiple duplicated files (four times). When I run this I get

.storybook
.storybook
.storybook
.storybook assets
assets
assets
assets storybook
storybook
storybook
storybook tests
tests
tests
tests .editorconfig
.editorconfig
.editorconfig
.editorconfig .env.testing
.env.testing
.env.testing
.env.testing .eslintignore
.eslintignore
.eslintignore
.eslintignore .eslintrc
.eslintrc
.eslintrc
.eslintrc .gitignore
.gitignore
.gitignore
.gitignore .stylelintrc
.stylelintrc
.stylelintrc
.stylelintrc babel.config.js
babel.config.js
babel.config.js
babel.config.js composer.json
composer.json
composer.json
composer.json composer.lock
composer.lock
composer.lock
composer.lock package.json
package.json
package.json
package.json package-lock.json
package-lock.json
package-lock.json
package-lock.json phpcs.xml.dist
phpcs.xml.dist
phpcs.xml.dist
phpcs.xml.dist phpstan.neon
phpstan.neon
phpstan.neon
phpstan.neon phpstan.neon.dist
phpstan.neon.dist
phpstan.neon.dist
phpstan.neon.dist postcss.config.js
postcss.config.js
postcss.config.js
postcss.config.js webpack.config.js
webpack.config.js
webpack.config.js
webpack.config.js CODE_OF_CONDUCT.md
CODE_OF_CONDUCT.md
CODE_OF_CONDUCT.md
CODE_OF_CONDUCT.md

Expected output:

.storybook
assets
storybook
tests
.editorconfig
.env.testing
.eslintignore
.eslintrc
.gitignore
.stylelintrc
babel.config.js
composer.json
composer.lock
package.json
package-lock.json
phpcs.xml.dist
phpstan.neon
phpstan.neon.dist
postcss.config.js
webpack.config.js
CODE_OF_CONDUCT.md

What am I doing wrong?

like image 206
dingo_d Avatar asked Mar 01 '26 07:03

dingo_d


1 Answers

As others will likely point out, there are other (simpler, more efficient) ways to do what the OP is looking to do; the objective of this answer is to address the behavior of the OP's current sed code.

By default sed will pass input through to stdout. Consider:

$ line='/wp-content/themes/**/.storybook export-ignore'
$ echo "${line}" | sed 's/ export-ignore//'
/wp-content/themes/**/.storybook

By adding the p directive to the sed command you are telling sed to print the result to stdout. Consider:

$ line='/wp-content/themes/**/.storybook export-ignore'
$ echo "${line}" | sed 's/ export-ignore//p'
/wp-content/themes/**/.storybook
/wp-content/themes/**/.storybook

As you can see we get 2 sets of output ... one set due to the normal behavior of sed ... one set due to the additional p directive.

If you want to use the p directive and eliminate the 'duplicate' output you can add the -n (aka --quiet/--silent) flag which disables sed's default behavior of passing input through to stdout. Consider:

$ line='/wp-content/themes/**/.storybook export-ignore'
$ echo "${line}" | sed -n 's/ export-ignore//p'
/wp-content/themes/**/.storybook

Because you have 2 sed commands using the p directive, while not using the -n flag, you end up with a total of 4 copies of each matching input (the first sed generating 2 lines of output; the second sed then doubling the output again).

To remove the 'duplicates' there are a couple options:

  • remove the p directive from both sed commands or ...
  • add the -n flag to both sed commands
like image 149
markp-fuso Avatar answered Mar 04 '26 10:03

markp-fuso



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!