Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

filter unique parameters from file

Tags:

grep

awk

i have file contains urls plus params like following

https://example.com/endpoint/?param1=123&param2=1212
https://example.com/endpoint/?param3=123&param1=98989
https://example.com/endpoint/endpoint3/?param2=123
https://example.com/endpoint/endpoint2/?param1=123
https://example.com/endpoint/endpoint2/
https://example.com/endpoint/endpoint5/"//i.example.com/00/s/Nzk5WDEwMjQ=/z/47IAAOSwBu5hXIKF

and i need to filter only urls with unique params the desired output

http://example.com/endpoint/?param1=123&param2=1212
https://example.com/endpoint/?param3=123&param1=98989
https://example.com/endpoint/endpoint3/?param2=123

i managed to filter only urls with params with grep grep -E '(\?[a-zA-Z0-9]{1,9}\=)'

but i need to filter params in the same time so i tried with awk with the same regex but it gives error

awk '{sub(\?[a-zA-Z0-9]{1,9}\=)} !seen[$0]++'

update

i am sorry for editing the desired output but when i tried the scripts i figured out that their a lot of carbege in my file need to filter too. i tried @James Brown with some editing and it looks good till the end line it dose not filter it unfortunately

awk -F '?|&' '$2&&!a[$2]++'

and to be more clear why the that output is good for me it chosed the 1 st line because it has at least param1 2nd line because it has at least param3 3 line because it has at least param2 the comparison method here is choose just unique parameter whatever it concatenate with others with & char or not

like image 611
Emad Avatar asked Oct 24 '21 14:10

Emad


People also ask

How do I filter only unique values?

In Excel, there are several ways to filter for unique values—or remove duplicate values: To filter for unique values, click Data > Sort & Filter > Advanced. To remove duplicate values, click Data > Data Tools > Remove Duplicates.

How do I get unique values from multiple criteria in Excel?

Get a list of unique values based on criteria. To extract unique values with condition, use the Excel UNIQUE and FILTER functions together: The FILTER function limits the data only to values that meet the condition. The UNIQUE function removes duplicates from the filtered list.

What is the difference between unique and filter in Excel?

The Excel UNIQUE function returns a list of unique values in a list or range. Values can be text, numbers, dates, times, etc. The Excel FILTER function filters a range of data based on supplied criteria, and extracts matching records.

How to extract a list of unique values from a set?

To extract a list of unique values from a set of data, while applying one or more logical criteria, you can use the UNIQUE function together with the FILTER function. In the example shown, the formula in D5 is: which returns the 3 unique colors in group B with a quantity > 5. This example uses the UNIQUE function together with the FILTER function.

How do you find unique values with criteria in Excel?

Unique values with criteria. To extract a list of unique values from a set of data, while applying one or more logical criteria, you can use the UNIQUE function together with the FILTER function. In the example shown, the formula in D5 is: which outputs the 5 unique values in group A, as seen in E5:E9.

How to detect duplicate lines in a file Using uniq command?

Note: uniq isn’t able to detect the duplicate lines unless they are adjacent to each other. The content in the file must be therefore sorted before using uniq or you can simply use sort -u instead of uniq command. Options For uniq Command: -c – -count : It tells how many times a line was repeated by displaying a number as a prefix with the line.


Video Answer


3 Answers

Edited version after the reqs changes some:

$ awk -F? '{                   # ? as field delimiter
    split($2,b,/&/)            # split at & to get whats between ? and &
    if(b[1]!=""&&!a[b[1]]++)   # no ? means no $2
        print
}' file

Output as expected. Original answer was:

A short one:

$ awk -F? '$2&&!a[$2]++' file

Explained: Split records at ? (-F?) and if there is a second field ($2) and (&&) it is unique this far by counting the instances of the parameters in the array a (!a[$2]++), output it.

like image 54
James Brown Avatar answered Nov 09 '22 16:11

James Brown


EDIT: Following solution may help when query string has ? as well as & present in it and we want to consider both of them for removing duplicates.

awk '
/\?/{
  match($0,/\?[^&]*/)
  val=substr($0,RSTART,RLENGTH)
  match($0,/&.*/)
  if(!seen[val]++ && !seen[substr($0,RSTART,RLENGTH)]++){
    print
  }
}' Input_file


2nd solution: (Following solution may help when we don't have & parameters in query string) With your shown samples, please try following awk program.

awk 'match($0,/\?.*$/) && !seen[substr($0,RSTART,RLENGTH)]++' Input_file

OR above could be shorten to as follows:(as per Ed sir's suggestions):

awk 's=index($0,"?") && !seen[substr($0,s)]++' Input_file

Explanation: Simple explanation would be, using match function of awk which matches everything from ? to till end of line value. Then adding an AND condition to it to make sure we get only unique values out of all matched values in all lines.

like image 23
RavinderSingh13 Avatar answered Nov 09 '22 18:11

RavinderSingh13


With gnu awk, you could also match the url till the first occurrence of the question mark, and then capture what follows using your initial pattern for the first parameter ([a-zA-Z0-9]{1,9}=[^&]+) followed by matching any character except the &

Then you can use the !seen[$0]++ part with the value of capture group 1.

awk '
match($0, /https?:\/\/[^?]+\?([a-zA-Z0-9]{1,9}=[^&]+)/, arr) && !seen[arr[1]]++
' file

Output

https://example.com/endpoint/?param1=123&param2=1212
https://example.com/endpoint/?param3=123&param1=98989
https://example.com/endpoint/endpoint3/?param2=123

Using awk you can check that the string starts with the protocol and contains a question mark.

Then to get the first parameter only, you can split on ? and & and use the second part of the split for seen

awk '
/^https?:\/\/[^?]*\?/ && split($0, arr, /[?&]/) > 1 && !seen[arr[2]]++
' file
like image 30
The fourth bird Avatar answered Nov 09 '22 16:11

The fourth bird