I'd like to print headers of *.java
files in all sub-directories recursively that have more than two type parameters (i.e. parameters within <R ... H>
in the samples below). One of the files looks like (with names reduced for brevity):
multiple-lines.java
class ClazzA<R extends A,
S extends B<T>, T extends C<T>,
U extends D, W extends E,
X extends F, Y extends G, Z extends H>
extends OtherClazz<S> implements I<T> {
public void method(Type<Q, R> x) {
// ... code ...
}
}
with expected output:
ClazzA.java:10: class ClazzA<R extends A,
ClazzA.java:11: S extends B<T>, T extends C<T>,
ClazzA.java:12: U extends D, W extends E,
ClazzA.java:13: X extends F, Y extends G, Z extends H>
ClazzA.java:14: extends OtherClazz<S> implements I<T> {
but another could look like this, as well:
single-line.java
class ClazzB<R extends A, S extends B<T>, T extends C<T>, U extends D, W extends E, X extends F, Y extends G, Z extends H> extends OtherClazz<S> implements I<T> {
public void method(Type<Q, R> x) {
// ... code ...
}
}
with expected output:
ClazzB.java:42: class ClazzB<R extends A, S extends B<T>, T extends C<T>, U extends D, W extends E, X extends F, Y extends G, Z extends H> extends OtherClazz<S> implements I<T> {
Files that should not be considered/printed:
X-no-parameter.java
class ClazzC /* no type parameter */ extends OtherClazz<S> implements I<T> {
public void method(Type<A, B> x) {
// ... code ...
}
}
X-one-parameter.java
class ClazzD<R extends A> // only one type parameter
extends OtherClazz<S> implements I<T> {
public void method(Type<X, Y> x) {
// ... code ...
}
}
X-two-parameters.java
class ClazzE<R extends A, S extends B<T>> // only two type parameters
extends OtherClazz<S> implements I<T> {
public void method(Type<X, Y> x) {
// ... code ...
}
}
X-two-line-parameters.java
class ClazzF<R extends A, // only two type parameters
S extends B<T>> // on two lines
extends OtherClazz<S> implements I<T> {
public void method(Type<X, Y> x) {
// ... code ...
}
}
All the spaces in the files could be \s+
. extends [...]
and implements [...]
immediately prior to {
are optional. extends [...]
is also optional at each of the type parameters. See The Java® Language Specification, 8.1. Class Declarations for details.
I'm using gawk
in the Git Bash:
$ gawk --version
GNU Awk 5.0.0, API: 2.0 (GNU MPFR 4.0.2, GNU MP 6.2.0)
with:
find . -type f -name '*.java' | xargs gawk -f ws-class-type-parameter.awk > ws-class-type-parameter.log
and ws-class-type-parameter.awk
:
# /start/ , /end/ ... pattern
#/class ClazzA<.*,.*/ , /{/ { # 5 lines, OK for ClazzA, but in real it prints classes with 2 or less type parameters, too
#/class ClazzA<.*,.*,/ , /{/ { # no line with ClazzA, since there's no second ',' on its first line
#/class ClazzA<.*,.*,/s , /{/ { # 500.000+(!) lines
#/class ClazzA<.*,.*,/s , /{/U { # 500.000+(!) lines
#/class ClazzA<.*,.*,/sU , /{/U { # 500.000+(!) lines
/(?s)class ClazzA<.*,.*,/ , /{/ { # no line
match( FILENAME, "/.*/.." )
print substr( FILENAME, RLENGTH ) ":" FNR ": " $0
}
This finds all the *.java
files...great, executes gawk
with each of them...great, but you see the results as comments after my tries. Please note: The ClazzA
literal is just for testing and MCVE here. It could be \w+
in real, but with 500.000+ lines in thousands of files when testing...
It works if I try it on regex101.com. Well, sort of. I didn't find how to define /start-regex/,/end-regex/
there, so I added another .*
in between.
I took the flags from there but I couldn't find a description whether gawk
supports the flag syntax /.../sU , /.../U
so I just gave it a try. A now deleted comment told me that no flavour of awk
supports this.
I also tried it with grep
:
$ grep --version
grep (GNU grep) 3.1
...
$ grep -nrPf types.grep *.java
with types.grep:
(?s).*class\s+\w+\s*<.*,.*,.*>.*{
which results in output of singleline.java only.
(?s)
is --perl-regexp, -P
syntax and grep --help
claims to support this.
The solution in Ed Morton's answer works well but it turned out that there are auto-generated files with methods like:
/** more code before here */
public void setId(String value) {
this.id = value;
}
/**
* Gets a map that contains attributes that aren't bound to any typed property on this class.
*
* <p>
* the map is keyed by the name of the attribute and
* the value is the string value of the attribute.
*
* the map returned by this method is live, and you can add new attribute
* by updating the map directly. Because of this design, there's no setter.
*
*
* @return
* always non-null
*/
public Map<QName, String> getOtherAttributes() {
return otherAttributes;
}
which give an output of e.g.:
AbstractAddressType.java:81: * Gets a map that contains attributes that aren't bound to any typed property on this class.
AbstractAddressType.java:82: *
AbstractAddressType.java:83: * <p>
AbstractAddressType.java:84: * the map is keyed by the name of the attribute and
AbstractAddressType.java:85: * the value is the string value of the attribute.
AbstractAddressType.java:86: *
AbstractAddressType.java:87: * the map returned by this method is live, and you can add new attribute
AbstractAddressType.java:88: * by updating the map directly. Because of this design, there's no setter.
AbstractAddressType.java:89: *
AbstractAddressType.java:90: *
AbstractAddressType.java:91: * @return
AbstractAddressType.java:92: * always non-null
AbstractAddressType.java:93: */
AbstractAddressType.java:94: public Map<QName, String> getOtherAttributes() {
and others with class comments and annotations like:
/**
* This class was generated by Apache CXF 3.3.4
* 2020-11-30T12:03:21.251+01:00
* Generated source version: 3.3.4
*
*/
@WebService(targetNamespace = "urn:SZRServices", name = "SZR")
@XmlSeeAlso({at.gv.egov.pvp1.ObjectFactory.class, org.w3._2001._04.xmldsig_more_.ObjectFactory.class, ObjectFactory.class, org.xmlsoap.schemas.ws._2002._04.secext.ObjectFactory.class, org.w3._2000._09.xmldsig_.ObjectFactory.class, at.gv.e_government.reference.namespace.persondata._20020228_.ObjectFactory.class})
public interface SZR {
// more code after here
with an an output of e.g.:
SZR.java:13: * This class was generated by Apache CXF 3.3.4
SZR.java:14: * 2020-10-12T11:51:35.175+02:00
SZR.java:15: * Generated source version: 3.3.4
SZR.java:16: *
SZR.java:17: */
SZR.java:18: @WebService(targetNamespace = "urn:SZRServices", name = "SZR")
SZR.java:19: @XmlSeeAlso({at.gv.egov.pvp1.ObjectFactory.class, org.w3._2001._04.xmldsig_more_.ObjectFactory.class, ObjectFactory.class, org.xmlsoap.schemas.ws._2002._04.secext.ObjectFactory.class, org.w3._2000._09.xmldsig_.ObjectFactory.class, at.gv.e_government.reference.namespace.persondata._20020228_.ObjectFactory.class})
Using any POSIX awk in any shell on every UNIX box:
$ cat tst.awk
/[[:space:]]*class[[:space:]]*/ {
inDef = 1
fname = FILENAME
sub(".*/","",fname)
def = out = ""
}
inDef {
out = out fname ":" FNR ": " $0 ORS
# Remove comments (not perfect but should work for 99.9% of cases)
sub("//.*","")
gsub("/[*]|[*]/","\n")
gsub(/\n[^\n]*\n/,"")
def = def $0 ORS
if ( /{/ ) {
if ( gsub(/,/,"&",def) > 2 ) {
printf "%s", out
}
inDef = 0
}
}
$ find tmp -type f -name '*.java' -exec awk -f tst.awk {} +
multiple-lines.java:1: class ClazzA<R extends A,
multiple-lines.java:2: S extends B<T>, T extends C<T>,
multiple-lines.java:3: U extends D, W extends E,
multiple-lines.java:4: X extends F, Y extends G, Z extends H>
multiple-lines.java:5: extends OtherClazz<S> implements I<T> {
single-line.java:1: class ClazzB<R extends A, S extends B<T>, T extends C<T>, U extends D, W extends E, X extends F, Y extends G, Z extends H> extends OtherClazz<S> implements I<T> {
The above was run using this input:
$ head tmp/*
==> tmp/X-no-parameter.java <==
class ClazzC /* no type parameter */ extends OtherClazz<S> implements I<T> {
public void method(Type<A, B> x) {
// ... code ...
}
}
==> tmp/X-one-parameter.java <==
class ClazzD<R extends A> // only one type parameter
extends OtherClazz<S> implements I<T> {
public void method(Type<X, Y> x) {
// ... code ...
}
}
==> tmp/X-two-line-parameters.java <==
class ClazzF<R extends A, // only two type parameters
S extends B<T>> // on two lines
extends OtherClazz<S> implements I<T> {
public void method(Type<X, Y> x) {
// ... code ...
}
}
==> tmp/X-two-parameters.java <==
class ClazzE<R extends A, S extends B<T>> // only two type parameters
extends OtherClazz<S> implements I<T> {
public void method(Type<X, Y> x) {
// ... code ...
}
}
==> tmp/multiple-lines.java <==
class ClazzA<R extends A,
S extends B<T>, T extends C<T>,
U extends D, W extends E,
X extends F, Y extends G, Z extends H>
extends OtherClazz<S> implements I<T> {
public void method(Type<Q, R> x) {
// ... code ...
}
}
==> tmp/single-line.java <==
class ClazzB<R extends A, S extends B<T>, T extends C<T>, U extends D, W extends E, X extends F, Y extends G, Z extends H> extends OtherClazz<S> implements I<T> {
public void method(Type<Q, R> x) {
// ... code ...
}
}
The above is just a best effort without writing a parser for the language and just having the OPs posted sample input/output to go on for what needs to be handled.
Note: Presence of comments can cause these solutions to fail.
With ripgrep
(https://github.com/BurntSushi/ripgrep)
rg -nU --no-heading '(?s)class\s+\w+\s*<[^{]*,[^{]*,[^{]*>[^{]*\{' *.java
-n
enables line numbering (this is the default if output is to the terminal)-U
enables multiline matching--no-heading
by default, ripgrep
displays matching lines grouped under filename as a header, this option makes ripgrep
behave like GNU grep
with filename prefix for each output line[^{]*
is used instead of .*
to prevent matching ,
and >
elsewhere in the file, otherwise lines like public void method(Type<Q, R> x) {
will get matched-m
option can be used to limit number of matches per input file, which will give an additional benefit of not having to search entire input fileIf you use the above regexp with GNU grep
, note that:
grep
matches only one line at a time. If you use -z
option, grep
will consider ASCII NUL as the record separator, which effectively gives you ability to match across multiple lines, assuming input doesn't have NUL characters that can prevent such matching. Another effect of -z
option is that NUL character will be appended to each output result (this could be fixed by piping results to tr '\0' '\n'
)-o
option will be needed to print only matching portion, which means you won't be able to get line number prefix-P
isn't needed, grep -zoE 'class\s+\w+\s*<[^{]*,[^{]*,[^{]*>[^{]*\{' *.java | tr '\0' '\n'
will give you similar result as the ripgrep
command. But, you won't get line number prefix, filename prefix will be only for each matching portion instead of each matching line and you won't get rest of line before class
and after {
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With