Java Web Crawler Libraries

2 Answers

Crawler4j is the best solution for you,

Crawler4j is an open source Java crawler which provides a simple interface for crawling the Web. You can setup a multi-threaded web crawler in 5 minutes!

Also visit. for more java based web crawler tools and brief explanation for each.

answered Sep 29 '22 07:09

cuneytykaya

This is How your program 'visit' or 'connect' to web pages.

Click to copy

    URL url;
    InputStream is = null;
    DataInputStream dis;
    String line;

    try {
        url = new URL("http://stackoverflow.com/");
        is = url.openStream();  // throws an IOException
        dis = new DataInputStream(new BufferedInputStream(is));

        while ((line = dis.readLine()) != null) {
            System.out.println(line);
        }
    } catch (MalformedURLException mue) {
         mue.printStackTrace();
    } catch (IOException ioe) {
         ioe.printStackTrace();
    } finally {
        try {
            is.close();
        } catch (IOException ioe) {
            // nothing to see here
        }
    }

This will download source of html page.

For HTML parsing see this

Also take a look at jSpider and jsoup

answered Sep 29 '22 08:09

Mohammad Adil

Related questions
                            
                                Java: How to reset an arraylist so that it is empty [closed]
                            
                                Fastest way to incrementally read a large file
                            
                                Java is installed, in listing, but execution produces "./java: No such file or directory"
                            
                                How to get Spring to print out what spring profiles are active?
                            
                                MockMVC and Mockito returns Status expected <200> but was <415>
                            
                                How can I get the size of an array, a Collection, or a String in Java?
                            
                                When I close a BufferedInputStream, is the underlying InputStream also closed? [duplicate]
                            
                                javax.json: Add new JsonNumber to existing JsonObject
                            
                                Method overload resolution in java
                            
                                Spring-Boot behind a network proxy
                            
                                Multipart File upload using Springfox and Swagger-ui
                            
                                New sort method added in List after using Collections.sort [duplicate]
                            
                                ERROR: Cannot add task 'clean' as a task with that name already exists
                            
                                Recurring "PermGen" in Tomcat 6
                            
                                Asynchronous HTTP Client for Java
                            
                                ResultSet to Pagination
                            
                                Setting Android images from string value
                            
                                how can i generate a unique int from a unique string?
                            
                                Iterate twice on values (MapReduce)
                            
                                Java GUI: How to Set Focus on JButton in JPanel on JFrame?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Java Web Crawler Libraries

Tags:

java

web-crawler

CodeKingPlusPlus

People also ask

2 Answers

cuneytykaya

Mohammad Adil

Recent Activity

Donate For Us