Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Setting a per-request proxy (or rotating proxies) with .NET Flurl/HttpClient

Tags:

c#

flurl

I know that with the Flurl HTTP .NET library I can set a global proxy by using a custom HttpClientFactory, but is there a way to choose a custom proxy for each request?

With many other programming languages, setting a proxy is as easy as setting an option. For example, with Node.js I can do:

const request = require('request');
let opts = { url: 'http://random.org', proxy: 'http://myproxy' };
request(opts, callback);

The ideal way to do that with Flurl would be something like this, which is currently not possible:

await "http://random.org".WithProxy("http://myproxy").GetAsync();

I also know that creating a FlurlClient/HttpClient for every request is not an option, because of the socket exhaustion issue, which I've experienced myself in the past as well.

The scenario for this is when you need to have a pool of proxies that are rotated in some way, so that each HTTP request potentially uses a different proxy URL.

like image 479
mcont Avatar asked Jan 01 '23 18:01

mcont


1 Answers

So after some discussion with the Flurl creator (#228 and #374), the solution we'come up with is to use a custom FlurlClient manager class, which would be in charge of creating the required FlurlClients and the linked HttpClient instances. This is needed because each FlurlClient can only use one proxy at a time, for limitations of how the .NET HttpClient is designed.

If you're looking for the actual solution (and code), you can skip to the end of this answer. The following section still helps if you want to understand better.

[UPDATE: I've also built an HTTP client library that takes care of all the stuff below, allowing to set a per-request proxy out of the box. It's called PlainHttp.]

So, the first explored idea was to create a custom FlurlClientFactory that implements the IFlurlClientFactory interface.

The factory keeps a pool of FlurlClients, and when a new request needs to be sent, the factory is invoked with the Url as the input parameter. Some logic is then performed to decide whether the request should go through a proxy or not. The URL could potentially be used as the discriminator for choosing the proxy to use for the particular request. In my case, a random proxy would be chosen for each request, and then a cached FlurlClient would be returned.

In the end, the factory would create:

  • at most one FlurlClient per proxy URL (which will be then used for all the requests that have to go through that proxy);
  • a set of clients for "normal" requests.

Some code for this solution can be found here. After registering the custom factory, there would be not much else to do. Standard requests like await "http://random.org".GetAsync(); would be automagically proxied, if the factory decided to do so.

Unfortunately, this solution has a drawback. It turns out that the custom factory is invoked multiple times during the process of building a request with Flurl. According to my experience, it is called at least 3 times. This could lead to issues, because the factory might not return the same FlurlClient for the same input URL.

The solution

The solution is to build a custom FlurlClientManager class, to completely bypass the FlurlClient factory mechanism and keep a custom pool of clients that are provided on demand.

While this solution is specifically built to work with the awesome Flurl library, a very similar thing can be done using the HttpClient class directly.

/// <summary>
/// Static class that manages cached IFlurlClient instances
/// </summary>
public static class FlurlClientManager
{
    /// <summary>
    /// Cache for the clients
    /// </summary>
    private static readonly ConcurrentDictionary<string, IFlurlClient> Clients =
        new ConcurrentDictionary<string, IFlurlClient>();

    /// <summary>
    /// Gets a cached client for the host associated to the input URL
    /// </summary>
    /// <param name="url"><see cref="Url"/> or <see cref="string"/></param>
    /// <returns>A cached <see cref="FlurlClient"/> instance for the host</returns>
    public static IFlurlClient GetClient(Url url)
    {
        if (url == null)
        {
            throw new ArgumentNullException(nameof(url));
        }

        return PerHostClientFromCache(url);
    }

    /// <summary>
    /// Gets a cached client with a proxy attached to it
    /// </summary>
    /// <returns>A cached <see cref="FlurlClient"/> instance with a proxy</returns>
    public static IFlurlClient GetProxiedClient()
    {
        string proxyUrl = ChooseProxy();

        return ProxiedClientFromCache(proxyUrl);
    }

    private static string ChooseProxy()
    {
        // Do something and return a proxy URL
        return "http://myproxy";
    }

    private static IFlurlClient PerHostClientFromCache(Url url)
    {
        return Clients.AddOrUpdate(
            key: url.ToUri().Host,
            addValueFactory: u => {
                return CreateClient();
            },
            updateValueFactory: (u, client) => {
                return client.IsDisposed ? CreateClient() : client;
            }
        );
    }

    private static IFlurlClient ProxiedClientFromCache(string proxyUrl)
    {
        return Clients.AddOrUpdate(
            key: proxyUrl,
            addValueFactory: u => {
                return CreateProxiedClient(proxyUrl);
            },
            updateValueFactory: (u, client) => {
                return client.IsDisposed ? CreateProxiedClient(proxyUrl) : client;
            }
        );
    }

    private static IFlurlClient CreateProxiedClient(string proxyUrl)
    {
        HttpMessageHandler handler = new SocketsHttpHandler()
        {
            Proxy = new WebProxy(proxyUrl),
            UseProxy = true,
            PooledConnectionLifetime = TimeSpan.FromMinutes(10)
        };

        HttpClient client = new HttpClient(handler);

        return new FlurlClient(client);
    }

    private static IFlurlClient CreateClient()
    {
        HttpMessageHandler handler = new SocketsHttpHandler()
        {
            PooledConnectionLifetime = TimeSpan.FromMinutes(10)
        };

        HttpClient client = new HttpClient(handler);

        return new FlurlClient(client);
    }
}

This static class keeps a global pool of FlurlClients. As with the previous solution, the pool consists of:

  • one client per proxy;
  • one client per host for all the requests that mustn't go through the proxy (this is actually the default factory strategy of Flurl).

In this implementation of the class, the proxy is chosen by the class itself (using whatever policy you want, e.g. round robin or random), but it can be adapted to take a proxy URL as the input. In that case, remember that with this implementation clients are never disposed after they're created, so you might want to think about that.

This implementation also used the new SocketsHttpHandler.PooledConnectionLifetime option, available since .NET Core 2.1, to solve the DNS issues that arise when your HttpClient instances have a long lifetime. On .NET Framework, the ServicePoint.ConnectionLeaseTimeout property should be used instead.

Using the manager class is easy. For normal requests, use:

await FlurlClientManager.GetClient(url).Request(url).GetAsync();

For proxied requests, use:

await FlurlClientManager.GetProxiedClient().Request(url).GetAsync();
like image 181
mcont Avatar answered Jan 31 '23 15:01

mcont