Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Slow Ruby Regex Becomes Fast with Odd Change

Tags:

regex

ruby

pcre

I've been debugging a site to find the source of long page loading times, and I've narrowed it down to a regex that's used to extract URLs from text:

/(?:([\w+.-]+):\/\/|(?:www\.))[^\s<]+/g

This takes about 3 seconds to run on a large block of text. I found out that if I add the inverse of the first clause to the start of the regex ((?:[^\w+.-]|^)), it runs almost instantly:

/(?:[^\w+.-]|^)(?:([\w+.-]++):\/\/|(?:www\.))[^\s<]+/gx

It seems to me like the added clause shouldn't affect the regex at all, since nothing could cause that clause to fail (as those characters would be matched by the "[\w+.-]++" clause). Why does this make the regex run so much faster?

Edit

Some people have asked for an example of what I'm trying to do. To simplify things and to address the concerns people had in the comments, I'll be using the following two regexes:

# slow one
/(?:([\w+.-]+):\/\/|(?:www\.))[^\s<]+/g

# fast one
/[^\w+.-](?:([\w+.-]+):\/\/|(?:www\.))[^\s<]+/g

Fire up IRB/Pry and throw some text in a variable (this is a scrubbed version of what is actually searched against):

text = <<END_OF_TEXT
Unable to deliver message to [email protected]. Error message:  request: <soap:Envelope xmlns:soap=";http://schemas.xmlsoap.org/soap/envelope/&quot; xmlns:t=";http://schemas.microsoft.com/exchange/services/year/types&quot; xmlns:m=";http://schemas.microsoft.com/exchange/services/year/messages"><soap:Header&gt;<t:RequestServerVersion Version="ExchangeYear"/></soap:Header><soap:Body><m:CreateItem MessageDisposition="SendAndSaveCopy"><m:SavedItemFolderId><t:DistinguishedFolderId Id="stuff"/></m:SavedItemFolderId><m:Items><t:Message><t:MimeContent>RGF0ZTogRnJpLCAwMyBBcHIgMjAxNSAxNDo0MzozMCArMDMwMA0KRnJvbTogPT91dGYtOD9RPz1EMD05ND1EMD1BMT1EMD1BMl89M4j5Ba0fQrvz8atXqDIHQS4xT5dBOrGbeSsUfHFTfj6eP8blEZKl16Pgp4iA0AcFvJtCeC6s3Iq5GJbVXivtrpyKa5n3yB0f6xUQdFc95hTUleo12k0MH3rRwi0RX8wxyRaWUH81yjiXRmcjeTWAhtOCoDVb7oxOmIZAXNVQAMh05JitBFSUwVvZQuXPOo7BfGsIog4rjacpj743JsmuHxuYSl7WZQj7hV9wxvVSpE78Ey6uLAAL3yCBQ41EOSG5alLJeUOb7LVTlPQL6cauwRDUERZ5UYlJzYXj26hfrpzIVL15RlzQyLwt0cFFLcOsKNBwVXoyRyB784mqhJ7Ks9pFngzk9GZQ23M9ivSD5tDvc2083K7DPgfNThy9ev64jKZ7Ktex2ljsBovyDK9zr9RLWTViuoRjpltzZ8efRu6cBMppofm5DxbQbowvP5nRXSdS9ay9gfZ6Z2Sl3mO5W4LQh6xOE2uCqLNCeQSVWqUzf7dHyLp1RE6br76Rok1rhE8xi7NomNWViQb3ZA45gbUsY0UqvhrsgVfGZ5z5XuyzezQ9u8sHxSOGoK4XgfoZbOboOgxtB07JNx0CtGupENtOCH3Et4lGoNmJ1Mb7DaUAVNEf3m90bHm1M2d8QX2Bik6fvk9TguIKWH6syPFKJMKQrR5zIouNEyqqERYjZIJtRl6AQ4DZQd2iKt1ENZm8XZJXQhtNoE9h3DhWLDsN26cipUd6Y0abkZQX9ObR4J8ULscFmZvkh0MDS0Grpx8zwxn0Mg9bDGqbbJ97iGCb0DkhiU1TsqEZSfZzRB9c8PRba8QrlZ4FapbE1tRswDe2MPSEjJCLUJIdnHDNmBfA807NMVfwYS8FF5fG63XIruKACsXjxiKq8KF4ciXpEuM2jJcp2fCqOD0t4OUXCnGvodu7for9xKA8JWU9tJYZEAnNUKrWK8yh18pXltvElyVRNfnYXkaqFWA7Py6AmdoCoKPL3zBJh9AkoK8QpZfRgQeG5XQ4WzQlG2Zhsz7lcA9v3uyv2g4kh8HAzzZ1c6fyKP4M32yS09tN2N91aMyWTvBEic60FHZbZsnUowoSgi1kBeSE4IV3BWp6wly2Z149sUshIuBxC3IdVGA2cHHpk0aQgdIPJkqniWATqiMpNhsTtvqZAxlhTY4ELxJWuOBgfEkh5nGG3MfRcGirPRqLLFGiOn0i9HnUzsKZ4hc9jNVKUQfrQgUEbH7Y6Zck7gCgxP72JgBulHZYUPwJMWSDdlUl2LYROVFbGphx6CwCCKG0yiD1ImNu1JLr4J9fZRFLHlaDorSNCCNf6ERCBUlOIVYe2NLxQrUwMFnE6WDuNJI8WTO2jhm5PugypaQHBuUwcIvz5bhQD3PRcmJJieDTw4tbakH8NKl2KYJ63Yvsi2TP1AEuCm6EW5AVlxaDr2HqEjUdisK1TuOv2Y3vOpHjxwQGwnDAi61K2leqkNkCB4La7y2OHQVrVcEjALKF9cdR0tcfrTWXqoTPWevti8QPxkZDdBlImvMLaN1JH1XVv0XQoE5Qbvvv4RyxhGmFQV36VsdwE1s794QHwUl3dBxIZAYe5jqXc6VK90wRgvU8rDByY5WB3F0s2Bi7UeuLZnzHddTTBMiMU0yTXCQTBfx2OKM5PoIZdDbCCALMGbMwFRldlt8Z61vZEKZDjiJOkXCphAyNoY1K0vZJIGVDZjzNPRMy9fvQvOliREksdaKGYhJrpUsXGsDJLv5w7DcFygeOZPAkuacXCtLwYExbRd9SUqgOsCKRUbw7ZA4ysEVL1D8rpRbJeKKYKYIyOd9W2BEf5uGQ10YRRSzl8l87n28XApT5d8vYIznXp1zxs2jU0uQKd0NgvslVVUGRo3o17xGOnNBJM5QvyYnEG0FZnhXbIExq2zr0H1Km0OgEU1W7pFYyUE0spZwVtTazYD6TZPXUfj4jYS2vgnGRnHYfVAEH6Ufyl7VfcGbqsF8C5wTBLCSB854DBOI65NWEfJPmypwTUU5dkpk34DY9PEtb0w1LnLPAYlG35wlKPx63QExGVJgBu9rPRYX33NH8lPVeuHIXxdCZdvthOly2hVs6PsDU5WJxyIhIhvIauwY7nBx2BkGNfpoDX7Qb21MOUwWXGmwGBXIThZpP5oiUJTkVq1p3QsMl9cIlFzmvJvGMFHIca4ZZGZq4ukbFXOZJkJ5UWP0CYccv0HQPZjhpDJfnJNSctArpNmYGDAlFgkgrb24deZj0pNE1qfByF1H092bb9ZNIiBwp3d3av1S2thX6sKycCWibrF0MhqaEPGP016E018I4GUCk4IkuQ32ZMEAwzLVpjd9eT0pWMC6eeGy7EVobYs9OdZazxJQUOYc2JXGbdBJ1WK0BNrFANa3MzPUZBpw7tG98Bz3htPIAYRURwPqWFMAfwz0TvJZrLrVQUoQFBDl2mmdI1Fw0CoBnhC6w7XtjNKxfYWroixWIM6knPS1XcEDZoxSEUQGoUIzO2fK3RU2z0h4dbqUNfXCM7tLmIj6xqLlpbWZTy8XEAWq0bmABYy1VFv90yF0HeEF4laiKmYhQ5TyMJzS4XX77xIrLQNzDO03zWr1JMyucczShje0tZptKMGawoqWUyPFmbgzxTyjWjiLSWO6vxFl27SKbiERw74bqnjZ0hUWXsHHAnlCwl6nCfFvY52q6skalWZGBdiG7O4NbG8WUZxpcvtEu7xI9qYkECV8ThoicSudB0LV7T5VUKZvAwsOFcQ6L5kGJGxCS4dIwARllkvXmdKUyAYXGxk7IPoSJWcb4kjoxrlhoIvs8TlM4LfKBBYjOkfuz5XKCNEWnfDGDeWkjVagatxpbiDmDl8OhHppg6VB7LapvLIHolfqEmgAelTjhrTNBMHVypFk1945p9gDH6szpHkW5AADAy5o0rmGlUisiOG9zzEaFqMhCYIhW7Iqjd1SxBRrCPXxpe08kH6cYKObmHo1cerTBVml5BMUbmaxVG3X6VVKjFRpZD6mHOiFXvt0zwr69RntM9YvaUk31cIXOjF8rZ0VAxsnvYXD1QWtjhtY55L1DEzcSFhjiQChmBK4vUEaWksCbxYCeN8lVvCGaZ7wxHRNMdwrG7d0QHrw0C39LPZHGSjKYzbgx1JpM88dmQu2yNkrOuXJUYSJS6LBYGrIHCzJzebmmAMVjWIiCb6B5jccssUgn0Yl14PaP3LxG8QBsRlWjxfyz9b3aP4nvg4Db39hIB9cBRoqdooaa9FAv9HiVPEfiIA2N6q2DzaJFQglEcLVWXOVTuT3EAmc0Q2OqJE6WNlHFwVPXylalY4IDjEquQH4TZLjFwrUcNx3lTgSymASFmrUipau4QkL1PCrXY3AfqusoU1WRczs2ZRZhBkTu6SUGyQuyohomIXvD6AjRx6eKQwzd4VdfauUads6v7CsryyU2tJim4IQge9wLjJd5HbD2BeRjbUH918fL4BreR7Kv2wMNXydJkS7URtW3CH3VImZqUMwJCsGtD158GcDrST7A81MdjezzrBFpH43eQIXfVnKYiv7zBHLvtaJ2NaoSGC7FhzfpCjsbZFMlZkiaQykCZp5pUfH1izWvC5fpJ9Cwkc4lEMcKGLPpFq6vugPYpktDLNaAvyDZfVK8XixTzpR3DAQ5ehWsd98lAmUElUu4SlB7EuB2QglcstnnXZRCqA2jfYpGoKsMPRIYOswHEqJbiBQdvr6i8LNcyhlCQEcqufQJDSampQQTFTYlqn4XI80n9h2U3eAS5xIqCELMJaBKJIjDFBH8bTXB1RzGUL3OnIZ39tkJH1Uf2ygSlJKHTlpCSgcLxCnSFoYKzNEybkHIW6teq3lGtE0Asdsbb7ys3ictlJ7omJET0iJrSoZf9S4KCJUgBbRn3FDHTp75XWofO7kw6WpYHl3LZOAOvVabmUu9jAdtxwbsx0y5SnmtFsp40cM3uWAsIM7j6Nx3144hgKQD72VuHpeFCCj2CrtERhVxkDyAV2EmroGmhom3rlkdeyjS44eee36KR5xB6q4nxl7qdXvlchQKoWb24Q5DnDzdmIGWh6EeDHse8MLlvtlKZhsRAH3TiTdqKHN1uLDCvS9PUz7i7PzbmFC3sWVxeG2FtRzQEw6BzOlQizXBClak69r5oJ5t7Hcx1Rlv9ktHX3A1rwM8S3eTmzGuZIxXSCZRddtrbdMxAyiYuH7lhJRGbC3x5AZgcqgYHSOJa9mmh5ouVgMoxj7Pt5qHSlIJ8E4YWnLGNmQyRVou06G6Dtu3EYmtUcZ1VgnMeCdwkBBPe0bkFAbCiTRfYTgC3oyftSxW4HF5WhN5yORlTfg4rtrWOC5rWsJZiuZSvV0l9IwdkPA04n1f0ryz0Eo7EtHf5EQEIHc66340SBjKQrChypJIw77QRCBvT8bJHRHjnZytfj1clmK20A1cAJaEncUrm0FusmiQSAkZgO86abE1tmgGh27p0O71EpQ1jo13MONtsltNrsec99X2qKWec3BqalQZzss93iJYgRzKbOgf4xkOCT9xM44p55JtFA3n4AXI12yJvplL6QixN045qeqC6ssMhJE2Y8H2EhHRIYBfNpLZ3xzEXH6aeywGF3optfPN5STtwlPXFdwlMwMXgp7Av5zuhaE7SBjgb5yL9FsX1XyffxeTtXQp00Er5JJvOkeP6H1N97GX0oOj1pfC2mexPVbbgDmoY6gyTx3fw2kYFqS0laELEtium2Jy5G1QIyisvaXAUjkLS3zo1wnJPzM48VApY2OqTsZlwWZAZgk8BKMh4U7XKij7cRWJsnKAR0yIFmMWwkUmao0X8AMc4Ki114cNx1HIeGuvpIZvedCHazFBpCzwjg6DZXHFYCB0yMGILd7R5D7LHcaVtlCq4ezPCULVpD2ZsB7bSnnTj8boYmzAWaoTJwUM9h1sEuiP4klUAj4YjHj0gdBYSgGInkMC9v64iago5lxNHKsF1rIy5tWvXmEbRbcbTPmYfuy3z101KCzAfGZkWNVbWVnV0gAbMf4405gU1HYDgJFXWZgaeHdRYyhcEXkDwEBHetGF97e9GcbvMgV35n9kmWUmtAEVAmk3mNEP1dPBiuQmKVQdaJem6aOMskQEMfC8rH9ILNshh9wYqLCtpvSFlrUODHdE1dkJJUdZVfF9DJZwOZx47VFfIak4NZSaP9iGT56aY6OEqKGC4zqxDxMVWHhdE1QWpGjcEoZj0Tfn1cCZ6iTl4UHqDWK3o0MDuS16OkU0tTn7qYuBdokaVAWcdTFtoWdLeFK87ZUjax99jFeBKNEdubnWHohy4kRovMIcRZ2tq55Ix5Dmakg1juNBBlyCJ1QccOjPWIXgzsnbcgTKYuh2EwkWDoaFD3WjZMPkMLQZxr5L7z1OBDtcOfwmxp6MOGemt9lBGpDwK5LZlmpHB0jF8q1Iovdfbv3Nkg44AAjqV3oqHE41QBd34xUVlNVpNzDAyeU9reNmTE7y6A2GqUNPjerMPFp2Rj6ksqDs8JMuz4pSX4r72NlNGyOcE0dRLysM2AQwV1hI7xNPc6Jx60tqIaDeStJ9f8rLb0TjymgmsDKOFIT2SQK6R1nbORU2vhgyunLpnaw9oX41qYNYV8cttRNqtTwmlBf6Bt6jv9rJI2HBajBuBwaDpuSrsLEoWARJMImuo4djeKMm8tJHv3aj4k9qe68umdbSxSTBpffb1846CbNWnvUIki4VDEYP85c9CrRat6sk2oWH1w5NGMe5AYsR5schIVEIpf4OfHpaWHPT316ip1r3dgsKD95KZ25FzmPdtRZajIL25f5ZMzgpXZliwrThurFBDPRZVpiiZB1uEPWIktHU3e8u1B7Ug8qg3IQkEeXk73ir1Npe2ZpI4JYKkNkksrQjjchMJOWLkHzHoAQfuXhPfdVVUDfvVteBgXV2KB2XJKe73pSPqIeQhL8ozu7VjpBr9qoXW2UhtWzTCSDUjokLNta7KnEmqEZbcVyaQMfS0cb0GpM1B0JGXimkCiiQm22P4Hj4dsVJenXUnjqpLHBvFFgRspNqsMLaIqBdE1dLqcREtGPhy3dFta6OppJ8q4IhmsEqkLLu43LMzDH1p9e1xatH5iSiUNn8S1jhEIKEe1LGAO5VcYvds0WcJS0JchSDLvlIwQsb5xTAvocdWFICvIfCWpnmCkoCKl777LZ74G78STTjPubAaME3x9cliLzMIVXuGqZZ8zwpM2D71TIAgBy49IFvqrTMm7Rw50I5hZOwEMmCna4lzEKAdIMYiSrD0XHOA9XdGj6xj91zch0qR05HRLfJeIZzuiLR80B7QjxI84OgD7nLZrVuWxjVoOCvx8sR8vQcGClXUmMpB3RZfamBEswBsvaUKVPiHDTp0QpMKUtMeX9LzNLSrq7WgNugolPz2MpsGSNKYKREvhvTFAcxBc7ZjpflosHH4OvaQ0DUzgxp3mbZpY7eeHLeLT2DMODiYBD8QPaLusJXTvRovJviw8v0DG7A7s0qTfhidyiFVoWJvnZ4n3QSOh79XDlx5fAL1oZjQdEHEOpzGtuhXqCnJhvhhPN3vybKfyIvyJzIOu0NNSPe7P8jFOyLzOKiHMMSR0QG4vZhp3winzD6yCuq8tFo5p0jktwjvArc1OME09KdXyEgIY1JNANsHJiSTmnvRkXg0UyoZX48SdjAnDxlKLzRfT5128hIMRQXpi8RDI4SraijcX91If4NX7nm4K2AruWqbLnUTFiaXzcLBPitp9Ij5KyH3sxspAnykxHFTLfqPDv2Q6QBAyMvDw2TkDMB4dsJAmbiDelw8B01xbdm0j8f4Kemcqy8mjJlEX6cb9lSdqJnYeDisEXEsbqgVnS1ZejTomV0sjFYV3BGZ7oFwtiZa7MjnLhcYQAaaacw3lpSiRqM4yFvPrbV2ZAMl1vpd3YULaO36WZHRvUi8qe1Xwwmj4CHBDeX2moaIdlKxDlksKwvLi9C0hvOXdEBQiBWLA3AUO3pXGs9qIYY0BHolqWQCnXDMUcJBHgGaiT1CRXLydzNk0A3i8QXINindQsCuive0xjpb7YJYzpu7zlXYgmctprr6szyhLBIditlsuTAgu832tLUrnnKc1W4JHh5i9892V0FoD8ct9DOKUlB804IH9douZ97giyttRaIQXPr4DsQyYS25sDsC1h6bFsCqPIqXXbVHHims4hrrWz8f9kPlkBoEeIs5wFDCdmSnGE1hZI95NIH0JYnoBIsTRKLA3pwOlp6M1hvsXr3MIUONrmoZbHUdGyGhuUqeCDC6WM9bfqakuSVEfmf5nO9ayGrrPH5jfnVcbhMapBWAqp59gjAMbpPgYyD5pqD7apEEM66gEhwTLGWIUmrNL2SRTxmkA5BjBPiwmAjMeQFxdi1fU3CONJAR5vqL7mmOCs3nNjNDrMJrUztVBfcydUz5QKW0S045Qs2f8oskGtIolrChroR4zLkvFkW0EJOTNMw5H3ntK6QRDgJDFyfBFOx7r80oYpzPBT3kUi1E7glttb986fOE5nlEeUoK1a23u3gAuCVLfj8eeovwPUgIzvWMvzfKWHPoNoJ41I0FJgr6M59sskx93wX0Olvcm2Jg9Vn5kWIUvQ7A1OYx9p1iLa4UHtiS71l4SYEyiLJWayxixuUGqrnR43eFezXLwf9N0b6LwsPf1b0xkgtFKFF8WR9V1k5VsoQDSxPkn0bXOksiQuvbRdLIMLGSefKY4sUUGYD3I01KicMj8R0yU3hpmZJyqX5meQzZjGNvkTMQfTPTY5jWkPFdRTAR8Y0WGYbw2LSjQU1yH0cE02IhOoRbjrdvq6gVuHx1my8BVHbSNSlp3IfzV6KAfqZQ9OgmXgnndfQ1IE0vdhQUsY5OkAhxlqdEUSL9tA5m8RP3qsgOwiQcu6CKWlS26lK2AetDh2r1njq13KOWL4rKikQaP0qxpS1z8oWSNwCURLEUzeESLuIiuv9drIn5HvjfrmgVFdhfi03MmuI2LLA69UbMsA8GhFki07Ssx4W0WBgYuhYSgh84SZMRyd9n3R5mjqdIW4fUORe6Ql3P0JFYjJTRKeakoqupL6DaYVYGmaxDY77bepszXnDJnbhPi18NiGt11vygNPvw8ppijWSaXxP3pb94IdjATK4f5886oyTSkhhYidWK6qQfXwsDuf8hr3Zhv5Vd5IC7hjnK82XYFkMu0slr2LyqsVnvPvdLFvnxdazJMtwBG1f3ddCDWOgGEtXDLps3dQ6HOxJGSrT9WIGSiuii3ypKAUc1uesbVte228SDOZcAfVMut395Ulw4X8VRw3szjRKKmlOvRgTrC5yYONhXiLC0VLv74wTkLG3QTEPuqKGXJszO0mTjbUmIcGlX4q7sLxNwrZnwDOJPJXh12cwCYJBsqToxoL9tstwxB3x99QzOQMuCAR8BdywJcO6wmc1n0fwVnK7tBalXMiTv8Fns0qURWzagMEOiN488KS0Qj54TRG6AJxWULfa9kfCbXxZycud5yPbmUcE6IQFnNTaKmj7m0U6t9mA5fPveDDtyUqrstxyr4WMtw4FiBwjpNpraKWnzBfi4OBK4iYaCJdPJKRrpaAQyGSJyGedQ9AgFHYh9EZopKZgH5pcBnf2oHfhuBTq0NJGmrKusb4nC1PGjV1jFGpF3r1RgYtWMQte6KUQNCDSW0XGegmU2VVrboOaeAaMWM23WXRxa92e6zMsYxzLgMUpAmXnonfIr9ZPAjzwx0UXLqnWZP99s6Da9DewN2EKXEXQgzllbLdr61pNasr0KeyOq1sRWLuQzXrvergG5q4GKUopJBH02sAFfTUcaxgbTvRUxwqTbvQksLz7KwsirrtzuaxTk5WpL8yg51mjPrvyLUhCGFv7IvQible3seUkmqea9eKwDfvc9ZJNzOzWTkIL5VygG7onS1dlkp7bYFeLl6n2Iy9XKAVDrzSH5zzCEPoihK2NftnXrPwf4YoEstg1zl93WCMJC3mingiyZ3ILTq7hEDvJWzDdtKP5OIlMFHlrexwg2ej5aoa6YO7oi9PgzjfSVlGmGbqv0IggxHhe0FP43CcaqJUfTzYekLHOmhk4to5Lf7ttISxAda3bQOSkxYR4gS7z9GCCgsArwzfgfmw42YyGwydDTMQuSrnLvJXLXvaO8Yn22gkQ3XZ22axJhQaqcAb4lw3oDTeQwSxFDxly8J4U4Vm71rKMWxAyfzDYkRMQMMehpw5bCPLuYUOBYTBIWdQtr4HXze3FNHRDdWAud7EorALu9Q4IwNrvf0Fy0nPivrrLEjE5wBJDH1usLMMTizZSCvpKscVN6NBk2ll8PmEQG01D9lCTOUIXpbbOatipjSTSHgR3lHt30rkmskRXxz6aVYzYaLmBDucf5vhv9IWORxRsP0KfkgqzfoZi9dJ3RZoPaVJ2WoRCwGsFIx8cVPsF6L67kSTNuci9B0TbFUeuaCS1bauz4IUaWB4UZnZOY0hMtDYxc1zOSBM2h8x1QVeOxAbI74Sr6d768RWzx6sSShJ0RIS36zlLFmJ106ogFEqfA81dhSVJflBuFcSHMPwyvkD9YNiIWJCkP8kdoHASwdde7hdnom1OVKZSvjJxibdAJhKGwYdSy4YUsBPGXCtemsMp8Zn0xoQc5nhI2fPkrLPwFMftO4J2IO0sj7rQvtlYylTZFMHzaG5VDzrZrFMRPEnnjAkoKDFhfiXY9iMAK8busZaHjhANoQ5l0ewEkDCqxaU5ej8EffvmEruywI3luXyGkEUpef3qKev3w4Wf5umX5mYQncTAfqmxp0kn1iedSIfJFVu7Tm0IBAuir0T3XakyRgCEyVvytFNwiuvuVDw2HCmfgAnB7NtRvG8045XyirFBpLbgefOVkYh4IK1svdL86Dlh5Jo5mRhBTNf1nEmHcylgMJUWVB0RaVa8d4KwR2Y7cag7BSlC6IXl2Bxh1TZau4TUpSnikx7JtC42nkCSYFJbq1FIHMu2xrNhlrjgvNSjjipLOyv7uOzHUrgoCTcYYFik77ncAbCbFSxC4sjYAWR7rIQBC6Ghyw8eBdCdQyVwznjeeUFWpcWMiJC0C8ZywQllwj2eU0gsZjFHxPsMlU9v5yHJ5un2HlqDMHknrlXWGKctXt0xwcVQx0agFVJh7uRVNiYU5eATpDUPj8MBHekFD9m65mVTbNFOcpxKHliJqjglUmz0TCqCTlrRyjiNM7csOq6BWDt2jyhp0JTpZPTL161cmwgQWfcVPqv0lyNKWeSb8zVr86H9jxBTRu4CfOW7KorFEtkIVqDzynx3c33ruKPapMe83M4rNeXc8enDEMJ34z7urZDCmqXhgghwV8jzG959csv6J2kvfBLoTCH4lm0RRpWVkkx1oGVPTA9UiqDjv10qNXYCF0RUn6vsqGcz213KSfC4vG3X2pSnmk0Au8dEjA0mzhILy0z72zHLStcIuH6ShW3R1xQYr6bdGnsGn34aiz5ztJaVcwszOg31QXUJ4nRCMrVmkdkC5LuhpQcl3vKnzXxPEDHF6jscBfSYCEVNRmV1x1eXtNePKJMfSTvn4NUCitvqYrXVhrlfEIdANYxy5Z8wTaZh2fn3G4jKj4356Ar5bmpRjgcWyGvGjEILm0mknGSNMD2498vzP2wQO9rnM1tTbcVBckAgZCOzW3eYs0tJ25u7uhbLxxJLK3Z5laGpYL3QSUxiaPX1Che7fnMIL20jC9cJ3kUfzuILxSaFXdTuPxz2xox4JZ4yfdvaGDozG4DsTA0o0ZYSQQ8i2l7zNYQ136wputv5lfVrDGZ2bniRuAx99qKFrdTaYy4RuGdBDqS1g9OBwS6Wol0TgA291cWuxgINnACTQT5jcGQ0Z1ovcCkXjYR3Uk2vSz5a7pWxT27ZOSyGZ4s7g48aBaQCQIX9W5P235Aksw3AF0t2FTzYaVLwe5gYnZEtmdeh8CZOH8ZoDh3hcmEKbAH1kxHYqUzileGXWtGU8sR171Wf5Q6CXVnVL3gKJwUyLjESokumSA3vAJXysiuowShTVu8YBj5qqMzq1Jkuc6pg8xAOxfSrQOmDp6ul2uCtQWk5GcGQxzIxwH8RFhWj5p9zFbjHwkgZYDQoBpaZjXGbLYQUbWm3eCp5OTayu5FBljTodojLXszR789I87EadRyK72h4sSaXP5xvOY55puL3JowcjGdR2WdUT2fFIi0Rnr9uvXhzIuLgvpThUXpwHerVwJY5fTKE8ousbnMbUOCIbMjgJXwfey7ukYiTpm7TiRKGdeX3VukYQzKnOnt68IGljXFSJdg6xFMCfMG1pMXIKVc2BSoRgv0mQwljkgd1kNNUzzxHL1qiiH3ZtTD28uRvkUqcFFuKRogN7wtx4TRSePhK0kktAkVPdR1NhpJ3XZvHdpvvPsv16E86p6jIPwlZtPTmJCXUO0CjMIofAmRG2pmcXzxbKwGqn8yIvVy1QkXGc4f81Cd3sKlQIPYIO9HpkVqX8VdilJYyDQNNxqTiU6OOjigP7yF6ND2wXZScCdpsIS7eMNwoZRKJb3ccoioR15fCuHE8WuxuRR0hTC4D2cgPaFhRDKSBNm4EreCCQMC7Y8Bz7w8lLkE2fu3dmgcS3lC0a9XlnprVL9cpKP2mq8BWrGnWz05fyu41iCwGLseWcnqL8wD7zYwymlO1ptmWIOXVFbD6bgfGclFb5mtGsytRzBccdNE2gKIWTShnXPzsCYWsGAyxTPY1k5LPXZLEAayzmxJExoORuG82I5Aqy1yzcr7ew7mUeejyeJfrWPqL2zQCi6c6AMmaVN8BkLrviWw9DYp2slT5QCzNdcJDgC8pV6epUc7QxBTttuU8zfjwQ4ORYNnpA27xngdO8yIYxCand8ajx5kXpcpBPAELg4LJcPDLZ4HqGSUnEE4cUxrEuSnO1dXWOaTlxaiKjiScTM3SpYZrlOm7yhdce4xxvReEVFkHw6ykbfgo1TEk8wDgmIHBtPOoFOvQFCgrRmi0zAYyYRFiRGX3OyHKGs1qAoEWt4cOG6UJZpSGK8jz9BYag57lTE1yck9r29Y8UbsvLvI1NSLhJQLNk9gmLHRr0iGV7QRpYCkdcTz7eWO6VrmfFq42ngWid4gKSBqi0ts7dGYiv361JyHOKA3soVgjJ51dyJ64zdSpuJoa7HpKeUFfmhRA7uq6Ztsg1vmoVBjRdZe6SOLCtux2Cw4HbxDSEJBlVVLr99atQ2POJjzC3p6H5bpJK2HJBFWQJgtHF1WorwFNeL855c3LdIbfS5gU3EGxrfqowdYcC3UdaoLrRBFIjOFlzHXohh7Bo0IWHrFZpSt9QjD5eIvHXoLH4EZCrfNLLzHpBP7IIrKNGpVkDjAJ9soXmcmIJ5xt1hyriyho5X3N8hbuandC8cGRhGH4ba182PxBEbnvIVbZ9jX9hKjjRgABwr3GSPSUvQGbm0aj9myAnieCLALeVmNNGH92sO9FBY2whV3rcJehz9Q0onTQi2ABBXxQYVJMS7xF62g9gTIYHKZAHfu3MlNTqBaYz2N8zVwWsC1hgJnjHcACjMDENgZaCa373ZbPYLPTHqCecOYTGDTOH1cxYsxyjbsJUBvh2OYGZ7ZmBonymiDV6qiaoKDU1RED6gJTKStyGQ8xdZafxDMxPJx1djx4QzpmJEXODrwRR05UpwxCkWH4nG0RqIOE4keNNRx18Xcx9e8DDWnzNeNOW9fQ6klmcLbZRIXsy7Tg7nGdONu8TJkgojnHFbZ3mIBrkmg5HXRRTRIVLKHmNV3JsfEAqMo3eu4d5f8EL1IbAl5sdbbcf9zLbCjJoS0uRxuAXBsEPqlMYiN5krG4dvUtgNzjBoddnPXhAl3OTx04KA2K2W5wtrfcuYD4uthnWbr9fchyvE99UkfthE8dsAuOIX4yiJyrJ3JjLZndVrYhjfb5HrgEXkLgpzywOGzGqzCRwyN1Nmnkpj8zZwxwhtSZkiyafemzBx7pspb0Hr1l2eEjXucPnvccxtAYzTK7fFHDGE3VAe3HawVUikXXPArgK0YfRZWUo3uwJYRe9UspKHswg2pCQgPOJeVryJWgwRfooZdwTzmJO2noRdVLBFrRUQweyOzU4lgVBxVx72fvr4Dj35kd9mXqUq9fzvRBmNBrsTAiOhGc1Xq5z4C0NUvZEOSJa9AEkvCoiMyvj4Q77mJjr8Y0SXAbujXFyL4pQPfiVk3pTLHNSy0UEj3NHdtCmDmn2k5AimwL3VaXXack396CsdgMqTYfeRlNrqyaz8cRAjBKBH3vfrzlxvLs2A7Hk1lMVxCI71YccSW3R6W9uAuWUREXTJpOtCgN4xtLTjQx7CrO4AiIB8XsHROooTmWHaRQZ254NO17hbtgfvUZNlHnRzF6hreYDkUSaKb9AWWQkxYCNKeayjJgrXplfvXQfH6B7jXVXdvhjgOajJsbRCmx5POMmqQKrecNPRcPE9TMOAPLEyHIEd2qp9z576s91Jkiw3xg1RqlkKoY4L5yv8TdKe3BuN4SFNxmbI7KeAM2O9AVInsH37Hlw5cBgFRCoxrOfs9bv7TrjcusBQsSu5JWJ8xAlN4cJt66SHol1j7QHHXHRHZCkdNqGpIG0uwC8jNKVQN32UZxKwNBXJhGkopXu1z4JoWh5fh19ojUVoDVnTGaLqv7cSfofOTBEII4FdfEy04LAzEOMT8Hp1ZkBpHjwJ8IdJ2fX3XZ9b6tcCK2Fv37vAxijBFYaMgUW1Ll4GEgcft599wtnaU1ICeWMJM2tvjQwWUMw0JvtdV06wvBTmlnN35IhWn8wUUDGtwt62ItvhPqSgNHkKR8zaPKvB5Dclnfpw10zaHrQhwdtsxucTqrtdgCe4d5UZNCbZZvMIdgQscG6I0o72ksErRemzSMwW4i2Zejzuwr1N5Ow6jR1BI8EIRRWoDJOQRbMb41ETfiObhOAUOeMafT7NmNHbDncm2VelfwV56pJa1FuAjD9Y7rYvGoE7iJ7wDKLSbgBQN94r1gnRjjixsXr7RCEVz7BZ1YNxpD5BdKvfT4KsJcfh9gYYsLzAc1Hk5ZpBCpZO1QBl2LLeS7HKmWEaZhxVxB1B3tTJz2YGdYtLTByvZeiv3lffi5FTFoUmB1Re9j29FcgWGITRdOcdVG9hbVyZUVQOYKJUtfwRugsoKGsuQx3dvr5VgpAyYXO6ybEqtjgT2YxtsasAXToO5XViTRy3oVkUZvjfSYBdoUbPT2UYr0P4L42tFw4pWi2FEwIU0cKUC7LJSl1TjJB85ZL41J8qUFlwl8MCw1gDMpMnVIdW38R3XqtqUNbRbOvgayTWGiBO1kWDi40Mr8jzRIKxDKJAur0YySwaVocY37ZRelZftCeziaRHX3D3LBY31DiVVwHxHwU0ZLFBohb5KYA0s0W6zdxp7pDUg8KvVRB7l6RqbJnILOH2mUTrSUHBTTLkGh45LtBRwh9pH78Ay4ztrEqetR33Cv0jMy9lVibFxM7PMPPdu9m0HOJVm33Uh0NSV7yVobcp525djoX6vFdmKIJDh8N447bd1DCO28TXxYgfg7sjcVGbzHJNgsM1VHTl5PzYlZzWxxNVGTKK3I70GKY6b7SsksarR1Zk3oo8Qtzr3fq2gxQpF9lDO62IxwfJqyeQplNH7yTeVvgFpnCSLx00P74GwGsr4rmc76lYXpiacpMEjgYHNcKry03zGIc6zVtbtyVsOGFZiwQezl4U2h18iwLRVswJlFhED2cqeUX7cGfv5G15iL3xYBtsSE4dCqzikh8eupJBuZLPuRY4kcSbzLFJufOusgSvb7WRfyN2IhWA8mIPiQseUywNbxi3QXXFdAmPwMC1i9JQ8Hx2tWzyeM0MleFXLs0kYCJgrKqLuSIwhA3oc4uXzbSz58oGIMTW4aE25zZXLUwXCcSRf2NMjI6xOkoycrZ8NGiV6PATiSQkGq7MxW5qc7SqEsrDSJRviAf2q1mVegXqKBRxOY3zR3SMalTGPaeM9P9HJQISqlPzOX3hOwNi6vR5cypj91hRoVkUBbZaG9yAdzkfQLrnt9NXHl2E8j0mPAH39tQSlBIirKMcmc29GSHoTAsfA3XxQz4i5X5l45Oxn9tdhFtFnPDwfAh6kFWMpXpKCyguqQ9N4w7W2en9305aKhYkzn5yWPtOt6LnDh7wHupLrDP9chweXsqqtxr46XuORLctrWlwIYVLnOG80RYOLghr4F2dyfAer9aMBaSPpybx1JFgOOICmPCRG8SiQQpX3sPNmT3Cj36vTi2OC6hM0RxgWOFqrMmwAMm4wBB0ukft22DLX7QbKVXkiNdoVDq6Q0BIF9U8gV4E40kvqqnwM56loEfBuYiFePURX1aw9iR3rRhaeY0CHwfV7BEyqKNUFDDzjXzCSV2dtxcv0MJd6mu1XhkICEt1z288tjPrVv2j1qfe6U9vHrO9jGJIFCAuwJe7ouLXAycZDqZ9iWvJETOU4sYGWJFFzFJeHsA63xfX96Q0E4WJTkhKE8OGm0WrhglV5ZAnwbNQBDZ52fuw3Y3rQXiU87KCZTJoWs8GP3QQKVMSR6sFtAGyEJ3dOIk8tMMYxhcZPj8cDQ3BIA4qIBRywGTmpiOZ7sYS7zO5Df55IAeNDUvmETy5QQku6kQjfeseFqgrGmvdE17mD6zCcDmspf6IFZWEBC2pb55VhOIfX3Q5tfnfQBmisyirJZKy9QTTNwl09MomyBKw8XeBBgoc7pd2CRh17QT194OyAlwYNDM4KBOAqOAPkTmUhipiFzDemBPUDN3mVuVNVx7ZaJpOJF6aEvolnT9PiFMLRbOkFgiZAsQRIEyqdlTLt4fDpD0U9SNfYDmm092bwmLhKSvfQNmnXaHUz21ooTAkFcIVPatIhayhRO5MCFPjEf3Fwdp2brMVMI6C4ic4Win5qw9WiEBn5NqNAS4HSmX7GnFYvhs1OpJ3wZIEyT0CDj0woaBen5j4aAqXdpzFNWvxDL2UrhRSIFiJ5ELoER2nWqFEK8fOd5mFSXcqJlcejevLOqmubhuYdVMqtQ2WlRDjUNJVgAQ5JxE5c49B9R3aLy1hNyedOS1ApHn0RJ6iAxWMLFOsiajlhu4vCP1uNe8EMbAnu0pmaRAs0e5OpPpJkS91DTcZQ822fZ5tjkY7ATkxJcLDDt5pVKLTIs6HnAlrzVFjnUEvMrQSIo9xbDptTNOMqQ4shGSYe8QiAW4vLLoOvjJkXrghwWUnvsYcLKvwAQ0cTNE7g1xX6KOBDdBT8rvGDrAW1803702BohvPMX3ES5elx5B4NkL3OUoAP2b9jdzzodGhl7IdyEUPtA9QqwLVV1rGMFpNbxow5E5jVAgJWvOnjoe12JQCMCXoSWWAPZDsE4mxD3HxshEChzpgKulSsr5eKwmLMAKaYz5GCzIXdfTP9r628oezuYzxSvevAx1K6k5JqGgiKcZ7kFDXfNrr7UnpzgMN0kpgmdjgnKMJvViedVmlwJ2VX5CtZWCg7VthFpQYkxj2xca1IfTUrb8rk01x9q2BO7j0FliN4fY09cTfuCzl9E49HKFpkLKznUa4itV7KUIx1KMpxI0Ya7pX0ahwXG3imxmOYxBN0PtilhKeQ97N7QwdQdmSvapAp4auFBSyfSWsVqDg0JRRODf6VRsLkkM7iiB94X2lIYiLA1EnNqAloGAHOltdg1F2FMfKRLh518xhJBdBegSfzAQbBGZ7f88mYi4Ei55FQW8p5bdGbcubd8M7sdWLcjavWNgtEfr1wr2SwV5FFRWZVu313BQtPrZ2ODBepRy3Tu0HZPE1GCsNJvZMiKIWpLSa5PJRl5HAPfJVbwCLFNrRpFyT1Ks8G6knrdjURUPLPfYITjIPH49yz7NTViCmiWkDc7GFSn6PRd1hh22c8GmWzIiYIORcIGzZrkwYu3L2ECIFhmH4CWMlrTztubS5UgFYdF9jREkAkjfLp76QPEqf4TABPSHtK7uucZw5rkjuoFQNdhauUWvJcleChUAsGlliLcxDajTkzvYBUunt7GCHuRPYanIIbSZiOLl5GtbxnspE5QYf582X4yWFz6ohCNFhBwfXUOTqKWhFMH6Z7QN2YqWpFkktii51bj4eDXXK7bstq9zoPKB7FW7KZtPGEAAr7VV5ALuKmkFkW60rcG6xHsmxiv1ZQp3ejvz5NquIzSLoCiGK08BIBVN8yTqJLRZM3aX9U4EcNPLHXXtC3pOdgwUa99n4fwYx1ddOIADFhQDVLHQy31RWFPipqRUWSQR2AYDlNdjmdyTqJBe5CO91cswO60hk7FAHjGlu48zZgco4oDD7tLn9BSafwh9uxhNjhTXsWbeMO1t4DJ8dAZd6BGThjcfLCPFawfBfSvTNt73Cq6AkmRMSRdxnwPjAYbL3Cm4wPRJ2sKK5raRuC1MNY126QD7GzDugGoOYgF0cTckbeu6VaofNMkqyDqBdOVpQPXHZX68wbqTKoKsv3x95vxWAXVN4TyjtsWySFPKIlbQhojHJ3c975jlCo3rhyCquVnm9FiYnCY80aRqG0ABcSAlqUHQljbHPRqzVaIKXWCheEps4cSdb4RNRHyyrf84380almWd9DmTqfCOtJE3KI8dcuptL7KEtJe87Ggfls04N1syrYEY0ZOg6l4EQBc2J3aM5jquilOc2yeGtoPdb2dyv8mScjtE4msoH9GNrjZOMHDX3i9okVVwqnyqJhADSg5EzK5EpEfG7yxKgcYjrs9pOAnNs5OzhWlf6YcZ4VhflXm4amuY5Tf7AbhuxOjxiFglR1oxPLRMOBqEOlYwTe1Hb7u4f3pm4fX5w0GczNdojb7IffhYwZ2zju8gQEidySAoxgWQAiZmSxeJ21WCE4YcDulyL7NeqSeHN3tdmNfMQ9nwkjm8geu40n3YZBeNJZkuLlNvrxx0Ah8lVBuAC6BfWPkcKWsUQYX8rShiWIbHJJrYzO0D27TGWPZE1rbL8z4zuqebzAh5ZDy2JtQ217FvlFOTol0kJVjB6QE9Sm7P4jEKG4RNSdKO8Ixb6dAETYOKSoC3iFvjHyykZjovvDCACkUneTLgVhrAL8Cminu4Vsbsmz8SMHtVv3MTE6TlWJMksOT591LeJqLzdwc7gWZYRczzbY5btLbVyCUKkCShiiz8NU5xlLNtl0bXfxPrxyskz4ZzMZEFNM5ryjJUasT3pSnuYjcwdEK5ri6FaIArc24MWcmWJivb8NyLyabrksePJORoLbHRRnMbMmYM49JZPcUa2Hf7Jc4zIn0Mf4emMXWpKJPuR9qkhIIMaA8cWM9T3mgprc8Unb4LGqhDXptyr1dmEYKPZRSZv0ILKZSeSWZVYTZ1HWoSlVZFg5EiKj9QFBtthcCZdrz3sc8ugcdRboHXGEH5SSbfZVM9cy9JOtL2nO2ZDmMUx3dSZm8zXIBL5Upom8Z1PHkGIvXa8qgh4bJDzSTIyrLL9TCHSCiltWW2MzDkcZu0eBKyKrpJ9jiqnvLgBN6tsuDkhtBKk1IYi9MYEYE6YjiRSQhZubtUVYKopCpRM4igc8StqsjSRvZShI4ei2cYCW0yKQmhKq6QeiJFECrUHXrZK0fE3eAuXNunXrPcXK8daorknhmyOIXO2xa1CtEwCwAd4cfDx8sJ3RZIdSwnKTLqMQN38ovShWkiIGpFMpTi4LUYl32DMKwjodXGHslYJaSxv02qTo2kheqhcfqlQrEQQJ5tvdmLKGaoCKYiwAoVHgDS6QD6q21YTGzRzAYcdNR7RwVwY0LBH6Jd7hHRriexAG8uvEgLRdKEvVoAsowVVsgfaKNSQBQgV6bAHgVZYobCPbsY3OCbVxSyjcbWCtJ4u5dtpFrRE05hulVBaXXw24tsoT1139nWeiDllBxqFCOlWphpb77PhzUPegsI7YDPKiXFqQB7PsAvhHVNm9X04FN4PPHTRchrRmc8BQ1xfP36mVCMTN1W5zJovdFyvd5EG4kymc456KMefh7oOCZZP0JVDc4WiNmK1pdN492zBOTY1Jbsi5HQXSs5lpYHlH56AdvaAbeLSt1izQ2Lbk5V6UgiBL5AwgZmHRkdhAcZYW0elh5CjdKT1cg1ngJMbSHKl756Pyioc73ZQvdXOuwHK7ClMQ6VuNdJnoy0sExffYE1AKO4ORHJnbXwjqZnmNJuYP6FR3vzCv4OXgzg6sh24oy9ofUYX6VMdi90UabMOcA9YIzfTKCtA9dxgvVHnUigz97dkthRKk71br0wJTsv01Sk9KjD6CdNSoWaB1tm3T6ADV3dA9DKj2oX1YsFovUgcDtmkhjmC7hdKi6Up4PeIYlfvg8B3cZtncNy9bhWHHtOw1PtKLTcgPpJyW2a8SjsyNxnLLtJosf6gJLrPyYAA55q87bDTYS6iUHriTnrjlRTPGNqWRMsPWLRxKboaHKnCC8RBPAFOgiJhNROJ2rGDWrKSWnaedxVczwoCswxbBmcCwr6YysIgDih5ewdgZvCWaR4gakVHyyMcVsFhOaR6UZRV4cXcREanMAbMPNnyrZXCybbVua5kgXMKbYehmM8qZZi8rhkbcjquAFQnEUjWYhL4vZk2PR8wk0TIMkRx0NWSAR8PeJkRx03JAj8aSsgMWtxCfX1yMR02IXYT5kEMKws1Cq8bUG1NMYfkzgcgKJeHpvZzWPmhhqSKUOZnqvng5RgkUjygToobecX80JbjD7rwezbeZaLy4Irl82fX1FUyrSu5xEwVTRpnkGG6BMLKqDdUhuTu7ZcmztjDuxP0w4MaE2yrbbkctbNsaHTEtIzjFbKdocRZvQjJlW2bn3uYKiGXoVo806B49oRmZToPgsUJ8i3wBXRDnfscilqCMfjsOcXo8i2PjLvXJbUZSlJbWeDAVzyBUrYQamdcZlKM1Aqnte2hwqus2J9fq4PbIxhXgDt2z6pEqhPHiiAdzoesKxW38g22DFSrnYwryekslCqlQhVkdAkoDULSlR5gz4SQyVobvWalGpAo2MEwmLaT70qItgCHv0TVGvmQRyTzK201LGrd8slwwmHl9yYiQgu2nrNC3WvetveHdAP7UlhnY1CtNG9tQFHzmefW6ohg9v0
END_OF_TEXT

Use the slow regex on it and note how slow it is:

text.gsub(/(?:([\w+.-]+):\/\/|(?:www\.))[^\s<]+/).to_a

Use the fast regex and note how fast it is:

text.gsub(/[^\w+.-](?:([\w+.-]+):\/\/|(?:www\.))[^\s<]+/).to_a

I figured out that this problem is specific to the type of data I used in the example (not a lot of spaces). If you run it against RFC 3986, which is much longer, both versions are equally fast.

like image 382
Yahooguntu Avatar asked Apr 28 '15 21:04

Yahooguntu


1 Answers

The first pattern is slow because it starts with an alternation and the first branch of the alternation is very permissive since it allows any number of words characters or dots or hyphens. Consequence, this alternation takes a lot of time/steps before failing.

The second pattern is faster because (?:[^\w+.-]|^) (that is an alternation too) works like a kind of anchor. Indeed, even it is an alternation, it is quickly tested because the first branch matches only one character and the second is a zero-width assertion. So it takes less time/steps to fail. (in particular because it must be followed by a word character or a dot or an hypĥen, that is a binding condition)

But you can write this pattern in a better way. Since your are looking for urls, you can be more precise for the begining: the url can begin with, lets say, "http", "ftp", "sftp", "gopher", "www" (feel free to add other schemes if needed). So you can describe the start with:

(?:https?:\/\/|ftp:\/\/|sftp:\/\/|gopher:\/\/|www\.)

To limit the cost of the alternation (5 branches to test at each positions in the string) you can use two tricks:

  • you can use a word boundary to quickly skip positions that are not the start or the end of a word:

    \b(?:https?:\/\/|ftp:\/\/|sftp:\/\/|gopher:\/\/|www\.)

  • you can add a lookahead with the first letter of each branches, to quickly avoid uneeded positions in the string without to test the five branches:

    \b(?=[fghsw])(?:https?:\/\/|ftp:\/\/|sftp:\/\/|gopher:\/\/|www\.)

So you can write a more efficient pattern like this:

/\b(?=[fghsw])(?:https?:\/\/|ftp:\/\/|sftp:\/\/|gopher:\/\/|www\.)[^\s<]+/

In short: a pattern is efficient when it fail fast at bad positions in the string.

An other possible design that uses more memory and needs to check if the capture group exists for each match, but that is faster:

/[^ghsfw]*+(?:\B[ghsfw][^ghsfw]*)*+|\b((?:https?:\/\/|ftp:\/\/|sftp:\/\/|gopher:\/\/|www\.)[^\s<"&]+)/

(the idea is to divide the pattern in two main branches, the first one describes all that you want to avoid, and the second describes the urls. The effect is quick jumps to key positions in the string)

Note: when patterns begin to be long, you can use the free-spacing mode (or comment mode...) for readability and maintainability:

/(?x)
\b (?=[fghsw])
(?:
    https?:\/\/  |
    ftp:\/\/     |
    sftp:\/\/    |
    gopher:\/\/  |
    www\.
)
[^\s<]+/

or you can use a formatted string and a join as suggested by Cary Swoveland in comments.

like image 181
Casimir et Hippolyte Avatar answered Nov 06 '22 22:11

Casimir et Hippolyte