Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

preg_match find and replace a string pattern

I have a wordpress database which has some embeded iframes from sound cloud. I want the the iframes to be replaced with some sort of shortcode. I have even created a shortcode and it works pretty good.

The problem is that i have an old database with approx 2000 posts which has already embeded codes. What i want to do is to write a code so that it would replace the iframe with the shortcode.

Here is the code which i am using to find the url from the content but it always returns blank.

$string = 'Think Kavinsky meets Futurecop! meets your favorite 80s TV show theme song and you might be pretty close to Swedish producer Johan Bengtsson\'s retro project, <a href="https://soundcloud.com/daataa"><strong>Mitch Murder</strong></a>. Title track, "The Touch," is genuinely lighthearted and fun, crossing over from 80s synth work into a bit of French Touch influence; also including a big time guitar solo straight out of your dad\'s record collection. B-side "Race Day" could very easily be the soundtrack to a video montage of all of your favorite beach scenes from every 80s movie you\'ve ever watched, or as the PR put it, "quite possibly a contender to be the title screen music to a Wave Race 64 sequel." Sounds awesome to me. Also included in this package out today on <a href="https://soundcloud.com/maddecent/">Mad Decent</a>\'s Jeffree\'s sub-label are two remixes of the A-side from Lifelike and Nite Sprite. Download below.
<iframe src="https://w.soundcloud.com/player/?url=http%3A%2F%2Fapi.soundcloud.com%2Fplaylists%2F8087281&amp;color=000000&amp;auto_play=false&amp;show_artwork=true" frameborder="no" scrolling="no" width="100%" height="350"></iframe>';

preg_match("/url=(.*?)/", $string, $matches);

print_r($matches);

The above code doesn't work and i am not so familiar with regex so if any one can figure out what is wrong here then it would be great. And also if anyone can guide me the right process to do this then that would be great.

like image 654
Nirmal Ram Avatar asked Aug 13 '13 12:08

Nirmal Ram


3 Answers

For a one-time fix, you might consider an SQL solution. Some assumptions with the following SQL:

  • There is only ONE iframe per post to be replaced (the SQL can be run multiple times if there are posts with more than one iframe).
  • The iframes to be replaced ALL are in the form:

<iframe src="https://w.soundcloud.com/player/?url="..." other-stuff</iframe>

  • All you care about is what's between the quotes for the url parameter
  • The end result is [soundcloud url="..."]

If all of this is true, then the following SQL should do the trick. It can be tweaked if you want a different shortcode, etc.

Be sure to backup your wp_posts table before performing ANY mass update.

CREATE TABLE wp_posts_backup SELECT * FROM wp_posts
;

Once the backup is complete, the following SQL should fix all of your posts in a single shot:

UPDATE wp_posts p

   SET p.post_content = CONCAT( SUBSTRING_INDEX( p.post_content, '<iframe src="https://w.soundcloud.com/player/?url=', 1 )
                               ,'[soundcloud url="'
                               , REPLACE( REPLACE(
                                 SUBSTRING_INDEX( SUBSTR( p.post_content
                                                        , LOCATE( '<iframe src="https://w.soundcloud.com/player/?url=', p.post_content ) + 50
                                                        )
                                                , '&amp;', 1
                                                )
                               , '%3A', ':' ), '%2F', '/' )
                               ,'?'
                               ,SUBSTRING_INDEX( SUBSTR( p.post_content
                                                       , LOCATE( '<iframe src="https://w.soundcloud.com/player/?url=', p.post_content ) + 50
                                                       + LOCATE( '&amp;', SUBSTR( p.post_content
                                                                                , LOCATE( '<iframe src="https://w.soundcloud.com/player/?url=', p.post_content ) + 50
                                                                                )
                                                               ) + 4
                                                       )
                                               , ' ', 1
                                               )
                               ,']'
                               ,SUBSTR( p.post_content, LOCATE( '</iframe>', p.post_content ) + 9 )
                              )

 WHERE p.post_content LIKE '%<iframe src="https://w.soundcloud.com/player/?url=%</iframe>%'
;

I would suggest you TEST a few posts before running this against all of them. An easy way to test would be to add the following to the WHERE clause above (immediately before ';') changing '?' to the Post ID(s) to be tested.

AND p.ID IN (?,?,?)

If for any reason you need to restore your posts, you can do something like:

UPDATE wp_posts p
  JOIN wp_posts_backup b
    ON b.ID = p.ID
   SET p.post_content = b.post_content
;

One other thing to consider. I wasn't sure if you wanted to pass on the parameters that are currently a part of the url, so I included them. You can easily remove them by changing:

                               ,'?'
                               ,SUBSTRING_INDEX( SUBSTR( p.post_content
                                                       , LOCATE( '<iframe src="https://w.soundcloud.com/player/?url=', p.post_content ) + 50
                                                       + LOCATE( '&amp;', SUBSTR( p.post_content
                                                                                , LOCATE( '<iframe src="https://w.soundcloud.com/player/?url=', p.post_content ) + 50
                                                                                )
                                                               ) + 4
                                                       )
                                               , ' ', 1
                                               )
                               ,']'

to:

                           ,'"]'

resulting in:

UPDATE wp_posts p

   SET p.post_content = CONCAT( SUBSTRING_INDEX( p.post_content, '<iframe src="https://w.soundcloud.com/player/?url=', 1 )
                               ,'[soundcloud url="'
                               , REPLACE( REPLACE(
                                 SUBSTRING_INDEX( SUBSTR( p.post_content
                                                        , LOCATE( '<iframe src="https://w.soundcloud.com/player/?url=', p.post_content ) + 50
                                                        )
                                                , '&amp;', 1
                                                )
                               , '%3A', ':' ), '%2F', '/' )
                               ,'"]'
                               ,SUBSTR( p.post_content, LOCATE( '</iframe>', p.post_content ) + 9 )
                              )

 WHERE p.post_content LIKE '%<iframe src="https://w.soundcloud.com/player/?url=%</iframe>%'
;

Updated to allow for no parameters in the url

UPDATE wp_posts p

   SET p.post_content = CONCAT( SUBSTRING_INDEX( p.post_content, '<iframe src="https://w.soundcloud.com/player/?url=', 1 )
                               ,'[soundcloud url="'
                               , REPLACE( REPLACE(
                                 SUBSTRING_INDEX(
                                     SUBSTRING_INDEX( SUBSTR( p.post_content
                                                            , LOCATE( '<iframe src="https://w.soundcloud.com/player/?url=', p.post_content ) + 50
                                                            )
                                                    , '&amp;', 1
                                                    )
                                                , '"', 1
                                                )
                               , '%3A', ':' ), '%2F', '/' )
                               ,'"]'
                               ,SUBSTR( p.post_content, LOCATE( '</iframe>', p.post_content ) + 9 )
                              )

 WHERE p.post_content LIKE '%<iframe src="https://w.soundcloud.com/player/?url=%</iframe>%'
;

Good luck.

like image 24
gwc Avatar answered Nov 07 '22 14:11

gwc


Since you're working with HTML here, I would recommend using DOM functions:

$doc = new DOMDocument;
$doc->loadHTML($string);

foreach ($doc->getElementsByTagName('iframe') as $iframe) {
    $url = $iframe->getAttribute('src');
    // parse the query string
    parse_str(parse_url($url, PHP_URL_QUERY), $args);
    // save the modified attribute
    $iframe->setAttribute('src', $args['url']);
}

echo $doc->saveHTML();

This outputs the full document, so you would need to trim it down:

$body = $doc->getElementsByTagName('body')->item(0);
foreach ($body->childNodes as $node) {
    echo $doc->saveHTML($node);
}

Output:

<p>Think Kavinsky meets Futurecop! meets your favorite 80s TV show theme song and you might be pretty close to Swedish producer Johan Bengtsson's retro project, <a href="https://soundcloud.com/daataa"><strong>Mitch Murder</strong></a>. Title track, "The Touch," is genuinely lighthearted and fun, crossing over from 80s synth work into a bit of French Touch influence; also including a big time guitar solo straight out of your dad's record collection. B-side "Race Day" could very easily be the soundtrack to a video montage of all of your favorite beach scenes from every 80s movie you've ever watched, or as the PR put it, "quite possibly a contender to be the title screen music to a Wave Race 64 sequel." Sounds awesome to me. Also included in this package out today on <a href="https://soundcloud.com/maddecent/">Mad Decent</a>'s Jeffree's sub-label are two remixes of the A-side from Lifelike and Nite Sprite. Download below.
<iframe src="http://api.soundcloud.com/playlists/8087281" frameborder="no" scrolling="no" width="100%" height="350"></iframe></p>
like image 178
Ja͢ck Avatar answered Nov 07 '22 14:11

Ja͢ck


This should do for what you've specified

$new_string = preg_replace('/(?:<iframe[^\>]+src="[^\"]*url=([^\"]*soundcloud\.com[^\"]*))"[^\/]*\/[^\>]*>/i', '[soundcloud url="$1"]', $string);

It's limited to iframes with the url=…soundcloud… part in the src attribute and replaces the entire iframe code with [soundcloud url="{part after url=}"]

like image 41
vollie Avatar answered Nov 07 '22 13:11

vollie