Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Issue with Str_Replace

I'm a beginner programmer making a fairly simple scrape-website and storing information in a mysql database privately to learn more about programming.

Here's the code I am trying to scrape:

<li id="liIngredient" data-ingredientid="3914" data-grams="907.2">
                <label>
                    <span class="checkbox-formatted"><input id="cbxIngredient" type="checkbox" name="ctl00$CenterColumnPlaceHolder$recipeTest$recipe$ingredients$rptIngredientsCol1$ctl01$cbxIngredient" /></span>
                    <p class="fl-ing" itemprop="ingredients">
                        <span id="lblIngAmount" class="ingredient-amount">2 pounds</span>
                        <span id="lblIngName" class="ingredient-name">ground beef chuck</span>

                    </p>
                </label>
            </li>

<li id="liIngredient" data-ingredientid="5838" data-grams="454">
                <label>
                    <span class="checkbox-formatted"><input id="cbxIngredient" type="checkbox" name="ctl00$CenterColumnPlaceHolder$recipeTest$recipe$ingredients$rptIngredientsCol1$ctl02$cbxIngredient" /></span>
                    <p class="fl-ing" itemprop="ingredients">
                        <span id="lblIngAmount" class="ingredient-amount">1 pound</span>
                        <span id="lblIngName" class="ingredient-name">bulk Italian sausage</span>

                    </p>
                </label>
            </li>

After scraping the data, I am trying to use str_replace to get rid of everything but the (using the first example) 2 pounds ground beef (or 1 pound bulk Italian sausage in the second example) .

Here's my attempt:

$ingredients = str_replace('#<label>\s<span class="checkbox-formatted"><input id="cbxIngredient" type="checkbox" name=".*?" /></span>\s<p class="fl-ing" itemprop="ingredients">\s#', null, $ingredients);
              echo $ingredients;

Which in theory, should remove everything to the span id=lblIngAmount part. Where am I going wrong? The text stays the same after and before the str_replace. How come?

Thanks for any and all help! If you need any more details, I'll be glad to give them!

like image 475
Muhambi Avatar asked May 11 '26 03:05

Muhambi


2 Answers

Don't use regex to parse HTML.

See How to parse HTML.

Regex would work in this specific case, but since this is a learning project, you want to do it right.

like image 70
Sylverdrag Avatar answered May 13 '26 15:05

Sylverdrag


You want to use preg_replace() however you should not really be using regular expression to manipulate HTML. Use PHP's DOMDocument instead.