Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Issue with Str_Replace

I'm a beginner programmer making a fairly simple scrape-website and storing information in a mysql database privately to learn more about programming.

Here's the code I am trying to scrape:

<li id="liIngredient" data-ingredientid="3914" data-grams="907.2">
                <label>
                    <span class="checkbox-formatted"><input id="cbxIngredient" type="checkbox" name="ctl00$CenterColumnPlaceHolder$recipeTest$recipe$ingredients$rptIngredientsCol1$ctl01$cbxIngredient" /></span>
                    <p class="fl-ing" itemprop="ingredients">
                        <span id="lblIngAmount" class="ingredient-amount">2 pounds</span>
                        <span id="lblIngName" class="ingredient-name">ground beef chuck</span>

                    </p>
                </label>
            </li>

<li id="liIngredient" data-ingredientid="5838" data-grams="454">
                <label>
                    <span class="checkbox-formatted"><input id="cbxIngredient" type="checkbox" name="ctl00$CenterColumnPlaceHolder$recipeTest$recipe$ingredients$rptIngredientsCol1$ctl02$cbxIngredient" /></span>
                    <p class="fl-ing" itemprop="ingredients">
                        <span id="lblIngAmount" class="ingredient-amount">1 pound</span>
                        <span id="lblIngName" class="ingredient-name">bulk Italian sausage</span>

                    </p>
                </label>
            </li>

After scraping the data, I am trying to use str_replace to get rid of everything but the (using the first example) 2 pounds ground beef (or 1 pound bulk Italian sausage in the second example) .

Here's my attempt:

$ingredients = str_replace('#<label>\s<span class="checkbox-formatted"><input id="cbxIngredient" type="checkbox" name=".*?" /></span>\s<p class="fl-ing" itemprop="ingredients">\s#', null, $ingredients);
              echo $ingredients;

Which in theory, should remove everything to the span id=lblIngAmount part. Where am I going wrong? The text stays the same after and before the str_replace. How come?

Thanks for any and all help! If you need any more details, I'll be glad to give them!

like image 475
Muhambi Avatar asked May 11 '26 03:05

Muhambi


2 Answers

Don't use regex to parse HTML.

See How to parse HTML.

Regex would work in this specific case, but since this is a learning project, you want to do it right.

like image 70
Sylverdrag Avatar answered May 13 '26 15:05

Sylverdrag


You want to use preg_replace() however you should not really be using regular expression to manipulate HTML. Use PHP's DOMDocument instead.


Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!