Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Remove "\n\r\n " in BeautifulSoup In Output

Tags:

python-3.x

I have a code like This

from bs4 import BeautifulSoup
import requests
import re

page = open('doc1.html','rb').read()
soup = BeautifulSoup(page,'lxml')
# print(soup.prettify())

# eng = soup.find_all(string = re.compile("righteou"))
# print(eng)

# heb = soup.findAll('p',{'dir':'RTL'})
# print(heb)
list=[]
all_tr =soup.findAll('tr')
for td in all_tr:
    all_td = soup.findAll('td')
    d={
    'hob':all_td[0].text.strip(),
    'english':all_td[1].text.strip()

        }
    list.append(d)
print(list)

My Output is like This

[{'hob': 'עִנְיָן שֶׁנִּיתְּנָה הַתּוֹרָה עַל הַר סִינַי דַּוְקָא,', 'english': '\n\r\n                    We need to understand\r\n                    \r\n      
              the idea that the Torah was given specifically on Mount\r\n                        Sinai,\r\n                    '}, {'hob': 'עִנְיָן שֶׁנִּיתְּנָה הַתּוֹרָה עַל הַר סִינַי דַּוְקָא,', 'english': '\n\r\n                    We need to understand\r\n                    \r\n                        the idea that the Torah was given specifically on Mount\r\n                        Sinai,\r\n                    '}, {'hob': 'עִנְיָן שֶׁנִּיתְּנָה הַתּוֹרָה עַל הַר סִינַי דַּוְקָא,', 'english': '\n\r\n                    We need to understand\r\n                    \r\n                        the idea that the Torah was given specifically on Mount\r\n                        Sinai,\r\n                    '}, {'hob': 'עִנְיָן שֶׁנִּיתְּנָה הַתּוֹרָה עַל הַר סִינַי דַּוְקָא,', 'english': '\n\r\n                    We need to understand\r\n                    \r\n                        the idea that the Torah was given specifically on Mount\r\n                        Sinai,\r\n                    '}, {'hob': 'עִנְיָן שֶׁנִּיתְּנָה הַתּוֹרָה עַל הַר סִינַי דַּוְקָא,', 'english': '\n\r\n                    We need to understand\r\n                    \r\n                        the idea that the Torah was given specifically on Mount\r\n                        Sinai,\r\n                    '}, {'hob': 'עִנְיָן שֶׁנִּיתְּנָה הַתּוֹרָה עַל הַר סִינַי דַּוְקָא,', 'english': '\n\r\n                    We need to understand\r\n                    \r\n                        the idea that the Torah was given specifically on Mount\r\n                        Sinai,\r\n                    '}, {'hob': 'עִנְיָן שֶׁנִּיתְּנָה הַתּוֹרָה עַל הַר סִינַי דַּוְקָא,', 'english': '\n\r\n                    We need to understand\r\n                    \r\n                        the idea that the Torah was given specifically on Mount\r\n                        Sinai,\r\n                    '}, {'hob': 'עִנְיָן שֶׁנִּיתְּנָה הַתּוֹרָה עַל הַר סִינַי דַּוְקָא,', 'english': '\n\r\n                    We need to understand\r\n                    \r\n                        the idea that the Torah was given specifically on Mount\r\n                        Sinai,\r\n                    '}, {'hob': 'עִנְיָן שֶׁנִּיתְּנָה הַתּוֹרָה עַל הַר סִינַי דַּוְקָא,', 'english': '\n\r\n                    We need to understand\r\n                    \r\n                        the idea that the Torah was given specifically on Mount\r\n                        Sinai,\r\n                    '}, {'hob': 'עִנְיָן שֶׁנִּ...................................................................................................................................................................................................................................................

iI Wannt To Remove \n\t From Output Suh That my File Will Be cleaarr ..How Can I do This???????


1 Answers

Split the words and join them with a space.

'english':" ".join(all_td[1].text.split())

This removes all "\n" , "\r", " ".

like image 95
Nihal Sangeeth Avatar answered Nov 29 '25 01:11

Nihal Sangeeth



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!