I have a suspicion that this is related to the parser that BS will use to read the HTML. They document is here, but if you're like me (on OSX) you might be stuck with something that requires a bit of work:
You'll notice that in the BS4 documentation page above, they point out that by default BS4 will use the Python built-in HTML parser. Assuming you are in OSX, the Apple-bundled version of Python is 2.7.2 which is not lenient for character formatting. I hit this same problem, so I upgraded my version of Python to work around it. Doing this in a virtualenv will minimize disruption to other projects.
If doing that sounds like a pain, you can switch over to the LXML parser:
pip install lxml
And then try:
soup = BeautifulSoup(html, "lxml")
Depending on your scenario, that might be good enough. I found this annoying enough to warrant upgrading my version of Python. Using virtualenv, you can migrate your packages fairly easily.
'Python' 카테고리의 다른 글
ubuntu에 anaconda 설치 (0) | 2019.05.04 |
---|---|
jupyter notebook nameerror 문제 변수 셀 넘기기 오류 (0) | 2019.04.15 |
jupyter에서 no module named 'tensorflow' 에러 (0) | 2019.04.07 |
쥬피터 노트북 (Jupyter Notebook) (0) | 2019.04.07 |
Tensorflow 텐서플로우 설치 (0) | 2019.04.07 |