Ultimate link preview in Python
Introduction
Recently, I have stumbled upon the problem of link preview. Friends of mine are creating a website that showcases their favourite design inspiration and tools.
Similar to this link preview, on the website they want to show other websites that they are fond off.
However you can already see a problem: There is no picture of the website in this link. It should look like this:
So to do this, do we really need to take a screenshot of each website by hand? Is there a better way to do it?
Get the full code here: https://github.com/lxve-commits/link-preview
Problem dimensions
How to get website description
How to get website description in python
Starting off, I searched the web for python crawlers and came across the requests module together with beautifulsoup: https://github.com/hackersandslackers/beautifulsoup-tutorial/tree/master/scraper
I simply modified the code to add exception handling:
Take image of how website looks like to actual people in python
To continue, I then researched ways to add an image to link preview. Some websites already include images for link-preview like this one:
However in the website we are building, we want to show how other websites actually look like:
One way to take screenshots automatically is to use selenium webdriver. This tool actually opens up a window in your own browser (Kind of freaky to be honest).
So next I researched a way to get the links out of our discord channel and implemented selenium webdriver in python to take screenshots.
Please change the window_size according to the format of the screenshots you want to take.
If you only need to take screenshots off small pages, you can decrease the page_load_timeout to speed up the program.
Avoiding cookie messages
Lastly, as we want everything to look clean, we need to avoid annoying cookie banners like this one:
What I came up with to do this is iterating through every button in the page and seeing if it would match a confirmation message.
Hint: You can delete languages from the cookie_shorthand_matches or add your own ones.
The matching works on some pages really well but there are some edge cases where a website would require the user to first accept and then save settings, which this program does not yet deal with.
Scraping discord messages using python
I have found this awesome tutorial on how to get the contents of one page:
To get all the links however, it was necessary to iterate through the channel using the last message id.
To get the full chat of a discord channel, use the python code below:
In future, I might extend this project to automatically creating entries in the webflow website and running it in the cloud.
I hope you have enjoyed the tutorial :):)
Check out the full code of this project at https://github.com/lxve-commits/link-preview.git