Long time ago, I created rank-es, a news aggregation site that extracted links from Menéame, a Spanish reddit-like site, re-ranked the posts using Twitter and Facebook shares metrics and used a simple linear function to age links, so the front page was constantly refreshing. I liked the idea of not having actual registered users, but instead use the massive amounts of information already provided by established social network sites to do the article ranking. There is no possibility of downvoting, just upvoting by sharing.
It was good for a while, until a change in Spain's Intellectual Property Law turned that experiment into a potential minefield:
The Spanish government has successfully passed a new copyright law which imposes fees for online content aggregators such as Google News, in an effort to protect its print media industry.
The new intellectual property law, known popularly as the “Google Tax” or by its initials LPI, requires services which post links and excerpts of news articles to pay a fee to the organisation representing Spanish newspapers, the Association of Editors of Spanish Dailies (known by its Spanish-language abbreviation AEDE). Failure to pay up can lead to a fine of up to €600,000.
So I called it quits. I closed the project, that was running on Google App Engine at the time, but kept the code public just in case someone wanted to continue the project.
Fast-forward about one year into the future. That is, today. I decided to try again, but this time trying to rank news directly from the sources. And adding a commenting system. In the end, it was not that hard.
The idea is simple: let's retrieve news from different sources, rank them in a certain way thanks to Twitter and Facebook, use a simple linear function to penalize old links (and remove them after one or two days) and then build a frontpage like this one.
I have tried to keep it as simple as possible. I have not used a database server. Instead, I am using Python's sqlite3
module and a single DB file to store both current links and expired ones.
In the current experiment there are four news sources:
- The New York Times, for widely read general news site.
- Phys.org, the science section.
- Wired, tech.
- The Intercept, a relatively small journal.
It is easy to see that one cannot work with absolute sharing metrics. At the time of this writing, with the items currently on my DB file, articles from the New York Times are shared 4990 times on average, but Wired articles only 580. If we use only those numbers, The Gray Lady would take most or all the links on the front page. As a normalization step, I have simply divided by the mean sharing score per site. Something a bit more elaborated could have been used (z-scores, for instance), but this is good enough. Basically, it helps building a front page that lists the most relevant articles from each site.
Also, thanks to Disqus, it is possible to add comments for each news article on the front page. This is optional and depends on the configuration settings.
html_generator.py
simply writes static HTML pages. You can run it directly on the server or locally and then copy the result.
If you are interested in this, you can download the source code. Feel free to fork it and send me whatever improvements you can think about. The main github page for this project includes a README file with the necessary instructions to get it up and running. Enjoy!
This, of course, is hardly a replacement for sites like reddit. One of the best things about reddit is that you can discover stuff that you would not have found otherwise. However, if you want to quickly build a forum for commenting current issues (and you have a good bunch of sources that will provide you with relevant news), this can be a good option.