One of the things I realized after moving into industry is that, contrary to a widely held belief in Academia, it is important to stay on top of the latest research not only on the fields that directly affect my line of work but also on not-so-closely related areas. It is very difficult to do a competent job by relying exclusively on the tricks you learned in graduate school. I am still very much still a neophite in machine learning, but even for organizing my own head when I work through textbooks, I find it important to know what are the topics and questions to which people are currently paying attention in the research frontier. Besides, every day you find new problems at work and you never know where you are going to find that tip that is going to help you solve them.1 In short, reading abstracts of the newest research in statistics and machine learning cannot hurt.
To make the process easier, I spent my Saturday morning writing a Twitter bot
(@arXivStats) in Python that calls the
periodically and tweets all new papers that are publised in the
stat category. The structure is very simple: a class for
inputting data (
papers) that collects and parses the response of the API using
BeautifulSoup, and a class
for outputting data (
tweet) that transforms the dictionary returned by the
papers class that publishes a list of tweets through the
tweepy module. A cron job sitting on my Raspberry Pi
runs the script every 24 hours. Just another step to making my Twitter feed the
main source of scientific and technical literature.
The source code can be found here.
It is just uncanny how many things I have figured out just by taking a very quick look at an introductory textbook on statistical mechanics. ↩