Running the LDA algorithm on a Raspberry Pi: The resin.io way

Ilias is a new member to our team and an experienced data scientist. As is customary with any new member- he built a project using resin.io and gave us his thoughts on the experience: I’ve always wanted to play with a Raspberry Pi- mainly because of the many OS images out there that enable you…

Imgur

Ilias is a new member to our team and an experienced data scientist. As is customary with any new member, he built a project using resin.io and gave us his thoughts on the experience:

I’ve always wanted to play with a Raspberry Pi, mainly because of the many OS images out there that enable you to use your device as a lite Linux machine, a home entertainment center or whatever you like. However this was never a strong enough motivation for me to buy a Raspberry Pi.

What would be a strong motivation instead? To have a device, connected to the network, easily manageable from a web interface, ready to accept my code with simple push requests and update on the fly! Well, this is something that resin.io provides. And the whole procedure for the aforementioned setup is smooth and works like a charm, thanks to the detailed documentation.

Starting to familiarize myself with the resin.io platform, it took me a few minutes to connect my device to the platform and deploy a very simple project.
All you have to do is sign up and login to https://resin.io, download the appropriate image for your device, burn it to your SD card and then you are ready to launch your first application. The platform will identify your device and then give you access to a git repository, ready to push your code. The update and deployment to the device is then performed… “auto-magically”.

Deploying a simple project wasn’t enough though. The device that the Resin team kindly provided me with was a Raspberry Pi 2 model B coming with a quad core CPU and 1GB RAM.

Writing a program that would take advantage of the powerful Pi 2 capabilities to the fullest and deploying it easily with resin.io was the next logical step.

TL;DR: The task was to deploy a Machine Learning algorithm to the device that would read data and produce a specific result. To accomplish that, I started writing a Node.js application and deployed it to the Raspberry Pi. The whole process was literally a piece of cake since resin.io does all the heavy work for you.

In the spirit of testing a “heavy” Machine Learning algorithm, I chose to apply the Latent Dirichlet Allocation (LDA) to a bunch of recent tweets regarding mainly the Bitcoin crypto-currency and the situation in Eastern Europe and the Middle East.

LDA constitutes a “topic modeling” method, which means that is able to detect “abstract” topics from a collection of text documents. In this case, we treat each tweet as a single document, resulting in 10.000 documents that constitute the input dataset. All tweets are in English and no pre-processing was performed. Skipping the underlying assumptions of LDA, and for the sake of simplicity, we can treat LDA as a statistical method, which takes documents as input and produces a list of topics. Each topic is represented by a bunch of words of high probability. This means that those words are the best candidates for describing that topic.

I run the model for five topics and asked for the top five words for each one.
This process took a while. Even in modern workstations such a process would take some time to complete.

Creating this project as a Node.js application with the corresponding package.json file for defining dependencies and all required information and pushing the code to the device works magically with a simple push request via resin.io. I did not even have to transfer the dependent libraries to the device. Resin does this for you as well. So, I “git pushed” and waited…

Excitement, excitement, excitement! After the push request, and within minutes, I had the code deployed and executed on the Raspberry Pi! I had to wait a bit since it was a heavy algorithm. But when it started and finished 9 minutes later, I was simply impressed. Having a complex algorithm running on a device with technologies that a few years ago was just the realm of imagination and seeing the results on my web interface was simply fascinating!

For comparison reasons, I also used an older Raspberry model (B+) to perform the same task.
It took 30 minutes, which is normal, given that the B+ model comes with a slower CPU and the older ARMv6 architecture, compared to the ARMv7 of the Raspberry 2.

Here’s an example table showing five topics described by five words of highest probability:

To sum up, with resin.io, you can easily deploy your own application on your device and keep it up-to- date with a simple git push command. As for the project I described above, it was a big surprise for me to see one of my favorite algorithms running on a Raspberry Pi. What was particularly interesting was the fact that the algorithm managed to distinguish between the topics related to Bitcoin and European or Middle East crisis.

Regarding the performance, it seems that -at least- for the aforementioned experiment, the Raspberry Pi 2 is three times faster than its predecessor. What’s up next? I don’t know, since there are numerous possibilities. From connecting sensors and monitoring events, up to data mining and artificial intelligence…

You can also find Ilias project on Hackster.io. Happy hacking!

Have questions, or just want to say hi? Find the team on our community chat.


Posted

in

Tags: