GRASBOCK

Interpals (Social Media) Crawler


interpals logo

I needed a break from doing university stuff, so I wanted to try something. There is that platform called [interpals](https://www.interpals.net) which is about international chat. I still visit the site from time to time. It is also very easy to crawl! Someone [crawled 16k profiles already](https://www.reddit.com/r/dataisbeautiful/comments/31j66u/i_crawled_16k_profiles_on_interpalsnet_to_find/) in the past. He found out what the most blocked countries were.

I started out a bit smaller. I got 12k most recent profiles. The Search tab can only show me a maximum of 9999 profiles, so I would have to collect the most recent accounts every day, as well as the last logins, so that I can spot all the active users. When crawling by visiting peoples profiles I unfortunately ran into rate-limits. Nevertheless I got some experience of the process. Using rust was a good choice, because it really made sure my error handling is reliable and code sufficiently tested. That turned out to be really helpful when fixing bugs quickly.

Note: these stats are not representative of the entire interpals community. They are just the new joiners from the last two days.

The ratio of male to female is suprisingly equal: 52% male.

Indonesia seems to be by far the most common country.

countrynew joiners
Indonesia2322
Turkey993
United States824
Morocco517
Russia494
Brazil485
India410
Algeria334
Egypt272
Thailand255

More interesting would be to actually look at it in terms of “per city population” and then normalized.

The oldest person that (might not be pretending) was 80 years old.

About 2500 people join the site every day. This surprised me actually and made me reconsider whether it is worth the effort crawling the entire site.

When one visits profiles, they get a notification. Since the crawler visited about 2400 profiles I thus got plenty of attention in return:

Viewed 213 times today

I didn’t want to spend too much time on it now, but one could use multiple accounts to overcome rate limits and run the program on a server to continously monitor activity. One could also find a lot of people who might be inactive using the Search and then tactically filtering.