I recently left Twitter. The reasons why probably need little explanation for a contemporaneous reader in the know.
In short, Mastodon is now my main short-form writing outlet. What Mastodon didn’t have though… was bat photos.
Bats of Twitter
Twitter has a lot of accounts that post bat photos, be they for education, charity or just for bat enthusiasts like myself. Think your Bat Conservation International, your Bat World Sanctuary, your Occassional Bat.
Very relatable.
Having a steady stream of them little critters having a nice time throughout the day, it was lovely.
Bats aren’t the only animals to get this treatment. Do you want photos of capybaras? How about fennec foxes? Possums?
Unfortunately, Twitter’s looming API changes will likely see the end of these kinds of accounts, and many are yet to make the move over to the fediverse. My stream of bats was drying up, and I was big sad.
Then I remembered that I know how to code things.
Building the bat bot
I have a bunch of prior experience using Mastodon’s API. I’d used it before for random side projects like Rainbow Dashboard, Just One Toot, and a Mastodon-based TweetDeck clone that I was working on until someone beat me to it.
Using Mastodon’s API is, remarkably… simple.
And so was writing bat-bot. To be fair, I cheated a little and used a Node wrapper around the Mastodon API, but pretty much identical functionality can probably be achieved using JavaScript’s Fetch API without much more code.
All-in-all it took about 20 minutes to have a working prototype that could pick a random image, assemble metadata about it into a human-readable format, upload and post it to Mastodon. It is remarkably simple. It took longer to get my botsin.space registration approved.
For years, on and off, I’ve had a Twitter bot named 9000_beeps that took my regular tweets and remixed them into something new. This wouldn’t be my first rodeo managing an automated account, which means I knew exactly what I hated about doing it: keeping it running.
Having a program that runs from now until the end of time is a pain in the ass when it comes to resource management, garbage collection and not accidentally overstepping usage limits.
My Raspberry Pi would usually die after a few weeks of running 9000_beeps, crashing from the built-up load of unrecoverable memory usage. I wanted to avoid problems like this altogether, so I instead turned to a recently made friend: GitHub Actions.
With Actions, the posting script could be turned on once an hour, do its thing, and get turned off again. I wouldn’t even need to host it, GitHub would handle everything but the actual code. We’ll get back to how that went in a bit.
Eternal gathering of the spotless bats
With that done, now came the awfully time-consuming part: collating content.
I knew already from past side projects—namely my cartoon-themed placeholder image services placeponi.es and placeholder.rocks—that collating graphics is incredibly tedious.
Bat bot would be no exception.
If anything, it would be worse. For cartoons, every image would reliably be under the same copyright. Fan wikis and sites would have already collated galleries of the stuff that could be downloaded fairly easily. (Unless they were on Fandom. Screw Fandom.)
This would not be the case here. Each photo would need individual attribution, and I couldn’t just scrape online media galleries like Wikimedia Commons or the Encyclopedia of Life because I’d decided to have standards:
Principle 1: Living
First, photos must depict living, healthy bats. This is a celebration of a lot of wonderful, often endangered, species. I don’t want to include dead or sick bats, taxidermy models, or those infected by white-nose syndrome.
Principle 2: Loved
Second, many photos of bats are of them being in unnatural situations, such as being inside cages, in human settlements or being held by people. Sometimes this is justified, such as when a bat is being transported, studied or is a rescue. Other times, however, this can be evidence of mistreatment or exploitation.
Where possible, I want to ensure that the bat is being handled responsibly by people who know what they’re doing. This is hard to judge and impossible to be sure of, but if the photo is of a known biologist cautiously holding a bat in a darkened cave, that’s a better sign than it being a tourist danging a bat from their fingers in the middle of a sunny marketplace.
Principle 3: Licensed
Third, I wanted to be sure I could actually use all of the images. In the wild world of animal-posting Twitter, there’s usually very little regard given to proper attribution and licensing. That’s no good. I want my images to be public domain, licensed under Creative Commons, or otherwise useable under some permissive license.
This, unfortunately, excluded me from using the amazing works of Dr. Merlin Tuttle, probably the world’s pre-eminent bat biologist and photographer. He has a wonderful gallery, but at $10 per image, that’s an expense well outside the scope of this project.
I wish that I could say I wrote some awesome code to do all of this image gathering and data collection for me, but I didn’t. I spent hours collating bat photos from a variety of sources, manually documenting photographers and licenses along the way. That’s just life sometimes.
Teething problems
Bat bot, now renamed Hourly Bats and hosted at the imaginatively named @batstbatsbats went live early the next afternoon. And it went pretty well! Except…
Problem 1: GitHub’s cron queue
I’d set up my posting GitHub Action so that the posted once an hour, at the top of the hour. Technically I was sneaky, knowing that lots of people would put it at exactly the top of the hour, so I actually put it at 59 minutes past the hour to try and jump the queue. That’d show them, I thought.
It didn’t show them. It turns out that cron tasks on GitHub work more like a queue. You can request that a job happens at the top of the hour, but that isn’t when your code will run—that’s only when your code will get added to the queue, and that queue is frequently anywhere from 10 to 35 minutes long.
There isn’t a whole lot you can do about this. Maybe queue the task super early and somehow keep it running until it hits the time you actually want to post? Seems wasteful. I figured this was a minor enough problem to just ignore. Bat pictures are not time-sensitive, I just want one posted every hour or so.
Problem 2: Unstable server, unstable API
One hour, a bat picture didn’t get posted.
A quick check found that the API request had timed out. botsin.space had been under some load at the time, so a request timing out wouldn’t be out of the ordinary. I re-ran the posting action manually and it worked fine, so I just call this a fluke.
If I wanted to be entirely hands-off about it, I could add something that detects if the API request failed and retries it until it succeeds, but that’s one of those problems that sounds simple but is actually a rabbit hole of unexpected complexity. If I have anything misconfigured or botsin.space is having issues, I’m suddenly responsible for a Denial of Service attack on their servers, and that’s generally frowned upon.
Like the cron issue, I figured a rare missed post was probably not worth the complexity and resource-related risk of trying to fix it in code.
Problem 3: Repetition, repetition, repetition
There was quickly some repetition, which wasn’t unexpected. The photo pool was still pretty small and mathematically there would be repeats, that’s just how randomness is, but it wasn’t a good initial look. I briefly pondered ways of reducing repetition, perhaps by keeping track of what had recently been posted and rerolling if a repeat was chosen, but as the bot worked via GitHub Actions and was pretty much memory-less, I quickly realised that would stray from my original desire to not have to keep the darn thing running constantly.
I did make some changes, replacing the time-based seed with the random npm package, which promises some form of uniform distribution.
This is theoretically a ‘more random’ solution, but as it remains stateless, repetition is still statistically unavoidable. I figured I could live with repetition. It’s not as if everyone was going to see every single post, anyway, right?
Problem 4: The undocumented zone
Another issue occurred the next day where an image was rejected by the API because it was too large. Unusually, this wasn’t because of filesize, but because of the image’s ‘physical’ dimensions. In this case, the image was 5184×3456 pixels in size.
This was a bit weird, as Mastodon’s API docs make no mention of any such limitation, and the more ‘user-friendly’ user guide only mentions the default configuration’s 8 megabyte filesize limit.
It turns out that Mastodon does have a limitation on the dimensions of uploaded images, the (hardcoded!) MAX_MATRIX_LIMIT
setting, which is simply the pixel area of the image: width multiplied by height.
The limit is 16,777,216 pixels. My rejected bat photo was 17,915,904 pixels. Heck.
Mastodon resizes all uploaded imagery to 1920×1080, so shrinking down the source images wouldn’t have a tangible effect on the bot’s output anyway. I did that.
Still, this is an interesting omission in Mastodon’s developer documentation. Someone should fix that.
Problem 5: null
I wouldn’t expect that passing the null
keyword as a value to an API would actually post the word ‘null’, but apparently, Mastodon’s API does that. Pass an empty string instead. Now you know.
Build more bots!
I don’t really have a conclusion to this, but if I were to scrounge one from the depths of literary consciousness, it would be a call to action to build more cool, useful and creative Mastodon bots.
Mastodon’s API is just… really easy to use. You don’t need to do any jumping through authorisation hoops to get an access token, the usage limits tend to be quite liberal, there’s no paywall, and there’s no need to even register an account if you’re dealing entirely with public data.
(That said: There is an unwritten rule in the fediverse that content scrapers are not welcome. Many Mastodon users value their privacy and don’t take kindly to their posts being copied without explicit permission.)
Still, Mastodon is simple enough for all manner of fun little side-projects and it’s certainly a nice place to do that kinda thing. Dare I say, often nicer than Twitter ever was.
Anyway, go code cool shit.
1 comment
Thoughts? Questions? You can favourite, share or comment on this post by replying to it on Mastodon.