1,000,000 Observations with Sounds on iNaturalist!

Last week, iNaturalist hit an exciting milestone—1,000,000 observations with sound!

While this is a small fraction compared to the number of image-based observations, it’s a significant contribution to global biodiversity monitoring. In fact, iNaturalist is now the second-largest provider of sound recordings to the Global Biodiversity Information Facility (GBIF) over the past decade. While initiatives like Macaulay Library at Cornell Lab of Ornithology and FrogID at the Australian Museum also contribute vast sound-generated point records to GBIF, the datasets on the graph below uniquely share with GBIF the sound recordings themselves.

The Growing Role of Sound on iNaturalist

Sound is becoming an increasingly important tool for biodiversity documentation on iNaturalist. Here's how it's being used and our vision for the future.

Using iNaturalist to Record and Annotate Sounds: Case Study from Panama

To explore how iNaturalist is helping record and annotate sounds, we spoke with Brian Gratwicke (@briangratwicke), a long-time iNaturalist user and amphibian conservation lead at the Smithsonian’s National Zoo and Conservation Biology Institute. In Altos de Campana National Park, Panama, where amphibian populations have been devastated by chytrid fungus, Brian and his colleagues Roberto Ibáñez (@ibanezr) and Jorge Guerrel (@jorge_guerrel) have been using sound recordings to make remarkable discoveries.

Recently, the team rediscovered and recorded calls from the Boquete rocket frog (Silverstoneia nubicola), a species that hadn’t been detected in the park for years. They also recorded the calls of the abundant Rainforest rocket frog (Silverstoneia flotator), which has a very similar call. Roberto Ibáñez, a leading expert on frog calls in Panama who has been studying them since the 1980s, is one of the few who can distinguish these species by sound alone.

So far, around 100 contributors have submitted 261 sound observations of 47 out of 188 frog species from Panama. Our goal is to make iNaturalist an even more valuable tool for collecting sound vouchers and annotations, which we hope will attract more experts like Roberto to share their amphibian call expertise on iNaturalist.

Looking Ahead: Sound and AI on iNaturalist

The future of sound on iNaturalist is bright. Grant Van Horn (@gvanhorn), a longtime collaborator on iNaturalist's computer vision projects and creator of Merlin Sound ID, recently worked with iNaturalist staff member Alex Shepard (@alexshepard) and colleagues from the University of Massachusetts Amherst to publish a paper on the iNaturalist Sounds Dataset. This paper, focused on building sound datasets for advancing AI sound models, was just accepted to NeurIPS 2024, one of the world’s top conferences on machine learning and AI. A preprint on arXiv will be available later this month and we’ll share the link here once it’s live.

Our long-term vision is to elevate sound to the same status as images on iNaturalist. We’re committed to developing tools that will make it easier for the community to record, annotate, and showcase sounds. We aim to leverage these data to power the next generation of AI sound models. These models will not only enhance the iNaturalist platform but also be shared with the broader scientific and conservation community.

By the end of 2024, we project that iNaturalist’s computer vision and geo models will cover 100,000 species. Even building an AI sound model capable of accurately identifying 10% of that—around 10,000 species—could be transformative for bioacoustics research.

Join Us in Shaping the Future of Bioacoustics

Can the iNaturalist community rally to generate the data needed for a 10,000-species sound model? We believe the answer is yes. With the right tools, outreach, and collaboration, we can achieve this ambitious goal together. Let's continue working together to expand the power of sound in conservation and biodiversity research!

Tips for Contributing Sound Observations

Identifying species by sight can be tricky, and sound adds an extra layer of challenge! Follow these simple tips to make identification easier for the iNaturalist community:

  1. Recording Techniques: Get as close as possible to your subject without disturbing it. Stand still and keep quiet to minimize background noise like footsteps, clothing rustle, or other sounds that could obscure your subject’s sound. Point your microphone toward the sound source, which may mean pointing the bottom of your phone toward your subject. Aim for recordings of at least 10 seconds—or ideally 30 seconds if the subject stays put—as longer samples can help with identification.
  2. Recording Diversity: To help us build a complete picture of each species’ sounds, record different individuals across various locations and times of year. Many shorter recordings from diverse settings are far more useful than a few lengthy ones from the same spot.
  3. Background Species: While it’s not required, going above and beyond by adding notes about any background species you hear can be incredibly valuable. Even when these sounds overlap with your target subject, they provide important context about the environment and help future listeners better interpret your recording. This extra detail also contributes to the development of machine learning models that recognize all species vocalizing, not just the target species.
  4. File Format: If you’re uploading sounds recorded outside the iNaturalist app, please use WAV files with a minimum sample rate of 44.1kHz.
Posted on October 4, 2024 11:32 PM by loarie loarie

Comments

Reason why I don't upload audio : I have to extract from the video which is very time consuming... definitely observation with sound is very important and awesome but someday... video support with limited length? 🥺 (yeah... even compressed video, it will take many storage)

Posted by miyrumiyru 18 days ago

I do a lot of sound recording because I'm a Merlin user. But nobody IDs sound on iNaturalist, so I didn't really think it was being used. I have a huge backlog I could upload. Are there any taxa you'd especially like to see documented? I'll try to get a start this weekend.

Posted by nancylightfoot 18 days ago

We have good bird audio identifiers in my state of Minnesota. So I'm lucky there.

Way less lucky with mammal sounds (mostly rodents - chipmunks, etc) and insect (grasshoppers, cicadas, etc).

I also hold off on audio unless I'm personally curious but I'm willing to collect more if we can get more identifiers.

Posted by mmmiller 18 days ago

Why not mp3 files? I use a small portable Olympus WS853 recorder (when uploading files outside of my iNat app). I know the mp3s are compressed files but they serve the purpose of identifications most of the time. Why let perfection be the enemy of the good? :)

Posted by ragkannan 18 days ago

Yay!! Congrats!!

Posted by texas_nature_family 18 days ago

@ragkannan I think WAV is just a preference, not a requirement.

Anyway, the future plans with sound are so awesome -- I can't wait for the day where there's enough sound observations for an identification model to be functional! Especially hoping for spectrogram display to be a thing someday, like Macaulay Library -- it's super super helpful when doing identifications by sound.

Posted by cigazze 18 days ago

Agreed @cigazze! Spectograms are SO helpful!

Posted by texas_nature_family 18 days ago

As someone with ~3000 sound observations and ~4000 sound identifications, I have a few suggestions that will help:
1) Fix the poor sound quality problem of the app and/or uploading. Compression artifacts are easily heard and valuable high frequencies are being discarded.
2) Enable a built-in spectrogram analyzer. This will greatly speed identifications.
3) Make it easier to harvest the vast quantity of bycatch species in existing sound recordings. It could be made much easier.

Posted by dan_johnson 18 days ago

One tricky group to deal with is bats. Since bat sounds aren't audible to humans, some people just upload spectrograms instead. My preferred solution is to upload them at 1/10th speed. I wonder though if this would potentially cause problems for the AI models or if they would be able to recognize the patterns regardless. Any bat people here? What are your thoughts?

Posted by zygy 18 days ago

@zygy for bats in Africa there is https://www.inaturalist.org/people/jakob

Posted by dianastuder 17 days ago

I try to upload sound files for bird observations with additional confirmation from Xeno-canto and Merlin/Birdnet whenever possible. Let us try and build a better library of sounds, so that the software can be trained better, images with sound will definitely help as well. Looking forward to iNaturalist's own sound recognition tool in near future. Cheers!

Posted by gs5 17 days ago

We also have almost a thousand of observations of plants with sound. While there are some legit ones (popping seeds for example), I clicked handful of random ones and it was always some 2-second random noise that someone uploaded by mistake with some photos. Not sure what to do about this, but at this point, there is a clearly not a very clear collection of sounds on iNat.

Posted by opisska 17 days ago

@opisska Research Grade should deal with that issue. If there are Research Grade sounds for plants with no detectable plant noise, identify the other species that can be heard, or mark No Evidence of Organism. Needs ID sounds are unlikely to be used for training

Posted by deboas 17 days ago

@deboas - the observations have many good pictures and then there is the nonsensical sound recording. So they are fully OK being research grade, but if you are looking for sounds, they are not helpful.

Posted by opisska 17 days ago

@opisska Got you! I can see that being a problem. I can see a case for marking "no" for "Evidence related to a single subject" in such situations

Posted by deboas 17 days ago

@deboas - I just don't want to casualise an entire observation for this. But as far as I know, it's not possible to flag just a single item in an observation, right?

Posted by opisska 17 days ago

I understand that Merlin uses the spectrogram image for analysis rather than the the audio directly. Is the plan to do the same with AI/CV and iNat audio? I know on the forum discussion about incorporating spectrograms there were concerns about bats and other high pitched species, similar to what @zygy expressed above. But as others have said this update is exciting!

I've been working on learning and recording singing insects in Ontario, but I find it more slow and tedious than photo ID because of a combination of lack of identifiers and inability to visualize the sounds (I find I learn things a lot easier visually than aurally).

Posted by upupa-epops 17 days ago

I'll be very excited to see audio AI come to the platform in one way or another. Merlin is great for birds, but I've often wished I had something similar for frogs, mammals, and insects.

Regarding plant observations with unrelated audio clips, this feels like another case where per-photo annotation (as opposed to per-observation) would be helpful in sorting and filtering data.

Posted by guerrichache 17 days ago

I can't speak for the academic quality of these observations, but gosh do I enjoy listening to the birds and frogs and bugs from around the world!

Posted by sambiology 17 days ago

So I just uploaded an audio file, I was trying to capture some Owls, but by the time I figured out if I was recording and uploading properly I was only able to capture insects or frogs in the background. Is this type of audio upload useful? and if so are there any suggested projects to add it to?

Posted by tommy_dye 17 days ago

In that case just upload the observations for the taxa that you can hear in the recording and not the owl (ie if you cant id the frogs 'Anura', insects 'Insecta', ...)

Posted by glebglub 16 days ago

Few people identify calls on iNat: I've requested a feature long go that is the equivalent to the compare feature, but giving the calls of nearby species rather than the images. It is very difficult to source calls of species especially in poorly studied regions (ie Africa). iNaturalist could revolutionize this aspect of biodiversity data, as more species are contributed and the site becomes invaluable resource for identification of sounds, and more people will be encouraged to add more sound if they get identified. Please can we get something like this implemented? AI training requires enough data and we just don't have enough in many places yet.

Thanks, excited for the future!

Posted by alexanderr 16 days ago

@miyrumiyru
To avoid having to extract audio from video, I use the following audio recording app on Android which works well for me. It comes with a widget option that allows you to start/stop recording by just tapping the widget without even having to open the app. And then I also take a photo of the location to use for the GPS location tag:
https://github.com/FossifyOrg/Voice-Recorder

Or to very quickly and easily extract the audio from video files, I use the following converter to convert MP4 videos to AAC audio as accepted by iNaturalist:
https://www.mediahuman.com/audio-converter/

@ragkannan
How well particularly the higher frequencies are retained will depend on the bit rate, so a high enough bit rate MP3 might be OK in theory. But all lossy compression formats are optimised for human ears, and discard a lot of data that is inaudible to us right across the frequency spectrum. Whether that additional data might be useful for an identification algorithm I have no idea, but I suspect any algorithm developed/trained is unlikely to be able to rely on such data to be effective, considering that the majority of user submissions are more likely to be from lossy compression formats (such as extracted from videos).

Also, MP3 is quite an old and inefficient format (though also encoder dependent). I would recommend any of OGG/OPUS/AAC as typically much better choices for lossy compression, depending on what hardware and software compatibility you require. OPUS is the most modern and best performing (particularly at very low bit rates), and already in common usage by web conferencing / VoIP apps and streaming services like YouTube, but still not as universally supported as OGG or AAC(M4A) which are still good choices for maintaining backwards compatibility (and aren't significantly worse that OPUS at the medium/higher bit rates you're likely to be using for recordings in any case).

For lossless compression use FLAC. The suggestion to upload uncompressed WAV audio is a bit strange being particularly and unnecessarily wasteful on bandwidth/resources. I can't think of any good reason for exposing end users to uncompressed media formats.

It would be really interesting to hear more about what frequency range is under consideration for analysis, how this compares to the range of species covered and the challenges of recording species exceeding the limits of human hearing (such as bats being a familiar example), the associated limitations of recording hardware/software specifications and possible alternatives (such as the dedicated bat detectors – could people submit artificially down-shifted audio like this??), bit-rate requirements per species etc. etc. ☺

Posted by bsteer 16 days ago

This is great! I can't wait! I've actually been using Merlin to record/identify sounds, and then I download, and clip/enhance the audio files. This is seriously a feature I've been hoping for!!!

Posted by dreadhorn 16 days ago

@bsteer Oh, thanks for great tips! :D

Posted by miyrumiyru 16 days ago

This is a really fascinating frontier in the citizen science space. When it comes to sound, I am an enthusiastic Merlin user. The just-in-time identifications serve a similar purpose as the iNaturalist CV for images, so hearing that there are plans to build a native iNat AI sound model is really heartening. I also didn’t know that iNat audio files are directly shared with GBIF, how cool!

Sound has been underrated on iNat for a while now, I think because of the quality issues and lack of associated visual representations (i.e. spectrograms) as others have mentioned. But sound is a vital tool for observing wildlife in the real world and there are certainly identifier experts out there who are just waiting on an improved workflow to get cracking. I see there’s already been work to make the observer side of things easier—I just tried out the “Record Sound” feature in the iOS app and it’s pretty snappy!

The flexibility of being able to observe any taxon remains one of iNat’s greatest strengths, so it will be awesome to see where we go from here with sound 🙌

Posted by featherenthusiast 16 days ago

I just use Merlin for all of my sound uploads, even if it isn't a bird; it's just too convenient since it can export directly to iNat.

Very rarely do I have to manually edit a sound file to make it more audible; I think the last one I did was for a southern flying squirrel, if it had been something more commonly observed I would not have bothered.

Posted by lothlin 15 days ago

I think one of the big obstacles to people IDing things on iNat is just the volume. Sometimes you have to crank up the volume to the max in order to hear anything, and it can be really irritating to forget to lower the volume and nearly go deaf when you listen to the next observation or play some other audio. Normalizing the audio volume (like what youtube does to all of the videos uploaded there) would be a really great first step in encouraging people to ID more sound observations.

Posted by davidenrique 15 days ago

Audio contributions might be more convenient if we could just directly upload the video, and have iNat automatically strip out the audio, converting it into the optimal audio format. It's time-consuming doing this all manually.

Posted by lesfreck 15 days ago

I have been recording sounds using my iPhone's built-in app "Voice Memos". I really like the editing feature on this app. But the biggest impediment for me is that the date, time, and location aren't preserved on the recording. So I have to add these manually on the iNat upload page. Can anyone suggest an app for iPhone that preserves these meta-data, and that also allows easy editing? Or some other efficient workaround?

Posted by johndreynolds 15 days ago

I think sound might be easier for AI to identify.

Posted by bellskimmer 15 days ago

I'm new here I believe this community will assist my quest for knowledge

Posted by jekwu-95_ 14 days ago

@johndreynolds I use the workaround mentioned in my previous comment above – take a photo at the location to save the date, time and location metadata. Then you can upload that as an observation (I first crop it down to make it a very small file size) to get that data set in place, move the audio recording into that observation on the submission page (i.e. combine them), and then just delete the photo from it such that you are left with the audio recording alone, but with the associated metadata still in place.

Posted by bsteer 14 days ago

I like this because there is definitely a gap when it comes to sound identification. We have a little frog here that 'sings' in bushes at night that many people think is a bird; Merlin - which does so well with birds - doesn't register the sound at all, since it's not a bird.

A few thoughts, from a design-thinking angle, if the sky is the limit --

Would love some native ways to edit audio in iNaturalist, which would include basics like cropping starting and end points, and more complex areas like noise reduction, high and low gate filters, normalization, etc.
iNaturalist might serve audio files best if multiple IDs can be added to a file - outside, we can take a picture of a single organism and try to focus on it, but sound can be quite different; a single recoding of any space I visit may have cicadas, traffic noise, a crow calling, a few frogs, etc

I have occasionally submitted audio files when I hear something that's too interesting not to submit, but do a fair amount of noise reduction on account of living in an urban area. It's a challenge to listen to audio files from my area without the noise reduction for this reason.

Posted by scarletskylight 14 days ago

Thanks for the suggestion, @bsteer. I'll try that.

Posted by johndreynolds 13 days ago

I use the same method for audio files as @bsteer uses for my audio files and process them pretty much the same as they do. I also use it for all my Canon photos which don't have geolocation. The extra hack I use is, when I'm taking a photo purely for geolocation or date/time info, I either hold my hand in front of the lens or I take a photo of the pavement or some blank surface. When I first started, I would come across a photo of 'stuff' and wonder - did I see something in that photo? or was it just a geo photo? That way, when I'm sorting through my phone photos, I can easily see 'nothing here but location'. I also take these types of photos between similar organisms. Say I'm photographing lots of insects on a group of flowers. I follow one insect till I'm satisfied. Then I take a 'break' photo (my hand, the path, etc) before going to another insect. It makes it much easier to keep the groups of similar looking insects (or other things) separate.

Posted by mmmiller 13 days ago

''File Format: If you’re uploading sounds recorded outside the iNaturalist app, please use WAV files with a minimum sample rate of 44.1kHz.'' I wav file could not be uploaded (over 20MB?) so I converted it to MP3. Ho can I shrink a WAV file from Merlin on an Anroid telephone?

Posted by optilete 13 days ago

@optilete
iNaturalist should accept WAV files:
"We accept JPG, PNG, GIF, WAV, AAC, MP3, and MP4 (audio only)"
[it would be useful to add FLAC, OPUS and OGG to this list for audio, along with WEBP and JPEG XL for images]

I tested it now with a very small (1 second) WAV file and it appears to work, but it wouldn't surprise me if you run into issues trying to upload long WAV files which could be quite large (and hence it is not a good choice of format for this purpose). If you were trying to do it on a mobile data connection, perhaps it is worth trying again with a higher bandwidth WiFi connection?

Alternatively you could try using an online audio converter, or an app, to convert it to M4A [AAC encoding in an MP4 container, as referred to as "MP4 (audio only)" above] for a significantly smaller file size. There are many different options for this, but e.g.:
https://online-audio-converter.com/

Posted by bsteer 12 days ago

How come iNaturalist doesn't do ID sugestions of sounds like Merlin does? I am using both apps now, Merlin for ID and iNaturalist for submitting the record.

Posted by henk1 3 days ago

@Miyrumiyru: Extracting audio from video too hard? Get Audacity (free and professional), drop the video file into the Audacity window, optional edit, export audio, done!

Posted by tuoichen 3 days ago

Subo muchas grabaciones todo el tiempo, no pensé que le dieran la importancia ya que no hay interacción con mis grabaciones por parte de los usuarios, pero seguiré subiendolas

Posted by alvaro_atrogularis 3 days ago

@henk1 make sure you are not blindly (deafly?) accepting the Merlin identifications, otherwise we will end up with AI errors reinforcing AI errors. Merlin is good, and improving, but it makes frequent mistakes.

Posted by deboas 3 days ago

@deboas Agree with this for sure. Even eBird cautions against reporting Merlin-identified species not independently verified by the observer. My favorite Merlin “mistake" is the long list of birds I get whenever I let it listen to a Northern Mockingbird 😂

@henk1 iNaturalist does not yet have a working AI identification mechanism for sound, but per the post it seems there are plans to get this implemented soon! iNat’s also historically been a more photo-based application so there’ll be more work to get audio support up and running at the same level, chiefly improving the quality of observations, UX of the identification process, quantity of observations available to train an AI on, etc.

Posted by featherenthusiast 3 days ago

@johndreynolds @bsteer you can skip taking a photo to get location + time recorded by making an observation with 'no media' (at least this is how it works for me on iPhone), and then add the edited audio after that. No need to take weird photos to get the location recorded that way. :) I hope this helps with your important work!

Posted by sudenkorentoko 2 days ago

@sudenkorentoko : yeah, the key for the tip that works for you is that you're using the app. My suggestions were for those who upload on the web interface. We need those 'weird photos'. :-) All of my photos and audio files get uploaded from my computer at home.

Posted by mmmiller 2 days ago

I have been stymied why I cannot hear the sound recordings when trying to ID others' observations. I have the sound on and turned up, but hear nothing (using iNaturalist on my laptop for better viewing and sound). I also do not download my own observations straight from my iphone, but download later through my laptop, which does not allow me to add audio as an iNat file. If so, I could be adding a lot more quality observations of bird calls with exact locations.

Posted by wildmare64 1 day ago

I often have trouble hearing other audio file observations. I have a pretty decent audio set up on my computer and there are times when I would crank the volume up full (which would hurt my ears for any other normal audio file) but still can't hear anything. If I'm feeling determined (or, alternately, generous), I will sometimes download the file onto my computer and pop it into Audacity. It's clear that there almost no volume on those files and once I amplify them, I can usually at least determine 'bird, insect, etc.' But I don't think that's a reasonable work around. The next person to listen won't be able to hear anything unless they also download and amplify it. I can't add the amplified file to the observation. And I'm not sure it's the best use of my time, all things considered. The observer heard something, but their recording device, for whatever reason, just didn't pick it up at an adequate volume.

Posted by mmmiller 1 day ago

Add a Comment

Sign In or Sign Up to add comments