We released a new computer vision model today. It has 86,861 taxa up from 84,878. This new model (v2.12) was trained on data exported on February 02, 2024.
Here's a graph of the models release schedule since early 2022 (segments extend from data export date to model release date) and how the number of species included in each model has increased over time.
Thanks to some work by the team described here, we are going to start posting accuracy estimates with these model releases estimated against 1,000 random Research Grade observations in each group not seen during training time. The paired bars below compare average accuracy of model 2.11 with the new model 2.12. Each bar shows the accuracy from Computer Vision alone (dark green), Computer Vision + Geo (geen), and Computer Vision + Geo + Cropping Change (light green). "Cropping Change" is a slight modification to the way images are prepared before they are sent to the CV model that resulted in an average 2.1% improvement.
Overall the average accuracy of 2.12 is 89.1%. You can see the average accuracy varies by taxonomic group and continent from as low as the 60s for Africa and as high as the 90s for Europe and North America. Also note that on average 2.12 is 1.1% more accurate than 2.11 which is consistent with our goal of keeping model accuracy roughly stable as we continue to add thousands of taxa each month (as described here we probably expect <2% variance all other things being equal).
Here is a sample of new species added to v2.12:
Comments
Some beautiful species added this time! The fact that % of accuracy grows with added taxa is amazing!
Always interesting to see.. love to see that some of the arachnids I ID made it into the new model
So... non-ecologist, non-biologist, non-scientist here. Is there a point where knowing how to ID things just won't matter anymore because AI can do it? There must be some distinction that still makes it essential for people to be able to do it... but when i use iNat nowadays, many of my observations have been solved for me by the model, and therefore I feel like I don't need to actually learn.
But I want to. Please tell me that this (amazing) tech won't make the expert knowledge of thousands obsolete.
@qworkqwork system doesn't work without humans and not all species are visually distinct, current model can't even id mallards on flight.
I would love to learn more about what "Cropping Change" entails, as it seems to have a significant positive effect on arachnid IDs, and maybe some other groups that I don't care as much about ;)
For the 1000 observations used to make the graph, does that include taxa which are not in the training set, and the CV will necessarily misidentify? Or only only taxa which the CV is trained on?
Computer Vision + Geo (geen),
Thanks for writing the blog post with background information about the model.
Why should accuracy vary between continents?
** Are African organisms more difficult to identify?
*** are there too many, too similar organisms?
*** are the organisms too variable?
** Are African observers bad at taking diagnostic photographs?
** Are African identifications less reliable?
or is this on the entire dataset, and thus simply a case of less observations = less training == less precision?
@tonyrebelo I suspect it is the later, that many African endemic species have not yet enough photos where they can be included in the CV training data. I don't really think there's anything intrinsic about your flora that's more difficult than others. However, I do suspect that if they ran this at a country-by-country level rather than continent level, that South Africa would have a much higher CV accuracy than others on the continent
We need more identifiers for Africa. We are steadily adding new species for Africa. 152 African plants this time - of those 109 in Southern, so 43 with more in North than 'central' Africa I think - can't find a filter for those two?
And better internet - still dealing with undersea cable breaks off West Africa right now. I think tropical Africa is the difficult bit. If we could see accuracy separately for North, Southern and 'central' Africa separately ? Or North, South, East, West and Sahara - the numbers will vary widely.
@kevinfaccenda - i see:
But surely the test should only be on species known to the model? If unknown species are included the CV will by definition fail (except in cases where both CV and identifiers make the same mistake (e.g. ID a rare species not in the model as a more common one in the model)).
Otherwise the measure is merely an index of species known to the model. For the test to be meaningful, it has to be "1000 random RG observations for species that the model has previously been trained on"?
There's no "should", just how it is, as most observations come from tourists who usually don't have the means to know the local biota by name, especially tropical forest, they add the majority of observations and rely on iders, there're no experts, thus no list to compare your photos to, so even experts from neighbouring countries can't help too much with such diverse region and high levels of endemism. It's a circle of unided photos that just grows and grows until someone comes and works on it steadily.
CV will often be confused with photos and when unsure -- suggest a NA species, if the train sample comes from a few observers there're higher chances a new photos will confuse it, there will be this bias untill species will have much more observers, but it's better that it's in the model as it is vs it not being suggested at all.
PS: it would also be nice to see the progress with the model every now and again as a proportion of the species in the AI model, verse the total species RG on iNaturalist. Both by taxonomy and continent.
@marina_gorbunova - that is a totally different issue: the proportion of observations that are Research Grade.
What I thought we were interested in here is how accurate the AI CV is with making identifications. By definition the AI CV will fail 100% of the time on observations it has not been trained on. (Well perhaps it may pick up cases where identifiers have made a mistake, so perhaps it will only fail in 95-99% of cases, but by the methods mentioned above it will fail 100%).
So to measure the efficiency of the AI, has two components: those it has not been trained on (and it will fail 100%) versus those it has been trained upon - where we expect an efficiency of 80-90% at present, and as the training algorithms improve, we can expect it to continue to 90 to 95% and better.
@tonyrebelo they of course check the known species, but as I clarified, the limit of a learnt species is pretty low now, so your photo can be confusing for the model, hell, it can't id cropped photos of mallards or crows sometimes because of a little angle, plants can be much more confusing at different stages of life and if you photograph a leaf of flower separate or the whole plant instead, you got me.
((It is also important to bear in mind with the 1000 random sample, is that if people are using the AI to make the IDs, then comparing user IDs to AI IDs to test if the AI is correct, is not really an independent test.
((Moreover, we may categorize the AI ID as wrong, if the improved version changes its ID to a different species to the one suggested previously by the AI to users, and to which the users have incorrectly agreed. - but I imagine (hope) such cases will be relatively rare))
@marina_gorbunova - Sorry: I have read your explanation again and am not clarified. I think the tourism aspect and the ID aspect are not relevant.
Are you suggesting that because Africa has fewer observations, it has a smaller training set and a greater proportion of observations of those species encountered subsequently by the AI will be of untrained perspectives/facets/details?
OK - I can see that, but it would be nice to know if it actually applies. Especially at a level of 20% - I could understand 1-5% but 20% of trained cases seems a bit high for any explanation that I can think of.
100 pictures is usually about 60 obs
why did this palm need 137 obs?
https://www.inaturalist.org/taxa/364450-Borassus-akeassii
@tonyrebelo I meant that smaller subset of observers produce similar photos for many observations, so it's my guess that taking a new photo can be challenging for the system if that time pixels just don't allign the same way. Another thing is when id is done by humans from multiple photos, then each contain a piece of information, but cv model doesn't know it. For hard to id species like insects that's a problem everywhere and if in Africa more observations that are taken in the model are like that. But it's just a speculation on what could play a role.
Judging how % is lower / biodiversity is higher, answer is connected to that. It seems that only NA and Europe got something other regions don't have yet, I guess number of observations?
@dianastuder - 2 obvious reasons: 1 - lots of observations were posted recently; 2 - lots of IDs were made recently.
I am sure that there are other explanations, but in this case (-Borassus-akeassii) almost 50 observations were posted in the last two weeks.
@tonyrebelo
Q: Why should accuracy vary between continents?
A: I think the more species the model has the better it can separate them based on AI. So Africa will catch up as observations and validations keep coming for (endemic) species of this continent.
@loarie Thanks for this great update, also nice presentation of all these metrics
@qworkqwork If it is possible, it is still a long way off. Accurately identifying plants and animals often relies on characterisitics that are difficult to capture in a single photo: can't have a super zoomed in photo of the flower petals while also having a zoomed out photo of the entire tree. Some characteristics are impossible to capture on camera, such as animal sounds/behaviour/internal genetalia. The most accurate case I can envisage is perfect ID accuracy for well taken and scaled photos of organisms with visual differences. This would require a very large, well curated, very complete (no missing species) data set, which I do not think is possible in this lifetime. This would also exclude photos in poor light, photos from unsual angles etc.
Not to mention that even if you did have this perfect model, new species are constantly being described and taxonomy of existing species is in flux, which would complicate matters.
@tonyrebelo One factor that could be at play (not just in Africa, but across the world) is that I think areas/groups with higher species diversity will be more challenging for the CV. For instance, if there is a genus with only 5 members (or an area where only five members are present), and 4 of the five are included in the CV, it has a better chance of IDing correctly. If there's a genus with 50 members (or an area where all of these are present) and only 4 are included in the CV, it may be more challenging to get correct as the CV doesn't know how to create rules that distinguish those other species (making it easier for the CV to confuse them with species it does know about). This doesn't necessarily mean Africa would be tougher to ID specifically, but tropical areas with higher diversity and lower proportional taxon coverage in the model should probably be expected to have lower ID accuracy. I would expect this would also interact with fewer observations/observers/identifiers which we also know is an issue.
@cthawley - none of this is an issue if the AI is only tested on the species that it has been trained on. The other species, and unidentified species, and unknown species, and higher species richness, are not an issue as they should not be in the testing set. (the AI will be wrong with IDing them 100% of the time: we know that already)
I agree that the model will never be able to identify unknown species to species level (or other levels that aren't in its training set), but I believe its accuracy for included taxa can change based on the inclusion of other taxa in the training set in at least two ways.
1) The model can ID species (or taxa) that aren't in its training set to a broader level - eg, IDing an known or unknown species to the correct genus. While this isn't 100% "correct", it also isn't wrong and is very useful. These correct genus or family ID suggestions greatly speed up the IDing process on iNat by getting observations to IDers that are expert in those taxa. I'm not sure how this type of higher level suggestion from the model is considered in accuracy calculations above, but it is important to consider when assessing model performance. For instance, if a model suggests Genus A for an observation of species A. beeus, I wouldn't consider that "wrong" in the same way as an ID of species A. ceeus would be.
2) If the model has better sampling of the members of a genus, it is likely going to have higher performance in making correct, less specific IDs, as the rules/criteria it develops during training will better describe the genus (or other higher taxon). Likewise, a model can make different rules/classifiers for identifying species A and B when it is trained on a set that includes only species A and B vs. when it is trained on a set that includes species A, B, C, D, and E. Depending on how the model is trained, this could be true even if exactly the same photos were used for species A and B in the training set. were The rules/classifiers that the model creates for identifying when trained on the higher diversity set may be more accurate/specific for species A and B, even when measuring performance just the observations of those species, and disregarding performance on taxa outside the training set (though they could be worse to, depending on the specific situation).
Question about the Cropping Change. Was this also applied on the training data? If not, would that make sense?
I checked new taxa for where I live, Pennsylvania in the US. I love being reminded that even an area that often seems so commonplace to me still holds so many species that are rare or rarely observed. Very motivating, and a nice reminder that literally everyone lives in a place with new and meaningful things to see and record. :-)
@qworkqwork Don't worry. :-) iNat is one of the few entities making valuable, reasonable, non-hyped use of machine learning as a tool, not as a replacement for anything at all. It's more like another kind of field guide. It's a technology that's been around for decades, but is now able to use faster computers. And it's tremendously fun and rewarding for the people who are making good use of it. Technically, there's still no such thing as artificial intelligence, and I love that iNat correctly labels their tool as "computer vision" or "machine learning". As a good rule of thumb, don't trust anyone who calls their stuff "AI". ;-)
Are there plans to announce the date of the next time data will be exported to train the next version of the model? There are a few taxa whose geographic ranges could use some cleanup to train the model better. For example, Cantharellus cibarius is a Eurasian mushroom species that has similar-looking relatives in North America, and because North American specimens have long been misidentified as C. cibarius, the iNat algorithm suggests this species in North America based on geographic location even though this species has never been confirmed to occur on the continent as far as I'm aware. If we knew when the data would be exported, we could clean up the data for before that date. Just my 2 cents :)
@ground_grazer They've been aiming for updates every month. I wouldn't worry about the exact date too much and just curate that group. Once all the bad ID's are gone, it will update in due course
Good to see 4 common Australian Megachile up and ready to make my life easier! ;)
@ground_gazer That is why I check Pending / About status for a short leaderboard. And then push hard with @mentions to add IDs to another / more obs if they are still pending. Sometimes there is low hanging fruit waiting to be picked, perhaps in the next taxon level up.
I’m really happy that Anadara chemnitzii has been added! A few identifiers and I have gone through a lot of observations and brought the species from ~30 observations to more than 180. Still, there roughly 70 observations of A. chemnitzii that are stuck at a higher rank because of a previous misidentification
@wendyjegla It is Artificial General Intelligence that is likely not known to exist yet. Artificial intelligence has been around for a long time (some debate from the 1950s) and machine learning is a category or technique within the broader field of artificial intelligence.
How close are we to getting any more of the Fontinalis species on the CV? The most common N. American taxa like F. novæ-angliæ and F. hypnoides are currently at 119 and 94 observations respectively. Currently, the CV thinks all Fontinalis species are F. antipyretica (rather than a genus-level suggestion), which produces a lot of false positives to negotiate for the small few of us who ID the world’s Fontinalis observations.
F. novæ-angliæ has 54 RG and F. hypnoides 45 RG observations. If more of the observations could be confirmed, it could be included in the next training.
But would having all 15 Fontinalis species identified as 3 species rather than 1 species not make it more difficult to pick up misidentifications?
I'd like to recognize the efforts of @jf920, whose work on identifying mostly cultivated observations of Chlorophytum comosum and Chlorophytum laxum has led to the addition of the latter species to the computer vision model. Hopefully that will now mean that most people uploading photos of their spider plant/pongol sword will get an accurate computer vision ID, and that will mean less work for human identifiers to correct misidentifications.
@dianastuder: I think that palm wasn't added earlier because it just received a lot more observations and IDs. Of the current 138 verifiable observations of that taxon, only 90 were in the database on February 2 and only 84 on December 31, 2023 (the cut-off date for the previous CV model).
Most of these 84 observations appear to have received their first species-level ID in the past few months, so it seems plausible that on December 31, there were fewer than 100 photos associated with verifiable observations identified as Borassus akeassii.
(I see now that @tonyrebelo made similar comments earlier. I guess this is a good example of how adding observations and IDs translates into improvements in the CV model.)
Add a Comment