What is the history of the tool?
The first version of Denelezh was released in March 2017. The second version, aka Denelezh 2.0, was released in April 2018, including many improvements:
- The gender gap by Wikimedia project is available.
- Statistics are made with all humans in Wikidata (and not only the ones with gender + year of birth + country of citizenship + occupation).
- There is no limit to the year of birth (statistics about humans born before 1600 are available).
- Occupations are deduced using the property « subclass of ».
Each version was detailed in a blog post:
- First version, March 2017: A tool to estimate the gender gap in Wikidata and Wikipedia.
- Second version, April 2018: Denelezh 2.0, a transitional version.
How are the statistics generated?
Denelezh uses weekly dumps of Wikidata. It gathers statistics about humans:
- An item with a best value (not necessarily unique) for the property instance of equal to human is a human.
- A human with its (unique) best value for the property gender equal to female is a female, equal to male is a male, equal to any other value is an other. A human with zero or multiple values as best value for the property gender has no gender in Denelezh.
- A human has a year of birth if the property date of birth has a best value with the sufficient precision. In the case of several best values available, the year is used if it is equal for each value, otherwise the human has no year of birth.
- All normal and preferred values of the property country of citizenship are used as country of citizenship for each human.
- All normal and preferred values of the property occupation are used as occupation for each human.
- Parent occupations are deduced using the property subclass of. An occupation has to be directly used at least one time to appears in Denelezh.
- Countries used less than 1,000 times are discarded; starting with dump from 2018-04-09, only countries used less than 200 times are discarded.
- Occupations used (directly or indirectly) less than 1,000 times are discarded.
- A sitelink represents a page in a Wikimedia project (including all Wikipedias, but also Wikisource, Wikiquote, ...) for a given item.
What are the ranks in Wikidata?
To sum up, each statement in Wikidata has a rank:
- deprecated: the value is incorrect
- normal (the default rank): the value is correct
- preferred: the value is the best among the correct values
The best values represent the best values that are available for a property in an item: the ones with the preferred rank if they exist, the ones with the normal rank otherwise.
How can I have the number of biographies accross all Wikipedias?
You can't (or you have to do it manually). At the moment, Denelezh provides statistics about a specific Wikimedia project or all Wikimedia projects, not a subset of them.
What is the origin of the name Denelezh?
Denelezh means Humanity in Breton.