Data analysis of developer job posts from Stack Overflow

I did some data analysis on developer job posts from Stack Overflow with the aim of coming up with interesting insights about different aspects of job posts such as what are the most popular technologies, or the locations of job posts around the world as visualized on a map.

After much data processing, I finally got the graphs I was looking for. I will show the most interesting graphs since there are too many to choose from and if you want to see the complete set of graphs, go to my personal website; you will also find there explanations about how I managed to produce the data necessary to generate the graphs. The project is implemented in Python3 and makes use of a bunch of modules (e.g. BeautifulSoup4, matplotlib, plotly) for building the whole pipeline of generating the maps, graphs and reports.

Contents

1. Sources of data and some initial stats
2. Bar chart: popularity of technologies based on number of occurrences in job posts
3. World map of job posts
4. Interactive scatter plot: Average mid-range salaries for different technologies
5. TODOs

Sources of data and some initial stats

The maps and graphs are generated based on a total of 1000 job posts coming from Stack Overflow's developer jobs website and RSS feed, dated from 2018-09-18 to 2018-09-28 and from 41 countries. The grand majority (90%) of job posts are coming from USA and Europe with USA being the country with the most job posts (37 %), followed by Germany (24 %). The following table summarizes these stats:

Table 1 Stats about sources of data
Sources of dataStack Overflow's RSS feed
and jobs website
Number of job posts1000
Published dates2018-09-18 to 2018-09-28
Number of countries41
Top 5 countries based on number of job postsUnited States (377)
Germany (241)
United Kingdom (84)
Netherlands (43)
Switzerland (34)

Bar chart: popularity of technologies based on number of occurrences in job posts

A technology is described by a tag in a job post and can consist for example in a programming language, an OS, or a web framework. The popularity of a technology is determined based on its number of occurrences among job posts. One job post can only be associated with unique technologies, i.e. a job post can't be associated with the same technology more than once. The following table gives some stats about the technologies found in the job posts:

Table 2 Stats about technologies found in job posts
Number of technologies568
Number of job posts with at least one technology tag999
Published dates2018-09-18 to 2018-09-28

Top 20 most popular technologies based on number of occurrences across Stack Overflow job posts
More graphs @ my personal website.

World map of job posts

As stated earlier, the grand majority of job posts are concentrated in the USA and Europe, more specifically in the USA's East/West coasts and Germany. Usually, a job post only has a single job location but there are few cases of one job post with two and more job locations. The following table gives some stats about the countries around the world found in the job posts:

Table 3 Stats about countries found in job posts
Number of countries41
Number of job posts999
Published dates2018-09-18 to 2018-09-28

Distribution of Stack Overflow job posts around the world

NOTE: Each dot on the map represents a particular job location (or address) from a job post and the size of the dot gives the relative importance of the job location compared to other job locations. Thus, a bigger dot for a particular location means that more job posts are associated with this location compared to other locations having smaller dots.
US map @ my personal website.

Interactive scatter plot: Average mid-range salaries for different technologies

Since the companies provide a range for the salary (i.e. a min and max salaries a candidate could expect to receive) and not a precise number, it is very difficult to have a good measure of the salary of a job post. Thus, I converted the min and max salaries into a single number: the mid-range salary which is defined as mid = (min + max) / 2. Not all job posts provide a salary and there can only be one salary per job post. The following table gives some stats about the salaries associated with technologies found in job posts:

Table 4 Stats about technologies associated with salaries
Number of technologies with salaries272
Number of job posts with technologies and salaries195
Published dates2018-09-18 to 2018-09-28


NOTE 1: the y-axis represents the number of job posts used for computing the average mid-range salary for the given technology, e.g. JavaScript is associated with an average mid-range salary of $84255 based on 42 job posts.
NOTE 2: The average mid-range salary of an item (e.g. industries, skills) is computed by grouping all the same items along with their mid-range salaries and computing the average of their mid-range salaries.

TODOs

  • Since not every job post provides a salary range, the number of posts with salaries analyzed is very small but I plan on integrating more data from Stackoverflow and from other sources of data.
  • I want to make automatic the whole data updating/processing/displaying as much as possible so I can show the data updated weekly without much intervention to see how the job market changes as time goes on.

Popular posts from this blog

Deactivate conda's base environment on startup

Draw arrows with GIMP plugins

Product review: SMONET wireless security camera system