Reading a Git repo’s commit history with Pandas efficiently

Reading a Git repo’s commit history with Pandas efficiently

There are multiple reasons for analyzing a version control system like your Git repository. See for example Adam Tornhill’s book “Your Code as a Crime Scene” or his upcoming book “Software Design X-Rays” for plenty of inspirations:

You can analyze knowledge islands, distinguish often changing code from stable code parts, identify code that is temporal coupled to other code.

Having the necessary data for those analyses in a Pandas DataFrame gives you many possibilities to quickly gain insights into the evolution of your software system in various ways…

Mining performance hotspots with JProfiler, jQAssistant, Neo4j and Pandas – Part 2: Root Cause Analysis

Mining performance hotspots with JProfiler, jQAssistant, Neo4j and Pandas – Part 2: Root Cause Analysis

All the work before was just there to get a nice graph model that feels more natural. Now comes the analysis part: As mentioned in the introduction, we don’t only want the hotspots that signal that something awkward happened, but also

the trigger in our application of the hotspot combined with
the information about the entry point (e. g. where in our application does the problem happen) and
(optionally) the request that causes the problem (to be able to localize the problem)…