Decreasing use of the git exe in GitPHP

I’m actually not getting screwed by work this month, so I have a branch I’m working on where I’m changing the way GitPHP loads data from a project.  I’m moving as much of the data loading as possible into php itself, by having PHP directly access the objects and packfiles in the git project, rather than relying on calls to the git executable.  The packfile loading code is based off of Glip, but these changes are not actually using the Glip library itself.

The majority of the work is done.  I’m going to be merging it into master relatively soon, with the intent of including the changes in the next release.  Since this is such a fundamental change to the way data is loaded, I’m providing a configuration option to fall back to using the git executable (like GitPHP has always done) in case there are issues, and I’m considering providing this option on a per-project basis.

Update 7/17: This is merged into master. Any beta testers willing to test the new loading code on their repositories by running from git master would be appreciated… maybe we can knock out any major compatibility issues early.

Q: Why?
A: Several reasons:

  1. Compatibility.  The less calls I have to do to the shell, the less I have to worry about the differences between Linux and Windows, and breaking one or the other (usually Windows, since I develop on Linux)
  2. Performance.  Every shell execution is a fork() off of the webserver’s process.  When you run lots of shell commands one after another, fork()-ing the web server process each time, the delay is noticeable.
  3. Security.  While the good security practice would be to make your git repositories read-only to the webserver user, I bet very few people actually do that.  Passing user input down to a shell command is always a dangerous thing to do, and potentially exposes your server to hackers.  Unfortunately, I’ve always put functionality above security when calling the git executable, and I’d like to change that.

Q: Why didn’t you use Glip?
A. I experimented with using Glip.  In doing so, I found a couple things:

  1. Glip is incomplete.  It doesn’t load all of the objects that can exist in a git project, and is also missing a number of features I would need for it to replace the git executable.
  2. Glip provides its own API to access objects in a git project – Blob, Commit, Tree, etc.  These objects were very similar to the git object representations I had created for GitPHP, but the API was different enough to make them incompatible.  In loading data using Glip, I found myself creating Glip objects, manually picking out values off of the properties just to stuff them into my own properties, and discarding the Glip objects.  Not only that, but the way it stored data in properties was different enough from my git objects that I ended up having to do redundant conversions.  Glip would load data in git’s internal representation, convert them to php’s internal representation, and then I would extract those values and convert them back into git’s internal representation for use in my objects.  It just felt like a really redundant amount of glue code.  I have no doubt that Glip would be great if you were writing a project from scratch based off it, but ripping apart and rewriting all of GitPHP’s model code is not something I’m looking to do anytime soon.

Q: Does this mean the git executable is no longer required?
A: Right now the git exe is still required.

Q: Does this mean we may not need the git executable in the future?
A. I don’t want to say never, but this is unlikely.  While loading data from plain git objects is significantly faster in raw PHP, this also means that the processing has to be done in PHP.  There are a number of cases where large amounts of processing in PHP actually make it slower than just biting the bullet and calling the git executable:

  1. The git rev-list command (used for the shortlog/log) has a –skip command, which is used to skip a certain number of commits down in the log.  This is used for the next/prev paging in the log in GitPHP.  In raw PHP, we don’t have this option – we have to walk all the way down the commit log ourselves.  So in PHP, if you want to get page 5 of the log (commits 401-500), you have to walk the log and load commits 1-500, and discard the first 400.  So you can imagine that it gets really slow when you get down to page 21, and you have to load the first 2100 commits and discard the first 2000.  And walking the log isn’t just following each parent link – it’s any commit reachable from the tip, which includes all merged branches and any of their reachable commits.  The current implementation actually uses raw php for the early pages of the log, but when skipping a significant number of commits, falls back on the git executable in order to keep performance reasonable.
  2. Searching for a commit, committer, or author requires loading every single commit in the history.
  3. Grepping inside files requires loading up the contents of every single file in a tree and searching every line.
  4. Getting the history of a file requires reading every single commit in the entire history and reading each commit’s tree to see if it touched this file.  That also doesn’t take into account things like detecting renames.
  5. Getting the blame of a file requires getting the entire history of a file and diffing every commit with its parent to figure out the changes to the file.
  6. Diffing a file would require me to write my own diff algorithm (non-trivial), or requiring an external php extension like xdiff, which I don’t want to do.

Q: Are there any additional requirements?
A: You’ll need Zlib support in your PHP, since git objects are gz-compressed.

Q: Are there any limitations?
A. The new loading code won’t read packfiles larger than 2GB.  This is a limitation in Glip, too.  It’s because php’s fseek() (or actually most operations in php) top out at 2GB (2*1024^3).

I’ll update this post with any other FAQs I can think of.

GitPHP 0.2.4

GitPHP 0.2.4 is out. I’ve still been pretty busy, I apologize.

  • Chinese translation – thanks to seefan
  • Side-by-side diffs for blobdiff and commitdiff – based on work by Mattias Ulbrich and Tanguy Pruvot
  • Fix clone and push urls using ssh colon notation – thanks to mdevilz
  • Allow specifying hashbases by branch/tag name in URLs – previously you were required to specify the entire ref name, eg hb=refs/heads/master. Now you can just specify the head/tag name, eg hb=master.
  • Fix crash in blobdiff_plain when specifying hashbase without a hash – this prevented you from using urls with just a filename and hashbase id or ref name, which was supposed to find the right file hash automatically

Release is on the GitPHP page and bugs can be reported on the bugtracker.

GitPHP 0.2.3

I’ve released GitPHP 0.2.3. Unfortunately I’ve had a pretty busy couple months at work, so I didn’t really get to any of the enhancements that I intended to get to this release. 0.2.3 has several bugfixes that I wanted to get out, rather than delaying the release and making you wait for them.

  • Degrade gracefully when the system doesn’t have posix functions
    Previously there was a bug where the page crashed when trying to display the project owner on systems without the php posix module installed (windows systems, and several linux distributions). If you don’t have this module then the owner still won’t display unless you set the gitweb.owner config value or override it in projects.conf.php, but at least the page won’t stop rendering it.
  • Make the language setting a permanent cookie
    Previously, the language cookie was a session cookie, so when you came back to the site you had to change your language again. Now it’s a permanent cookie with a 1 year expiration.
  • Escape HTML in the title header
    Previously, commit messages weren’t escaped in the title bar (the gray bar with the commit message at the top of the page, below the nav), so commit messages with HTML in them caused weird things to happen. Now the HTML displays properly
  • Add attribution and link back to GitPHP page in the footer
    This is by request, I swear. If you don’t like it, I don’t mind if you delete it from the footer and go back to the old appearance.
  • Run javascript project livesearch after the user stops typing
    Previously the javascript livesearch on the project list ran after every letter you typed. This caused a severe performance hit on larger project lists, since the browser would churn through the huge list of projects many times in succession. Now it runs after the user stops typing.
  • Remove containing tag from object cache
    Previously an object’s containing tag was stored in the object cache. The containing tag is actually not immutable data, and so there are certain cases where caching this would cause the display to get out of sync. Now, this is no longer cached. I would suggest clearing your cache after upgrading to be safe.
  • Improve performance of projects with lots of tags or heads
    Previously there was a severe performance problem with projects with many tags or heads (hundreds). This was because it was attempting to load all tags and all their data into memory. For example, the git repo for the git program has almost 400 tags, and took almost 2 minutes to load any page. Now, it only loads the data for the tags it needs so it’s a lot better.
  • Clean up stylesheets
    Previously the stylesheet was a mess and was really difficult to read. I’ve cleaned it up, organized it, and broken it out into two separate stylesheets – a functional stylesheet (gitphp.css) which contains declarations needed for gitphp to layout and work properly, and a look and feel stylesheet (gitphpskin.css) which customizes the colors and styling. The stylesheet config option now controls the location of the skin stylesheet – so you can make a copy, customize the look and feel, and point the stylesheet config option at the new skin stylesheet. The config option has backwards compatibility to handle the fact that the stylesheet name changed but people may have the option pointing at the old single stylesheet from previous example configs.

As always, the release is available on the GitPHP page and bugs can be reported on the bugtracker.

Return top