Posts Tagged ‘gitphp’

GitPHP 0.2.6

I’ve released GitPHP 0.2.6. This release has more internal changes than end-user ones, but there are still a few enhancements and fixes for users and administrators.

  • Enhancements:
    • Upgrade to Smarty 3. Some templates have been reorganized to take advantage of Smarty 3′s hierarchical template features
    • RequireJS is now used to serve javascript. RequireJS is a module loader that has allowed me to restructure the javascript code into clean reusable modules. My javascript code before was… let’s just say gross. Using RequireJS does have some other benefits too:
      • RequireJS loads modules asynchronously and in parallel. Many browsers still load javascript files from the web server one at a time, blocking each load until the previous one has finished loading. This improves page load time.
      • The RequireJS javascript minification process (which I do for all packaged releases) will combine all modules for a page and their dependencies into a single file before compression, which reduces the number of separate files the user’s browser has to load.
    • GitPHP can now read project-specific config options from the git config in the repository directory itself (e.g. gitphp.git/config), to allow project-specific settings to carry across multiple gitphp installations. This file can be edited directly or accessed using the ‘git config’ command. The section is [gitphp], and the config values are each of the project specific settings currently supported in projects.conf.php. See the comments in projects.conf.php.example for more information. The project-specific config file will override global gitphp settings in gitphp.conf.php, but install-specific settings in projects.conf.php will override the project-specific config file.
    • GitPHP can now use the Google Libraries API to serve the jQuery library, by setting the ‘googlejs’ config value. Using the Google Libraries API allows you to offload serving that library from your web server, and allows users to benefit from a single cached file for all sites using the Google Libraries API. The library is served from Google’s servers, which means this won’t work if you’re running on an intranet without outside internet access.
    • Signed-off-by lines in commit messages and PGP blocks in tags are styled differently to differentiate them from the actual commit/tag message
    • The shortlog now displays abbreviated hashes for each commit. The abbreviation length is read from the project’s git config file (core.abbrev setting), defaulting to 7. By default, abbreviated hashes are not checked for collisions. A gitphp config setting, ‘uniqueabbrev’, has been added which will turn on collision checking when abbreviating hashes. Note that this is performance intensive because it needs to search every hash in the project, which is why it’s off by default. You might be better served just increasing the minimum abbreviation length.
    • Japanese translation, thanks to Ishikawa Mutsumi
  • Fixes:
    • Fix a collision when multiple users were downloading an uncached version of the exact same snapshot at the exact same time
    • Fix direct line links on non-GeSHi blob pages, thanks to Steve Clay
    • Fix order of shortlog/log commits when a branch is rebased
    • Fix trimming of multibyte commit messages in shortlog, thanks to Ishikawa Mutsumi
    • Fix handling of git’s commit encoding header in commit messages
    • Fix handling of non-ASCII filenames in tree view. Based on a fix by sh2ka
    • Re-enable whitespace trimming to decrease the size of HTML files served. Was accidentally disabled during the big rewrite a year ago
    • Fix a potential XSS vulnerability

Smarty 3 was a difficult transition that’s been in the works for a long time. Smarty 3 was a complete rewrite from smarty 2 – and as I certainly found out during my big GitPHP rewrite, a complete rewrite tends to introduce slight unintentional functional changes. And for Smarty, as a library, those slight functional changes manifest themselves as bugs in the consuming app. I’ve spent a long time hunting Smarty 3 bugs in GitPHP using every single possible configuration combination I could think of. I believe I’ve gotten all the ones I can find, but I can’t possibly reproduce every single setup and git repo out there – so if you run into issues, by all means let me know.

Smarty 2′s compiled templates and cached files are incompatible with Smarty 3, so for safety I would suggest deleting the compiled templates in the template_c/ directory, any cached files in the cache/ directory, and bouncing Memcache if you use it. Smarty 3′s template compilation takes longer than before due to the hierarchical templates, however this will only happen the first time a page is loaded after deleting the contents of templates_c, you won’t have to do it again.

The release is on the GitPHP page and bugs can be filed on the bugtracker.

GitPHP 0.2.5

I’ve released GitPHP 0.2.5. I’ve had a little more free time recently than I’ve had for the past couple versions, so this version has some more changes than the previous couple releases.

  • Enhancements:
    • Move a lot of the data loading to using raw php file parsing instead of relying on the git executable. This causes an enormous performance boost, especially on webservers that have very expensive process forking (I’m looking at you Apache)
    • Add a ‘compat’ config option that can be specified globally or per-project, to fall back on the old method of loading data if you run into issues with the previous enhancement. (PHP will not process repository packfiles larger than 2GB)
    • Add a ‘largeskip’ config option. Using raw PHP data loading, we have to walk commits down the log as you page down earlier and earlier in the history, which has decreasing performance as you go back. This determines at what threshold gitphp will instead just use the git executable.
    • Display merge commits in the shortlog with grayed-out titles, thanks to Tanguy Pruvot
    • Subdirectory snapshots now include the subdirectory as part of the archive name
    • Support for the xdiff php extension. If you have the xdiff php extension installed, that will be used for diffing, which is faster and completely eliminates the need to have a separate diff executable or temp dir configured.
    • Support for loading the project list from an SCM-Manager config. Only projects marked public are loaded. Based on work done by Craig Sparks
    • Add debug info when searching directory for projects. If you aren’t seeing certain or any projects appear when letting GitPHP find all projects in the project root, turn on the ‘debug’ config option to see more about what it’s doing.
    • Benchmarking info is now turned on separately from debug info, with a new config option ‘benchmark’
    • Cache list of projects if the object cache is turned on, instead of searching the project root every time. Based on work done by Tanguy Pruvot
    • Archives are now delivered incrementally to the user’s browser, instead of being loaded entirely into memory first. This avoids PHP out-of-memory errors when trying to snapshot a very large project. Note: as a result of this change, snapshot tarballs are no longer cached to Memcache, even if Memcache support is turned on. They will always be cached to the cache directory on disk.
    • Minify CSS for performance
    • Support tags pointing directly to blobs. This isn’t commonly done but can be used to embed something like a GPG key in a repository
    • Allow displaying a website URL for a project. Because this can’t really be automatically calculated the way clone/push urls can, this is a per-project setting only.
  • Fixes:
    • Fixed how the default diff executable is determined if not specified in the config
    • Fixed issues diffing when the temp dir had spaces in the name (common on Windows)
    • Fixed issues diffing on windows when the temp dir didn’t have a trailing backslash
    • Avoid floods of warning messages when the fileinfo magic database is incorrectly compiled
    • Avoid warning messages when the project doesn’t have a description file. (sometimes happens with repositories created by third party software other than the standard git program)
    • Fixed some display issues on the project list with owner/age/links columns wrapping too much when the project has an extremely long description
    • Avoid warning messages when listing all projects in the projectroot and the webserver user doesn’t have read access to one of the directories. Based on a fix by Justyn Shull
    • Fixed issue where using the diff link by a single file on the commit page showed the wrong commit at the top of the diff

As always, the release is on the GitPHP page and bugs can be filed on the bugtracker. If you need support you can always email me, or if you would prefer a more public discussion, you can use the forums.

Q: You’ve had a smarty 3 branch in progress for a while now. Why haven’t you merged it yet? Smarty 3 is perfect and amazing!!!1!eleven1one
A: Not quite. Smarty 3 is a complete rewrite and, just like the complete GitPHP rewrite a year ago, has some bugs and some differences in behavior. These changes in smarty 3 actually broke some of the functionality in GitPHP. I’ve been making changes to accomodate new behavior, and doing hacks to work around smarty 3 bugs, and I’m still not convinced I’ve found all of them yet. Additionally, Smarty 3′s cache infrastructure is completely different from Smarty 2, which renders all custom cache code useless. In particular, this means that GitPHP Memcache support does not work at all with Smarty 3. The upgrade to smarty 3 is something that makes development easier but has no real discernible effect for end users. I refuse to do an ‘upgrade’ that will negatively impact end users, especially when that negative impact is lost functionality.

Decreasing use of the git exe in GitPHP

I’m actually not getting screwed by work this month, so I have a branch I’m working on where I’m changing the way GitPHP loads data from a project.  I’m moving as much of the data loading as possible into php itself, by having PHP directly access the objects and packfiles in the git project, rather than relying on calls to the git executable.  The packfile loading code is based off of Glip, but these changes are not actually using the Glip library itself.

The majority of the work is done.  I’m going to be merging it into master relatively soon, with the intent of including the changes in the next release.  Since this is such a fundamental change to the way data is loaded, I’m providing a configuration option to fall back to using the git executable (like GitPHP has always done) in case there are issues, and I’m considering providing this option on a per-project basis.

Update 7/17: This is merged into master. Any beta testers willing to test the new loading code on their repositories by running from git master would be appreciated… maybe we can knock out any major compatibility issues early.

Q: Why?
A: Several reasons:

  1. Compatibility.  The less calls I have to do to the shell, the less I have to worry about the differences between Linux and Windows, and breaking one or the other (usually Windows, since I develop on Linux)
  2. Performance.  Every shell execution is a fork() off of the webserver’s process.  When you run lots of shell commands one after another, fork()-ing the web server process each time, the delay is noticeable.
  3. Security.  While the good security practice would be to make your git repositories read-only to the webserver user, I bet very few people actually do that.  Passing user input down to a shell command is always a dangerous thing to do, and potentially exposes your server to hackers.  Unfortunately, I’ve always put functionality above security when calling the git executable, and I’d like to change that.

Q: Why didn’t you use Glip?
A. I experimented with using Glip.  In doing so, I found a couple things:

  1. Glip is incomplete.  It doesn’t load all of the objects that can exist in a git project, and is also missing a number of features I would need for it to replace the git executable.
  2. Glip provides its own API to access objects in a git project – Blob, Commit, Tree, etc.  These objects were very similar to the git object representations I had created for GitPHP, but the API was different enough to make them incompatible.  In loading data using Glip, I found myself creating Glip objects, manually picking out values off of the properties just to stuff them into my own properties, and discarding the Glip objects.  Not only that, but the way it stored data in properties was different enough from my git objects that I ended up having to do redundant conversions.  Glip would load data in git’s internal representation, convert them to php’s internal representation, and then I would extract those values and convert them back into git’s internal representation for use in my objects.  It just felt like a really redundant amount of glue code.  I have no doubt that Glip would be great if you were writing a project from scratch based off it, but ripping apart and rewriting all of GitPHP’s model code is not something I’m looking to do anytime soon.

Q: Does this mean the git executable is no longer required?
A: Right now the git exe is still required.

Q: Does this mean we may not need the git executable in the future?
A. I don’t want to say never, but this is unlikely.  While loading data from plain git objects is significantly faster in raw PHP, this also means that the processing has to be done in PHP.  There are a number of cases where large amounts of processing in PHP actually make it slower than just biting the bullet and calling the git executable:

  1. The git rev-list command (used for the shortlog/log) has a –skip command, which is used to skip a certain number of commits down in the log.  This is used for the next/prev paging in the log in GitPHP.  In raw PHP, we don’t have this option – we have to walk all the way down the commit log ourselves.  So in PHP, if you want to get page 5 of the log (commits 401-500), you have to walk the log and load commits 1-500, and discard the first 400.  So you can imagine that it gets really slow when you get down to page 21, and you have to load the first 2100 commits and discard the first 2000.  And walking the log isn’t just following each parent link – it’s any commit reachable from the tip, which includes all merged branches and any of their reachable commits.  The current implementation actually uses raw php for the early pages of the log, but when skipping a significant number of commits, falls back on the git executable in order to keep performance reasonable.
  2. Searching for a commit, committer, or author requires loading every single commit in the history.
  3. Grepping inside files requires loading up the contents of every single file in a tree and searching every line.
  4. Getting the history of a file requires reading every single commit in the entire history and reading each commit’s tree to see if it touched this file.  That also doesn’t take into account things like detecting renames.
  5. Getting the blame of a file requires getting the entire history of a file and diffing every commit with its parent to figure out the changes to the file.
  6. Diffing a file would require me to write my own diff algorithm (non-trivial), or requiring an external php extension like xdiff, which I don’t want to do.

Q: Are there any additional requirements?
A: You’ll need Zlib support in your PHP, since git objects are gz-compressed.

Q: Are there any limitations?
A. The new loading code won’t read packfiles larger than 2GB.  This is a limitation in Glip, too.  It’s because php’s fseek() (or actually most operations in php) top out at 2GB (2*1024^3).

I’ll update this post with any other FAQs I can think of.

Return top