Archive for the ‘Development’ Category

C# Code coverage approaches and unit tests

Monday, September 14th, 2009

This weekend I wrote a set of classes to [Determine and police password strength]. The interesting goal I set my self was to ensure that I got 100% code coverage using the MS unit test framework. Sounds easy! How hard can that be? Honestly I’ve focused more on TDD tests rather than code coverage.

Well firstly I have to say its not that easy :-) , secondarily it really shows the true benefit of [TDD] – more on that later. So on to the discoveries!

Being a TDD advocate I started off with an interface and then the tests before the code. So far so good – then on to the code. As a Spongebob episode would say “Several hours later” we were done. So crank up the tests and we are looking golden.

Unfortunately the code coverage wasn’t half as good as I had hoped for. So next couple of hours was spent updating the tests for coverage, and a number of edge cases surrounding the throwing of exceptions for invalid parameters. Interestingly a couple of areas that showed up outside of the TDD approach were certain areas in “for loops” where not executed because my test case was really unknowingly best case. The part that was revealing to me was that the TDD is really a “business” driven approach, code coverage is believed to be a programmers indicator of  ‘quality’. So while TDD (at least the first pass I end up doing) meets the business needs it is very hard to get 100% coverage from a business scenario.

Finally I got it up to 99.84% coverage. This last standout was VERY annoying, specifically looking at the code coverage everything was green – no red or yellow sections. Hrrrmm. Ok it was a simple function that has this 1 block of unexecuted code – a switch statement taking a enum in as a parameter.  So lets have a look have a look (note I’ve stripped a lot of the content and comments out for readability:

   public enum PasswordStrengthIndex
   {
      None = 0,
      Weak = 1,
      Medium = 2,
      Strong = 3,
      MostStrong = 4,
   }

      public void ResetToDefinedPolicyStrength(PasswordStrengthIndex strength)
      {
         switch (strength)
         {
            case PasswordStrengthIndex.None:
               MinimumPasswordLength = 0;
               break;
            case PasswordStrengthIndex.Weak:
               MinimumPasswordLength = 4;
               break;
            case PasswordStrengthIndex.Medium:
               MinimumPasswordLength = 6;
               break;
            case PasswordStrengthIndex.Strong:
               MinimumPasswordLength = 8;
               break;
            case PasswordStrengthIndex.MostStrong:
               MinimumPasswordLength = 12;
               break;
         }
         return;
      }

The test case iterated through the enums and verified the output matched the expectations. Still I had this annoying block that was unexecuted. The next step was the Jimmy Neutron “Think Think Think”, a-ha! No default handler – so add one. Hmmm I cant add one that actually can be exercised as I’ve already used all my values in my enum. Well adding a default handler to anyone of those case statements still didnt resolve the issue. Now it was getting personal! I was dangerously close to calling in technical gurus and venting my disgust! So before making the calls time to search Google for “Code Coverage Switch” – and boy Google never ceases to amaze me.

Firstly it confirmed my suspicion that is was indeed the switch statement, second was a link that explained it in detail. Basically the default handler was not getting called – and there were a number of ‘breaking’ workarounds (change the enum to have another element to be used in the default case) in the Google answers, but one one that I ‘liked’ the most was to force the cast (which I would link out to the original author out of respect but I cant find it now :-( ).

      public void ResetToDefinedPolicyStrength(PasswordStrengthIndex strength)
      {
         switch (strength)
         {
            case PasswordStrengthIndex.None:
               MinimumPasswordLength = 0;
               break;
            case PasswordStrengthIndex.Weak:
               MinimumPasswordLength = 4;
               break;
            case PasswordStrengthIndex.Medium:
               MinimumPasswordLength = 6;
               break;
            case PasswordStrengthIndex.Strong:
               MinimumPasswordLength = 8;
               break;
            case PasswordStrengthIndex.MostStrong:
               MinimumPasswordLength = 12;
               break;
            default:
               throw new ArgumentException("Supplied strength is not recognized as enum", "strength");
         }
         return;
      }

      [TestMethod()]
      [ExpectedException(typeof(ArgumentException))]
      public void ResetToDefinedPolicyStrengthBogus()
      {
         PasswordPolicy policy = new PasswordPolicy();
         policy.ResetToDefinedPolicyStrength(<strong>(PasswordStrengthIndex)666</strong>);
      }

Look at the last line of the listing above. Use a cast to force the enum to be a number out side of the range it was looking for – ta da!  On a partially related note this reminds me of the fact Enums are definitely interesting beasts that can bite you if you have compiled against one then change its definition in a different assembly.

So now we have got to 100% code coverage – woo hoo! The realization (which is obvious and well know to many) is that without TDD you would most likely end up writing tests that cover your code – but miss the business requirements. So any programmers out there who think they have good code coverage but are not using TDD are likely to have a false believe in their code because the tool says 100% green (unless they are coding guru’s!) – they are only really testing what they have written and not the business requirements. On the flipside just writing TDD tests without coverage is highly unlikely to meet all the ‘code’ edge cases (different than ‘business’ edge cases). So the conclusion it that the blend of the two are the best, but yet still not perfect. You have to start with TDD, and then move to code coverage – business first, code second.

I’m hoping at some point management teams will start to understand that even with 100% executed code coverage there can still be bugs as its a data world we live in! Obviously I’m an optimist!

Gareth

Better way to determine and police Password Strengths

Sunday, September 13th, 2009

Perhaps my Google search mo-jo has been acting up, but I could not find a good strong C# implementation for strong passwords (in fact I really couldn’t find much outside of logical cut & paste of implementations of random Information entropy implementations) . They were all predicated on the relatively standard assessment that all submitted passwords are random – uh huh!

For starters I recommend reading the article [http://en.wikipedia.org/wiki/Password_strength]. This is a good article covering the relative strengths of passwords, and gives a guide for determining the strength of a random password and a human derived password.

The major problem with passwords are that humans need to remember them, or they write them down. In an interesting technology twist historically you only used to have to worry about your co-workers having access/abusing your password because there was implicit physical security in place – you could only log on if you were physically in the office.  As such at that time your biggest threat was your co-workers, unfortunately the secondary defense of physical location has effectively been removed with the internet and VPN technology.  So now your threat count has increased from the people you work with to the entire world! Add to this these people are financially motivated and can directly target you – its a whole lot scarier out there now!

So before jumping into the implementations we need to go through well known things to avoid to help improve password strength:

  • Avoid sequences – keyboard or alphabet based (abcd, qwert, 1234, !@#$% etc)
  • Avoid dictionary words, especially common ones! Be aware that common misspellings are also used in dictionary based attacks – so unless your misspelling is VERY unusual then you can expect it to be in a dictionary!
  • Avoid leet/1337 password substitution of words (eg P@ssw0rd, M1cr0$0ft, 0\/\/n3d). Again these are now all in dictionaries, so while it may be harder to brute force – they are pretty trivial for a dictionary attack. Of course it doesnt hurt to be 1337, but it just really doesn’t help defend a targeted attack.
  • Avoid team names, socials, license names etc.

Things to avoid to minimize compromise exposure:

  • Use different passwords for different online accounts
  • Avoid using information about you that can be readily be found on the web as a password reset scheme. DOB, where you were born, school name etc.
  • If any account needs the most rigorous password control it is your email account. Nearly every online system ties back to an email account. If you need to reset a password, it normally goes to your email address. If that is compromised then that is really the opening of Pandora’s box.

Alright, lets start with the weakest ‘safe’ approach – Information entropy:

  • This strength calculation only holds true for ‘random’ passwords. No human (at least that I know) can really generate a random password on their own. The best approach that I’m aware of is to start up notepad and get your two year old to start smacking your keyboard. Then take this text and randomly change case of characters and inserting special characters. Unfortunately this is still weak because we have 2 hands and the keyboard is naturally divided into where your hands go. This generation is not as randomly distributed as people would think – nor would I recommend it! But at least you have a starting point, but then you have to write it down!
  • [0-9] – 10 possible symbols per character – 3.32 bits of base2 log entropy
  • [a-z] – 26 possible symbols per character- 4.7 bits of base2 log entropy
  • [A-Z] – 26 possible symbols per character- 4.7 bits of base2 log entropy
  • [A-Z, 0-9] – 36 possible symbols per character- 5.17 bits of base2 log entropy
  • [A-Z,a-z] – 52 possible symbols per character- 5.7 bits of base2 log entropy
  • [A-Z, a-z, 0-9] – 62 possible symbols per character- 5.95 bits of base2 log entropy
  • [A-Z, a-z, 0-9, Special] – 94 possible symbols per character – 6.55 bits of base2 log entropy

So we can see that having a strong password using completely random information will be hard to generate on our own, yet this approach is what is what is most commonly used to in web applications to determine password strength. This is not strong enough because humans are naturally not random. Using this theory the following non-random passwords generate results that imply the passwords are strong:

  • 12345678901234567890 – 20*3.32 => 66.4 bits of entropy
  • !!!!!!!!!!!!!!!!!!!! – 10 * 6.55 => 65.5 bits of entropy
  • !@#$%^&*() – 10 & 6.55 => 65.5 bits of entropy
  • qwertyuiop[]qwertyuiop[] = 24 * 6.55 => 157.2 bits of entropy.

The more astute among us will see the last two passwords were generated by running your finger across the a keyboard line of on a US keyboard. To enter the 24 characters password took under 3 seconds. So if anyone saw someone entering a password like this at work or in a library – its pretty easy to duplicate. Plainly you can see that with human users they are going to opt for the easiest way to remember and enter a password – this will never be random!

So to help avoid our users from becoming victims we have to try to take away the ‘easy’ passage from them. We have to assume the password is not going to be mathematically random – so we need to start from a different position. We have to ensure we remove the human weaknesses that other ‘black hats’ are looking to exploit.

So going back to the beginning of the article we are going to create an interface to define a ‘password policy’ that provides us a way to help enforce a stronger passwords – or  at least allows systems to setup a common language for handling passwords.

   /// <summary>
   /// Interface for defining a password policy
   /// </summary>
   /// <remarks>
   /// This security policy determines whether passwords
   /// meet pre-determined complexity requirements.
   ///
   /// If this policy is enabled, passwords must meet the
   /// following minimum requirements:
   ///
   /// Not contain the user's account name or parts of the
   /// user's full name that exceed four consecutive
   /// characters.
   /// Be at least <see cref="MinimumPasswordLength"/>
   /// characters in length
   /// Contain characters from three of the following
   /// four categories:
   /// English uppercase characters (A through Z)
   /// English lowercase characters (a through z)
   /// Base 10 digits (0 through 9)
   /// Non-alphabetic characters (for example, !, $, #, %)
   ///
   /// Complexity requirements are enforced when passwords
   /// are changed or created.
   /// </remarks>
   public interface IPasswordPolicy : IPolicy
   {
      /// <summary>
      /// Indicates the minimum password strength index for
      /// this policy (see PasswordStrengthIndex)
      /// </summary>
      /// <remarks>
      /// This value is based of a calculation of
      /// information entropy after sequences
      /// and dictionary words have been
      /// removed.
      /// </remarks>
      /// <value>
      /// The minimum index of the password strength.
      /// </value>
      PasswordStrengthIndex MinimumPasswordStrengthIndex
      {
         get;
         set;
      }

      /// <summary>
      /// Gets or sets the minimum length of the password.
      /// </summary>
      /// <value>The minimum length of the password.</value>
      int MinimumPasswordLength
      {
         get;
         set;
      }

      /// <summary>
      /// Gets or sets the maximum length of the password.
      /// </summary>
      /// <value>The maximum length of the password.</value>
      int MaximumPasswordLength
      {
         get;
         set;
      }

      /// <summary>
      /// If policy requires mixed case
      /// </summary>
      /// <value>true if policy needs mixed case</value>
      bool RequireMixedCase
      {
         get;
         set;
      }

      /// <summary>
      /// If policy needs digits
      /// </summary>
      /// <value>true if policy needs digits.</value>
      bool RequireDigits
      {
         get;
         set;
      }

      /// <summary>
      /// If policy needs special characters
      /// </summary>
      /// <value>
      /// true if require special characters are needed
      /// </value>
      bool RequireSpecialCharacters
      {
         get;
         set;
      }

      /// <summary>
      /// Indicates if the username needs to be additionally
      /// supplied to verify the password complexity against
      /// </summary>
      /// <value>
      /// true require username to check password against
      /// </value>
      bool RequireUsernameToCheckPasswordAgainst
      {
         get;
         set;
      }

      /// <summary>
      /// Gets or sets the maximum count of characters
      /// in a sequence
      /// </summary>
      /// <value>The maximum count of characters
      /// in a sequence.</value>
      int MaximumCharacterSequenceCount
      {
         get;
         set;
      }

      /// <summary>
      /// The duration of the lockout in minutes.
      /// </summary>
      /// <remarks>
      /// This security setting determines the number of
      /// minutes a locked-out account remains locked
      /// out before automatically becoming unlocked.
      /// The available range is from 0 minutes through
      /// 99,999 minutes.
      /// If you set the account lockout duration to less
      /// than zero, the account will be locked out until an
      /// administrator explicitly unlocks it. If an account
      /// lockout threshold is defined, the account lockout
      /// duration must be greater than or equal to
      /// the reset time.
      /// </remarks>
      /// <value>The duration of the lockout.</value>
      int LockoutDuration
      {
         get;
         set;
      }

      /// <summary>
      /// Gets or sets the lockout threshold.
      /// </summary>
      /// <remarks>
      /// This security setting determines the number of
      /// failed logon attempts that causes a user
      /// account to be locked out. A locked-out
      /// account cannot be used until it is reset
      /// by an administrator or until the
      /// lockout duration for the account has expired. You
      /// can set a value between 0 and 999 failed
      /// logon attempts. If you set the value to 0,
      /// the account will never be locked out.
      /// </remarks>
      /// <value>The lockout threshold.</value>
      int LockoutThreshold
      {
         get;
         set;
      }

      /// <summary>
      /// Reset account lockout after X minutes
      /// </summary>
      /// <remarks>
      /// This security setting determines the number of
      /// minutes that must elapse after a failed logon
      /// attempt before the failed logon attempt
      /// counter is reset to 0 bad logon attempts.
      /// The available range is 1 minute to
      /// 99,999 minutes.
      /// </remarks>
      /// <value>The duration of the lockout.</value>
      int LockoutResetInMinutes
      {
         get;
         set;
      }
   }

You can see this password policy template extends the initial outline to not only provide guidance for the number of entropy bits, but allows for the policy to cover the lock out strategy in the case of incorrect password handling and password expiry approaches. If you look at the source code you will also see the options that are available, but for the sake of this article we are trying to keep on point :-)

So on to the actual strength testing, this oddly is rather simple at the end of the day. We are going to use an Interface definition (IPassword) for the Password processor (makes testing & mocking easier) so we can actually have multiple implementations (think MEF!).  Now the actual implementation.

  1. Check for sequences using various lookup tables to determine if any sequences exist. If a sequence length is detected, and is longer than allowed the password fails the policy. The tables include:
    • Alphabetic + numeric sequence
    • QWERTY US Keyboard
    • QWERTY UK Keyboard
    • AZERTY Keyboard
  2. Perform simple DecodeEliteEncoding then perform a simple hardcoded dictionary match of well known super common passwords
  3. If supplied (and if required) compare password elements to the user name

The end implementation is still fairly simple and it would be fairly easy to improve on this implementation.  The most obvious ones are to support a custom dictionary and add more custom keyboard sequences. Other extensions would be to store the passwords and become a real password token service. We can leave it up to the reader to provide an implementation of IPassword to call the Google password rating service rather than the above implementation:

https://www.google.com/accounts/RatePassword?Passwd=csharphacker

All good stuff! I hope this helps (and the source code) people provide a better approach to helping strengthen passwords.

[Download source code here]

The linked source code is liable to change over time so check back often. The source code uses the Microsoft testing framework and currently has 100% code coverage! Although I don’t think 100% is all that people think it is.

Finally the goal is to make everything a more secure place – and in reality the best approach is to use a strong memorable password in conjunction with a hardware token that changes every minute.

As always feedback is welcome!

Gareth

We all wanted the reason why we wanted to install Win 7 – right?

Tuesday, August 4th, 2009

Ok I admit I’ve been watching the clock tick closer to August 6th :-) , but if you work with virtual machines (and you know who you are!) and have been ever frustrated by speed or it just not feeling normal then just check out the boot to VHD feature!:

Seriously cool – I mean seriously! Native booting to a VHD on your disk – I want to buy who ever thought of that and got management to agree to putting that into Windows 7 (think HyperVisor) a beer!

I LOVE IT!!

GZipStream is helpful, but has some missing features

Monday, July 27th, 2009

I recently had to work around a problem in a particularly ugly way (which I wont detail :-) ), so after that painful experience I opted to create a class to solve my specific issue in a sane and reusable manner! Out of this unexpected need the class “GZipHelper” was born. This is really just a wrapper around the  base .Net System.IO.Compression.GZipStream . Its was kind of a sad day as I really didn’t want to be doing this type of wrapper code, I was hoping it would have just been nativity available in the existing GZipStream class and I could have got on with solving my real business problem at hand.

Firstly it should be said that the standard GZipStream stream provides the functionality I’m sure the MS engineers expected it to do, which was for HTTP based compression (at least I think that was its expected purpose). However it is certainly not a fully featured class that is really easy to use for the programmers looking to get quick & helpful access to the GZip compression.

Specifically the problem I needed to solved was I needed to know how big any given “.GZ” decompressed file was without fully reading and decompressing the file. It seemed trivial enough – “gzip.exe -l” does what I needed, but no amount of hunting within MSDN helped. So on to the ever handy GZip wikipedia entry that detailed enough of the file format and provided the reference to the “GZIP file format specification version 4.3“.

So armed this this information we can start to decode the GZip file format to extract the length. Infact this class will check the file to see if it is GZip compressed and returns the decompressed length for that or the regular file length if it is not compressed.

The following class functions have been implemented (see the bottom of the article for the link to the full project):

   /// <summary>
   /// Utility class to help with managing GZip (.gz) files in .Net
   /// </summary>
   /// <remarks>
   /// This is a trivial wrapper class on top of <see cref="GZipStream"/> that does a little magic
   /// under the covers by looking at the underlying data format and retrieves the
   /// stored data information within the GZip compressed file.
   /// </remarks>
   public class GZipHelper
   {
      /// <summary>
      /// Gets the compressed file details
      /// </summary>
      /// <param name="filename">The filename.</param>
      /// <returns>True if file exists, else false</returns>
      public bool GetFileDetails(string filename);

      /// <summary>
      /// Gets the compressed file information from a file stream
      /// </summary>
      /// <param name="fileStream">The file stream.</param>
      /// <remarks>
      /// Definitions provided by RFC 1952 -GZIP File Format Specification (May 1996).
      /// Coding was performed against ftp://ftp.isi.edu/in-notes/rfc1952.txt
      /// </remarks>
      public void GetFileInformation(FileStream fileStream);

      /// <summary>
      /// Compresses the file file
      /// </summary>
      /// <param name="filename">The filename.</param>
      /// <param name="overWriteExisting">if set to <c>true</c> [over write existing].</param>
      /// <returns></returns>
      public void CompressFile(string filename, bool overWriteExisting);

      /// <summary>
      /// Decompresses the file.
      /// </summary>
      /// <param name="filename">The filename.</param>
      /// <param name="overWriteExisting">if set to <c>true</c> [over write existing].</param>
      /// <returns></returns>
      public bool DecompressFile(string filename, bool overWriteExisting);

      /// <summary>
      /// Returns a seekable stream into either a file or compressed file (defaults read-only)
      /// </summary>
      /// <remarks>
      /// Decompresses the stream into a <see cref="MemoryStream"/> if the file is compressed
      /// otherwise just returns back a regular <see cref="FileStream"/> as a <see cref="Stream"/>
      /// </remarks>
      /// <param name="filename">The filename to open.</param>
      /// <returns>Reference to opened stream</returns>
      public Stream GetSeekableStream(string filename);
   }

In combination to this the following properties are available:

  • CompressedLength – Size of the compressed file (or regular file size if not compressed)
  • DecompressedLength – Size of the file if it were uncompressed (or regular file size if not compressed)
  • IsTextFile – Indicates if GZip thought the file was text based, potentially leading to better compression
  • CompressionModeValue – Numeric indication of the compression mode used
  • CRC16Present – Indicates a CRC16 is available for the file
  • ExtraFieldsPresent – Additional meta fields are available in the file
  • FileNamePresent – GZip contains the original file name
  • FileCommentPresent – Compressed file has a comment associated with it
  • IsCompressed – Indicates if the file is GZip compressed or not
  • CompressedDate – If stored this is the date the file was compressed.
  • CRC32 – CRC32 value associated with the file

Along with the project there are MSTest harnesses to test the class (trivial implementations). So the features of the class are:

  • Can trivially determine a true file size (regardless if it was compressed via GZip or is uncompressed). This makes your code path much more readable if you are dealing with mixed file types.
  • Provides a Seekable stream into the compressed file via via a MemoryStream. The key is that you dont need to worry about the compression (unless you are reading in BIG files) as you will get back a Stream for either a File or a Compressed file – both support seeking. This can be handy if you problem assumes it can Seek in the stream and you need to access GZip files!
  • Trivial Decompress file, this also honors the CompressedDate. If that date is set then the decompressed file has that creation date.
  • Trivial Compress file. Unfortunately at the time of writing I’ve not updated the header to include the date of the compressed file. This may come in a later version (and if so I’ll update the blog :-) – but definitely no promises!).

Simple example usages are (taken straight from the unit tests!):


// Perform a file compression
GZipHelper actual = new GZipHelper();
actual.CompressFile(_fileName, true);

// Perform a file decompression
GZipHelper actual = new GZipHelper();
string fileName = "CSharpHackerSmallTest.txt.gz";
actual.DecompressFile(fileName, true);

// Get a seekable stream
GZipHelper actual = new GZipHelper();
using (Stream dataStream = actual.GetSeekableStream("CSharpHackerSmallTest.txt.gz"))
{
    // Silly seek - but it just shows it can be done
    dataStream.Seek(0, SeekOrigin.Begin);
    StreamReader sr = new StreamReader(dataStream);
    string contents = sr.ReadToEnd();

    Assert.AreEqual(119, contents.Length);
}

// Gets natural decompressed file length from a compressed file.
GZipHelper actual = new GZipHelper();
actual.GetFileInformation("CSharpHackerSmallTest.txt.gz");
Assert.AreEqual(119, actual.DecompressedLength);

Finally it should be noted that by all accounts the standard implementation of GZipStream in the base .Net libraries (actually the DeflateStream) has a problem when attempting to compress random or already compressed data. There is a Microsoft Connect article [http://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=93930] that details the issue.

The GZipStream and DeflateStream classes can _significantly_ increase the size of “compressed” data. That means, they don’t just add a few header bytes as stand-alone compressors do, but they _inflate_ the data by as much as 50%. This is apparently because these classes do not check for incompressible data which is a standard feature of all stand-alone compressors. Both classes work fine when the data actually can be compressed.

Please refer to this thread for more details:

http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=179704&SiteID=1

The base implementation worked for me and met my specific needs without the need of bringing in any third party DLLs. Which incidentally also has a nice benefit for those looking to bring this into proprietary software of avoiding any licensing discussions with supervisors! If you want a more robust GZipStream implementation you can check out http://dotnetzip.codeplex.com/. This apparently has a drop in replacement, but this class could still be useful even if use this drop in replacement as well.

I hope this helps some one out there :-)

[Download GZipHelper (Source + Project) Here]

This download link will always have the latest and greatest version.

Gareth