376 stories
·
8 followers

Will deep understanding still be valuable?

2 Shares

This morning GitHub made a big announcement: Introducing GitHub Copilot: your AI pair programmer. Everybody's talking about it. And for good reason -- it looks really cool.

But my own reactions are mixed. I admire the accomplishment, and I am eager to try it, but I am also troubled by the apparent trend.

This blog entry is my attempt to write about that. As I begin, I hope the paragraphs below favor questions over judgments. I hope to express my feelings and perspectives without being critical of others. I hope to write something that is not just another "get off my lawn" rant. Let's see if I succeed.

In short: It looks to me like AI-assisted software development is just getting started, and will probably become a very big thing. And, it feels to me like yet another step toward shallow understanding in our field.

Hmmm. That last sentence looks kinda harsh. I wonder if I've already crossed the line I didn't want to cross.

Let me try to say this another way.

In my nearly 4 decades of writing code, I have consistently found that the most valuable thing is to know how things work. Nothing in software development is more effective than the ability to see deeper. To borrow Joel Spolsky's terminology, I claim that almost all abstractions are leakier than you think, so it is valuable to see through them.

  • Knowing how to allocate and free memory is one thing. Much better is to have understanding of how the memory allocator works. That is the kind of depth that gives me what I need to make decisions.

  • Writing SQL statements is fairly easy. But there is a lot going on under the hood. The whole experience goes so much better if I understand indexes and table scans and lock escalation.

  • Networking code? Don't even get me started.

I am utterly convinced that deep understanding is important.

But increasingly, I feel like I'm swimming upstream. It seems like most people in our industry care far more about "how to do" rather than "how does it work".

And yes, there are good reasons for this. People have jobs. An employer's expectations are typically about getting things done.

So I am not saying it is unimportant know how to do things. Rather, what I'm saying is that after I understand how things work, seeing how to do something is usually trivial. And the next time I need to figure "how to do", it will go faster.

But the world and I seem to be at odds about this. I feel like I crash into this conflict every. single. day.

  • A lot of programming documentation is structured around the steps to follow to perform a certain job, but that is almost never what I'm looking for.

  • On question/answer sites, it seems like every interesting question yields responses from people saying "You shouldn't do that", which seems unhelpful.

  • After years of seeing people say "Don't write your own crypto", I made peace with it, but now I see "Don't write your own X" for all kinds of X. I mean, if we want X in the world, somebody has to write it, right?

  • Last week I was trying to confirm that the CIL castclass instruction returns the same object reference it was given (when it doesn't throw). Previous people who asked this question were mostly told some form of "you shouldn't need to know that". But surely somebody on this planet needs to know that?

  • Most people facing a software problem will move forward as soon as they are unblocked, without stopping to learn why the given solution worked.

I feel like I'm drowning in a sea of people saying "I did (whatever) and the problem went away".

Most online interaction I have with other software developers results in me feeling different and alone.

But I remain "utterly convinced". I have bet my career on the importance of depth, and I will continue to do so. And, when given the opportunity to guide and mentor younger people, I steer them along the same path.

And yet, as the trends in the industry seem to move away from me, I am forced to wonder about how well my own experience will map onto a very different future. Am I giving newbies bad advice when I suggest (for example) that they learn what's really going on with async/await?

Sometimes I worry that my posture is a form of gatekeeping. I don't want to be someone who insists that everybody's path needs to mirror mine. If you are having an enjoyable and successful software career without studying stuff like two's-complement arithmetic and B-trees, I am happy for you. But do I believe, broadly speaking, that your career would take another positive step every time you learn more? Yeah, I do.

Simply put, I see two basic possibilities here. Either I am correct, and technical depth is still important even as fewer people value it, or I am a dinosaur, and nature has selected my kind for extinction.

So... GitHub Copilot looks like fun. But the day is coming soon when I'm going to see somebody respond to a coding question with "Why are you asking this? Just use the AI pair programmer." And I'm probably going to throw a tantrum.

Read the whole story
Wilka
1168 days ago
reply
Newcastle, United Kingdom
alvinashcraft
1176 days ago
reply
West Grove, PA
Share this story
Delete

Introducing open source Windows 10 PowerToys

3 Shares

Microsoft Windows PowerToysYesterday the Windows Team announced the first preview and code release of PowerToys for Windows 10. This first preview includes two utilities:

Many years ago there was PowerToys for Windows 95 and frankly, it's overdue that we have them for Windows 10 – and bonus points for being open source!

These tools are also open source and hosted on GitHub! Maybe you have an open source project that's a "PowerToy?" Let me know in the comments. A great example of a PowerToy is something that takes a Windows Features and turns it up to 11!

EarTrumpet is a favorite example of mine of a community "PowerToy." It takes the volume control and the Windows auto subsystem and tailors it for the pro/advanced user. You should definitely try it out!

As for these new Windows 10 Power Toys, here’s what the Windows key shortcut guide looks like:

PowerToys - Shortcut Guide

And here's Fancy Zones. It's very sophisticated. Be sure to watch the YouTube to see how to use it.

Fancy Zones

To kick the tires on the first two utilities, download the installer here.

The main PowerToys service runs when Windows starts and a user logs in. When the service is running, a PowerToys icon appears in the system tray. Selecting the icon launches the PowerToys settings UI. The settings UI lets you enable and disable individual utilities and provides settings for each utility. There is also a link to the help doc for each utility. You can right click the tray icon to quit the Power Toys service.

We'd love to see YOU make a PowerToy and maybe it'll get bundled with the PowerToys installer!

How to create new PowerToys

See the instructions on how to install the PowerToys Module project template.
Specifications for the PowerToys settings API.

We ask that before you start work on a feature that you would like to contribute, please read our Contributor's Guide. We will be happy to work with you to figure out the best approach, provide guidance and mentorship throughout feature development, and help avoid any wasted or duplicate effort.

Additional utilities in the pipeline are:

If you find bugs or have suggestions, please open an issue in the Power Toys GitHub repo.


Sponsor: Uno Platform is the Open Source platform for building single codebase, native mobile, desktop and web apps using only C# and XAML. Built on top of Xamarin and WebAssembly! Check out the Uno Platform tutorial!



© 2019 Scott Hanselman. All rights reserved.
     
Read the whole story
Wilka
1835 days ago
reply
Newcastle, United Kingdom
alvinashcraft
1837 days ago
reply
West Grove, PA
Share this story
Delete

Announcing WPF, WinForms, and WinUI are going Open Source

3 Shares

Buckle up friends! Microsoft is open sourcing WPF, Windows Forms (winforms), and WinUI, so the three major Windows UX technologies are going open source! All this is happening on the same day as .NET Core 3.0 Preview 1 is announced. Madness! ;)

.NET Core 3 is a major update which adds support for building Windows desktop applications using Windows Presentation Foundation (WPF), Windows Forms, and Entity Framework 6 (EF6). Note that .NET Core 3 continues to be open source and runs on Windows, Linux, Mac, in containers, and in the cloud. In the case of WPF/WinForms/etc you'll be able to create apps for Windows that include (if you like) their own copy of the .NET Framework for a clean side-by-side install and even faster apps at run time. The Windows UI XAML Library (WinUI) is also being open sourced AND you can use these controls in any Windows UI framework.

That means your (or my!) WPF/WinForms/WinUI apps can all use the same controls if you like, using XAML Islands. I could take the now 10 year old BabySmash WPF app and add support for pens, improved touch, or whatever makes me happy!

WPF and Windows Forms projects are run under the .NET Foundation which also announced changes today and the community will guide foundation operations. The .NET Foundation is also changing its governance model by increasing the number of board members to 7, with just 1 appointed by Microsoft. The other board members will be voted on by the community! Anyone who has contributed to a .NET Foundation project can run, similar to how the Gnome Foundation works! Learn more about the .NET Foundation here.

On the runtime and versioning side, here's a really important point from the .NET blog that's worth emphasizing IMHO:

Know that if you have existing .NET Framework apps that there is not pressure to port them to .NET Core. We will be adding features to .NET Framework 4.8 to support new desktop scenarios. While we do recommend that new desktop apps should consider targeting .NET Core, the .NET Framework will keep the high compatibility bar and will provide support for your apps for a very long time to come.

I think of it this way. The .NET Framework on Windows is slowing down. It's not going anywhere but it's rock solid and stable. But if you've got a desktop app and you want the latest and greatest features, new controls, new C#, etc language features, then use .NET Core as it's going to be updated more often. The new hotness is going to come in .NET Core first and that's the fresh tech train going forward, but .NET Framework 4.8 is still a fundamental part of Windows itself. Target the .NET that is right for your project and/or business.

I don't want to hear any of this "this is dead, only use that" nonsense. We just open sourced WinForms and have already taken Pull Requests. WinForms has been updated for 4k+ displays! WPF is open source, y'all! Think about the .NET Standard and how you can run standard libraries on .NET Framework, .NET Core, and Mono - or any ".NET" that's out there. Mono is enabling running .NET Standard libraries via WebAssembly. To be clear - your browser is now .NET Standard capable! There are open source projects like https://platform.uno/ and Avalonia and Ooui taking .NET in new and interesting places. Blazor makes Web UIs in .NET with (preview/experimental) client support with Web Assembly and server support included in .NET 3.0 with Razor Components. Only good things are coming, my friends!

.NET ALL THE THINGS

.NET Core runs on Raspberry Pi and ARM processors! .NET Core supports serial points, IoT devices, and there's even a System.Device.GPIO (General Purpose I/O) package! Go explore https://github.com/dotnet/iot to really get your head around how much cool stuff is happening in the .NET space.

I want to encourage you to go check out Matt Warren's extremely well-researched post "Open Source .NET - 4 years later" to get a real visceral sense of how far we've come as a community. You'll be amazed!

Now, go play!

Enjoy.


Sponsor: Preview the latest JetBrains Rider with its Assembly Explorer, Git Submodules, SQL language injections, integrated performance profiler and more advanced Unity support.



© 2018 Scott Hanselman. All rights reserved.
     
Read the whole story
Wilka
2109 days ago
reply
Newcastle, United Kingdom
alvinashcraft
2114 days ago
reply
West Grove, PA
Share this story
Delete

Which Test Cases Should I Automate?

2 Shares

When someone asks “I have a big suite of manual tests; which tests (or worse, which test cases) should I automate?”, I often worry about several things.

The first thing is that focusing on test cases is often a pretty lousy way to think about testing.  That’s because test cases are often cast in terms of following an explicit procedure in order to observe a specific result.  At best, this confirms that the product can work if someone follows that procedure, and it also assumes that any result unobserved in the course of that procedure and after it is unimportant.

The trouble is that there are potentially infinite variations on the procedure, and many factors might make a difference in a test or in its outcome. Will people use the product in only one way? Will this specific data expose a problem?  Might other data expose a problem that this data does not?  Will a bug appear every time we follow this procedure? The test case often actively suppresses discoveryTest cases are not testing , and bugs don’t follow the test cases.

Second: testing is neither manual nor automated. A test cannot be automated.  Elements of the procedure within the test (in particular, checks for specific facts) can be automated.  But your test is not just the machine performing virtual key presses or comparing an output to some reference.

Your test is a process of activity and reasoning:  analyzing risk; designing an experiment; performing the experiment; observing what happens before, during, and after the experiment; interpreting the results; and preparing and a relevant report.  This depends on your human intentions, your mindset, and your skill set.  Tools can help every activity along the way, but the test is something you do, not something the machine does.

Third:  lots of existing test cases are shallow, pointless, out of date, ponderous, inefficient, cryptic, and unmotivated by risk.  Often they are focused on the user interface, a level of the product that is often quite unfriendly to tools. Because the test cases exist, they are often pointlessly repeated, long after they have lost any power to find a bug.  Why execute pointless test cases more quickly?

It might be a much better idea to create new automated checks that are focused on specific factors of the product, especially at low levels, with the goal of providing quick feedback to the developer.  It might be a good idea to prepare those checks as a collaboration between the developer and the tester (or between two developers, one of whom is in the builder’s role, with the other taking on a testing role). It might be a good idea to develop those checks as part of the process of developing some code. And it might be a really good idea to think about tools in a way that goes far beyond faster execution of a test script.

So I encourage people to reframe the question.  Instead of thinking what (existing) test cases should I automate? try thinking:

  • What reason do we have for preserving these test cases at all? If we’re going to use tools effectively, why not design checks with tools in mind from the outset?  What tool-assisted experiments could we design to help us learn about the product and discover problems in it?
  • What parts of a given experiment could tools accelerate, extend, enhance, enable, or intensify?
  • What do we want to cover? What product factors could we check? (A product factor is something that can be examined during a test, or that might influence the outcome of a test.)
  • To what product factors (like data, or sequences, or platforms,…) could we apply tools to help us to induce variation into our testing?
  • How could tools help us to generate representative, valid data and exceptional or pathological data?
  • In experiments that we might perform on the product, how might tools help us make observations and recognize facts that might otherwise escape our notice?  How can tools make invisible things visible?
  • How can tools help us to generate lots of outcomes that we can analyze to find interesting patterns in the data?  How can tools help us to visualize those outcomes?
  • How could the tool help us to stress out the product; overwhelm it; perturb it; deprive it of things that it needs?
  • How can developers make various parts of the system more amenable to being operated, observed, and evaluated with the help of tools?
  • How might tools induce blindness to problems that we might be able to see without them?
  • What experiments could we perform on the product that we could not perform at all without help from tools?

Remember:  if your approach to testing is responsible, clever, efficient, and focused on discovering problems that matter, then tools will help you to find problems that matter in an efficient, clever, and responsible way.  If you try to “automate” bad testing, you’ll find yourself doing bad testing faster and worse than you’ve ever done it before.

More reading:
A Context-Driven Approach to Automation in Testing
Manual and Automated Testing
The End of Manual Testing

Learn about Rapid Software Testing

Sign up for the RST class, taught by both James Bach and Michael Bolton, in Seattle September 26-28, 2018.  It’s not just for testers!  Anyone who wants to get better at testing is welcome and encouraged to attend—developers, managers, designers, ops people, tech support folk…

Read the whole story
alvinashcraft
2256 days ago
reply
West Grove, PA
Wilka
2256 days ago
reply
Newcastle, United Kingdom
Share this story
Delete

Notes on Unit Testing and Other Things

1 Share

People often ask me what constitutes a good amount of unit testing. I think the answer is usually a very high level of coverage but it’s fair to say that there are smart people that disagree. The truth is probably that there is not one answer to this question. Notwithstanding this, I have been able to give some pretty consistent guidance with regard to unit testing which I’d like to share. And not surprisingly it doesn’t prescribe a specific level of testing.

No coverage is never acceptable

While it’s true that some things are harder to test than others I think it’s fair to say that 0% coverage is universally wrong. So the minimum thing I can recommend is everything have at least some coverage. This helps in a number of ways not the least of which is that each subsequent test is much easier to add.

Corollary: “My code can’t be unit tested” is never acceptable

While you may chose to test more lightly (see below) it is always possible to unit test code; there are no exceptions. No matter how crazy the context, it can be mocked. No matter how entangled the global dependencies, they can be faked. The greater the mess, the more valuable it will be to disentangle the problems to create something testable. Refactoring code to enhance its testability is inherently valuable and getting the tests to “somewhere good” is a great way to quantify the value of refactoring that might otherwise go unnoticed.

Unit testing drives good developer behavior overall.

Unit Tests Cross Check Complex Intent

The very best you can do with a unit test, in fact the only thing you can do, is verify that the code does what you intended it to do. This is not the same as verifying correctness but it goes a long way.

  • you create suitable mocks to create any simulated situation you need
  • you observe the side-effects on your mocks to validate that your code is taking the correct actions
  • you observe the publicly visible state of your system to ensure that it is moving correctly from one valid state to the next valid state
  • you add test-only methods as needed (if needed) to expose properties that need testing but are otherwise not readily visible with the public contract (it’s better if you do this minimally, but for instance test constructors are a common use case)

Given that unit tests are good for the above you then have to consider “Where will those things help?”

When creating telemetry you can observe that you log the correct things the correct number of times.

  • shipping logging that is wrong is super common and it can cost days or weeks to find out, fix it, and get the right data…
  • it doesn’t take weeks to unit test logging thoroughly, so great candidate to move fast by testing
  • logged data that is “mostly correct” is likely to become the bane of your existence

When you have algorithms with complex internal state, the state can be readily verified.

  • the tests act as living documentation for the state transitions
  • they protect you against future mistakes
  • well meaning newcomers and others looking to remove dead code and/or refactor can do so with confidence
  • if you’re planning such a refactor, “tests first, refactor second” is a great strategy

When you have algorithms with important policy, that policy can be verified.

  • many systems require “housekeeping” like expiration, deletion, or other maintenance
  • unit tests can help you trigger those events even if they normally take days, weeks, years

When you have outlying but important success/failure cases, they can be verified.

  • similar to the above the most exotic cases can be tested
  • important failure cases where we need to purge the cache or so some other cleanup action that are not exactly policy but are essential for correctness can be tested
  • these cases are hard to force in end to end tests and easily overlooked in manual testing
  • in general, situations that are not on the main path but are essential are the most important to verify

Areas under heavy churn can have their chief modes of operation verified.

  • anywhere there is significant development there is the greatest chance of bugs
  • whatever mistakes developers tend to make, write tests that will find them and stop them
  • even if developer check-ins are 99% right that means in any given week something important is gonna bust, that’s the math of it…

In case of complex threading, interleaves can be verified.

  • By mocking mutex, critical section, or events, every possible interleave can be simulated
  • weakness: you can only test the interleaves you thought of (but that goes a long way)
  • that’s actually the universal weakness of unit tests

The above are just examples, the idea being that you use the strength areas of unit tests cross-checked against your code to maximize their value. Even very simple tests that do stuff like “make sure the null cases are handled correctly” are super helpful because it’s easy to get stuff wrong and some edge case might not run in manual testing. These are real problems that cause breakage in your continuous integration and customer issues in production. Super dumb unit tests can stop many of these problems in their tracks.

Once you have some coverage it’s easy to look at the coverage reports and then decide where you should invest more and when. This stuff is great for learning the code and getting people up to speed!

A few notes on some of the other types of testing might be valuable too.

When to Consider Integration Tests

If you can take an entire subsystem and run it largely standalone, or wired out for logging, or anything at all like that really, it creates a great opportunity for what I’ll loosely call “integration testing.” I’m not sure that term is super-well defined actually, but the idea is that more live components can be tested. For instance, maybe you can use real backend servers, or developer servers, and maybe drive your client libraries without actually launching the real UI — that would be a great kind of integration test. Maybe this is done with test users and a stub UI; maybe it’s some light automation; maybe a combination. These test configurations can be used to validate large swaths of code, including communication stacks and server reconfigurations.

It’s important to note that no amount of unit testing can ever tell you if your new server topology is going to work. You can certainly verify that your deployment is what you think it is with something kind of like a unit test (I can validate my deployment files at least) but that isn’t really the same thing.

Something less than your full stack, based on your real code, can go a long way to validating these things. Perhaps without having to implicate UI or other systems that add unnecessary complexity and/or failure modes to the test.

This level of testing can also be great for fuzzing, which is a great way to create valid inputs and/or stimulation that you didn’t think of in unit tests. When fuzzing finds failures, you may want to add new validations to your suite if it turns out you have a big hole.

When to Consider End to End Tests

Again, it would be a mistake to think that these tests have no place in a testing ecosystem. But like unit tests, it’s important to play to their strengths. Do not use them to validate internal algorithms; they’re horrible at that. They don’t give you instant API level failures near the point of failure, there’s complex logging and what not.

But consider this very normal tools example: “I am adopting these new linker flags for my main binary” — in that situation the only test that can be used is an end to end test. Probably none of the unit tests even build with those optimizations on, and if they did they would not be any good at validation anyway, the flags probably behave differently with mini-binaries. Whereas a basic end-to-end suite of “does it crash when I try these 10 essential things” is invaluable.

Likewise, performance, power, and efficiency tests are usually end to end — because the unit tests don’t tell us what we need to know.

Fuzzing may have to happen at this level, but it can be challenging to do all your fuzzing via the UI. Repro steps can get harder and harder as we go down this road as well. External failures that aren’t real problems are also more likely to crop up.

Conclusion

It’s no surprise that a blend of all of these is probably the healthiest thing. Getting to greater than 0% unit-test-coverage universally (i.e. for all classes) is a great goal, if only so that you are then ready to test whatever turns out to be needed.

Generally, people report that adding tests consistently finds useful bugs, and getting them out, while making code more testable, is good for the code base overall. As long as that continues to be the case on your team it’s probably prudent to keep investing in tests, but teams will have to consider all their options to decide how much testing to do and when.

One popular myth, that unit tests have diminishing returns, really should be banished. The thing about unit tests is that any given block of code is about as easy to test as any other. Once you have your mocks set up you can pretty much force any situation. The 100th block of code isn’t harder to reach than the 99th was and it’s just as likely to have bugs in it as any other. People who get high levels of coverage generally report things like “I was sure the last 2 blocks were gonna be a waste of time and then I looked and … the code was wrong.”

But even if returns aren’t diminishing that doesn’t mean they are free. And bugs are not created equal. So ultimately, the blend is up to you.

Read the whole story
Wilka
2288 days ago
reply
Newcastle, United Kingdom
Share this story
Delete

Introducing Nullable Reference Types in C#

3 Shares

Today we released a prototype of a C# feature called “nullable reference types“, which is intended to help you find and fix most of your null-related bugs before they blow up at runtime.

We would love for you to install the prototype and try it out on your code! (Or maybe a copy of it! 😄) Your feedback is going to help us get the feature exactly right before we officially release it.

Read on for an in-depth discussion of the design and rationale, and scroll to the end for instructions on how to get started!

The billion-dollar mistake

Tony Hoare, one of the absolute giants of computer science and recipient of the Turing Award, invented the null reference! It’s crazy these days to think that something as foundational and ubiquitous was invented, but there it is. Many years later in a talk, Sir Tony actually apologized, calling it his “billion-dollar mistake”:

I call it my billion-dollar mistake. It was the invention of the null reference in 1965. At that time, I was designing the first comprehensive type system for references in an object oriented language (ALGOL W). My goal was to ensure that all use of references should be absolutely safe, with checking performed automatically by the compiler. But I couldn’t resist the temptation to put in a null reference, simply because it was so easy to implement. This has led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage in the last forty years.

There’s general agreement that Tony is actually low-balling the cost here. How many null reference exceptions have you gotten over the years? How many of them were in production code that was already tested and shipped? And how much extra effort did it take to verify your code and chase down potential problems to avoid even more of them?

The problem is that null references are so useful. In C#, they are the default value of every reference type. What else would the default value be? What other value would a variable have, until you can decide what else to assign to it? What other value could we pave a freshly allocated array of references over with, until you get around to filling it in?

Also, sometimes null is a sensible value in and of itself. Sometimes you want to represent the fact that, say, a field doesn’t have a value. That it’s ok to pass “nothing” for a parameter. The emphasis is on sometimes, though. And herein lies another part of the problem: Languages like C# don’t let you express whether a null right here is a good idea or not.

Yet!

What can be done?

There are some programming languages, such as F#, that don’t have null references or at least push them to the periphery of the programming experience. One popular approach instead uses option types to express that a value is either None or Some(T) for a given reference type T. Any access to the T value itself is then protected behind a pattern matching operation to see if it is there: The developer is forced, in essence, to “do a null check” before they can get at the value and start dereferencing it.

But that’s not how it works in C#. And here’s the problem: We’re not going to add another kind of nulls to C#. And we’re not going to add another way of checking for those nulls before you access a value. Imagine what a dog’s breakfast that would be! If we are to do something about the problem in C#, it has to be in the context of existing nulls and existing null checks. It has to be in a way that can help you find bugs in existing code without forcing you to rewrite everything.

Step one: expressing intent

The first major problem is that C# does not let you express your intent: is this variable, parameter, field, property, result etc. supposed to be null or not? In other words, is null part of the domain, or is it to be avoided.

We want to add such expressiveness. Either:

  1. A reference is not supposed to be null. In that case it is alright to dereference it, but you should not assign null to it.
  2. A reference is welcome to be null. In that case it is alright to assign null to it, but you should not dereference it without first checking that it isn’t currently null.

Reference types today occupy an unfortunate middle ground where both null assignment and unchecked dereferencing are encouraged.

Naively, this suggests that we add two new kinds of reference types: “safely nonnullable” reference types (maybe written string!) and “safely nullable” reference types (maybe written string?) in addition to the current, unhappy reference types.

We’re not going to do that. If that’s how we went about it, you’d only get safe nullable behavior going forward, as you start adding these annotations. Any existing code would benefit not at all. I guess you could push your source code into the future by adding a Roslyn analyzer that would complain at you for every “legacy” reference type string in your code that you haven’t yet added ? or ! to. But that would lead to a sea of warnings until you’re done. And once you are, your code would look like it’s swearing at you, with punctuation? on! every? declaration!

In a certain weird way we want something that’s more intrusive in the beginning (complains about current code) and less intrusive in the long run (requires fewer changes to existing code).

This can be achieved if instead we add only one new “safe” kind of reference type, and then reinterpret existing reference types as being the other “safe” kind. More specifically, we think that the default meaning of unannotated reference types such as string should be non-nullable reference types, for a couple of reasons:

  1. We believe that it is more common to want a reference not to be null. Nullable reference types would be the rarer kind (though we don’t have good data to tell us by how much), so they are the ones that should require a new annotation.
  2. The language already has a notion of – and a syntax for – nullable value types. The analogy between the two would make the language addition conceptually easier, and linguistically simpler.
  3. It seems right that you shouldn’t burden yourself or your consumer with cumbersome null values unless you’ve actively decided that you want them. Nulls, not the absence of them, should be the thing that you explicitly have to opt in to.

Here’s what it looks like:

class Person
{
    public string FirstName;   // Not null
    public string? MiddleName; // May be null
    public string LastName;    // Not null
}

This class is now able to express the intent that everyone has a first and a last name, but only some people have a middle name.

Thus we get to the reason we call this language feature “nullable reference types”: Those are the ones that get added to the language. The nonnullable ones are already there, at least syntactically.

Step two: enforcing behavior

A consequence of this design choice is that any enforcement will add new warnings or errors to existing code!

That seems like a breaking change, and a really bad idea, until you realize that part of the purpose of this feature is to find bugs in existing code. If it can’t find new problems with old code, then it isn’t worth its salt!

So we want it to complain about your existing code. But not obnoxiously. Here’s how we are going to try to strike that balance:

  1. All enforcement of null behavior will be in the form of warnings, not errors. As always, you can choose to run with warnings as errors, but that is up to you.
  2. There’s a compiler switch to turn these new warnings on or off. You’ll only get them when you turn it on, so you can still compile your old code with no change.
  3. The warnings will recognize existing ways of checking for null, and not force you to change your code where you are already diligently doing so.
  4. There is no semantic impact of the nullability annotations, other than the warnings. They don’t affect overload resolution or runtime behavior, and generate the same IL output code. They only affect type inference insofar as it passes them through and keeps track of them in order for the right warnings to occur on the other end.
  5. There is no guaranteed null safety, even if you react to and eliminate all the warnings. There are many holes in the analysis by necessity, and also some by choice.

To that last point: Sometimes a warning is the “correct” thing to do, but would fire all the time on existing code, even when it is actually written in a null safe way. In such cases we will err on the side of convenience, not correctness. We cannot be yielding a “sea of warnings” on existing code: too many people would just turn the warnings back off and never benefit from it.

Once the annotations are in the language, it is possible that folks who want more safety and less convenience can add their own analyzers to juice up the aggresiveness of the warnings. Or maybe we add an “Extreme” mode to the compiler itself for the hardliners.

In light of these design tenets, let’s look at the specific places we will start to yield warnings when the feature is turned on.

Avoiding dereferencing of nulls

First let’s look at how we would deal with the use of the new nullable reference types.

The design goal here is that if you mark some reference types as nullable, but you are already doing a good job of checking them for null before dereferencing, then you shouldn’t get any warnings. This means that the compiler needs to recognize you doing a good job. The way it can do that is through a flow analysis of the consuming code, similar to what it currently does for definite assignment.

More specifically, for certain “tracked variables” it will keep an eye on their “null state” throughout the source code (either “not null” or “may be null“). If an assignment happens, or if a check is made, that can affect the null state in subsequent code. If the variable is dereferenced at a place in the source code where its null state is “may be null“, then a warning is given.

void M(string? ns)            // ns is nullable
{
    WriteLine(ns.Length);     // WARNING: may be null
    if (ns != null) 
    { 
        WriteLine(ns.Length); // ok, not null here 
    } 
    if (ns == null) 
    { 
        return;               // not null after this
    }                         
    WriteLine(ns.Length);     // ok, not null here
    ns = null;                // null again!
    WriteLine(ns.Length);     // WARNING: may be null
}

In the example you can see how the null state of ns is affected by checks, assignments and control flow.

Which variables should be tracked? Parameters and locals for sure. There can be more of a discussion around fields and properties in “dotted chains” like x.y.z or this.x, or even a field x where the this. is implicit. We think such fields and properties should also be tracked, so that they can be “absolved” when they have been checked for null:

void M(Person p)
{
    if (p.MiddleName != null) 
    {
        WriteLine(p.MiddleName.Length); // ok
    }
}

This is one of those places where we choose convenience over correctness: there are many ways that p.MiddleName could become null between the check and the dereference. We would be able to track only the most blatant ones:

void M(Person p)
{
    if (p.MiddleName != null)
    {
        p.ResetAllFields();             // can't detect change
        WriteLine(p.MiddleName.Length); // ok 
        
        p = GetAnotherPerson();         // that's too obvious
        WriteLine(p.MiddleName.Length); // WARNING: saw that! 
    }
}

Those are examples of false negatives: we just don’t realize you are doing something dangerous, changing the state that we are reasoning about.

Despite our best efforts, there will also be false positives: Situations where you know that something is not null, but the compiler cannot figure it out. You get an undeserved warning, and you just want to shut it up.

We’re thinking of adding an operator for that, to say that you know better:

void M(Person p)
{
    WriteLine(p.MiddleName.Length);  // WARNING: may be null
    WriteLine(p.MiddleName!.Length); // ok, you know best!
}

The trailing ! on an expression tells the compiler that, despite what it thinks, it shouldn’t worry about that expression being null.

Avoiding nulls

So far, the warnings were about protecting nulls in nullable references from being dereferenced. The other side of the coin is to avoid having nulls at all in the nonnullable references.

There are a couple of ways null values can come into existence, and most of them are worth warning about, whereas a couple of them would cause another “sea of warnings” that is better to avoid:

  1. Assigning or passing null to a non-nullable reference type. That is pretty egregious, right? As a general rule we should warn on that (though there are surprising counterarguments to some cases, still under debate).
  2. Assigning or passing a nullable reference type to a nonnullable one. That’s almost the same as 1, except you don’t know that the value is null – you only suspect it. But that’s good enough for a warning.
  3. A default expression of a nonnullable reference type. again, that is similar to 1, and should yield a warning.
  4. Creating an array with a nonnullable element type, as in new string[10]. Clearly there are nulls being made here – lots of them! But a warning here would be very harsh. Lots of existing code would need to be changed – a large percentage of the worlds existing array creations! Also, there isn’t a really good work around. This seems like one we should just let go.
  5. Using the default constructor of a struct that has a field of nonnullable reference type. This one is sneaky, since the default constructor (which zeroes out the struct) can even be implicitly used in many places. Probably better not to warn, or else many existing struct types would be rendered useless.
  6. Leaving a nonnullable field of a newly constructed object null after construction. This we can do something about! Let’s check to see that every constructor assigns to every field whose type is nonnullable, or else yield a warning.

Here are examples of all of the above:

void M(Person p)
{
    p.FirstName = null;          // 1 WARNING: it's null
    p.LastName = p.MiddleName;   // 2 WARNING: may be null
    string s = default(string);  // 3 WARNING: it's null
    string[] a = new string[10]; // 4 ok: too common
}

struct PersonHandle
{
    public Person person;        // 5 ok: too common
}

class Person
{
    public string FirstName;     // 6 WARNING: uninitialized
    public string? MiddleName; 
    public string LastName;      // 6 WARNING: uninitialized
}

Once again, there will be cases where you know better than the compiler that either a) that thing being assigned isn’t actually null, or b) it is null but it doesn’t actually matter right here. And again you can use the ! operator to tell the compiler who’s boss:

void M(Person p)
{
    p.FirstName = null!;        // ok, you asked for it!
    p.LastName = p.MiddleName!; // ok, you handle it!
}

A day in the life of a null hunter

When you turn the feature on for existing code, everything will be nonnullable by default. That’s probably not a bad default, as we’ve mentioned, but there will likely be places where you should add some ?s.

Luckily, the warnings are going to help you find those places. In the beginning, almost every warning is going to be of the “avoid nulls” kind. All these warnings represent a place where either:

  1. you are putting a null where it doesn’t belong, and you should fix it – you just found a bug! – or
  2. the nonnullable variable involved should actually be changed to be nullable, and you should fix that.

Of course as you start adding ? to declarations that should be allowed to be null, you will start seeing a different kind of warnings, where other parts of your existing code are not written to respect that nullable intent, and do not properly check for nulls before dereferencing. That nullable intent was probably always there but was inexpressible in the code before.

So this is a pretty nice story, as long as you are just working with your own source code. The warnings drive quality and confidence through your source base, and when you’re done, your code is in a much better state.

But of course you’ll be depending on libraries. Those libraries are unlikely to add nullable annotations at exactly the same time as you. If they do so before you turn the feature on, then great: once you turn it on you will start getting useful warnings from their annotations as well as from your own.

If they add anotations after you, however, then the situation is more annoying. Before they do, you will “wrongly” interpret some of their inputs and outputs as non-null. You’ll get warnings you didn’t “deserve”, and miss warnings you should have had. You may have to use ! in a few places, because you really do know better.

After the library owners get around to adding ?s to their signatures, updating to their new version may “break” you in the sense that you now get new and different warnings from before – though at least they’ll be the right warnings this time. It’ll be worth fixing them, and you may also remove some of those !s you temporarily added before.

We spent a large amount of time thinking about mechanisms that could lessen the “blow” of this situation. But at the end of the day we think it’s probably not worth it. We base this in part on the experience from TypeScript, which added a similar feature recently. It shows that in practice, those inconveniences are quite manageable, and in no way inhibitive to adoption. They are certainly not worth the weight of a lot of extra “mechanism” to bridge you over in the interim. The right thing to do if an API you use has not added ?s in the right places is to push its owners to get it done, or even contribute the ?s yourself.

Become a null hunter today!

Please install the prototype and try it out in VS!

Go to github.com/dotnet/csharplang/wiki/Nullable-Reference-Types-Preview for instructions on how to install and give feedback, as well as a list of known issues and frequently asked questions.

Like all other C# language features, nullable reference types are being designed in the open here: github.com/dotnet/csharplang.
We look forward to walking the last nullable mile with you, and getting to a well-tuned, gentle and useful null-chasing feature with your help!

Thank you, and happy hunting!

Mads Torgersen, Lead Designer of C#

Read the whole story
Wilka
2498 days ago
reply
Newcastle, United Kingdom
marklam
2497 days ago
That looks like a very good approach.
alvinashcraft
2498 days ago
reply
West Grove, PA
Share this story
Delete
Next Page of Stories