Insight, feedback, and shared experience from the Indelible team.

We have witnessed the in-source/out-source/in-source process for several Security Operations Centers (SOC) over the years. One of the most important lessons learned about in-sourcing comes from the staffing approach when standing up this capability.

Starting with junior resources

Several organizations believe they can save money by hiring L1’s, interns, or low level analysts at the start. While this does seem to make some sense because it allows you to ‘get in the game’ with a reduced up-front investment (right after you spent millions on putting all the pieces together!), this may not be as intuitive as one might think. In one specific case, a customer had hired 7 L1’s (juniors) working in a Security Operations Center without any senior guidance. The plan was to [eventually] hire a lead to manage the team and mature the program, but what may have eluded this particular team is that the lead they are seeking would be taking on 8 distinct responsibilities walking in the door – day 1. (YIKES! reference: “Giving up my life for my job!“)

The results at the end of this path are predictably chaotic and painful, rife with disappointment, failed objectives, and frustration. Placing the entire burden of: training juniors, implementing improvements, being the escalation point for incidents, and collaborating with other teams simply isn’t going to work, and few people will sign up for this (reference: “Why can’t I find anyone to take my awesome SOC lead job!?’)

Why it hurts…

We all hate to hear someone tell us “you have to crawl before you can walk or run”, but expectations need to at least be make sense, or you are going to hurt yourself. Let’s look at some lessons learned:

  1. It typically takes 6 to 12 months for an L1s to gain enough experience to develop the mindset and independence to operate as a functional part of the SOC. This is especially true when there are no documented procedures. (But this totally never happens.. <.<)
  2. Junior staff need an escalation point for their questions and challenges. If senior resources don’t have bandwidth due to conflicting priorities and workload, the process breaks down, morale falls, and the team loses impetus. Juniors may leave due to lack of development opportunity.
  3. The senior resource can experience burn out with long hours, slow progress, frustrated leadership, and an inability to move things forward.
  4. You may lose traction and progress when you get to your first SOC anniversary, have no strong result to show, and half the team leaves (reference: “Getting and keeping my resume up to date!”)

In this approach to staffing security operations, after 18 to 24 months a few of the junior resources will become skilled while others will continue to require more guidance, ultimately hurting team performance and limiting the overall program evolution. Again, this is because there is so little guidance and leadership direction in the beginning that the program foundation to build on simply isn’t there! At worst, someone may just call a ‘do-over’.

What seemed to make perfect sense sets you back, and winds up being more costly than starting with expensive senior resources.

Cohorts – a better approach?

Below is just an example approach to solving the described challenges . While your mileage may vary, the idea is to set expectations around a long term strategy that both fulfills the enterprises need and sets reasonable expectations for staff.

Example of what a successful 24 month timeline to in-source Security Operations looks like.

For example, an organization can consider the following as an approach (modified to meet their specific needs).

  • Bring in at least two senior resources at the start and two junior resources with some automation tooling. Ideally, <shameless_plug>you get help from people who have been down this road</shameless_plug>. Pair a senior resource with a junior resource, forming two teams
  • Have the teams share responsibility for assessing current state while managing day to day issues. The two junior resources can help while they learn.
  • Develop a road map of what is needed to make more junior resources successful and efficient when they start.
  • Implement iteratively based on priority and then begin on-boarding more resources.
  • Conduct retrospectives and feedback sessions with the junior resources during the whole process.
  • During this time frame, the initial two junior resources can provide guidance to new associates being on-boarded while the senior resources continue heavy lifting.

Key takeaways

The key here: set reasonable expectations with leadership for program success. An organization fairs far better if it retains senior resources versus burning them out and being forced to find new ones.

Begin with Critical Services, onboard in cohorts, then steadily mature the team and capabilities. This is how you accelerate progress versus investing with little result to show. Quick and frequent win stories make life better for everyone!

Indelible can help you rapidly develop a custom plan that fits your organization for maximum success.

Cybersecurity Market Gap: Buying to a vision, or buying to buy…

More and more I’m witnessing conversations with customers that are asking: “OK – what did I get?” They are spending millions on “solutions”, but cannot quantify any sense of “feeling more secure”. For me, this precipitates some questions:

  • Does the outcome of the purchase or product match expectations? Your boss’s expectations?
  • What do you think went wrong? What went right?
  • How can you recover a fumble in the most efficient, expedient way possible?
  • How do you prove and show value? Did the “solution” solve a problem?

A Sobering Story

I wanted to share a story that explains the kind of problems we want to help our customers solve. I met with executives for a large firm last year that shared a pretty shocking situation with me (in confidence, so no names or hints):

  1. We decided to build a SOC and spent $15m (on a very sexy facility with screens, teaming rooms, etc.)
  2. The guy who led the build is no longer with the company
  3. We are unable to demonstrate any real value to the board because we are still unsure what to even put in the SIEM
  4. The 5 interns out on the floor today are smart, but we simply cannot produce results

How did they get there?

When we start the “SOC build” discussion, we ask a few key questions (and this is a tiny subset that should demonstrate obvious value):

  1. What is your vision for what your SOC will do?
    1. This helps set the stage for services, that helps define the supporting technology to go into the SOC
    2. Also, this is where you may start making decisions about what you will build, what you will buy, and where you might buy today, build tomorrow as an interim solution
    3. One size does not fit all, but understanding what a SOC can do (https://www.mitre.org/publications/all/ten-strategies-of-a-world-class-cybersecurity-operations-center), then matching a vision to what capabilities are desired is a good first step (e.g. the SOC may or may  not do vulnerability management, for instance. One client we came across did not do Incident Response out of their SOC)
  2. What kind of coverage do you want?
    1. There should be a resourcing plan based on 5×9, 24×7, or some other desired scenario commensurate with the investment appetite (I have witnessed a complete miss on exploring the requirements for this, and see a failed expectation that 5 interns could deliver anything of value)
    2. This takes some up-front thought, especially since you are not going to start where you want to wind up; it’s a journey
  3. How will success be measured?
    1. Defining desired capabilities gives us some sense of when we are getting close
    2. Effective metrics give us a success story to share at all levels (operations, management, executives) http://seanmason.com/2014/07/14/incident-response-metrics/

Conclusion

When you read “Programmatic Security or Transformation” on this site, the above approach is what we mean. “Measurable” is another key concept. You cannot manage or convey any sense of value on something you cannot measure.

If any of this hits close to home, please contact us as soon as you can, and let’s start a discussion about how we can solve these problems together.

Carric Dooley

carric@indelible.global

Clearly, ‘geeks’ and executives speak a very different language. I remember witnessing some executive conversations early in my career that were utterly confounding! We were speaking past each other, and it wasn’t very helpful. So, I wanted to explain the key service focus for Indelible in a way that I hope makes mores sense to those that haven’t necessarily lived Cybersecurity for the past 20 years.

Enterprise Risk Management Program

One of the challenges with IT security is the perception that it’s “an IT problem”. Looking at how companies manage risk at the corporate level, it’s clear someone with the technical expertise and experience needs to be in the conversation (in the same way we would look to the CFO to help understand Market Risk like currency exchange, liquidity, credit spread, etc), but I think there is a frequent failed assumption in that the CISO truly understands the business. In some cases, he/she may not have a security background, and remains “out of sorts”.

  • Is your information security organization and strategy aligned to your business?
  • Are you compliant where you need to be?
  • Can you explain this to the board? Can you prove it in a way that makes sense to them? [the litmus: how restrictive is your security budget if typical spend is 6% – 8% of the IT budget?)
  • Do you feel it is generally understood what IT systems support business critical processes? Are they adequately protected? How do you know?

Maybe you need a vCISO arrangement that yields both cost effectiveness and world-class advice – or you want consultative support for your CISO. Consultants have typically worked on many environments, so a lot of their value comes from that perspective.

Industrial Control Systems

There is a 20+ year chasm between OT and IT in most cases.. If you have fabs, plants, manage water or electricity, refine something – you have OT environments. These environments tend to have different requirements from a typical IT environment, and they are what represents how the company makes money. If we break them – everyone gest thrown out of the pool.

  • Is there true separation between IT and OT (in a world where  the term “air gap” seems to have taken on a different meaning)?
  • Does anyone look at the site HazOp manuals and consider how catastrophic incidents could manifest in the real world through Cyber? (e.g. computers control mixing, temperature, pressure, and process, and these computers become more connected everyday. How might a hacker release a cloud of toxic gas that could harm employees or the neighbouring town?)

Incident Response (a.k.a  “Detect and Respond”)

I have spent a lot of time just pondering security, so at one point, when considering the question “why we do we care about security?”, it dawned on me that it all comes down the incident. We want to either avoid/preventthem, or we want to detect and respond such that an incident has no serious impact on our business. Most of us in the industry feel you can only invest so many dollars in avoid before you have to accept that bad things really do happen to good people. So, what is IR, and are we doing it? If youdon’t have a dedicated security team, than it’s a definite NO. If you do, and you don’t have something like a SOC with someone researching threat landscape, and handling and reporting on inbound attacks, then it’s a “probably not”.

  • What is IR? It’s a capability that, put simply, is about the previously mentioned detect and respond.
  • Do you have a hunt team or capably that you have either built or contracted? (keep in mind that “hunting” is not using known IoC’s to find evil in your environment. It’s applying what we know about what attackers do to the things we see to identify infected endpoints. E.g.  A known Windows 7 computer is connecting to the proxy with a browser calling itself “Chameleon Web Browser  v1.7” – probably not..).
  • Do you have a threat intelligence capability? Sometimes we take someone who seems smart and says they want to do this for you, but can they really deliver? It’s a specific skill set… How many of us work on our own car, or fix our own computer – or try to do our own electrical and plumbing?? Why would we assume the outcome would be good if the person delivering doesn’t have experience?
  • Is there someone in charge of ensuring lessons learned post-incident are baked into tools and process?
  • Do you have people that know how to pick apart malware, or do deep host/network forensics (disk, memory, and protocol analysis)?

My hope is that this explanation makes the technical description of these services more accessible to a wider audience.

If we are concerned about the grid being destroyed by coronal mass ejection (CME) electro-magnetic pulse (EMP), we need to pay attention to “the cyber threat” as well.

I recently presented to a group of vendors in the ICS space and my key point was: operational technologies (OT) is where cyber meets physical. I’ll explain:

In the Industrial Control System (ICS) verticals (power, water, manufacturing, mining, oil & gas) historically human beings were used (“take ‘at shovel, and put ‘er to some good use!”) in a process to produce gold, oil, a car – whatever. Then we started adding mechanical components like engines to speed things up (you might have heard the story of John Henry going head-to-head against a steam drill – and winning… but losing).

Today – we use robots to put bolts in holes, and we use computers to control these robots and monitor processes like: mixing chemicals, measuring temperature, pressure, vibration, etc, and these computers often run WINDOWS!

In a practical example, consider the origin of the Stuxnet malware. It was first discovered as the root cause for the Iranian nuclear program “setbacks”. Check out the movie Zero Days if you want to learn more, but essentially, centrifuges that spin too slow or too fast tend to blow up, which is bad for soft squishy stuff in immediate proximity – like humans.

What do you think happens if chemicals are mixed wrong (have you ever mixed ammonia and bleach in the bathroom and wondered why you started feeling funny?), or the vibration/pressure sensor is either disabled or reports that “everything is fine” right up until it explodes? How happy were the 800k people who lost power in the Ukraine about 2 years ago because they got infected? I bet there was at least SOME unpleasant impact beyond  a few people having to huddle around a candle until the power came back on.

Additionally, these systems tend to be fragile. When they are being assessed, you can’t even do a basic discovery scan on them because you might cause a device to burp when it’s say, pouring molten metal into a form. This is the equivalent of not being able to take something through a damp room to test if it is water proof.

For numerous reasons, this is the prevalent state for many OT environments out there, not just our grid and critical infrastructure. Shockingly, the healthcare industry is often way behind the curve on cyber security. This is yet another example of where the potential impact in an incident will be loss of life.

The idea is not to incite panic, but we likely won’t seek change for things to which we are ignorant.

 

Story of John Henry

https://en.m.wikipedia.org/wiki/John_Henry_(folklore)

 

Movie: Zero Days

http://www.imdb.com/title/tt5446858/

Plan for Crisis

Pre-incident planning is the proverbial “ounce of prevention is worth a pound of cure”. If we look at the timeline for Puerto Rico, they were hit on Sep 20. As of Nov 11 I’m still reading stories of how they are “just starting to get more services online” (there was story dated Oct 3 about getting some customers back online, then ATT sent a COW (“cellular on wings/wheels”), which says “70% of the population is back online”. 70%? After 47 days?

What if the COWs had already been down there? What about the day old story of Google Loon?

I’m not criticizing, but what was the impact of WEEKS of no modern infrastructure? The death toll seems to be hard to pin down, but it started at 16, and has jumped to at least 45, but apparently none of the data is dependable. How many people died from heat exhaustion, thirst, hunger, exposure because they didn’t have access to water, food, electricity to power climate control? Or because they could not call for help?

We seem to be learning a few lessons, but who wants to wait WEEKS after a crisis to have modern necessities?

An ounce of prevention…