“Ship faster.” “Design for the cloud first.” In modern software development, imperatives like these mean that security and operations often take a back seat to feature delivery. DevSecOps is the trendy buzzword in this arena, but implementing DevSecOps requires the breaking down of every organizational silo. When a business’ overall security posture is in a critical state, this isn’t always an option.
Emphasizing development cycles while neglecting security and operations can leave organizations vulnerable to attack. A single publicly disclosed compromise shatters customer trust and represents an existential, critical threat to the business. Organizations that scale their operations and security in step with their software engineering will find themselves far better positioned to scale and maintain customers—and their trust—long term.
What Is SecOps?
SecOps is shorthand for security operations: implementing and maintaining various security systems and processes, as well as the consistent monitoring and maintaining of these systems to ensure an effective overall security posture against potential attacks. Although it can be tempting to simply reduce SecOps to “operations with a security focus,” this ignores the potential value to be gained from improving both operations and security as separate parts of a larger whole. Improving operational culture has the implicit benefit of creating a better security posture, while operationalizing security integrates those objectives into the day-to-day work of operations teams.
When Operational Culture Goes Bad
What does bad operational culture look like? It can manifest in several ways: unaddressed security vulnerabilities (usually a result of poorly-crafted procedures), poor visibility into production environments, or manual security processes that create bottlenecks as the organization grows. Operational security can also be hindered by a lack of shared knowledge across teams, and limited use of tools like contextual searchable logs or dashboards. A classic example is alert fatigue: Do vulnerability alerts get ignored? Are logs stored and never seen again? Those are key indicators of a bad operational culture.
Focusing on operational culture can benefit every team involved in software engineering. Having effective operational processes, tooling, and systems is a foundational component not only in maintaining the security and stability of existing systems, but also in delivering new, working products and features quickly.
Why Not DevSecOps?
One of the immediate questions that comes to mind at this point is: “What about DevSecOps?” It’s a valid question; hot on the heels of the DevOps movement has been the security-focused philosophy of shifting security objectives left via DevSecOps. It is by no means a simple proposition, however, and it can be a tall ask for teams that are already changing their software development culture.
For most organizations, the transition to a DevOps-focused culture is a gradual process, with some growing pains along the way. Attempting to implement DevSecOps without the learning and culture building that comes from establishing a strong DevOps foundation is a recipe for failure. Most organizations in a pre-DevOps state need to do the work of implementing basic DevOps workflows; there aren’t a lot of shortcuts for making cultural changes. It’s these cultural changes that make adopting DevSecOps that much easier, or even possible. Automating and integrating security objectives into software development has the greatest impact when done within a healthy DevOps environment.
Frankly, there are plenty of low-hanging fruit fixes that engineering organizations can undertake that can provide quick wins to improve security and operations which don’t require implementing DevSecOps. A quick example: do all of your systems log to a centralized, searchable interface? Fix that and you’ve improved your operations and security posture, without needing to make major organizational changes. DevSecOps is still an ideal to strive for, but engineering teams should pull the simpler operations and security levers first.
Scalability is obviously a must-have in certain contexts, such as WAAP (web application and API protection). Clearly, organizations need their Web Application Firewall (WAF), protection against DDoS attacks, and other web security technologies to autoscale in response to attacks.
But all too often, software engineering organizations neglect to spend enough time focusing on their operational processes, tooling, and knowledge. It’s an understandable oversight; startups that are trying to get an MVP in front of customers and investors are focused on creating polished, working features. Unfortunately, small operational annoyances for small teams can become major scaling bottlenecks down the road.
Lean teams, like startups, often don’t have a dedicated infrastructure or DevOps engineer as part of the founding team. Software engineers need to build instrumentation interfaces into their applications early in the lifecycle of the product: they may not be used immediately, but they will make scaling crucial operations tooling—like monitoring and observability systems—so much easier as the infrastructure grows. Frameworks like OpenTelemetry can provide standardized interfaces for developers to instrument their applications.
Another area of focus for scaling operations is knowledge sharing. Teams should build shared knowledge quickly. It’s also important to build this knowledge in a way that it can be organized, searched, and left for others to read and learn asynchronously. It’s a common experience to have worked in an environment where one highly tenured engineer knows all the ins and outs of the system, and all the little tricks and hacks to fix any errors or issues. Inevitably, a problem or issue would arise and the aforementioned engineer would be on vacation, sick, or, worse yet, have left the company, and the remaining engineers would be in a mad scramble to piece together the knowledge needed to restore functionality. Using shared knowledge tools like Confluence or Notion, as well as a cultural focus on effective documentation, will avoid these scenarios.
Outages and Disaster Recovery
During an outage, spending time trying to identify exactly which alerting or monitoring system has useful data about what is happening represents crucial minutes lost to resolution, and, ultimately, a poor user experience. The exercise of establishing a limited number of approved tools and dashboards will help deliver the single-pane-of-glass experience, ensuring that everyone is operating with a shared, canonical understanding of the issue. This is especially important for web security, since real-time traffic visibility and control is vital during attacks.
On this theme, another major consideration should be putting in place some form of disaster recovery. At the very least, a small team of software engineers should perform a basic thought experiment: “If the entire application went down or was compromised, what is the bare minimum needed to restore it to working functionality?” Unfortunately, often very little thought goes to even basic disaster recovery operations until an actual disaster happens, and at that point it’s usually too late to stage an effective recovery that provides business continuity. Crafting good operational documentation, as well as implementing immutability in development and infrastructure, can help make disaster recovery operations easier. Ideally, engineers are running “gameday” recovery scenarios early and often in the application lifecycle.
The magic of SecOps is that security and operations both benefit from a focus on operational excellence. Operations teams improve their response times and accuracy in dealing with outages and potential security events, and security teams can maintain a strong security posture while still maintaining a focus on providing a good user experience.
Security needs to be a careful balance of quantifying risk and convenience. In reality, this is an exercise that continues in perpetuity; the modern software engineering landscape is always changing, and bringing in new security threats and operational issues that must be addressed. Heavy-handed security policies will significantly reduce risks, but at what cost? If developers feel constrained by security rules, they will find shortcuts and sidestep prescribed processes, or, in a worst-case scenario, even seek a new role outside the organization. Conversely, tailoring security entirely for maximum convenience will likely result in a poor security posture. The best security teams are actively and consistently evaluating the risk/convenience trade off in their environments.
Security also needs to be able to scale with the development teams and the applications that they support. That means focusing on operational scaling; automation and knowledge sharing are critical parts of this strategy. Manual security processes will be a bottleneck at some point if the process stays manual while the engineering team inevitably grows. Good security leadership will position their teams to provide value continuously at different stages of company and infrastructure maturity. This means a willingness to scale and iterate as the situation demands it: targeting the right skill sets when hiring, maintaining a superior developer experience, and having a solid operational foundation in place are all critical elements. Bad security leadership will only wonder how they can scale up their team’s ability to say “no” more often.
From a technical perspective, the primary theme for a security team that wants to scale operationally should be centralization. Modern security teams are often thinly staffed relative to the development teams they are expected to support, so any tool that can provide management leverage at scale should be employed.
- Identity: Gone are the days of separate user accounts on every server and node. Developer identities and access should be managed from a centralized interface. If an identity is compromised, security should be able to revoke access to every system to which it has access from a single tool. Try FoxPass and Okta to get started.
- Observability/Monitoring: If operations, development, and security teams all use different tools to provide visibility into the health and status of the infrastructure, converging on a clear and consistent picture of an operational or security issue will be that much harder. Standardizing on something like Prometheus/Grafana permits centralization of monitoring.
- Knowledge: Depending on implicit or tribal knowledge in engineering is a short road to operational pain and suffering. Any team with internal customers and/or infrastructure responsibilities should maintain an up-to-date and well-organized documentation hierarchy using software like Confluence or Notion.
- Communication: Converging on specific topics or issues, like an outage or compromise, depends on a unified and timestamped communication medium with well-defined processes and rules for event management. If there are two or more separate communication tools in use, incident response will be more chaotic and less effective. Tools like Slack are instrumental in this regard.
Not Just Another Ops
Security Operations isn’t just another trendy concept in the “—Ops” pile. There are lots of marketing angles to define SecOps, such as a shared focus on operational excellence that benefits both security and operations. While DevSecOps is likely the ultimate goal for most software-based organizations, SecOps can be a significant stop on that journey.