Skip to content

RESOLVED: Blog offline for maintenance

Announcements

  • 0 Votes
    1 Posts
    219 Views

    Once in every while, you encounter a repetitive issue that no matter what you try to do to resolve it, the problem manifests itself over and over again - sometimes, even on a daily basis. Much of how the issue is remediated really depends on the person assigned to the task.

    You might be puzzled at why I’d write about something like this, but it’s a situation I see constantly - one I like to refer to as “over thinker syndrome”. What do I mean by this ? Here’s the theory. Some people are very analytical when it comes to problem solving. Couple that with technical knowledge and you could land up with a situation where something relatively simple gets blown out of all proportion because the scenario played out in the mind is often much further from reality than you’d expect. And the technical reasoning is usually always to blame. Sometime around 2007, a colleague noticed that the Exchange Server (2003 wouldn’t you know) would suddenly reboot half way through a backup job. Rightly so, he wanted to investigate and asked me if this would be ok. Anyone with an ounce of experience knows that functional backups are critical in the event of a disaster - none more so than I - obviously, I have the go ahead. One bright spark in my team suggested a reboot of the server, which immediately prompted the response

    “…it’s rebooting itself every day, so how will that help ?”

    The investigation

    Joking aside, we’ve all heard the “have you rebooted” question touted at some point during helpdesk discussions, but this one was different. A system rebooting itself is usually symptomatic of an underlying issue somewhere, and my team member was ready for the task ahead. Stepping up to the plate, he asked if it was ok to install some monitoring software on the server. Usually, installing additional software components in a production server without testing first is a non-starter, but seeing as we needed to get this resolved as quickly as possible to reinstate the nightly backup (which incidentally hasn’t run successfully for 3 days by now), I provided approval to proceed without question. There’s a leap of faith at this point, as you could cause more problems than that you actually set out to resolve in the first place, but, as with anything related to information technology, someone’s you have to accept an element of risk. The software itself was actually for the RAID controller and motherboard  The assigned technician had already decided it was related to something along the lines of a faulty RAM module, or perhaps an issue with the controller itself. My thoughts leaned elsewhere already at this point - is the server reboots itself at exactly the same time every day then there is an established pattern which should be investigated first. It’s a logical approach, but it’s a common trait for technical support staff to sometimes think outside of the box - or in this case, outside of the building. Not wanting to push my opinion, or trample on anyone’s toes, I decided to remain quiet and see just how far this would go before intervention was required.

    In this case, not very far. The following morning after another unannounced nightly reboot, the error “the previous shutdown at [insert time and date here] was unexpected” showed up in the event log. No real surprises there, and once again, exactly the same time as the previous night. I asked my technician for an update, and he informed me that he believed that the memory was faulty and somehow causing the server to blue screen and reboot. That was actually a reasonable response and so I commended him on his research and findings, but also reminded him to perform a manual backup so that we at least had something to revert to in the event of a failure. Later that afternoon, the same tech approached me and said that he had ordered some replacement memory, and wanted to arrange downtime to fit it. Trying to keep a poker face and remain passive, I agreed and the memory was replaced the same evening around 10pm. At 2am the following morning, kaboom ! - the server rebooted itself again. Not wanting to admit defeat, our courageous tech suggested that the problem could be due to the system overheating. Another fair point, but not realistic as you’d see this in event log as a thermal shutdown. I willingly entertained this, and allowed investigations into the CPU temperature to begin - after another manual backup. Unsurprisingly, the temperature data returned no smoking gun, so that was abandoned. The next port of call was to reapply the service pack. Now, I’ll admit that this used to fix a multitude of issues under Windows NT Server (particularly Service Pack 4) but not under Windows 2003. I declined this for obvious reasons - if you reapply the service pack, you run the risk of overwriting key DLL files that could (and often will) render Exchange inoperable. Not being prepared to introduce an unprecedented risk into what was already becoming something of a showcase, I suggested that we look elsewhere.

    The exasperation

    The final (and honestly more realistic suggestion) was to enable verbose logging in Exchange. This is actually a good idea, but only if you suspect that the information store could be the issue. Given the evidence, I wasn’t convinced. If there was corruption in the store, or on any of the disks, this would show itself randomly through the day and wouldn’t wait until 2am in the morning. Not wanting to come across as condescending, I agreed, but at the same time, set a deadline to escalation. I wasn’t overly concerned about the backups as these were being completed manually each day whilst the investigations were taking place. Neither was I concerned at what could be seen at this point as wasting someone’s time when you think you may have the answer to what now seemed to be an impossible problem. This is where experience will eclipse any formal qualifications hands down. Those with university degrees may scoff at this, but those with substantially analytical thinking patterns seem to avoid logic like the plague and go off on a wild tangent looking for a dramatically technical explanation and solution to a problem when it’s much simpler than you’d expect. Hence the title of this article - Avoid the “bulldozer to find a china cup” scenario. After witnessing another pained expression on the face of my now exasperated and exhausted tech, I said “let’s get a coffee”. In agreement, he followed me to the kitchen and then asked me what I thought the problem could be. I said that if he wanted my advice, it would be to step back and look at this problem from a logical angle rather than technical. The confused look I received was priceless - the guy must have really though I’d lost the plot. After what seemed like an eternity (although in reality only a few seconds) he asked me what I meant by this. “Come with me”, I said. Finishing his coffee, he diligently followed me to the server room. Once inside, I asked him to show me the Exchange Server. Puzzled, he correctly pointed out the exact machine. I then asked him to trace the power cables and tell me where they went.

    As with most server rooms, locating and identifying cables can be a bit of a challenge after equipment has been added and removed, so this took a little longer than we expected. Eventually, the tech traced the cables back to

    …an old looking UPS that had a red light illuminated at the front like it had been a prop in a Terminator film.

    The realisation

    Suddenly, the real cause of this issue dawned on the tech like a morning sunrise over the Serengeti. The UPS that the Exchange Server was unexpectedly connected to had a faulty battery. The UPS was conducting a self test at 2am each morning, and because the bypass test failed owing to the burnt battery, the connected server lost power and started back up after the offending equipment left bypass mode and went online.

    Where is this going you might ask ?  Here’s the moral of this (particular, and many others like it) story

    Just because a problem involves technology, it doesn’t mean that the answer has to be a complex technical one Logic and common sense has a part to play in all of our lives. Sometimes, it makes more sense just to step back, take a breath, and see something for what it really is before deciding to commit It’s easy to allow technical expertise to cloud your judgement - don’t fall into the trap of using a sledgehammer to break an egg You cannot buy experience - it’s earned, gained, and leaves an indelible mark

    Let’s hear your views. Did you ever come across a situation where no matter what you tried, nothing worked ? Did the solution turn out to be much simpler than you’d have ever thought ?

  • 0 Votes
    1 Posts
    342 Views

    expert.webp
    One thing I’ve seen a lot of over my career is the “expert” myth being touted on LinkedIn and Twitter. Originating from psychologist K. Anders Ericsson who studied the way people become experts in their fields, and then discussed by Malcolm Gladwell in the book, “Outliers“, “to become an expert it takes 10,000 hours (or approximately 10 years) of deliberate practice”. This paradigm (if you can indeed call it that) has been adopted by several so called “experts” - mostly those within the Information Security and GDPR fields. This article isn’t about GDPR (for once), but mostly those who consider themselves “experts” by virtue of the acronym. Prior to it’s implementation, nobody should have proclaimed themselves a GDPR “expert”. You cannot be an expert in something that wasn’t actually legally binding until May 25 2018, nor can you have sufficient time invested to be an expert since inception in my view. GDPR is a vast universe, and you can’t claim to know all of it.

    Consultant ? Possibly, yes. Expert ? No.

    The associated sales campaign isn’t much better, and can be aligned to the children’s book “Chicken Licken”. For those unfamiliar with this concept, here is a walkthrough. I’m sure you’ll understand why I choose a children’s story in this case, as it seems to fit the bill very well. What I’ve seen over the last 12 months had been nothing short of amazing - but not in the sense of outstanding. I could align GDPR here to the PPI claims furore - for anyone unfamiliar with what this “uprising” is, here’s a synopsis.

    The “expert” fallacy

    Payment Protection Insurance (PPI) is the insurance sold alongside credit cards, loans and other finance agreements to ensure payments are made if the borrower is unable to make them due to sickness or unemployment. The PPI scandal has its roots set back as far as 1998, although compensatory payments did not officially start until 2011 once the review and court appeal process was completed. Since the deadline for PPI claims has been announced as August 2019, the campaign has become intensively aggressive, with, it would seem, thousands of PPI “experts”. Again, I would question the authenticity of such a title. It seems that everyone is doing it, therefore, it must be easy to attain (a bit like the CISSP then). I witnessed the same shark pool of so called “experts” before, back in the day when Y2K was the latest buzzword on everyone’s lips. Years of aggressive selling campaigns and similarly, years of FUD (Fear, Uncertainty, Doubt - more effectively known as complete bulls…) caused an unprecedented spike that allowed companies and consultants (several of whom had never been heard of before) to suddenly appear out of the woodwork and assume the identity of “experts” in this field. In reality, it’s not possible to be a subject matter expert in a particular field or niche market unless you have extensive experience. If you compare a weapons expert to a GDPR “expert”, you’ll see just how weak this paradigm actually is. A weapons expert will have years of knowledge in a field, and could probably tell you which gun discharged a bullet just by looking at the expended shell casing. I very much doubt a self styled GDPR expert can tell you what will happen in the event of an unknown scenario around the framework and the specific legal rights (in terms of the individual who the data belongs to) and implications for the institution affected. How can they when nobody has even been exposed to such a scenario before ? This makes a GDPR expert in my view about as plausible as a Brexit expert specialising in Article 50.

    What defines an expert ?

    The focal point here is in the comparison. A weapons expert can be given a gun and a sample of shell casings, then asked to determine if the suspected weapon actually fired the supplied ammunition or not. Using a process of proven identification techniques, the expert can then determine if the gun provided is indeed the origin. This information is derived by using established identity techniques from the indentations and markings in the shell casing created by the gun barrel from which the bullet was expelled, velocity, angle, and speed measurements obtained from firing the weapon. The impact of the bullet and exit damage is also analysed to determine a match based on material and critical evidence. Given the knowledge and experience level required to produce such results, how long do you think it took to reach this unrivalled plateau ? An expert isn’t solely based on knowledge. It’s not solely based on experience either. In fact, it’s a deep mixture of both. Deep in the sense of the subject matter comprehension, and how to execute that same understanding along with real life experience to obtain the optimum result. Here’s an example   An information technology expert should be able to

    Identify and eliminate potential bottlenecks Address security concerns, Design high availability Factor in extensible scalability Consider risk to adjacent and disparate technology and conduct analysis Ensure that any design proposal meets both the current criteria and beyond Understand the business need for technology and be able to support it

    If I leveraged external consultancy for a project, I’d expect all of the above and probably more from anyone who labels themselves as an expert - or for that fact, an architect. Sadly, I’ve been disappointed on numerous occasions throughout my career where it became evident very quickly that the so called expert (who I hasten to add is earning more an hour than I do in a day in most cases) hired for his “expertise and superior knowledge” in fact appears to know far less than I do about the same topic.

    How long does it really take to become an expert ?

    I’ve been in the information technology and security field since I was 16. I’m now 47, meaning 31 years experience (well, 31 as this year isn’t over yet). If you consider that experience is acquired during an 8 hour day, and used the following equation to determine the amount of years needed to reach 10,000 hours

    10000 / 8 / 365 = 3.4246575342 - for the sake of simple mathematics, let’s say 3.5 years.

    However, in the initial calculation, it’s 10 years (using the basis of 90 minutes invested per day) - making the expert title when aligned to GDPR even more unrealistic. As the directive was adopted on the 27 April 2016, the elapsed time period isn’t even enough to carry the first figure cited at 3.5 years, irrespective of the second. The reality here is that no amount of time invested in anything is going to make your an expert if you do not possess the prerequisite skills and a thorough understanding based on previous events in order to supplement and bolster the initial investment. I could spend 10,000 practicing a particular sport - yet effectively suck at it because my body (If you’ve met me, you’d know why) isn’t designed for the activity I’m requesting it to perform. Just because I’ve spent 10,000 hours reading about something doesn’t make me an expert by any stretch of the imagination. If I calculated the hours spanned over my career, I would arrive at the below. I’m basing this on an 8 hour day when in reality, most of my days are in fact much longer.

    31 x 365 x 8 = 90,520 hours

    Even when factoring in vacation based on 4 weeks per year (subject to variation, but I’ve gone for the mean average),

    31 x 28 X 8 = 6,944 hours to subtract

    This is only fair as you are not (supposed to be) working when on holiday. Even with this subtraction, the total is still 83,578 hours. Does my investment make me an expert ? I think so, yes - based on the fact that 31 years dedicated to one area would indicate a high level of experience and professional standard - both of which I constantly strive to maintain. Still think 10,000 hours invested makes you an expert ? You decide ! What are your views around this ?

  • 0 Votes
    1 Posts
    180 Views

    dc1.webp
    Why is it that all outages seem to happen at 5:30pm on a Friday afternoon ? Back in the day during 1998 when DEC (yeah, I’m old - shoot me) was still mainstream and Windows NT Server 4.0 was the latest and greatest, I was working for a commodity trading firm in the West End as an IT Manager. The week had typically gone by with the usual activity - nothing too major to report apart from the odd support issue and the usual plethora of invoices that needed to be approved. Suddenly, one of my team emerged from the comms room and informed me that they had spotted a red light on one of the disks sitting in the Exchange server. I asked which disk it was, and said we’d need to get a replacement.

    For those who haven’t been in this industry for a years (unlike me) DEC (Digital Equipment Corporation) was a major player in previous years, but around 1998 started to struggle - it was then acquired by Compaq (who later on down the line in 2002 were acquired themselves by Hewlett Packard). This server was a beast - a DEC server 5000 the size of an under the counter fridge with a Mylex DAC960 RAID controller. It was so large, it had wheels with brakes. And, like a washing machine, was incredibly heavy. I’m sure the factory that manufactured servers in the 90’s used to pour concrete in them just for a bit of fun…

    Here’s a little glimpse for nostalgia purposes
    decserver5000.webp
    Those who remember DEC and it’s associated Mylex DAC960 RAID controller will also recall that the RAID5 incarnation was less than flawless. In modern RAID deployments, if a disk was marked as faulty or defunct, the controller would effectively blacklist the disk meaning that if it were to be removed then reinserted, any bad blocks would not be copied into the array hence causing corruption - it would be rejected.

    Well, that’s how modern controllers work. Unfortunately, the DAC960 controller was one of those boards that when coupled with old firmware and the NT operating system created the perfect storm. It was relatively well documented at the time that plugging a faulty drive back into an array could cause corruption and spell disaster. My enterprising team member had spotted the red light on the drive, then decided to eject it out of the array. For some unknown reason, instead of taking it back to his desk to order a replacement, he reinserted the it back into the array. Now, for those of you that actually remember the disks that went inside a DEC server 5000, you’ll know that these things were like bricks in plastic containers. They were around 3 inches in height, about 6 inches long, and quite heavy. These drives even had a eject clip on each side meaning that you had to press both sides of the disk carrier and then slide out the drive before it could be fully removed. Inserting a replacement drive required much the same effort (except in reverse), and provided a satisfyingly secure “clunk” as the interface of the drive made contact with the RAID controller bus.

    No sooner had I said the words

    “…please tell me you didn’t plug that disk back in……”

    to my team member, our central helpdesk number lit up like a Christmas tree in Times Square with users complaining they couldn’t get into email. I literally ran into the comms room and found the server with all drive bays lit solidly as if suspended in its own cryogenic state. For sake of schematics, a standard RAID5 configuration looks like the below. Essentially, the “p” component is parity. This is the stripe that contains information about the array and is spread across all disks that are members. In the event that one fails, the data is still held across the remaining drives, meaning still accessible - with a reduction in performance. The data is written across the disks in one write like a stripe (set).
    raid5_ok.webp
    At this point I’d already realised that the array had been corrupted by the returning faulty disk, and the bad stripe information was now resident on all the remaining drives. Those who understand RAID will know that if one drive in a RAID5 set fails, you still have the other remaining drives as a resilient array - but not if they are all corrupted. What I am alluding to here is shown below. The stripe was now unreadable, therefore, none of the disks were accessible
    raid5_broken.png.webp
    The server had completely frozen up and would not respond. I’m no fan of force powering a server off in the best of circumstances, but what choice did we have ?

    The server was powered off, then turned back on again. I really was hoping that this was just a system freeze and a reboot would make all our problems go away. The less naïve and experienced part of me dragged my legs towards the backup storage area (yes, we had a rotation pool of 2 weeks on site and 2 weeks off), and started collecting the previous day’s backup from the safe. As it stands, this was clearly the next logical step. Upon restart, we were met with the below shortly after NTOSKERNEL completed it’s checks
    bsod.webp
    (Not the actual BSOD of course - camera phones didn’t exist in 1998 - but as close as it gets)

    Anyone familiar with the Windows operating system will have bumped into this at some point in their career, and by the more commonplace acronym BSOD (Blue Screen Of Death). Either way, it’s never a good sign when you are trying to recover a system. One of the best messages displayed by a BSOD is

    IRQL_NOT_LESS_OR_EQUAL

    I say “best” with a hint of sarcasm of course as this message is completely useless and doesn’t mean anything to anyone as such. As the internet back in 1998 was fairly infantile, gaining a decent insight wasn’t as simple or clear cut as it is today. Looking at the problem from a sensible angle, it was fairly obvious that the DAC960 controller had either failed completely, or couldn’t read the disks and caused the crash. Deciding not to invest too much time in getting this system back to life, I fired up it’s dormant sister (yes, we had two fridges :)) which started with no issues. This secondary server was originally purchased to split the load of the mailboxes across two servers for resilience purposes, but never came to fruition owing to a backlog of other projects that were further up the chain of importance. Had this exercise have taken place, only 50% of the office would have been impacted - typical.

    With the server started, we then began the process of installing Exchange. Don’t get too excited - this was Exchange 5.5 and didn’t have any formal link to Active Directory, so it was never going to be the case of installing Exchange in disaster recovery mode, then playing back the database. Nope. This was going to be a directory restore first, followed by the Information Store.

    With Exchange installed and the previous service packs and hotfixes applied (early versions of Exchange had a habit of not working at all after a restore unless the patching​ level was the same), BackupExec 6.2 (yes, I know) was set to restore to an alternative Exchange server, and the tapes loaded into the robotic arm cradle. In hindsight, it would have been a better option to install BackupExec on the Exchange server itself, and connect the tape drive to the SCSI connector. However, can you find a cable when you really need one ? In any case, the server was SCSI2 when the loader was SCSI1. This should have set alarm bells ringing at the time, but with the restore started, we went back to our seats - I then began the task of explaining to senior management about the cause of the outage and what we were doing to resolve the problem. As anyone with experience of Microsoft systems knows, attempting to predict the time to restore or copy anything (especially back in the 90’s) wasn’t a simple task, as Windows had a habit of either exaggerating the time, or sitting there not responding for ages.

    Rather like a 90’s Wikipedia, NT wasn’t known for it’s accuracy.

    I called home and solemnly declared I was in for a long night. It’s never easy explaining the reasons why or attempting to justify the reasons you need to work late to family members, but that’s another story. Checking on the progress of the restore, we were averaging speeds of around 2Mbps ! Cast your mind back to 1998 and think of the surrounding technology. Back in the (not so good) old days, modern switching technology and 10Gbps networks were non existent. We were stuck with old 3Com 10Mbps hubs and an even slower Frame Relay connection (256k with 128k ISDN backup) as the gateway. To make matters worse, our internet connection was based on dialup technology using a SHIVA LanRoverE. Forget 1Gb fibre - this thing dished out an awesome [sic] 33.6k speed or even 56k if you were using ISDN. Web Pages loading in about 20 seconds was commonplace - downloading drivers was an absolute nightmare as you can imagine.

    Back to the restore. Having performed the basic math, and given the size of the databases (around 70Gb on a DLT 40 that was compressed to 80Gb), this was going to take over 24 hours. If you think about how hubs used to work, this meant that the 10Mbps speed of the device was actually shared across all 24 ports. This effectively reduces the port speed to 0.42Mbps - and that really depends on what the other ports are doing at the time. The restore rate remained at around 2Mbps for hours, and rather than everyone sit there watching water evaporate, I sent home the remaining staff and told them to be on standby for the entire weekend. I really couldn’t stomach food at this point, and ended up working into the night on other open tasks in an effort to catch up. I ended up falling asleep at my desk around 2am, and then being woken by the sound of my mobile (a Nokia of course) ringing. Looking at the clock, it was 5am. Checking the restore, it had progressed to the information store itself and was around 60% completed. After another 15 hours in the office, the restore finally completed.

    Having restarted all of the Exchange services, even the information store came up, which really was good news. However, browsing through the mailboxes I noticed that only a quarter of the 250+ I was expecting were listed. Not knowing much about the Exchange back end at the time, I contacted a so-called Exchange specialist based in Switzerland (in case you’re wondering, we were a Swiss headquartered entity, and all external support came from there). This Exchange specialist informed me that the backup hadn’t completed properly, and a set of commands needed to be run in BackupExec to resolve this. Of course, this also meant that the restore process had to be restarted - there goes another 24+ hours, I thought to myself. With the new “settings applied” and the restore process restarted, I decided that I wasn’t going to sit in the office for another day waiting for the restore to complete, and so I decided to call one of my team to come in and occupy the watchtower.

    Getting hold of someone was much more difficult than I had imagined. After letting the remainder of the team go, they all forged an exodus to the nearest door like iron filings to a magnet. So much for team ethic I thought. Eventually, I managed to get hold of a colleague who, after much griping, agreed to come into the office. I wouldn’t have minded as much if he didn’t live less than 15 minutes away, but that’s another story. My colleague arrived around 30 minutes later, and then I left the office. Getting home wasn’t a simple task. In the UK, there are often engineering works taking place over the weekend - particularly on the tube, and in most cases, local rail providers also - mine included. What should have taken about 2 hours maximum took 4, and by the time I got home, I flopped into bed exhausted. Needless to say this didn’t go down particularly well with my wife who saw me last on the previous morning - especially as after 3 hours of restlessness and a general inability to sleep, I was called by senior management - and was asked to go back in.

    By now, my already frustrated wife’s temperature went from 36.9c to an erupting volcano equivalent in less than a split second. I fully appreciated her response, but I was young (well, younger), eager to impress, and also had a sense of ownership. After a somewhat heated exchange, I left for the office. I arrived in much the same time as it took me to get home in the first place, and found that the restore was of course still running. My colleague made some half baked excuse that he needed to leave the office as he had a “family emergency”. Not really in the mood to argue this, I let him leave. I then got on a conference call with the consultant we had been using. Unsurprisingly, the topic of the restore time came up.

    “…You have a very slow network…” said the consultant.

    “…No s**t Sherlock…”  I thought. “…Do you honestly think I’m sitting here for my health ? …”

    I politely “agreed”.

    Eventually, the restore process completed. With a sudden feeling of euphoria, I went back into the comms room to start the services and… to my dismay, found once again that only a third of the recipients appeared in the directory. The term “FFS” didn’t go anywhere near being an accurate portrayal of my response. I was brutally upset. Hopelessly crushed. On the verge of losing it… (ok, perhaps that’s overkill). There had to be a reason for this. Something we’d missed, or just didn’t understand. I went looking for answers on a 1998 version of Yahoo (actually, I think it may have been Lycos), and found an article relating to the DS/IS Consistency Adjuster in Exchange 5.5 - this isn’t the exact resource I found, but it goes a long way to describe the fundamental process. The upshot is that the consistency adjuster needed to be run to synchronise the once orphaned mailboxes with the directory service. This entire process took​ a couple of hours - whilst that seems inconceivable to even the extreme Luddite, this is 1998 with SCSI1 drives, a Pentium II Processor, and 512Mb ram.

    After the process completed (which incidentally looked like this)
    dsisadjuster.webp
    I could then see all mailboxes ! After performing several somersaults around the office (just kidding here, but I can tell you I felt like doing it), I confirmed with a 25% random user test that I had access to mailboxes. Unfortunately, I couldn’t see any new mail arriving, but that was only due to a stalled mail connector on the server in Switzerland that received external mail. After a quick reboot of this gateway, mail began to flow. After around an hour of testing, I was happy that everything was working as expected. As for the consultant who had just wasted hours of my life, it’s just as well he wasn’t in the same country as me, let alone room. I went home elated - to an extremely angry wife. She’s since forgiven me of course, and now looking back, I really appreciate why - she was looking out for me, and concerned - I just didn’t appreciate that at the time.

    Come Monday morning, users were back into email with everything working as expected. An emergency Exchange backup had been run, and I was in the process of writing up my postmortem report for senior management. I then got a phone call. Anyone remember a product by Fenstrae called Faxination ? This was peered with Exchange 5.5, and had stopped working since the crash. The head of operations demanded that this was resolved as a priority… Another late night… another argument at home, but that’s a story for another day.

  • 0 Votes
    1 Posts
    192 Views

    bg-min-dark.webp
    It’s a common occurrence in today’s modern world that virtually all organisations have a considerable budget (or a strong focus on) information and cyber security. Often, larger organisations spend millions annually on significant improvements to their security program or framework, yet overlook arguably the most fundamental basics which should be (but are often not) the building blocks of any fortified stronghold.

    We’ve spent so much time concentrating on the virtual aspect of security and all that it encompasses, but seem to have lost sight of what should arguably be the first item on the list – physical security. It doesn’t matter how much money and effort you plough into designing and securing your estate when you consider how vulnerable and easily negated the program or framework is if you neglect the physical element. Modern cyber crime has evolved, and it’s the general consensus these days that the traditional perimeter as entry point is rapidly losing its appeal from the accessibility versus yield perspective. Today’s discerning criminal is much more inclined to go for a softer and predictable target in the form of users themselves rather than spend hours on reconnaissance and black box probing looking for backdoors or other associated weak points in a network or associated infrastructure.

    Physical vs virtual

    So does this mean you should be focusing your efforts on the physical elements solely, and ignoring the perimeter altogether ? Absolutely not – doing so would be commercial suicide. However, the physical element should not be neglected either, but instead factored into any security design at the outset instead of being an afterthought. I’ve worked for a variety of organisations over my career – each of them with differing views and attitudes to risk concerning physical security. From the banking and finance sector to manufacturing, they all have common weaknesses. Weaknesses that should, in fact, have been eliminated from the outset rather than being a part of the everyday activity. Take this as an example. In order to qualify for buildings and contents insurance, business with office space need to ensure that they have effective measures in place to secure that particular area. In most cases, modern security mechanisms dictate that proximity card readers are deployed at main entrances, rendering access impossible (when the locking mechanism is enforced) without a programmed access card or token. But how “impossible” is that access in reality ?

    Organisations often take an entire floor of a building, or at least a subset of it. This means that any doors dividing floors or areas occupied by other tenants must be secured against unauthorised access. Quite often, these floors have more than one exit point for a variety of health and safety / fire regulation reasons, and it’s this particular scenario that often goes unnoticed, or unintentionally overlooked. Human nature dictates that it’s quicker to take the side exit when leaving the building rather than the main entrance, and the last employee leaving (in an ideal world) has the responsibility of ensuring that the door is locked behind them when they leave. However, the reality is often the case instead where the door is held open by a fire extinguisher for example. Whilst this facilitates effective and easy access during the day, it has a significant impact to your physical security if that same door remains open and unattended all night. I’ve seen this particular offence repeatedly committed over months – not days or weeks – in most organisations I’ve worked for. In fact, this exact situation allowed thieves to steal a laptop left on the desk in an office of a finance firm I previously worked at.

    Theft in general is mostly based around opportunity. As a paradigm, you could leave a £20 note / $20 bill on your desk and see how long it remained there before it went missing. I’m not implying here that anyone in particular is a thief, but again, it’s about opportunity. The same process can be aligned to Information security. It’s commonplace to secure information systems with passwords, least privilege access, locked server rooms, and all the other usual mechanisms, but what about the physical elements ? It’s not just door locks. It’s anything else that could be classed as sensitive, such as printed documents left on copiers long since forgotten and unloved, personally identifiable information left out on desks, misplaced smartphones, or even keys to restricted areas such as usually locked doors or cupboards. That 30 second window could be all that would be required to trigger a breach of security – and even worse, of information classed as sensitive. Not only could your insurance refuse to pay out if you could not demonstrate beyond reasonable doubt that you had the basic physical security measures in place, but (in the EU) you would have to notify the regulator (in this case, the ICO) that information had been stolen. Not only would it be of significant embarrassment to any firm that a “chancer” was able to casually stroll in and take anything they wanted unchallenged, but significant in terms of the severity of such an information breach – and the resultant fines imposed by the ICO or SEC (from the regulatory perspective – in this case, GDPR) – at €20m or 4% of annual global (yes, global) turnover (if you were part of a larger organisation, then that is actually 4% of the parent entity turnover – not just your firm) – whichever is the highest. Of equal significance is the need to notify the ICO within 72 hours of a discovered breach. In the event of electronic systems, you could gain intelligence about what was taken from a centralised logging system (if you have one – that’s another horror story altogether if you don’t and you are breached) from the “electronic” angle of any breach via traditional cyber channels, but do you know exactly what information has taken residence on desks ? Simple answer ? No.

    It’s for this very reason that several firms operate a “clean desk” policy. Not just for aesthetic reasons, but for information security reasons. Paper shredders are a great invention, but they lack AI and machine learning to wheel themselves around your office looking for sensitive hard copy (printed) data to destroy in order for you to remain compliant with your information security policy (now there’s an invention…).

    But how secure are these “unbreakable” locks ? Despite the furore around physical security in the form of smart locks, thieves seem to be able to bypass these “security measures” with little effort. Here’s a short video courtesy of ABC news detailing just how easy it was (and still is in some cases) to gain access to hotel rooms using cheap technology, tools, and “how-to” articles from YouTube.

    Surveillance systems aren’t exempt either. As an example, a camera system can be rendered useless with a can of spray paint or even something as simple as a grocery bag if it’s in full view. Admittedly, this would require some previous reconnaissance to determine the camera locations before committing any offence, but it’s certainly a viable prospect of that system is not monitored regularly. Additionally, (in the UK at least) the usage of CCTV in a commercial setting requires a written visible notice to be displayed informing those affected that they are in fact being recorded (along with an impact assessment around the usage), and is also subject to various other controls around privacy, usage, security, and retention periods.

    Unbreakable locks ?

    Then there’s the “unbreakable” door lock. Tapplock advertised their “unbreakable smart lock” only to find that it was vulnerable to the most basic of all forced entry – the screwdriver. Have a look at this article courtesy of “The Register”. In all seriousness, there aren’t that many locks that cannot be effectively bypassed. Now, I know what you’re thinking. If the lock cannot be effectively opened, then how do you gain entry ? It’s much simpler than you think. For a great demonstration, we’ll hand over to a scene from “RED” that shows exactly how this would work. The lock itself may have pass-code that “…changes every 6 hours…” and is “unbreakable”, but that doesn’t extend to the material that holds both the door and the access panel for the lock itself.

    And so onto the actual point. Unless your “unbreakable” door lock is housed within fortified brick or concrete walls and impervious to drills, oxy-acetylene cutting equipment, and proximity explosive charges (ok, that’s a little over the top…), it should not be classed as “secure”. Some of the best examples I’ve seen are a metal door housed in a plasterboard / false wall. Personally, if I wanted access to the room that badly, I’d go through the wall with the nearest fire extinguisher rather than fiddle with the lock itself. All it takes is to tap on the wall, and you’ll know for sure if it’s hollow just by the sound it makes. Finally, there’s the even more ridiculous – where you have a reinforced door lock with a viewing pane (of course, glass). Why bother with the lock when you can simply shatter the glass, put your hand through, and unlock the door ?

    Conclusion

    There’s always a variety of reasons as to why you wouldn’t build your comms room out of brick or concrete – mostly attributed to building and landlord regulations in premises that businesses occupy. Arguably, if you wanted to build something like this, and occupied the ground floor, then yes, you could indeed carry out this work if it was permitted. Most data centres that are truly secure are patrolled 24 x 7 by security, are located underground, or within heavily fortified surroundings. Here is an example of one of the most physically secure data centres in the world.

    https://www.identiv.com/resources/blog/the-worlds-most-secure-buildings-bahnhof-data-center

    Virtually all physical security aspects eventually circle back to two common topics – budget, and attitude to risk. The real question here is what value you place on your data – particularly if you are a custodian of it, but the data relates to others. Leaking data because of exceptionally weak security practices in today’s modern age is an unfortunate risk – one that you cannot afford to overlook.

    What are your thoughts around physical security ?

  • 87 Votes
    123 Posts
    10k Views

    @crazycells yes, sadly, we live in a world where racism is present in all walks of life, not excluding the Internet where it is arguably the worst purely because of the ability to remain anonymous and hide behind a keyboard.

    If this is what large scale language models are using for enrichment, then it’s no wonder we’ve landed up here.

  • 0 Votes
    3 Posts
    286 Views

    @justoverclock yes, completely understand that. It’s a haven for criminal gangs and literally everything is on the table. Drugs, weapons, money laundering, cyber attacks for rent, and even murder for hire.

    Nothing it seems is off limits. The dark web is truly a place where the only limitation is the amount you are prepared to spend.

  • 0 Votes
    1 Posts
    347 Views

    tech.jpeg
    Ever heard of KISS ? Nope - not these guys

    kiss.jpeg
    What I’m referring to is the acronym was reportedly coined by Kelly Johnson, lead engineer at the Lockheed Skunk Works (creators of the Lockheed U-2 and SR-71 Blackbird spy planes, among many others), which formed the basis of the relationship between the way things break, and the sophistication available to repair them. You might be puzzled at why I’d write about something like this, but it’s a situation I see constantly – one I like to refer to as “over thinker syndrome”. What do I mean by this ? Here’s the theory. Some people are very analytical when it comes to problem solving. Couple that with technical knowledge and you could land up with a situation where something relatively simple gets blown out of all proportion because the scenario played out in the mind is often much further from reality than you’d expect. And the technical reasoning is usually always to blame.

    Some years ago in a previous career, a colleague noticed that the Exchange Server (2003 wouldn’t you know) would suddenly reboot half way through a backup job. Rightly so, he wanted to investigate and asked me if this would be ok. Anyone with an ounce of experience knows that functional backups are critical in the event of a disaster – none more so than I – obviously, I gave the go ahead. One bright spark in my team suggested a reboot of the server, which immediately prompted the response

    “……it’s rebooting itself every day, so how will that help ?”

    There’s always one, isn’t there ? The final (and honestly more realistic suggestion) was to enable verbose logging in Exchange. This is actually a good idea, but only if you suspect that the information store could be the issue. Given the evidence, I wasn’t convinced. If there was corruption in the store, or on any of the disks, this would show itself randomly through the day and wouldn’t wait until 2am in the morning. Not wanting to come across as condescending, I agreed, but at the same time, set a deadline to escalation. I wasn’t overly concerned about the backups as these were being completed manually each day whilst the investigations were taking place. Neither was I concerned at what could be seen at this point as wasting someone’s time when you think you may have the answer to what now seemed to be an impossible problem. This is where experience will eclipse any formal qualifications hands down. Those with university degrees may scoff at this, but those with substantially analytical thinking patterns seem to avoid logic like the plague and go off on a wild tangent looking for a dramatically technical explanation and solution to a problem when it’s much simpler than you’d expect.

    After witnessing the pained expression on the face of my now exasperated and exhausted tech, I said “let’s get a coffee”. In agreement, he followed me to the kitchen and then asked me what I thought the problem could be. I said that if he wanted my advice, it would be to step back and look at this problem from a logical angle rather than technical. The confused look I received was priceless – the guy must have really though I’d lost the plot. After what seemed like an eternity (although in reality only a few seconds) he asked me what I meant by this. “Come with me”, I said. Finishing his coffee, he diligently followed me to the server room. Once inside, I asked him to show me the Exchange Server. Puzzled, he correctly pointed out the exact machine. I then asked him to trace the power cables and tell me where they went.

    As with most server rooms, locating and identifying cables can be a bit of a challenge after equipment has been added and removed, so this took a little longer than we expected. Eventually, the tech traced the cables back to

    ………an old looking UPS that had a red light illuminated at the front like it had been a prop in a Terminator film.

    Suddenly, the real cause of this issue dawned on the tech like a morning sunrise over the Serengeti. The UPS that the Exchange Server was unexpectedly connected to had a faulty battery. The UPS was conducting a self test at 2am each morning, and because the bypass test failed owing to the burnt battery, the connected server lost power and started back up after the offending equipment left bypass mode and went online.

    Where is this going you might ask ? Here’s the moral of this particular story

    Just because a problem involves technology, it doesn’t mean that the answer has to be a complex technical one Logic and common sense has a part to play in all of our lives. Sometimes, it makes more sense just to step back, take a breath, and see something for what it really is before deciding to commit It’s easy to allow technical expertise to cloud your judgement – don’t fall into the trap of using a sledgehammer to break an egg You cannot buy experience – it’s earned, gained, and leaves an indelible mark
  • 0 Votes
    1 Posts
    222 Views

    I’m excited to announce that a new blog section has been added 😛 The blog is actually using Ghost and not NodeBB, and also sits on it’s own subdomain of https://content.sudonix.com (if you ever fancy hitting it directly).

    We’ve moved all the blog articles out of the existing category here, and migrated them to the Ghost platform. However, you can still comment on these articles just like they were part of the root system. If you pick a blog article whilst logged in

    7e61c35b-2304-4c06-bda2-34da52252e1a-image.png

    Then choose the blog article you want to read

    7ca5089e-cf7e-4050-b951-5426fd1c6ec3-image.png

    Once opened, you’ll see a short synopsis of the article

    1bc086b4-5968-4e81-bc47-70de263b2275-image.png

    Click the link to read the rest of the post. Scroll down to the bottom, and you’ll see a space where you can provide your comments ! Take the time to read the articles, and provide your own feedback - I’d love to hear it.

    3f712e7c-475d-42d4-a5ca-b4becff6cc2e-image.png

    The blog component is not quite finished yet - it needs some polish, and there’s a few bugs scattered here and there, but these will only manifest themselves if a certain sequence of events is met.