LOPSA-NJ News Aggregator
The woes of a small infrastructure admin…
Before I start, I just want you to know that I’m not whining, I just thought I’d give this as an example of some of the things that people who run small infrastructures are left out of…
Today I’m sitting in the office in NJ, doing work as normal. What I’d prefer to be doing is going to the IT Roadmap Conference & Expo in NYC. According to the website, it’s “designed for IT professionals who want to cover multiple industry topics in one day”. That sounds like something I’d be interested in!
Essentially, it’s a sales pitch, or a series of sales pitches. I don’t know if I’m in the market for what they’re selling, but I’d like to go find out what is being offered. All the same, I like to keep my eyes on the horizon, because things have a habit of coming up quick on us in IT, and if we don’t familiarize ourselves with the likely technology of the next few years, then we’ll be caught with our pants down. So I wanted to see what people were selling.
The conference is free. All you have to do is fill out the application for registration. Unfortunately, I don’t qualify:
Dear Matt,
Thank you for your interest in Network World Live’s IT Roadmap Conference & Expo in New York.
Unfortunately, after reviewing the information that you submitted, we determined that at this time, we are not able to confirm your seat on a complimentary basis.
As we noted on the registration form, this event is geared towards network and IT professionals in end-user type companies who actively purchase products and services – or – who will be doing so in the near future. We have a limited number of complimentary seats reserved for attendees who meet this criteria.
…
snip
…
Walk-ins or ineligible applicants arriving at the conference facility will NOT be admitted on the day of the event.
Thank you,
IT Roadmap Team
Network World Events & Executive Forums
(emphasis theirs)
Well, I do actively purchase technologies and products, but not at the scale that they’re looking for, I suppose. I don’t have 50 data centers, or “20,000 or more” servers, so I don’t get to go to their party and look at the toys.
It’s unfortunate for them and me, but somehow I think I’ll live. I just wanted to give you a tangible example of…well…I won’t go so far as to say discrimination, but maybe exclusion, that we small admins deal with from vendors.
LOPSA Conference schedule published!
Back to our normally scheduled blog posts
Or as (ir)regular as they normally are. I really hope that you enjoyed the flashback week, and got something useful from it. I’m going to try to do it again next year on the first full week of March.
Now it’s just back to the daily grind for me. I’ve been rehashing some Nagios configuration and I’ve unearthed an ancient relic! How fun! Configuration archaeology is a hobby of mine, and to find a gem that hasn’t (as far as I can tell) been mentioned on the official site since 2002? That’s GREAT! I’ve still got to go through the source code to make sure that it doesn’t do anything interesting, but it’s out of my config now.
As it turns out, my recent attention to Nagios is multifaceted. I’m cleaning up the config and tightening up the alert rules, but also, I’m going to be giving a 45 minute talk at the Professional IT Community Conference in May. If you’re in the northeast US, you should definitely make it! And you should hurry and register while the early bird special is going!
Designing a Secure Linux System
Bruce Schneier’s blog post about the Mariposa Botnet has an interesting discussion in the comments about how to make a secure system [1]. Note that the threat is considered to be remote attackers, that means viruses and trojan horses – which includes infected files run from USB devices (IE you aren’t safe just because you aren’t on the Internet). The threat we are considering is not people who can replace hardware in the computer (people who have physical access to it which includes people who have access to where it is located or who are employed to repair it). This is the most common case, the risk involved in stealing a typical PC is far greater than the whatever benefit might be obtained from the data on it – a typical computer user is at risk of theft only for the resale value of a second-hand computer.
So the question is, how do can we most effectively use free software to protect against such threats?
The first restriction is that the hardware in common use is cheap and has little special functionality for security. Systems that have a TPM seem unlikely to provide a useful benefit due to the TPM being designed more for Digital Restrictions Management than for protecting the user – and due to TPM not being widely enough used.
The BIOS and the BootloaderIt seems that the first thing that is needed is a BIOS that is reliable. If an attacker manages to replace the BIOS then it could do exciting things like modifying the code of the kernel at boot time. It seems quite plausible for the real-mode boot loader code to be run in a VM86 session and to then have it’s memory modified before it starts switches to protected mode. Every BIOS update is a potential attack. Coreboot replaces the default PC BIOS, it initialises the basic hardware and then executes an OS kernel or boot loader [2] (the Coreboot Wikipedia page has a good summary). The hardest part of the system startup process is initialising the hardware, Coreboot has that solved for 213 different motherboards.
If engineers were allowed to freely design hardware without interference then probably a significant portion of the computers in the market would have a little switch to disable the write line for the flash BIOS. I heard a rumor that in the days of 286 systems a vendor of a secure OS shipped a scalpel to disable the hardware ability to leave protected mode, cutting a track on the motherboard is probably still an option. Usually once a system is working you don’t want to upgrade the BIOS.
One of the payloads for Coreboot is GRUB. The Grub Feature Requests page has as it’s first entry “Option to check signatures of the bootchain up to the cryptsetup/luksOpen: MBR, grub partition, kernel, initramfs” [3]. Presumably this would allow a GPG signature to be checked so that a kernel and initrd would only be used if they came from a known good source. With this feature we could only boot a known good kernel.
How to run User SpaceThe next issue is how to run the user-space. There has been no shortage of Linux kernel exploits and I think it’s reasonable to assume that there will continue to be a large number of exploits. Some of the kernel flaws will be known by the bad guys for some time before there are patches, some of them will have patches which don’t get applied as quickly as desired. I think we have to assume that the Linux kernel will be compromised. Therefore the regular user applications can’t be run against a kernel that has direct hardware access.
It seems to me that the best way to go is to have the Linux kernel run in a virtual environment such as Xen or KVM. That means you have a hypervisor (Xen+Linux or Linux+KVM+QEMU) that controls the hardware and creates the environment for the OS image that the user interacts with. The hypervisor could create multiple virtual machines for different levels of data in a similar manner to the NSA NetTop project, not that this is really a required part of solving the general secure Internet terminal problem but as it would be a tiny bit of extra work you might as well do it.
One problem with using a hypervisor is that the video hardware tends to want to use features such as bus-mastering to give best performance. Apparently KVM has IOMMU support so it should be possible to grant a virtual machine enough hardware access to run 3D graphics at full speed without allowing it to break free.
Maintaining the Virtual Machine ImageGoogle has a good design for the ChromiumOS in terms of security [4]. They are using CGroups [5] to control access to device nodes in jails, RAM, CPU time, and other resources. They also have some intrusion detection which can prompt a user to perform a hardware reset. Some of the features would need to be implemented in a different manner for a full desktop system but most of the Google design features would work well.
For an OS running in a virtual machine when an intrusion is detected it would be best to have the hypervisor receive a message by some defined interface (maybe a line of text printed on the “console”) and then terminate and restart the virtual machine. Dumping the entire address space of the virtual machine would be a good idea too, with typical RAM sizes at around 4G for laptops and desktops and typical storage sizes at around 200G for laptops and 2T for new desktops it should be easy to store a few dumps in case they are needed.
The amount of data received by a typical ADSL link is not that great. Apart from the occasional big thing (like downloading a movie or listening to Internet radio for a long time) most data transfers are from casual web browsing which doesn’t involve that much data. A hypervisor could potentially store the last few gigabytes of data that were received which would then permit forensic analysis if the virtual machine was believed to be compromised. With cheap SATA disks in excess of 1TB it would be conceivable to store the last few years of data transfer (with downloaded movies excluded) – but such long-term storage would probably involve risks that would outweigh the rewards, probably storing no more than 24 hours of data would be best.
Finally in terms of applying updates and installing new software the only way to do this would be via the hypervisor as you don’t want any part of the virtual machine to be able to write to it’s data files or programs. So if the user selects to install a new application then the request “please install application X” would have to be passed to the hypervisor. After the application is installed a reboot of the virtual machine would be needed to apply the change. This is a common experience for mobile phones (where you even have to reboot if the telco changes some of their network settings) and it’s something that MS-Windows users have become used to – but it would get a negative reaction from the more skilled Linux users.
Would this be Accepted?The question is, if we built this would people want to use it? The NetTop functionality of having two OSs interchangeable on the one desktop would attract some people. But most users don’t desire greater security and would find some reason to avoid this. They would claim that it lowered the performance (even for aspects of performance where benchmarks revealed no difference) and claim that they don’t need it.
At this time it seems that computer security isn’t regarded as a big enough problem for users. It seems that the same people who will avoid catching a train because one mugging made it to the TV news will happily keep using insecure computers in spite of the huge number of cases of fraud that are reported all the time.
- [1] http://www.schneier.com/blog/archives/2010/03/mariposa_botnet.html
- [2] http://www.coreboot.org/Welcome_to_coreboot
- [3] http://grub.enbug.org/FeatureRequests
- [4] http://sites.google.com/a/chromium.org/dev/chromium-os/chromiumos-design-docs/system-hardening
- [5] http://lxr.linux.no/linux+v2.6.30/Documentation/cgroups/
Storage vs RAM Size
In a comment on my post Shared Objects and Big Applications about memlockd [1] mic said that they use memlockd to lock the entire root filesystem in RAM. Here is a table showing my history of desktop computers with the amounts of RAM, disk capacity, and CPU power available. All systems better than a 386-33 are laptops – a laptop has been my primary desktop system for the last 12 years. The columns for the maximum RAM and disk are the amounts that I could reasonably afford if I used a desktop PC instead of a laptop and used the best available technology of the day – I’m basing disk capacity on having four hard drives (the maximum that can be installed in a typical PC without extra power cables and drive controller cards) and running RAID-5. For the machines before 2000 I base the maximum disk capacity on not using RAID as Linux software RAID used to not be that good (lack of online rebuild for starters) and hardware RAID options have always been too expensive or too lame for my use.
Year CPU RAM Disk Maximum RAM Maximum Disk 1988 286-12 4M 70M 4M 70M 1993 386-33 16M 200M 16M 200M 1998 Pentium-M 233 96M 3G 128M 6G 1999 Pentium-2 400 256M 6G 512M 40G 2000 Pentium-2 600 384M 10G 512M 150G 2003 Pentium-M 1700 768M 60G 2048M 400G 2009 Pentium-M 1700 1536M 100G 8192M 4500G 2010 Core 2 Duo T7500 2200 5120M 100G 8192M 6000GThe above graph shows how the modern RAM capacities have overtaken older disk capacities. So it seems that a viable option on modern systems is to load everything that you need to run into RAM. Locking it there will save spinning up the hard drive on a laptop. With a modern laptop it should be possible to lock most of the hard drive contents that are regularly used (IE the applications) into RAM and run with /home on a SD flash storage device. Then the hard drive would only need to be used if something uncommon was accessed or if something large (like a movie) was needed. It also shows that there is potential to run diskless workstations that copy the entire contents of their root filesystem when they boot so that they can run independently of the server and only access the server for /home.
Note that the size of the RAM doesn’t need to be larger than the disk capacity of older machines (some of the disk was used for swap, /home, etc). But when it is larger it makes it clear that the disk doesn’t need to be accessed for routine storage needs.
I generated the graph with GnuPlot [2], the configuration files I used are in the directory that contains the images and the command used was “gnuplot command.txt“. I find the GnuPlot documentation to be difficult to use so I hope that this example will be useful for other people who need to produce basic graphs – I’m not using 1% of the GnuPlot functionality.
Tom @ Usenix LISA 2010, San Jose, CA, Nov 7-12, 2010
Tom @ LOPSA PICC in NJ, May 7-8, 2010
Countdown to LOPSA PICC!
Flashback: Burnout and the toll it takes
You are probably a human. At least, the statistical odds are in your favor. As a human, you experience stress, and how you react to it plays a large part in determining how happy you are. System administrators deal with stress particularly poorly, in general. We assume the role of hero and that’s that. Do what it takes, bask in whatever glory accompanies the successful completion of our task.
There is no downtime in that equation. Immediately following those emergencies, most of us drink depressants to bring ourselves down. On normal days, we require morning stimulants to bring ourselves up. I highly suspect that some of us are so called “adrenaline junkies” from the relative high that we get when there’s an immediate problem that no one can solve but ourselves.
This is unhealthy.
What we really need is to be able to step back and look at the pattern in our lives and say I don’t want to live with this stress.
When it first hit me that stress is probably the biggest single microproblem for admins, I wrote the following. I hope you find it relevant.
Jack Hughes, over at the Tech Teapot, mentions a very appropriate subject for too many systems administrators: burnout.As sysadmins, we’re nearly always the go-to person for whatever happens. After a while, we start to get used to it, and lots of times, we can develop a hero complex, carrying the weight of the world on our shoulders, at least in our minds. This isn’t healthy for a lot of reasons, the most important of which is your health.
Here’s an example of what taking your job too seriously can do to you:
Not to ruin the ending, but the most disgusting part is that, while the guy was taking medical leave, his company fired him. To be completely honest, he’s much better off without a company like that, and if your company would do the same thing, then so are you.
To quote Peter Gibbons, “We don’t have a lot of time on this earth. We weren’t meant to spend it this way. Human beings were not meant to sit in little cubicles staring at computer screens all day…”
Even one of the most preeminent Systems Administrators around, Tom Limoncelli advocates leaving the pressure at work when you head home. For those of us on call 24/7/365, that can be a little hard, but it’s important to try.
Tonight's LOPSA-NJ Chapter meeting
Opera and Trusting Applications vs Trusting Servers
I have just read an interesting article about the Opera browser [1]. The article is very critical of Opera-Mini on the iPhone for many reasons – most of which don’t interest me greatly. There are lots of technical trade-offs that you can make when designing an application for a constrained environment (EG a phone with low resolution and low bandwidth).
What does interest me is the criticism of the Opera Mini browser for proxying all Internet access (including HTTPS) through their own servers, this has been getting some traction around the Internet. Now it is obvious that if you have one server sitting on the net that proxies connections to lots of banks then there will be potential for abuse. What apparently isn’t obvious to as many people is the fact that you have to trust the application.
Causes of Software Security ProblemsWhen people think about computer security they usually think about worms and viruses that exploit existing bugs in software and about Trojan horse software that the user has to be tricked into running. These are both significant problems.
But another problem is that of malicious software releases. I think that this is significantly different from Trojan horses because instead of having an application which was written for the sole purpose of tricking people (as is most similar to Greek history) you have an application that was written by many people who genuinely want to make a good product but you have a single person or small group that hijacks it.
Rumor has it that rates well in excess of $10,000 are sometimes paid for previously unknown security vulnerabilities in widely used software. It seems likely that a programmer who was in a desperate financial situation could bolster their salary by deliberately putting bugs in software and then selling the exploits, this would not be a trivial task (making such bugs appear to be genuine mistakes would take some skill) – but there are lots of people who could do it and plausibly deny any accusation other than carelessness. There have been many examples of gambling addicts who have done more foolish things to fund their habit.
I don’t think it’s plausible to believe that every security flaw which has been discovered in widely used software was there purely as the result of a mistake. Given the huge number of programmers who have the skill needed to deliberately introduce a security flaw into the source of a program and conceal it from their colleagues I think it’s quite likely that someone has done so and attempted to profit from it.
Note that even if it could be proven that it was impossible to profit from creating a security flaw in a program that would not be sufficient to prove that it never happened. There is plenty of evidence of people committing crimes in the mistaken belief that it would be profitable for them.
Should We Trust a Proprietary Application or an Internet Server?I agree with the people who don’t like the Opera proxy idea, I would rather run a web browser on my phone that directly accesses the Internet. But I don’t think that the web browser that is built in to my current smart-phone is particularly secure. It seems usual for a PC to need a security update for the base OS or the web browser at least once a year while mobile phones have a standard service life of two years without any updates. I suspect that there is a lot of flawed code running on smart phones that never get updated.
It seems to me that the risks with Opera are the single point of failure of the proxy server in addition to the issues of code quality while the risks with the browser that is on my smart-phone is just the quality of the code. I suspect that Opera may do a better job of updating their software to fix security issues so this may mitigate the risk from using their proxy.
At the moment China is producing a significant portion of the world’s smart-phones. Some brands like LG are designed and manufactured in China, others are manufactured in China for marketing/engineering companies based in Europe and the US. A casual browse of information regarding Falun Gong makes the character of the Chinese leadership quite clear [2], I think that everything that comes out of China should be considered to be less trustworthy than equivalent products from Europe and the US. So I think that anyone who owns a Chinese mobile phone and rails against the Opera Mini hasn’t considered the issue enough.
I don’t think it’s possible to prove that an Opera Mini with it’s proxy is more or less of a risk than a Chinese smart-phone. I’m quite happy with my LG Viewty [3] – but I wouldn’t use it for Internet banking or checking my main email account.
Also we have to keep in mind that mobile phones are really owned by telephone companies. You might pay for your phone or even get it “unlocked” so you can run it on a different network, but you won’t get the custom menus of your telco removed. Most phones are designed to meet the needs of telcos not users and I doubt that secure Internet banking is a priority for a telco.
Update: You can buy unlocked mobile phones. But AFAIK the Android is the only phone which might be described as not being designed for the needs of the telcos over the needs of the users. So while you can get a phone without custom menus for a telco, you probably can’t get a phone that was specifically designed for what you want to do.
The Scope of the ProblemMobile phones are not the extent of the problem, I think that anyone who buys a PC from a Chinese manufacturer and doesn’t immediately wipe the hard drive and do a fresh OS install is taking an unreasonable risk. The same thing goes for anyone who buys a PC from a store where it’s handled by low wage employees, I can imagine someone on a minimum income accepting a cash payment to run some special software on every PC before it goes out the door – that wouldn’t be any more difficult or risky than the employees who copy customer credit card numbers (a reasonably common crime).
It’s also quite conceivable that any major commercial software company could have a rogue employee who is deliberately introducing bugs into it’s software. That includes Apple. If the iPhone OS was compromised before it shipped then the issue of browser security wouldn’t matter much.
I agree that having the minimum possible number of potential security weak points is a good idea. They should allow Opera Mini users to select that HTTPS traffic should not be proxied. But I don’t think that merely not using a proxy would create a safe platform for Internet banking. In terms of mobile phones most things are done in the wrong way to try and get more money out of the users. Choose whichever phone or browser you want and it will probably still be a huge security risk.
Harald Welte is doing some really good work on developing free software for running a GSM network [4]. But until that project gets to the stage of being widely usable I think that we just have to accept a certain level of security risk when using mobile phones.
Flashback: Infrastructure Upgrades through Forest Fires
It’s the end of a long day. You lean back in your chair, sigh, and you’re glad it’s time to go home. Someone asks you what you did all day. You just sort of shake your head and say “fought fires”.
Fire fighting, as a sysadmin, means you don’t make any progress. You only work very hard to stay where you are. Working against entropy is difficult, and it can take a lot out of you. Some days are harder than others.
One day in early June, not long after I started this blog, I experienced a major setback. Also, a major power outage. Our entire backup facility lost power, and what’s worse, the generator refused to kick on. Our secondary site was down hard for days, until the power was restored to the downtown area of the village we were located in.
During the problem, though, we were able to turn a major issue into a net gain. Read on for the rest of the story…
It’s funny, sometimes, how we tolerate suboptimal or downright malproductive arrangements in our infrastructures, just because it’s inconvenient or inopportune to do it the “right way”. It seems like “the right way” either never comes, due to projects getting phased out, or it gets fixed during a cataclysmic upheaval, when it has become an immediate concern.
The case in point is my mail server. We have an A and a B mx record. Originally the B MX just stored mail until the A came back up, then it would get delivered. Everyone checks mail on A, so it can’t really be down during the day, and about 6 months ago, the office that B was at relocated and B was never set up. This left us with just A. To make matters worse, A was old enough that it was physically located in our backup site, which used to be our primary site. This was suboptimal. Of course there was talk about moving it to the primary site, but when could a maintenance window be created? And we’d risk the entire period of non-connectivity when it was being moved. No, management said, lets just leave it where it was.
Great strategy. It actually worked fine though, until this weekend.
I came in on Saturday, ready to do some major work on the blade systems I’m building for our new site. I sat down at my desk, ready to dive into work. Since I was alone, Raiders of the Lost Ark was playing on the laptop. I had just logged into the first server when the lights went off, and the telltale screech and whine from the server room told me that we’d lost main power.
In Granville, OH, that’s not a strange thing. We’ve got backup AC and a backup generator, so I wasn’t worried. It does have to be manually started, so I jogged into the server room and turned on the CFL floor lamp. At least I tried to. I looked at the generator control panel and it confirmed my fears. No generator power.
I tried for several minutes to start it, but nothing gave me the impression that anything would change, so I called my boss to let him know the situation, and that I was going to start shutting down machines. Since the only critical thing was mail, I suggested that he change DNS to point to an as-yet unassigned IP at the colocation, and that I could setup a postfix process there to queue the mail. He said that it would work, but he suggested an alternative approach.
Why not relocate the physical mail server to the colocation? A lightbulb went off. Of course, not only could I take care of that long standing problem, but because there was no power at all in the datacenter, the normal policy of no-downtime-for-repairs-and-upgrades was out the window.
The next morning, I left work to go home at 5am. The previous 15 hours had been spent completely rehauling the backup datacenter. With the mail relocated to the primary facility, once the power came on in the backup, I had free reign to cull everything unnecessary that had been accumulating.
There is now a pile of cables covering a square yard or so around 6 inches deep of power, ethernet, and copper/fiber cables. There are something like 96 ports worth of switches that I took out, multiple servers, KVMs, fiber switches, and general cruft. The servers are also arranged so that no half-depth servers are hiding between full depth. That was always a pet peeve of mine.
I thought about it while I was doing this, and if fighting normal issues is considered firefighting, then what I went through should have been considered forestfire fighting. And just like a forest fire, good can come from it. It takes the massive heat of a forestfire to crack open some pine cones. It also takes massive infrastructure downtime to make significant changes.
Zeno Place, San Francisco, CA
Flashback: DNS names for internal hosts
This is a short bit that I wrote when I was considering overhauling the internal naming scheme at my company. We used to use an odd mismash of names, and we used to have multiple invented internal DNS names, that referred to the physical location. And I don’t mean things like “location.example.com” (that might make sense!). I mean it would be as if General Motors had “boston.gm” and “tijuana.gm” and “tokyo.gm”. Nonesensical in a lot of ways (particularly now that the TLD’s can be bought for a song (well, an expensive song)).
Anyway, I was curious how other people did it, so I asked. As it turns out, this post originally aired in July of 2008. I would guess that I had a couple of hundred readers. That’s a good range of experience to draw from, but I wanted a more broad view, so I submitted it to slashdot. And it got on the front page.
Thanks to Slashdot, this entry originally received 43 comments, which is right around 30 more than the next most popular story at that point. I’ve had a lot of people tell me that they found me because of that front page article. I didn’t submit it to drive people to the blog; I really did want to hear what people were doing with their own networks. Driving people to the blog was a completely satisfactory side effect, though
Before you leave this page, make sure to check out the original and read the comments. There’s a lot of funny (and interesting) ideas!
Enjoy!
Bob Plankers, over at The Lone Sysadmin wrote a couple days ago about getting busted while reading the wiki page on X-Men. He tried to cover it up by claiming to be researching future host names. Quick thinking, Bob. Good job!
It does bring up a good point, though. Internal naming schemes are something that everyone has an opinion on, and a load of suggestions.
At various places, I’ve used greek/roman gods, Simpsons characters, beer companies, wine labels, and fish.
At my current company, we used the beer and wine names. We absorbed another company that used fish. It worked fine for a while, but we grew in terms of servers and locations until it got unwieldy to remember A) all the names, and B) what each name did. You’d also start to get very similar names after a while. We’ve now got 4 physical locations, soon to be 5, and something like 50-60 servers (not counting network devices), no one would be able to keep them all straight (including the admin).
To improve the situation, we’re in the process of changing to location-based hostnames with a flat internal domain structure. For example, the 2ndary application server in Ohio is oh-app2, with the fake internal domain name trailing. The alpha site’s primary fileserver is a-fs1.
It’s no where near as fun as “wolverine.internal.com” but it certainly does tell you where you’re connecting to and what the machine does. What makes it interesting is when you go changing things like CVS repositories on people’s machines, mail servers, etc. The policy we’ve taken is to alias the old information to the new, and slowly phase out the old method.
What do you use as internal naming systems? What do you think would make an excellent scheme? Make sure to check the list to make sure it hasn’t been done before!
Time management is like juggling 5 balls
Flashback: HOWTO: Punch down blocks for in-building wiring
Today’s flashback is also going to be a HOWTO, and vaguely related to yesterday’s Rackmount HOWTO. Today I’m including a HOWTO for understanding how building wiring works.
When I first looked at house wiring in a moderately complex 8 story building, I was sort of mystified. It was only after literally tracing wires and numbers around the various wiring closets that I understood what was happening.
This howto deals specifically with 66-blocks, but 110 blocks are also becoming common, just not in my neck of the woods.
Enjoy!
Punch down blocks are used for when you need to run wires long distances, typically between distribution points ( things like the MDF, or Main Distribution Facility, otherwise known as the main telco room on the primary floor of the building), comms closets, and the like. They are used, rather than normal RJ45 jacks, because they are simpler, less prone to breaking, and don’t introduce much, if any, extraneous electrical interference to the wire.
For the next few paragraphs, please refer to the following picture, which is a clear, understandable example of a punch down block:
Those grey things in the middle are termination points for single wires. When you deal with punch down blocks, you deal in pairs of wires, and you get one wire to one grey clip. Pretend there is an imaginary line down the middle of that patch panel, because the pairs on the left are separate from the pairs on the right. If you number a line going across, 1 2 3 4, 1 and 2 are a pair, 3 and 4 are an entirely different pair. In the picture, you can tell, because of the numbering scheme they’ve used. There are 4 clips for a pair, but only 2 clips are wired in the beginning. Each pair is numbered, so the phone company can say “Turn on pair 3044″.
Now, at our building, the phone company’s lines are on the right side of the wall. On the left side of the wall is another huge array of punchdown blocks. These are for the “house wiring”. When they built the building, they pre-ran hundreds of pairs of wires to each floor from this room, so that they wouldn’t have to redo it every time someone ordered a T1. Each wire to each floor is terminated to a pair on the left side of the wall in exactly the same manner as the one on the right (including leaving one set empty).
To connect the two sides, you run a twisted pair of wires (it looks like you took a section of cat5, stripped off the sheathing, and just used one set of wires) from the right side of the wall (from the pair of grey clips we left open on #3044) to the left side of the wall (say #514, the 14th pair to the 5th floor, again using the empty grey clips). If you look again at that picture, you can see 3043 has been wired across, because all 4 wires are clipped in, but 3044 has not, since only the rightmost clips have wires.
At this point, you have two wires coming in from the phone company to the punch down blocks on the right. Then you’ve got wires connecting those punch down blocks to the “house wiring” punchdown blocks on the left. Then you’ve got vertical wiring up to the floor that the wire ends at.
In the comms closet on that floor (also known as the IDF, or intermediate distribution facility), you have a very similar situation. On the right hand side, you’ve got the punch down block where the vertical cabling from the MDF terminates, and on the left, you’ve got a punch down block where the actual wires that end up in your office are terminated. You use another twisted pair of wires to connect the two sides, and at that point, the wire that ends up in your office is connected directly to the phone company, albeit through several punch down blocks and lots of wire.
Now, when it comes into your office, hopefully someone has had the courtesy to install a patch panel for you. The patch panel looks like this on the front:
and this on the back:
As you can tell from the photo on the back, wires are typically matched up color for color when it comes to straight CAT5 cables. When it comes to things like wiring T1s, you’re only using two wires, so as long as you remember which one goes to what wire, you’re ok.
So, i review, we’ve got phone company wires coming in, and terminated in the MDF. They’re connected across to the house wiring, which is run vertically to the IDF, and from the IDF, it goes to your space. All of this is accomplished with those magic little grey clips.
Now, if only the wires would go in there. It turns out that there’s a trick. Or a tool, really, called a punch down tool (creative, eh?). The cheapest punch down tool I’ve ever seen is a buck. It’ll work in a pinch, but the one you want is here:
The way you use it is to arrange the wire you want to punch down against the metal clip. There’s a very thin slit in the clip where the wire will end up. Press the tip of the punch down tool against the clip, and push. The spring loaded mechanism (in the expensive tool) or your elbow grease (in $.99 model) will push the wire to the bottom of the slit, and in the process, scrap away the plastic or teflon sheathing on the wire, allowing the metals to make contact. The expensive model will then use the spring action to slice the extra wiring off the end, eliminating extraneous electrical interference (when you’re dealing with hundreds of feet worth of cable, this is a good thing). In the cheap model, I’d recommend an Xacto knife to do the job.
As for maintenance, there’s not really much that can go wrong in a patch panel, as long as no one comes in and starts pulling on wires. Typically there’s a plastic case that goes over the entire block to prevent accidental snags from pulling wires loose.
The best advice is to document everything you can. Leave a hard copy of the documentation in the comms closet so that you can see what’s been done. Lots of times, the telephone tech will “tag” the lines that he’s installed on the right hand side. The tag usually has the numbers of the pairs that are activated on the telco side, and the phone numbers (or circuit IDs) that match those pairs.
(Photos courtesy of lil 1/2 pint, techmsg, dmitrybarsky)
PICC – Please Come Speak!
(PICC is a regional sysadmin conference to be held in central NJ on May
7-8, 2010. I’m on the planning committee. http://picconf.org)
Today is the deadline for proposals for papers, talks, and such.
We’re a little low on submissions so I’d like to make one more “beg”. We’d love to have a talk about PHP for sysadmins, something fun you’ve done with Arduino, your favorite JS library, a walk-through on setting up Google Apps. Demo your favorite open source project, or propose a panel of people to talk about something you find interesting (I can help find others for your panel). It is an excellent way to spread the word about a project you are involved with.
We’ve tried to make the proposal process really easy. Just send your
contact info and topic plus a 1-2 paragraph description to
submissions@lopsanj.org
For more info, contact me and/or view:
http://lopsanj.org/events/picc10/cfp
BTW, today is the deadline but we can grant extensions to anyone that writes and asks.
Submit your talk proposals to PICC today!
Flashback: HOWTO: Racks and Rackmounting
This is flashback week, and today I’m including a HOWTO that I originally wrote in June of 2008 called “HOWTO: Racks and Rackmounting”. I had decided that no one was focusing on the physical aspect of system administration, even though most of us still have to deal with it. So I put together this information in the hopes that it would be useful.
Since it was originally published, it’s been viewed over 2,000 times and is still in the top 10 blog entries, which means that still, not enough people are covering the physical infrastructure aspect of system administration.
Enjoy!
I’m going to start a special feature on Fridays. It’s going to be sharing the sorts of tips that systems admins need to know, but can’t learn in a book. There are so many things that you learn on the job, figure out on your own, or run across on the net which make you realize that you’ve been doing something wrong for years. Sometimes you learn about things that you might have had no clue about. For instance, I just found out that you can do snapshots with LVM
Anyway, this Friday, I’m going to be showing you what I know about server racks.
I started out on a network that had a bunch of tower machines on industrial shelves; the sort you pick up at Harbor Freight or Big Lots. When we moved to racks and rackmount servers, it was like a whole new world.
The first difference is form-factor. Tower servers are usually rated by the “tower” descriptive. Full tower, half tower, mid-tower. Rack Servers are sized according to ‘U’s, short for “Rack Unit”. It’s equivalent to 1 3/4 inches, so a 2U server is 3.5” tall. The standard width for rackmount servers is 19” across. Server racks vary in depth, between 23 and 36”, with deeper being more common.
Instead of shelves for each server, rack hardware holds the server in place, usually suspended by the sides of the machine. They allow the server to slide in and out, sometimes permitting the removal of the server’s cover to access internal components. Different manufacturers have different locking mechanisms to keep the servers in place, but all rack kits I’ve seen come with instructions.
Installing the rack nuts is made easier with a specialized tool. I call it the “rack tool”, but I’m sure there’s another name. The rack nut is place with the inside edge clip in place, through the hole. The tool is inserted through the hole, grabs the outside clip, and then you pull the hook towards you. This pulls the outside clip to the front of the hole, securing the nut in place.
A typical server will require eight nuts, usually at the top and bottom of each rack unit, on the right and left sides, front and back. Each rack unit consists of three square holes, and a rack nut is put in the top and bottom of both the right and the left sides. Several pieces of networking equipment have space for four screws, but I’ve found that they stay in place fine with two. I can’t really recommend it for other people, but if you’re low on rack nuts, it’s better than letting the switches just sit there (and it almost always seems like you have fewer rack nuts than you need once your rack starts growing). If you only use two screws to hold in your networking equipment, make sure it’s the bottom two. The center of gravity of a rackmount switch is always behind the screws, so if the top screws hold it up, the bottom has a tendency to swing out, and that’s not good for your rack or your hardware.
While I’m on the subject of swtches, let me give you this piece of advice. Mount your switches in the rear of the rack. It seems obvious, but you have no idea how many people mount them on the front in the beginning because “it looks cooler” and then regrets it when they continually have to run cable through the rack to the front.
Once your rack starts to fill out, heat will become an issue. When you align your rack for your air conditioner, another bit of common sense that’s frequently ignored. Air goes into the servers through the front, and hot air leaves through the back. This means that when you cool your rack, you should point the AC towards the front of your rack, not the back.
Air comes in here… And leaves back here
It’s probably not a stranger to anyone who’s used a computer, but the cables seem to have a mind of their own, and nowhere is it more apparent than a reasonably full server rack. Many higher-end solutions provide built-in cable management features, such as in-cabinet runs for power cables or network cables, swing arms for cabling runs, and various places to put tie-downs.
There is no end-all-be-all advice to rack management, but there are some tips I can give you from my own experience.
Use Velcro for cabling that is likely to change in the next year. Permanent or semi-permanent cabling can deal with plastic zipties, as long as they aren’t pulled too tight, but anytime you see yourself having to clip zipties to get access to a cable, use Velcro. It’s far too easy to accidentally snip an Ethernet cable in addition to the ziptie.
Your rackmount servers will, in many cases, come with cable management arms. Ignore them. Melt them down or throw them away, but all they’ve ever done for me is block heat from escaping out the back.
Label everything. That includes both ends of the wires. Do this for all wires, even power cables (or especially power cables). Write down which servers are powered by which power sources.
If you have a lot of similar servers, label the back of the servers too. Pulling the wrong wire from the wrong server is not my idea of a good time.
Keep your rack tool in a convenient, conspicuous spot. I ran a zip tie through the side of the rack, and hang mine there.
(Some photos were courtesy of Ronnie Garciavia Flickr)
When Blog Entries Ruled the Earth…
So, this coming May 14th will be Standalone Sysadmin’s 2nd Blogiversary, meaning it’s been around since 2008. When I first started it, I had a lot more time to write in it than I do now, which means that I wrote more entries, and because I was pretty much bursting with information that I’d never told anyone, a lot of the things that I wrote were probably more useful than the stuff that I put out on a daily basis.
I switched this blog from Blogger to its own domain in July of 2009, and even though I did import all of my old articles (they’re available in the convenient sidebar over there —–>) it takes extra effort to go back and read old entries (not to mention that there are over 400 of them! Yikes).
So this leads to the inefficient situation we have before us. I wrote a lot of (what I hope is) useful stuff, but I wrote it before the blog got popular, and most of you have never seen it. This week, I’m going to try to fix that.
I’m taking a week off of writing actual blog entries (honestly? not much of a change, I know. Sorry about that) to post some old stuff that I suspect very few of you were around for. I’ve also encouraged some other bloggers to do the same, because I’d love to see some of their great old blog entries that are still useful today. I know they’re out there, but none of us have time to weed through the histories to find them. This week is about bringing them into the open.
So please forgive me if this week seems like a flashback episode on your favorite sitcom, but I sincerely hope that you find the posts useful and informative. Thanks!
(Incidentally, while looking for the picture of the t-rex included above, I found this. wow!)
