I could have sworn I posted this earlier in the week, but I don’t see it in my blog anywhere. So, here it is for real.
Last week we were in what I could call a “very dangerous position.” … our RAID controller on our VM host began throwing PCI parity errors, which REALLY doesn’t go over well on our Linux host OS. This past Saturday night, I was able to take the host machine down and make it right. I shut our VMs down and copied them over to a separate disk array to keep them safe. Once that was done I went ahead and swapped the High Point RR2320 card out for an Adaptec 3805. The Adaptec card got very good reviews from other Newegg customers, and the price/features balance was a deal maker.
After getting the build started on the new RAID 50 array (w00t for background builds), I did a reinstall of Fedora 8, formatted the new array, and started bringing our VMs back. After installing VMware server and the web UI, I took a deep breath and pressed the start button to begin bringing the VMs online. I got more and more relieved as each VM came back to life, and I probably broke out in cheer once we were 100% back online. All that was left after that was to install NRPE so that our Nagios box could monitor the health of the VM host.
Some thoughts from this project…
- This couldn’t have timed out any better… since there was no church on Sunday, I was able to take 75% of our systems down with minimal impact on our users.
- I was REALLY wrestling with the thought of trying ESXi out on the host instead of Fedora…I imagine it would have worked. However, the ability to monitor the host’s hardware with Dell OpenManage is trump. Although OM is not supported on Fedora, it will run if all the dependencies are satisfied.
- I’m glad we’ve got a coffee machine in the church…I wouldn’t have made it through the night otherwise.
- My initial plan was to install CentOS on the this server, but I had problems with GRUB hanging at boot time after the installation. I went back to Fedora because I just didn’t have time to mess around.
- I’m very thankful that my fiance (or wife, depending on when you read this) understands what I do and accepts the fact that I’m passionate about this stuff.
- Our server room’s AC works very well… almost too well. Again, hot coffee was a plus.
|
I love being able to take a break from ministry to rest up and get rejuvenated. This year is going to be even better due to the fact that I will be going on vacation with my wife. :-) We’re planning on going up to Maine for a week. Nice walks, cool weather, and probably more lobster than we can handle…. I can’t wait!
This year is quite a bit different because it’s the first year I’ve been out of the state and totally out of driving range of the office. As you might expect, my boss is a little nervous about whether or not things stay running while I’m gone — I can’t say I blame him…this past week had me concerned as well. Everything appears to be in good shape now though, and I feel much better about leaving than I did 5 days ago. UPS shutdown scripts and backups are checking out good, and Todd’s got the documentation he needs to hopefully solve any situations that come up while I’m gone.
I’m ready to go on this vacation for quite a few reasons…first and foremost since Jess and I are getting married this weekend, this vacation is doubling as our honeymoon. I can’t wait to be able to spend this time with her as we officially start our life together. (I think the drive up there is going to be 1/3 the fun.) I’ve never been to Maine, but I know we’re going to have a fantastic time.
Secondly… I’m ready to get away. I would be lying if I said I haven’t been feeling a lot of stress lately or if I said my brain wasn’t ready to reduce itself to a pile of mush. There’s been a whole lot on my mind the past few weeks. I need to put all that aside for awhile and unplug my brain as much as I can — No work-related computer use (except if Todd calls), no ActiveSync, and probably no twitter or blogging either (as if THAT will make a dent in my posting frequency). My goal at this point is to try as hard as I can to remain “dark” for at LEAST the first 11 days. Is that possible…I don’t know. Am I going to try…you bet.
Lastly, I’m ready to rest. Maybe you can relate to this and maybe you can’t, but I have a VERY hard time shutting my brain down. And I’ve not been sleeping well lately. At all. Between wedding prep, moving out of my apartment, and late-night fixes on our infrastructure… man… my body has not handled it well at all. Obviously this goes hand-in-hand with the stress that I’ve been feeling. Either way, I’m ready to relax.
All this to say that I’m very excited to be starting on this journey this weekend. I’ve prepped about as much as I know how, and gotten as much tied up as I can before leaving. There’s still plenty of things to do, but all those things will still be here when I return.
|
I forget exactly how it was that I stumbled onto the Stuff Christians Like blog, but I thoroughly enjoy reading it. Jon’s posts expose Christianese tendencies that just about all of us are familiar with. Most of the time they’re funny, and I dig that. Some times they take a serious turn, and I dig that as well.
I thought a recent post of Jon’s had some IT flavor to it, so I decided to reference it here: "Holy quotes at the end of emails"
|
Last week I worked on what I thought was an open-and-shut VM storage project for our VM host server: I had decided to replace the CERC RAID controller on our Dell PE1800 because the controller was not playing well with Linux. This was causing a bigtime bottleneck, sometimes bringing disk i/o to a complete stop while the controller munched on data. The solution was to replace both the controller and the connected disks with a High Point 2320, which would give us a good chunk of VM storage on a speedy RAID-50 array.
So last week, I powered down our 10 VMs and moved them to a different server for storage. After all the VMs had transferred, I pulled the CERC controller and the old disks that were with it and installed the 2320 and 8 new HDDs; 6 for RAID-50, one hot spare, and one OS disk that would attach to the Poweredge’s on-board SATA controller. The project took most of the night, and after 12 hours and a few mugs of coffee, I had our critical production VMs back online on their sleek new array. Disk performance was outstanding compared to what it was on RAID-5 and as a result, the load numbers on the server dropped drastically. I called the project a success.
A couple days later while in the server room, I noticed that the blue status light on the Poweredge had started to flash orange. Thinking it might just be a loose panel somewhere, I inspected the server and found nothing. Some after-hours testing revealed that if I totally removed power from the server, I could get rid of the orange light. However, within a few minutes of running, the light would go back to flashing orange. Now I don’t know how you feel about your car’s "Check Engine" light being on constantly while you’re driving, but I don’t like it at all, and I equated this situation with just that. With no apparent sign of trouble beyond the flashing light, I made a note to run a hardware diagnostic on the server next week.
Fast-forward a bit to this past Tuesday. I woke up to the sound of my phone ringing. I was planning on going in to work a little late, but my plans were shot down by what Todd told me over the phone. He was unable to access his Quickbooks server, and Exchange was unreachable as well. A quick look at my phone showed alerts from our monitoring service around 5:30am…Exchange had apparently gone down and never came back up.
Once I got to my desk and fired up a VMware console, I was greeted by a myriad of errors for each VM. Apparently the High Point RAID card was causing some I/O issues and PCI errors, enough so that Linux had unmounted the array, instantly killing about 10 VMs. The only good news in this situation so far is that since this appeared to have happened at 5am, the nightly backups that were run the previous night would be enough to make Exchange almost completely current. Holding my breath, I restarted the VM host, and watched as each VM came back to life. After some looking around, each machine appeared to be in good form. All I could say was "wow." We dodge way too many bullets.
I had to take some time later in the day to get my head around this situation. The prospect of doing another "RAID Transplant" just 2 weeks after changing drives/controller was very frustrating to think about, but I couldn’t just let it sit. I’m about to go on a 2-week honeymoon. I don’t want my boss to have to call me during that time to ask for help, and I certainly don’t want to come back early to do any fixes.
So, beginning tomorrow night, I’ll be taking our VM host down yet again to change the RAID controller out. The project will start with transferring all of our VMDKs to the backup server just to be safe. Then, I’ll be replacing the High Point 2320 for an Adaptec 3805 and transferring all the VMDKs back onto the new array (if needed). I’ll probably finish up by Sunday evening (I plan on taking a break to fish in the afternoon).
Usually I feel pretty good about fixes, and maybe it’s just the looming "deadline" I have coming up, but I feel like I’m about to throw a Hail Mary pass on this one. The blinking orange light and the errors only started after the High Point card was put in, but I’ve seen stranger acts of coincidence. If we still wind up getting PCI errors after the card is swapped out, we’ll need a bigger badder solution.
|
I’ve been looking for ways to make my life as an IT guy easier. I know, what a fresh concept, right?
Most of the time I find myself either in the middle of a big project or fighting small fires disguised as helpdesk tickets. The past 2-3 weeks has seen me doing little-to-no helpdesk work, and so I’ve taken the opportunity to look at various incarnations of monitoring and alerting tools. As I’ve looked around, the traits for monitoring/alerting software match the model of most other software: "Inexpensive, easy to configure, good features…chose two." That left me with two options…either spend money I don’t have, or get into things that I don’t know much about. I decided to take a dive into the open source world to experiment with a couple monitoring tools, which meant brushing up on my *nix-fu and swimming through documentation and forums.
My first endeavor was setting up Cacti - a pretty well-known graphing tool that allows you to create rather sexy-looking RRD graphs of just about any metric you can grab with SNMP and WMI (you need to do some wrenching for WMI though). In addition, Cacti has a very large user community and is very well-documented. Chances are that someone’s made a graph or data template for whatever you’d like to monitor. I’ve currently got Cacti up on a test box and am very pleased with it so far. It took awhile for me to even partially get my head around how data flows through Cacti from creating the initial query to putting that data on a graph. Once I have it nailed down, I’ll explain it in terms that are easy for me (and hopefully you) to remember. After Jess and I get back from Maine, I hope to get a fully-documented production box up and running.
Cacti is great for graphing, but unless I apply some add-ons (which I’m not ready for yet), I don’t get notified when thresholds are crossed or when a host or service drops out completely. I set up a Nagios box to fill in the gaps that Cacti leaves open. Nagios is another open-source monitoring tool that allows you to keep close track of stats and services, and allows you to set thresholds for "warning" and "critical" statuses. It’s very flexible and allows you to monitor just about any SNMP or WMI element that you want. Depending on where you download it from (the VMware Appliances page has it available in a number of flavors), you can use a WebGUI to editing the config files. Personally, I found that to be more complicated than just creating/editing config files by hand. Again, once you understand the way the configurations work and everything "clicks" it becomes a piece of cake.
I’m hoping that these two solutions will help keep things running smoothly. So far the price has been right. Like I said, I had a LOT of brushing-up to do on my Linux (thank goodness for the ability to snapshot a VM while it’s hot. I’ve definitely used that feature while getting back into the swing of things), and I’ve definitely felt some frustration while getting my head around how Nagios and Cacti are configured. It’s been worth it though, and I’m anxious to see how far I can go with these products.
|
Filed Under ( video) by Dave Mast on June-30-2008
When our control room for live video was installed, it was installed in a hurry. Now, that’s not a finger-point at anyone that was involved, we really only had about 5 days to make it work before Opening Day in the new building. Since then, equipment, functionality, and — you guessed it — more cable has been added to the mess of audio and video cable running along the wall.
About 6 weeks ago, we began plans for a project to clean this all up. The big goals were…
- Move our audio operations to the adjacent room, essentially creating the ability to have a broadcast mix.
- Move a majority of our production gear to a rack, and relocate the cabling in order to clean things up.
The big deal of this project was the time window — The control room simply must be operational on the weekends, so we had a limited amount of time to do what we needed to do without putting the service in jeopardy. We ended up splitting the project across 2 weekends: The first week would be dedicated to audio work (the smaller half of the project), and the more intense video half of the project would take place on the second week.
During "Audio Week" (June 16-20), we also did some preliminary stuff for "Video Week" (June 23-27) — this included tracking down and labeling every incoming and outgoing video/audio line, as well as installing a new Mid-Atlantic ERK Series rack for the video gear to move into. Quite a few of our video lines came up to the control room through the floor (the control room is on a mezzanine, so installing the rack also meant cutting some holes in the floor with a core drill so that wires could be pulled back into the floor and up through the new holes to the rack. The concrete floor of the control room sits over top of a drop ceiling, so this would be easy in theory.
The first step was to prep the floor area where the rack was going and cut the holes that the wire would pass through. God had a hand in this project from start to finish, and it became evident on the morning I was headed towards NPCC with the core drill that I just rented. Dave H. — one of our control room operators, was already on the scene and ready to help out with the drilling. Not only that, but Dave H. has extensive experience with concrete, and this was HUGE because I had absolutely none of that, much less any experience operating a core drill. The morning would not have gone nearly as smooth had Dave not been there.
After the holes were cut and the rack was secured to the floor, we ran some 22/2 cable between the control room desk, and rack, and the new "broadcast mix room." Once we did some soldering and verified that all our connections were in good shape, we took a break for the weekend.
The video rebuild was a bit more involved. There were quite a few lines to move from their old locations, and the goal was to get everything functional by Thursday. The first step involved gutting the entire control area and moving all our video gear to temporary storage.
Once all the cables were disconnected, they were pulled down through their conduits to the drop ceiling area below…
After all that, we began the process of moving gear and wire into the new rack.
We did run into a couple small snags with some over our CAT5 that is used to send XGA signals to the stage, but no show-stopping problems. And again, God’s hand was very evident this entire week. Volunteers made themselves available to help with wire pulling and cleanup and just about everything worked the first time around. On a side note, I claimed a huge personal victory this week by going up 30 or so feet in a scissor lift to install a lipstick camera on the auditorium ceiling without freaking out (I battle acrophobia and have a tendency to "lock up" at around 15-20 feet).
We ended up finishing the project Thursday night around 6:30pm.
There’s still a little bit of touch-up work to do for aesthetics and friendliness, but functionally everything checks out good and signal is flowing everywhere it needs to be.
Some thoughts from this project:
- I’m getting better at detailed planning. I’ve always been good at execution, but often times I would jump into something without nailing down the details, and it would result in a lot of problems. I feel real good about how this project panned out.
- The volunteers that helped in this project were invaluable and absolutely crucial to its completion. Thank you guys!
- The HD-SDI signal format is my friend. One 75-Ohm cable for 1080i is sweet.
- Next time we do a big wiring project, we need to rock up some cable trays.
- Zip ties are so out.
- Velcro is so in.
- I used to be able to bust out all sorts of productivity by going all night — not the case anymore. I really had to pace myself on this project and quit when I was tired.
Looking forward to the next project!
(Want more pictures? They can be found here.)
|
How nifty is that?
The story starts last Wednesday when Brandon (one of our music/video interns) and I went to Big Jim’s for lunch. For reasons unknown, I took my car keys out of my pocket and put them in the cup holder of his car. We had lunch, we came back to work, all was well … except that I forgot that my keys were in Brandon’s car.
I had forgotten all about my keys until after band rehearsal which was around 12:30am. Brandon had already left, which didn’t matter because I had long forgotten that my keys were still in his car. After searching everywhere I could think of looking (I had only been in a couple areas all day), I gave up hope of finding my keys. It was 2:30am, and I was exhausted. I found my way to a room with a comfy couch and crashed there for the night, hoping to get some decent rest.
I woke up about 6 hours later (I actually had a shower and a change of clothes), and for no apparent reason, updated my Facebook status something to the effect of "Dave has lost his car keys." About 15 minutes later, Brandon (who is way more into Facebook than I am) showed up at my desk with a grin, plopped my car keys down on my desk and said "Did you lose these? I saw your Facebook status and remembered the keys in my cup holder."
So there you have it — I experienced a tangible benefit from being in a social network. I wish I could say that it wouldn’t be the last time, but the jury’s still out on that.
|
Filed Under ( multi-site) by Dave Mast on June-4-2008
It’s coming up on 3am, and I’m just now starting to wind down a little from tonight’s work night. Although my project for this evening didn’t exactly turn out as I planned, it capped off a pretty productive day. The highlight of the day has been our IT operations meeting. We had a good bit of discussion today on multi-site and what we’ll need to implement to be ready. Some good discussion…
- Network reconfiguration (addressing and masking needs tweaked - already on my list)
- Data link for streaming and network access
- Who provides it?
- What if what we need is not available? (we’re rural, so that’s possible) Do we call the power company? Pull fiber in? Load-balance multiple instances of available connections?
- Phone system changeover (we will be dropping our PBX and going VoIP)
- Increasing systems availability (just because the power goes out here doesn’t mean it’s out everywhere)
- Centralized storage - it sure would be nice, but what does it look like for us?
Another big item will be getting our CMS (F1) ready for this transition. There’s a lot of background stuff that needs to happen ahead-of-time for this one.
I know even this is just the tip of the iceberg when it comes to the big picture … but are there any IT-related items that I’m missing on this?
|
I’ve been thinking a bit about multi-campus issues lately. We’re not there yet, but I think sooner or later we will be, and I want to be ready to roll when it happens. Being a dude that has no formal IT training and has only managed a single-site network, I’m both excited and a little anxious about the prospect of taking our IT and video infrastructure outside the walls of what would eventually be the "Central Campus."
Last week I had a chance to talk with Jared B. from NewSpring Church and get my mind around what it will take to stream live buffered video from one campus to another. The tools that they’re using are amazing, and seem to be just the fit for what I like to call "North Point-style video" (IMAG on the sides with a HD image of the communicator in the center). The idea of being able to get our services streamed to another location is insanely cool to me, and next week, I’ll be calling around to see just how big of a data pipe I can bring in to NewPointe, as tihs project will require a HEAP of bandwidth for both locations. (One of our contractors told us there’s fiber possibly running along our road already, how sweet would it be to ride that.)
Another item to consider will be our network. When it was set up in our current building, I wasn’t even thinking about multi-site ventures. As a result, the network isn’t in prime condition to be bridged off just yet. I want to keep our domain intact across our organization, so eventually I’ll need to convert everything from 10.0.0.0/8 to 10.0.0.0/16 and then begin testing site-to-site stuff. I really don’t want to spend on on new firewall hardware unless it’s absolutely necessary, so I’m gonna be banking on Brutus (our pfSense box) to make it happen.
Like I said, this stuff has me a little anxious, but VERY psyched at the same time. We don’t have a set date or anything like that for acquiring a second campus, but if I can get some preparations done before that all hits, I’ll feel real good.
|
For me, blogging is a bit like exercising: If I stop doing it, I eventually find it very hard to get back in the groove. There’s been a lot going on lately at NewPointe, and I’ve had a tough time breaking away to write anything even resembling a blog post.
I’m going to make an effort to get back into posting. There’s a lot of cool stuff going on at NP that’s worth talking about, so bear with me as I try to get my blog on again.
|
|
|