/* Partykof - Managing information and Technology */
In this blog, I am summarizing some of my work so far and the issues I'm facing everyday in my work as an IT professional.
You are welcome to follow, comment and share with others. If you want to drop me a private note, send me an e-mail

Saturday, March 26, 2011

Netapp FAS or IBM N-Series, this is the question?

Around this April, IBM is finally about to announce their equivalents to the FAS3270 in N-Series. This raises the question again, which is better to acquire?

I've used them both and have seen advantages to both platforms. Beside the late bloom of the N-series equivalent of the FAS and it's derivatives, there is the issue of support, which is provided by Netapp engineers. But the most troubling issue, is related to IBM's commitment to N-series in the next few years with their Storwize product line.

On every recent session I had with IBM, the V7000 storage system is offered first. The latter product line is aimed at Mid-Range Disk systems, just as the N-series was positioned so far. When I point out that SAN environment is a must then the XIV product line is suggested.

To my honest opinion, it seems that IBM is keeping N-series customers on low fire, just till they have a better solution for this SMB, Mid range market.

For someone who is an IBM customer for many years, it seems unfair – On one side show commitment to the N-Series by introducing new products, on the other side, develop and strongly market their in house alternative. Maybe it is Netapp who should raise this question on behalf of the customers and ask IBM to share their 5-Year roadmap vision of both product lines, so we as customers should know what to choose at this point in time.

What is your insight?  - You are welcome to leave your opinion on the comments below.


Wednesday, December 15, 2010

Mobile devices - The backdoor to your enterprise

In the past years, mobile devices and smartphones in particular have presented a new threat to the leakage of information from the organization. The CEO's iPhone was lost and his emails are now exposed to the world, or maybe worse, the Sysadmin's blackberry with some passwords or other important data was stolen. Well, there are many ways that can help you assist minimize the risk by remotely revoking or erasing the device, and I am not going to discuss those.
The huge install base of the iPhone and Android devices is very appealing to hostile entities who wish to penetrate your organization's shield and retrieve information or maybe just damage it. Be it Cybercrime criminal or Cyberwar soldiers, these mobile devices have become their gateway to your fortress, and they are not that protected.
Apple offers the iTunes store where you can download thousands of applications for your iPhone. Are these apps secured, well, some are, and most of them are really harmful, well let's just say for the sake of the discussion they are.
Google offers a huge Market place for Android applications, developed under the umbrella of the open source community, which allows a variety of apps which anyone can develop. Are these apps verified as secured, well they might be, but then again their not really checked, not each and every one.
Let's take a simple scenario where an app is checked and it seems secured. But if someone creates two apps that each on its own, is harmful, but when put together on the same device can become hole for the dropper. The dropper, is a payload carrier for hackers to put any kind of code they wish to hijack your device to their needs.
The problem becomes very clear, when you jailbreak an iPhone, in that case, the jailbreak application or its process can leave a hole for that same purpose. It might be later, when you download a cracked app that you can find on Cydia, or maybe the one you downloaded from a torrent site somewhere, can be this mobile Trojan horse.
The next stage will be for this Trojan to collect your stored credentials to gain access to your corporate network, or maybe to place another dropper that will place a Trojan the minute you plug your iPhone to your computer, and their in.

For now there is no real way to identify a jailbreak iPhone remotely, since Apple cannot keep up with the ones who develop it. Some even say that they are silently dropping the SDK that was used for that, which other companies used to develop product that would block it.
There is not much to do around that except be smart. Here are some tips I could think of that might assist in this situation, at least until someone comes up with a solid solution.

  1. Communicate and educate users about these threats, so they will be aware of the consequences.
  2. Set a policy that allows only iPhones that have not gone through the jailbreak process to be connected to your servers.
  3. Consider using an anti-virus application on your mobile devices. 
  4. Recommend that the iPhone will be used for business purposes only – well, as much as possible.
  5. Enforce password access to unlock  these devices.
  6. Purchase and install certified apps only.
  7. Make sure you can remotely disable and wipe the device in case it was lost or stolen.

I guess there are many more that others could think of. If any comes to your mind, you can leave a comment or maybe send me a note and I will include them.

Be safe.


Friday, December 3, 2010

Naming conventions in IT environment

This post is intended to provide a common set of guidelines useful when handling large number of records in your IT environment, records such as usernames, computer names, devices and other records, by applying some naming convention to these records and explaining the standard settings that will help others understand these conventions and schema. One common way of differentiation between elements is using suffix names, that is if your environment spreads across multiple countries or domain names. The scope of post is related to prefix differentiation which is a problem in local environments or databases.

Not much attention is given to naming convention in the IT environment, especially when starting out small. I remember my first network carried out names of characters from Joseph Heller's novel, Catch-22. At first it was just for fun and it was very easy to remember Yossarian, Milo, Orr, and majorx4 (Major major major major). As my network got bigger, I ran out of funny name to choose, and things started to get complicated. I started using characters from the Greek, Roman and Viking mythology, till it became a nightmare. I had to come up with a method that will help me identify nodes without remembering look-up tables in my head, so I stated looking for some kind of format that later on became very useful when I used an asset management tool.

In large network it is very common to use of some kind of database that holds records. Be it an LDAP (such as Active Directory), Yellow Pages YP/NIS, a DNS, a DHCP or a CMDB. This is why it is necessary to keep in your environment unique values for records such as computer names, usernames, asset tags and email addresses, so you can differentiate between them.

Reasons for using naming conventions:

  1. The need for standards and uniformity
  2. The use of logic to quickly identify objects
  3. Granular differentiation of elements, versions, locating and security reasons
  4. Uniqueness or records, in databases such as IDM, ITAM and others like those mention above
There are many ways you can differentiate between elements, here are some examples.

Physical differentiation – by the location of the object such as:

  • Subsidiary city
  • Building number
  • Floor/Level number
  • Room number
  • Factory line number

Logical differentiation – by the relation of the object such as:

  • Ownership - Owner user, Department, Organization Unit or Cost center
  • Type- Printer, Server, Computer, Switch, Filer, Desktop, Laptop, Phone or Tablet
  • Function – Email, DB, Web or File Servers
  • Permissions – Anonymous, Standard, Administrator user and so
 Figure 1: Physical and Logical differentiation in Top to Bottom view

Here are some examples that might clearly the idea. First examples relate to users and employees names. I'll use my domain as an example but it can be any domain. The primary objective is to have uniformity in the convention when selecting computer names, usernames, email addresses or any other identifies, as they may affect the use in an Identity management tool.
Let's take John Doe For example, let's say that John belongs to our Chicago office, his office is in building A, he works in the marketing group, and he has a laptop, a portable projector and a mobile phone.

His employee name – Should be recorded as John Doe, not john doe, John doe, Dow john, J0hN doW! or any other combination. If you have another John Doe in your company, you may use his middle initial or any other distinguishes.

His user name - Can be set by using his Surname and a first letter from this First name such as doej or the other way around johnd or b adding another identifier, Doej01.
His email address – should be set from his employee name, such as John.Doe@partykof.com.

  1. You should avoid using his username as the external email address as it can giveaway his username, which makes it easier for hackers to brute force their way in.
  2. I prefer using the dot (.) to differentiate between first name and Surname and reserve the underscore (_) to differentiate between different names, and the dash (-) for concatenated surnames. For example John_Michael.Doe-Benz@partykof.com
  3. In cases where the user has a very long name, such as John's it might be wise to shorten the email address, JM.Doe-Benz@partykof.com
Now for John's devices, The little background we got might help us decide on suitable names for his devices.

His devices might be named as:

His Laptop - chamkt-doej-lt
His Projector- chamkt-doej-pj
His Mobile Phone - chamkt-doej-mo
I used the following schema:
CH for the Chicago branch
A for building A
MKT for marketing
DOEJ for his username
MO for mobile, PJ for projector, LT his laptop.
Some other devices around John might be:

chamkt-prt1  - his departmental printer
chamkt-plt1 - his departmental plotter
chamkt-fs1 - his departmental file server
cha-sw-core1 - his building network core switches
ch-srv-ex1 - his branch exchange server
Other areas in IT where you can use naming conventions in a similar concept are:

  • Storage Systems – Filers, Aggregates, Volumes, Luns, Folders
  • Storage Networks – Fabrics, Zones, Switches, WWNs, WWPNs
  • Networks – WAN and LAN elements, VLAN, VPN, DMZ, firewalls, Routers, Access Points.
  • Applications - Databases, Tables,
Now remember, these are only examples. You should choose your own schema for your naming conventions as it best suites your organization. There are, however, some basic rules you should comply with.

Basic Rules

  • Avoid using non-alphanumeric characters use only letters (A-Z) and numbers (0-9) in your computer names. Underscores and other characters may cause problems with DNS services.
  • Use up to 15 characters for computer names, as some services such as NetBIOS and WINS are not compatible with more.
  • Avoid using duplicate names, even if they are in different levels which are permitted. In some cases they can cause mix-ups, such as in the case of AD forests and OUs
  • Avoid schemes that will lock you in cases of mergers
  • Although AD supports it, a user name should not contain a space: for example, user name; as many systems do not support it.

Other Guidelines

  • Keep names short and meaningful as possible
  • When using usernames within the computer name, remember to change the computer name when you assign it to a different user.
  • Build your naming conventions in a top to bottom hierarchy, your prefix should start with the top element.

Some References:

  1. Naming conventions in Active Directory for computers, domains, sites, and OUs
  2. Special characters in user ID and passwords


Naming records in a consistent and logical way will help distinguish between records in a glance. Naming records according to agreed conventions will make naming records much easier for all IT parties, it will streamline adoption of management application or new systems and allow a simple expansion of your organization.


Monday, August 23, 2010

Server management – In and out of band infrastructure – Part 1

One of the numerous tasks of an administrator is to access and control IT assets across the entire organization, be it inside a local data center or on a remote office.
Located somewhere in the Middle East, I was responsible for a full production corporate data center sited in Santa Clara, CA. and support another engineering data center in Shenzhen China. This drove me to find a solution that will allow me and my group complete access in order to maintain services availability.
Usually the focus on enabling access to a server is based on the criticality of the application running on the server or the service it provides. To deliver the highest possible level of availability, you need to make sure you minimize the down time of a service - First you need to know that it is down, than you should figure out a plan for how to repair it. The time measured between the failure notifications to its repair is called, Mean Time to repair (i.e. MTTR). MTTR should be as short as possible, but it is really defined by two objectives.
  1. Restore time objective (RTO)
  2. Restore point objective (RPO)

Discussing these objectives is a can spread across several posts, but this post can assist with on minimizing the RTO, and deal with normal operation and not only in crisis situations.
For the sake of this discussion I will present in this post several options of connecting to a server for managing it, though in practice, only the options that will allow recovery of the service back to operation should really be implemented. To define which connections you will require, you need to come up with failure scenarios, and which connection will be utilized to overcome these failures.

The connections to a server are divided in to two major categories:
  1. Out of Band infrastructure (OOBI) – utilizes a management channel that is isolated from the data channels.
  2. In Band infrastructure – allowing management through the use of regular data channels, such as Ethernet network, to the managed device.

Figure 1: Server interfaces for management

Out of Band interfaces

  • VGA – This port refers to the display graphics output of the server, it is based on a 15-pin VGA connector to a display monitor just as you would use on a desktop. Together with a keyboard and mouse connected to the server you get a full graphical interface to the manage the local operating system (OS). Some new systems will offer DVI or HDMI interface instead of the VGA port.
  • Modem – A modem is an interface connected to the server, which allows remote dial-in to the server using a different network than the standard data network, it might be a standard telephone line, an ISDN connection or a GSM/UMTS/HSDPA wireless connection.
  • IPMI/BMC – Is a new concept based on a System-on-Chip integrated to the server’s motherboard. It allows an IP based connection to the system, and is independent from the OS status, which means it works even of the OS is down. This interface provides access to the system platform, BIOS settings, remote screen view - graphical or text mode.
  • Serial – Is an interface which allows a connection based on a serial protocol such as RS-232, which provides access to the system console; a legacy device such as Digital’s VT100 could provide a basic terminal interface, usually text based for performing administrative tasks to the local system.
  • USB – The USB interface is utilized in several ways, either by connecting the Keyboard and mouse in VGA mode, or connecting an external Modem or a remote managed UPS which can be used to power down the system.
  • Power – “He who controls the power control the device”. Power is the fundamental element for every electronic device. If you can power on or off the device, you have basic management over it. Such as in the case of remote routers which are not responding.
  • Vendor Specific – Major vendors developed dedicated interfaces to manage their appliances or devices. These implementations vary from on board solution such as Oracle’s (Sun) LOM, HP ILO or dedicated additions to the system such as IBM’s RSA card or Dell’s DRAC.
In band interfaces
  • Ethernet – Using the system’s network interface, the operating system could run an application that provided management capability for the system OS and its hardware platform. From the simple network management protocol (SNMP), Telnet, SSH, a full graphic Remote Desktop Connection (RDP), VNC or in some cases a vendor propriety agent, such as IBM Tivoli or BMC PATROL.
  • IPMI over Ethernet – In some cases, hardware manufacturers will use the Ethernet interface to provide access to the hardware platform’s System-on-Chip. It will be assigned with a different IP address in case the OS fails. This solution is very useful in dense environment and is used to save cables, switches or ever ports on the actual hardware. What it gains in saving it loses in security, as some will advise the need for a separate management network to limit access.

Connecting a single server to different kind of management interfaces can contribute a lot to the cable sprawl in your data center.

A sample for a complete solution would look something like this: (keep in mind that this is for a single server)

   Figure 2: A single server management architecture

Keeping track of every server and the different ways to connect to it, is becoming a very difficult task. Just try to imagine it; you keep a table of the users, and the password, the IPMI IP addresses, the modem extensions, the power switches and the VGA monitors, for each connection. This can become a real headache.
Fortunately, there are many great solutions available today by vendors such as Avocent, ATEN,  MRV and many more, that allow us to minimize this sprawl. Solutions such as multiplexed KVM switches, Serial console servers and even an IPMI portal.
In my next post - Server management – In and out of band infrastructure – part 2, I will cover such solutions.


Friday, July 30, 2010

Troubleshooting problems in linux, based on a sample for DD-WRT web GUI not responding

In this post I am going to present a sample troubleshooting procedure for a linux box, where the web interface suddenly stop responding after few weeks of normal operation. I will present the use of basic tools embedded usually in any linux box, and an external monitoring tools based on MRTG.

I use a Linksys WRT54GS wireless router running DD-WRT v24-sp1 mega firmware. It is a small appliance that is based on Broadcom BCM4712 chip and is running a scale down linux OS. Since I installed this version I noticed that once in a while I am unable to access the web interface of the router. The simplest solution was to power cycle the router by unplugging its power plug out, but that meant getting to my router which sits in somewhere in the attic.  I decided to try and figure out what was it that was causing that.

First, I configured SSH access to the router, so I would be able to remotely connect to it, and reboot it in case I needed to. I also configured SNMP monitoring for it, to collect statistics of its performance.
Once the problem reoccurred, I was able to connect to the router and run a simple top command to see what processes are running and see if it can help me figure out the problem.

Figure 1: Console view of top output

Immediately I've noticed that the router load is high, and the process that is causing that was the web server daemon, httpd which was consuming 98.2% of the cpu.
Wondering when the problem started I turned to the RRD graph and noticed that it has been going on for more than 3 weeks, at the beginning of week 28.

Figure 2: Weekly view of router CPU load

In Figure 2, you may clearly notice that the router load has dramatically changed above the load value of 1, which means that the CPU was working at 100% and was queuing processes, which in turn means performance degradation.
I tried correlating the problem to memory or traffic incident at the time the problem started. Figure 3, shows the memory utilization of the router and Figure 4 shows inbound and outbound traffic on the router WAN bridge.

Figure 3: Weekly view of router memory usage

Figure 4: Weekly view of traffic on WAN interface

Looking at the beginning of week 28 of both graphs, I found no relation to any issue at the time the problem started or that these parameters would cause this problem.

Another point that might cause an effect is the system's disk capacity, but in such a small router, the whole file system is always presented as 100% full, so this would not present an indication for a problem.

With no luck figuring out the cause of the problem, but only the symptom, I googled it, and guess what, it is a know issue. According to others in the DD-WRT community, the problem is caused from using intensive P2P services, but currently there is no resolution for it, but to use the Mini firmware version.
Since I need the Mega firmware version for VPN and VOIP, I cannot afford to downgrade my router. So the best way is to live with it. To make life easier, I wrote a small script that I can run remotely that will restart the web service, without even having to interactively login to the router.
   #! /bin/sh
   stopservice httpd
   startservice httpd 

You can view a nice reference for doing this procedure in this Link

In summary, although this is only a small linux box, or a router, the basic procedure to identify a problem or its symptoms are the same, you should look at the system at normal operation and compare any irregularity to that steady state. The use of MRTG tools to collect statistics for reference is very important and useful for troubleshooting or capacity planning.


IBM acquires Storwize, A real-time in-line lossless data compression

A new announcement is spreading across all storage magazines saying that IBM announced today that it has decided to acquire Storwize which provides real-time data compression technology.

About Storwize
Storwize, headquartered in Marlborough, MA, with an R&D office in ISRAEL, provides online storage optimization through real-time data compression. Storwize's Random Access Compression Engine™ (RACE), applied in its STN appliances, transparently (in-line) compress primary storage up to 80 percent. They promise random access and deterministic, lossless data compression with no reduction in performance.

Key Values
The Storwize solution value is based on three issues.
  1. It is based on existing industry LZ compression algorithms, such as the one being used in standard tape backup operation, but its revolutionary idea is that it does it in real-time with no data loss.
  2. It is very simple to deploy; it is a plug&play solution that is seamless to day to day operation, installed in less than 30 minutes. Compression can begin immediately for new data; old data is compressed seamlessly over time.
  3. It presents immediate ROI - it allows a significant saving from day one and enables bigger operational capacity in storage and performance with current investment. 
 Figure 1: Typical Storwize solution

Advantages with current storage investment 
  • An implementation of the Storwize solution will provide the following benefits. 
  • Compress the data on existing network storage systems and save the next disk purchasing investments.
  • Compress data going in to the storage systems, which means it will extend the performance capacity of the current systems to a longer periods and delay the acquisition of such systems.
  • Immediate boosted to user experience will be as a result of this reduction of load on the storage system.
  • Reduce the need for users to sort, delete or compress their files and keep up with their current quota, hence freeing users from tedious tasks, and focusing on real work. 
  • Recovery time from tapes will be reduced dramatically, as less data will be transferred from the tapes to the disks. 
  • Power saving - Green computing - when using less disks to store compressed data, you save the power of the disks shelves that were needed for the uncompressed capacity. 
  • Smaller footprint - Floor space savings, when using less disks shelves, you delay the need to expand the expensive data center floor space. 

Risks to consider 
As any new solution to be integrated in to your computer environment and being it relatively new technology on the market it presents several risks that must be address or at least be aware of.
  • The appliance is placed in-line between your network switch and the storage system, which means it is another failure point in the critical path of your environment. -Precaution: Make sure you deploy Storwize’s fault tolerant solution to avoid single point of failure.
  • Overlooked compressed configuration could result in data loss  - Precaution: Set configuration control procedures and change management to avoid faults. 
  • Introduction of a totally new system with no prior experience  - Precaution: Seriously consider holding training sessions for IT personnel who will manage this environment. 
  • Scale out lockdown when using Storwize solution with new NAS technologies. No support for global/shared name space - Precaution: Consider deploying this solution on isolated controllers at least until Storwize offer a solution for Persistent namespaces. 
With over 18 months of experience of working with this solution, I can say, the results it presented were great. I noticed a very good compression ratio of typical data on the storage systems, while presenting performance improvements. Some configuration issues were discovered early in the deployment however they were immediately resolved by Storwize.  This solution is indeed revolutionary in its concept and the results. It presents many advantages and some risks which should be addressed as advised if this solution is to be considered.


Tuesday, July 20, 2010

Configuring a server for optimal performance

The preceding posts have illustrated the major building blocks that effect server configuration; I explained the importance of each one and the priority of adding it to the system.
If you missed them you can check these links:
In this final post of server configurations, I will present examples of configurations and areas where they should be applied.

Major Configurations
The configuration of a server is derived from its target application requirements. There are four major configurations

  1. Maximum Performance 
  2. Balanced Performance 
  3. Maximum Capacity 
  4. RAS configurations

Maximum Performance 
    This configuration is intended to get the maximum CPU frequency, and maximum memory bandwidth. It usually uses low count of memory, as you populate only one DIMM per channel (i.e 6 DIMMS overall). The common use for such servers is for High Performance Computing ( HPC) in research organization, Oil & Gas industry and Chip Design.  
 Figure 1:  Maximum Performance

Best configuration at the time of publishing this post:
  • CPU - Intel Xeon X5680 (3.33GHz), 6 cores per processor.
  • Memory - 6 PC3-10600 DIMMS (such as Kingston KVR1333D3D4R9SK3/24G) to allow 48GB of RAM, at 10.6GB/s bandwidth to memory.

  Balanced Performance 
    This configuration is focused on getting a balanced configuration between the maximum CPU frequency, and maximum capacity of memory. It usually uses medium count of memory, up to 96GB per host. The common use for such servers is for virtualization and other standard enterprise applications.  
 Figure 2:  Balanced Performance

Best configuration at the time of publishing this post:
  • CPU - Intel Xeon X5680 (3.33GHz), 6 cores per processor.
  • Memory - 2 DPC, 12 PC3-8500 DIMMS (such as Kingston KVR1066D3Q8R7SK3/24G) to allow 96GB of RAM, at 8.5GB/s bandwidth to memory. 

  Maximum Capacity
    This configuration is focused on getting a configuration that will support the maximum capacity of memory, with a considerable compute power. It usually designed to use as much as 144GB of RAM per host  ( 296GB with the upcoming 16GB modules). The common use for such servers is for very large scale database servers.  
 Figure 3:  Maximum Capacity

Best configuration at the time of publishing this post:
  • CPU - Intel Xeon X5680 (3.33GHz), 6 cores per processor.
  • Memory - 3 DPC, 18 PC3-8500 DIMMS (such as Kingston KVR1066D3Q8R7SK3/24G) to allow 144GB of RAM, at 6.4GB/s bandwidth to memory. 

 RAS Configuration
    RAS stands for Reliability, Availability and Serviceability.  Although the ECC technology offers error correction, it does not provide any failover capability. Replacing a DIMM in case of failure requires a power down of the system. The RAS configurations offer three memory protection options:
    1. Online spare memory mode
    2. Mirrored memory mode
    3. Lockstep memory mode
              This configuration uses only two out the three channels.

     Figure 4:  RAS configuration

       Online spare memory mode
        In this mode, one of the channels is designed as spare. This channel is not used in normal system operation. If a working DIMM exceeds the threshold of correctable memory errors, the system switches to the standby channel and the faulty channel is taken offline. 
         Mirrored memory mode
        In this mode, the same data is written to each channel and the read is alternated between the two channels. If a working DIMM exceeds the threshold of correctable memory errors in one of the channels, the faulty channel is taken offline and the system switches to using only one channel. 

         Lockstep memory mode
        This mode uses two memory channels at a time, and the work as a single channel. Each read and write operations moves a data word two channel wide. To provide double 8-bit error correction within a single DRAM. This mode is the most reliable but it reduces the maximum memory capacity as the third channel is not used.

      By now you should have the tools to configure your server for the optimal performance you will need for your application. You should focus on the application's memory requirements and start from that point to configure how much memory you should use and in which configuration of ranking and population.