Unix and Linux Systems

2008/03/26



I'm beginning to really hate the old SCO systems that
are left out there. However, when they call with a problem,
I have to help, right? Well of course I do. But I can't
help cringing sometimes..


This call was from a Boston client who honestly is trying
to get off this creaking old boat. They can't seem to find
off the shelf software that will meet their needs, but they
have hired some company to write them a new app. Unfortunately
it will be Windows, but that is what it is, right? Anyway,
while the app development proceeds, the old SCO has to keep running.
Early on a Friday morning, it stopped.


Well, not exactly. It stopped letting people log in. People
who were already logged in could work, but recieved a message
described to me as "something like 'system database not allocated'"


Well, OK - normally I'd complain and say "Please give me the EXACT
message you saw", but for this I knew it had to be something
in the TCB (Trusted Computing Base) being messed up. SCO has
built in tools to examine and even fix up minor problems, so
I felt I might be able to lead someone through fixing it,
but on the other hand this stuff can sometimes get nasty so I
wasn't sure. I had another problem too: I had to be somewhere else
in fifteen minutes and I wouldn't be able to talk anyone through
anything until I was done with that.


"How many people are still logged in?", I asked.


"Four"


Well, heck, that's not so bad. It's Friday, a slow day for them
as it is for most businesses, and they only usually have about
twelve or so people working on that system anyway. I asked if
they could limp along for a few hours. Yes, they could.. but
please hurry.


So I went off to my meeting, but couldn't help thinking about
how I'd approach this problem. Most likely, I could just have
them run "integrity" to get a list of damaged files and restore
them from backup. Most likely..


That's assuming the backup is good, of course. What if the backup
has messed up files or the problem is really somewhere else? I
can look at a tcb file and know what I'm looking at, but it
would just be gobbledy-gook to the person at their end, so I'd
have to get them to print things and fax them to me.. yuck. I
shortly convinced myself that it was better to go in to the job.


After finishing up my other business, I called and explained that.
I said that I probably could fix this over the phone, but I was
a little hesitant because it could get nasty, and I'd rather just
come in. The people at the other end immediately agreed: they really
didn't want to be led through anything anyway. By the way: they
were now down to two people working because the other two had
"accidentally" logged out.. no, I don't know how you "accidentally"
log out either.


An hour later I was there. I ran "integrity: and it pointed
to /etc/auth/system/default as the problem. I looked at
it and found it zero length.. how could that happen?
That file should look something like this:



default:\
:d_name=default:\
:u_pwd=*:\
:u_priority#0:u_cmdpriv=audittrail,su,queryspace,printqueue,mem,terminal:\
:u_syspriv=execsuid,nopromain,chmodsugid,chown:\
:u_minchg#0:u_maxlen#8:u_exp#0:u_life#0:\
:u_pickpw:u_genpwd:u_restrict@:u_nullpw:\
:u_suclog#0:u_unsuclog#0:u_maxtries#99:u_lock:\
:u_singleuserpswd:u_secclass=c2:u_integrity@:u_tcbpw@:\
:u_pwseg#1:\
:t_logdelay#1:t_maxtries#99:t_login_timeout#60:\
:chkent:

I've never seen a system glitch just zero that, so I was suspicious.


"Who's got root?"


It was a short list.


"I think somebody mean to clear something else and somehow cleared this.. or.."


The "or" would be file system corruption. Possible, but no other indicationss of that. Nothing in "syslog" or "messages" indicating any other problems.
I restored the file from backup and people immdeiately could log in. I
watched log files for any other problem and ran "integrity" again. No
problems..


I hung out a while checking this and that.. nothing seemed odd, so
I really think someone had to have done this by mistake.. I have no
idea what they thought they were doing.. it couldn't have been
sabotage because it was too limited.. surely anyone wanting to
cause damage would have done more.. well, I would think so anyway.


So that's where I left it. I'll check in on it a few times over the
next week, but I really don't think there is anything wrong.. just
an "accident".






















- Coming Soon - Skills Tests - Surveys - Kerio Mail Server - Fortinet Routers - Consulting - Advertise Here