12. Troubleshooting

Q: What are these Nasty Messages about Inodes, Blocks, and the Like?
Q: Why Do FTP Transfers Seem to Hang?
Q: Why Does Free Dump Core?
Q: Why Does Netscape Crash Frequently?
Q: Why Won't My FTP or Telnet Server Allow Logins?
Q: How Do I Keep Track of Bookmarks in Netscape?
Q: Why Does the Computer Have the Wrong Time?
Q: Why Don't Setuid Scripts Work?
Q: Why Is Free Memory as Reported by free Shrinking?
Q: Why Does the System Slow to a Crawl When Adding More Memory?
Q: Why Won't Some Programs (e.g., xdm) Allow Logins?
Q: Why Do Some Programs Allow Logins with No Password?
Q: Why Does the Machine Run Very Slowly with GCC / X / ...?
Q: Why Does My System Only Allow Root Logins?
Q: Why Is the Screen Is All Full of Weird Characters Instead of Letters?
Q: If I Screwed Up the System and Can't Log In, How Can I Fix It?
Q: What if I Forget the root Password?
Q: What's This Huge Security Hole in rm!?!?!
Q: Why Don't lpr and/or lpd Work?
Q: Why Are the Timestamps on Files on MS-DOS Partitions Set Incorrectly?
Q: Why is My Root File System Read-Only?
Q: What Is /proc/kcore?
Q: Why Does fdformat Require Superuser Privileges?
Q: Why Doesn't My PCMCIA Card Work after Upgrading the Kernel?

Q: What are these Nasty Messages about Inodes, Blocks, and the Like?

A: You may have a corrupted file system, probably caused by not shutting Linux down properly before turning off the power or resetting. You need to use a recent shutdown program to do this for example, the one included in the util-linux package, available on sunsite and tsx-11.

If you're lucky, the program fsck (or e2fsck or xfsck as appropriate if you don't have the automatic fsck front-end) will be able to repair your file system. If you're unlucky, the file system is trashed, and you'll have to re-initialize it with mkfs (or mke2fs, mkxfs, etc.), and restore from a backup.

NB: don't try to check a file system that's mounted read/writethis includes the root partition, if you don't see

 VFS: mounted root
... read-only

at boot time.

Q: Why Do FTP Transfers Seem to Hang?

A: FTP transfers that die suddenly are due, apparently, to some form of overrunning buffer. It occurs both with Linux and Microsoft servers. On Linux systems, the problem seems to occur most commonly with the distribution's server software.

If you receive ftp: connection refused errors, then the problem is likely due to a lack of authentication. Refer to Why Won't My FTP or Telnet Server Allow Logins?.

One remedy is to be replacing the distribution FTP server with the Linux port of the OpenBSD FTP server. The home page is: http://www.eleves.ens.fr:8080/home/madore/programs/.

To install the BSD server, follow the installation instructions, and refer to the manual pages for inetd and inetd.conf. (If you have the newer xinetd, see below.) Be sure to tell inetd to run the BSD daemon alone, not as a subprocess of, for example, tcpd. Comment out the line that begins ftp in the /etc/inetd.conf file and replace it with a line similar to (if you install the new ftpd in /usr/local/sbin/):

# Original entry, commented out. #ftp stream tcp nowait root /usr/sbin/tcpd
/usr/sbin/in.ftpd

# Replacement entry: ftp stream tcp nowait root /usr/local/sbin/ftpd -l

The replacement daemon will become effective after rebooting or sending (as root) a SIGHUP to inetd, e.g.:

 # kill -HUP inetd

To configure xinetd, create an entry in /etc/xinetd.d per the instructions in the xinetd.conf manual page. Make sure, again, that the command-line arguments for ftpd are correct, and that you have installed the /etc/ftpusers and /etc/pam.d/ftp files. Then restart xinetd with the command: /etc/rc.d/init.d/xinetd restart. The command should report "OK," and the restart will be noted in the system message log.

Q: Why Does Free Dump Core?

A: In Linux 1.3.57 and later, the format of /proc/meminfo was changed in a way that the implementation of free doesn't understand.

Get the latest version, from metalab.unc.edu, in /pub/Linux/system/Status/ps/procps-0.99.tgz.

Q: Why Does Netscape Crash Frequently?

A: Netscape shouldn't crash, if it and the network are properly configured. Some things to check:

  • Make sure that the MOZILLA_HOME environment variable is correctly set. If you installed Netscape under /usr/local/netscape/, for example, that should be the value of MOZILLA_HOME. Set it from the command line (e.g, "export MOZILLA_HOME="/usr/local/netscape"" under bash or add it to one your personal or system initialization files. Refer to the manual page for your shell for details.

  • If you have a brand-new version of Netscape, try a previous version, in case the run-time libraries are slightly incompatible. For example, if Netscape version 4.75 is installed (type "netscape --version" at the shell prompt), try installing version 4.7. All versions are archived at ftp://ftp.netscape.com/.

  • Netscape uses its own Motif and Java Runtime Environment libraries. If a separate version of either is installed on your system, ensure that they aren't interfering with Netscape's libraries; e.g., by un-installing them.

  • Make sure that Netscape can connect to its default name servers. The program will appear to freeze and time out after several minutes if it can't. This indicates a problem with the system's Internet connection; likely, the system can't connect to other sites, either.

Q: Why Won't My FTP or Telnet Server Allow Logins?

A: This applies to server daemons that respond to clients, but don't allow logins. On new systems that have Pluggable Authentication Modules installed, look for a file named, "ftp," or "telnet," in the directory /etc/pam/ or /etc/pam.d/. If the corresponding authentication file doesn't exist, the instructions for configuring FTP and Telnet authentication and other PAM configuration, should be in /usr/doc/pam-&version&. Refer also to the answer for FTP server says: "421 service not available, remote server has closed connection.".

If it's an FTP server on an older system, make sure that the account exists in /etc/passwd, especially anonymous.

This type of problem may also be caused a failure to resolve the host addresses properly, especially if using Reverse Address Resolution Protocol (RARP). The simple answer to this is to list all relevant host names and IP addresses in the /etc/hosts files on each machine. ( Refer to the example /etc/hosts and /etc/resolv.conf files in Sendmail Pauses for Up to a Minute at Each Command. If the network has an internal DNS, make sure that each host can resolve network addresses using it.

If the host machine doesn't respond to FTP or Telnet clients at all, then the server daemon is not installed correctly, or at all. Refer to the manual pages: inetd and inetd.conf on older systems, or xinetd and xinetd.conf, as well as ftpd, and telnetd.

Q: How Do I Keep Track of Bookmarks in Netscape?

A: This probably applies to most other browsers, too. In the Preferences/Navigator menu, set your home page to Netscape's bookmarks.html file, which is located in the .netscape (with a leading period) subdirectory. For example, if your login name is smith, set the home page to:

 file://home/smith/.netscape/bookmarks.html

Setting up your personal home page like this will present you with a nicely formatted (albeit possibly long) page of bookmarks when Netscape starts. And the file is automatically updated whenever you add, delete, or visit a bookmarked site.

Q: Why Does the Computer Have the Wrong Time?

A: There are two clocks in your computer. The hardware (CMOS) clock runs even when the computer is turned off, and is used when the system starts up and by DOS (if you use DOS). The ordinary system time, shown and set by date, is maintained by the kernel while Linux is running.

You can display the CMOS clock time, or set either clock from the other, with /sbin/clock (now called hwclock in many distributions). Refer to: man 8 clock or man 8 hwclock.

There are various other programs that can correct either or both clocks for system drift or transfer time across the network. Some of them may already be installed on your system. Try looking for adjtimex (corrects for drift), Network Time Protocol clients like netdate, getdate, and xntp, or NTP client-server suite like chrony. Refer to How Do I Find a Particular Application?.

Q: Why Don't Setuid Scripts Work?

A: They aren't supposed to. This feature has been disabled in the Linux kernel on purpose, because setuid scripts are almost always a security hole. Sudo and SuidPerl can provide more security than setuid scripts or binaries, especially if execute permissions are limited to a certain user ID or group ID.

If you want to know why setuid scripts are a security hole, read the FAQ for news:comp.unix.questions.

Q: Why Is Free Memory as Reported by free Shrinking?

A: The "free" figure printed by free doesn't include memory used as a disk buffer cacheshown in the buffers column. If you want to know how much memory is really free add the buffers amount to free. Newer versions of free print an extra line with this info.

The disk buffer cache tends to grow soon after starting Linux up. As you load more programs and use more files, the contents get cached. It will stabilize after a while.

Q: Why Does the System Slow to a Crawl When Adding More Memory?

A: This is a common symptom of a failure to cache the additional memory. The exact problem depends on your motherboard.

Sometimes you have to enable caching of certain regions in your BIOS setup. Look in the CMOS setup and see if there is an option to cache the new memory area which is currently switched off. This is apparently most common on a '486.

Sometimes the RAM has to be in certain sockets to be cached.

Sometimes you have to set jumpers to enable caching.

Some motherboards don't cache all of the RAM if you have more RAM per amount of cache than the hardware expects. Usually a full 256K cache will solve this problem.

If in doubt, check the manual. If you still can't fix it because the documentation is inadequate, you might like to post a message to news:comp.os.linux.hardware giving all of the details make, model number, date code, etc., so other Linux users can avoid it.

Q: Why Won't Some Programs (e.g., xdm) Allow Logins?

A: You are probably using non-shadow password programs and are using shadow passwords.

If so, you have to get or compile a shadow password version of the programs in question. The shadow password suite can be found at ftp://tsx-11.mit.edu/pub/linux/sources/usr.bin/shadow/. This is the source code. The binaries are probably in linux/binaries/usr.bin/.

Q: Why Do Some Programs Allow Logins with No Password?

A: You probably have the same problem as in Why Won't Some Programs (e.g., xdm) Allow Logins?, with an added wrinkle.

If you are using shadow passwords, you should put a letter x or an asterisk in the password field of /etc/passwd for each account, so that if a program doesn't know about the shadow passwords it won't think it's a passwordless account and let anyone in.

Q: Why Does the Machine Run Very Slowly with GCC / X / ...?

A: You may have too little real memory. If you have less RAM than all the programs you're running at once, Linux will swap to your hard disk instead and thrash horribly. The solution in this case is to not run so many things at once or buy more memory. You can also reclaim some memory by compiling and using a kernel with fewer options configured. See How To Upgrade/Recompile a Kernel.

You can tell how much memory and swap you're using with the free command, or by typing:

 $ cat /proc/meminfo

If your kernel is configured with a RAM disk, this is probably wasted space and will cause things to go slowly. Use LILO or rdev to tell the kernel not to allocate a RAM disk (see the LILO documentation or type man rdev).

Q: Why Does My System Only Allow Root Logins?

A: You probably have some permission problems, or you have a file /etc/nologin.

In the latter case, put rm -f /etc/nologin in your /etc/rc.local or /etc/rc.d/* scripts.

Otherwise, check the permissions on your shell, and any file names that appear in error messages, and also the directories that contain these files, up to and including the root directory.

Q: Why Is the Screen Is All Full of Weird Characters Instead of Letters?

A: You probably sent some binary data to your screen by mistake. Type echo 'c' to fix it. Many Linux distributions have a command, reset, that does this.

If that doesn't help, try a direct screen escape command: echo 'Ctrl-V Ctrl-O'.

This resets the default font of a Linux console. Remember to hold down the Control key and type the letter, instead of, for example, Ctrl, then V. The sequence Ctrl-V Esc C.

causes a full screen reset. If there's data left on the shell command line after typing a binary file, press Ctrl-C a few times to restore the shell command line.

Another possible command is an alias, sane, that can work with generic terminals:

 $ alias sane='echo -e " c";tput is2;
> stty sane line 1 rows $LINES columns $COLUMNS'

The alias is enclosed with open quotes (backticks), not single quotes. The line break is included here for clarity, and is not required.

Make sure that $LINES and $COLUMNS are defined in the environment with a command similar to this in ~/.cshrc or ~/.bashrc,

 $ LINES=25; export $LINES; $COLUMNS=80; export $COLUMNS

using the correct numbers of $LINES and $COLUMNS for the terminal.

Finally, the output of stty -g can be used to create a shell script that will reset the terminal:

  1. Save the output of stty -g to a file. In this example, the file is named termset:

     $ stty -g >termset 

    The output of stty -g (the contents of termset) will look something like:

     500:5:bd:8a3b:3:1c:7f:15:4:0:1:0:11:13:1a:0:12:f:17:16:0:0:73

  2. Edit termset to become a shell script; adding an interpreter and stty command:

     #!/bin/bash stty 500:5:bd:8a3b:3:1c:7f:15:4:0:1:0:11:13:1a:0:12:f:17:16:0:0:73

  3. Add executable permissions to termset and use as a shell script:

     $ chmod +x termset $ ./termset

[Floyd L. Davidson, Bernhard Gabler]

Q: If I Screwed Up the System and Can't Log In, How Can I Fix It?

A: You did create an emergency floppy (or floppies), right? Reboot from an emergency floppy or floppy pair. For example, the Slackware boot and root disk pair in the install subdirectory of the Slackware distribution.

A: There are also two, do-it-yourself rescue disk creation packages in ftp://metalab.unc.edu/pub/Linux/system/recovery/. These are better because they have your own kernel on them, so you don't run the risk of missing devices and file systems.

Get to a shell prompt and mount your hard disk with something like

 $ mount -t ext2 /dev/hda1 /mnt

Then your file system is available under the directory /mnt and you can fix the problem. Remember to unmount your hard disk before rebooting (cd somewhere else first, or it will say it's busy).

Q: What if I Forget the root Password?

A:

WarningIncorrectly editing any of the files in the /etc/directory can severely screw up a system. Please keep a spare copy of any files in case you make a mistake.

If your Linux distribution permits, try booting into single-user mode by typing single at the BOOT lilo: prompt. With more recent distributions, you can boot into single-user mode when prompted by typing linux 1, linux single, or init=/bin/bash.

If the above doesn't work for you, boot from the installation or rescue floppy, and switch to another virtual console with Alt-F1 -- Alt-F8, and then mount the root file system on /mnt. Then proceed with the steps below to determine if your system has standard or shadow passwords, and how to remove the password.

Using your favorite text editor, edit the root entry of the /etc/passwd file to remove the password, which is located between the first and second colons. '''Do this only if the password field does not contain an x, in which case see below.'''

 root:Yhgew13xs:0:0: ...

Change that to:

 root::0:0: ...

If the password field contains an x, then you must remove the password from the /etc/shadow file, which is in a similar format. Refer to the manual pages: man passwd, and man 5 shadow.

[Paul Colquhuon, Robert Kiesling, Tom Plunket]

Q: What's This Huge Security Hole in rm!?!?!

A: No there isn't. You are obviously new to unices and need to read a good book to find out how things work. Clue: the ability to delete files depends on permission to write in that directory.

Q: Why Don't lpr and/or lpd Work?

A: First make sure that your /dev/lp* port is correctly configured. Its IRQ (if any) and port address need to match the settings on the printer card. You should be able to dump a file directly to the printer:

 $ cat the_file >/dev/lp1

If lpr gives you a message like myname@host: host not found" it may mean that the TCP/IP loopback interface, lo, isn't working properly. Loopback support is compiled into most distribution kernels. Check that the interface is configured with the ifconfig command. By Internet convention, the network number is 127.0.0.0, and the local host address is 127.0.0.1. If everything is configured correctly, you should be able to telnet to your own machine and get a login prompt.

Make sure that /etc/hosts.lpd contains the machine's host name.

If your machine has a network-aware lpd, like the one that comes with LPRng, make sure that /etc/lpd.perms is configured correctly.

Also look at the Printing HOWTO. "Where can I get the HOWTO's and other documentation? ".

Q: Why Are the Timestamps on Files on MS-DOS Partitions Set Incorrectly?

A: There is a bug in the program clock (often found in /sbin). It miscounts a time zone offset, confusing seconds with minutes or something like that. Get a recent version.

Q: Why is My Root File System Read-Only?

A: To understand how you got into this state, see EXT2-fs: warning: mounting unchecked file system.

Remount it. If /etc/fstab is correct, you can simply type:

 mount -n -o remount /

If /etc/fstab is wrong, you must give the device name and possibly the type, too: e.g.

mount -n -o remount -t ext2 /dev/hda2 /

Q: What Is /proc/kcore?

A: None of the files in /proc are really therethey're all, "pretend," files made up by the kernel, to give you information about the system and don't take up any hard disk space.

/proc/kcore is like an "alias" for the memory in your computer. Its size is the same as the amount of RAM you have, and if you read it as a file, the kernel does memory reads.

Q: Why Does fdformat Require Superuser Privileges?

A: The system call to format a floppy can only be done as root, regardless of the permissions of /dev/fd0*. If you want any user to be able to format a floppy, try getting the fdformat2 program. This works around the problems by being setuid to root.

Q: Why Doesn't My PCMCIA Card Work after Upgrading the Kernel?

A: The PCMCIA Card Services modules, which are located in /lib/modules/version/pcmcia, where version is the version number of the kernel, use configuration information that is specific to that kernel image only. The PCMCIA modules on your system will not work with a different kernel image. You need to upgrade the PCMCIA card modules when you upgrade the kernel.

When upgrading from older kernels, make sure that you have the most recent version of the run-time libraries, the modutils package, and so on. Refer to the file Documentation/Changes in the kernel source tree for details.

Important: If you use the PCMCIA Card Services, do not enable the Network device support/Pocket and portable adapters option of the kernel configuration menu, as this conflicts with the modules in Card Services.

Knowing the PCMCIA module dependencies of the old kernel is useful. You need to keep track of them. For example, if your PCMCIA card depends on the serial port character device being installed as a module for the old kernel, then you need to ensure that the serial module is available for the new kernel and PCMCIA modules as well.

The procedure described here is somewhat kludgey, but it is much easier than re-calculating module dependencies from scratch, and making sure the upgrade modules get loaded so that both the non-PCMCIA and PCMCIA are happy. Recent kernel releases contain a myriad of module options, too many to keep track of easily. These steps use the existing module dependencies as much as possible, instead of requiring you to calculate new ones.

However, this procedure does not take into account instances where module dependencies are incompatible from one kernel version to another. In these cases, you'll need to load the modules yourself with insmod, or adjust the module dependencies in the /etc/conf.modules file. The Documentation/modules.txt file in the kernel source tree contains a good description of how to use the kernel loadable modules and the module utilities like insmod, modprobe, and depmod. Modules.txt also contains a recommended procedure for determining which features to include in a resident kernel, and which to build as modules.

Essentially, you need to follow these steps when you install a new kernel.

  • Before building the new kernel, make a record with the lsmod command of the module dependencies that your system currently uses. For example, part of the lsmod output might look like this:

     Module         Pages            Used by
     memory_cs      2                0
     ds             2                [memory_cs]  3
     i82365         4                2
     pcmcia_core    8                [memory_cs ds i82365] 3
     sg             1                0
     bsd_comp       1                0
     ppp            5                [bsd_comp] 0
     slhc           2                [ppp] 0
     serial         8                0
     psaux          1                0
     lp             2                0
    

    This tells you for example that the memory_cs module needs the ds and pcmcia_core modules loaded first. What it doesn't say is that, in order to avoid recalculating the module dependencies, you may also need to have the serial, lp, psaux, and other standard modules available to prevent errors when installing the pcmcia routines at boot time with insmod. A glance at the /etc/modules file will tell you what modules the system currently loads, and in what order. Save a copy of this file for future reference, until you have successfully installed the new kernel's modules. Also save the lsmod output to a file, for example, with the command: lsmod >lsmod.old-kernel.output.

  • Build the new kernel, and install the boot image, either zImage or bzImage, to a floppy diskette. To do this, change to the arch/i386/boot directory (substitute the correct architecture directory if you don't have an Intel machine), and, with a floppy in the diskette drive, execute the command:

     $ dd if=bzImage of=/dev/fd0 bs=512

    if you built the kernel with the make bzImage command, and if your floppy drive is /dev/fd0. This results in a bootable kernel image being written to the floppy, and allows you to try out the new kernel without replacing the existing one that LILO boots on the hard drive.

  • Boot the new kernel from the floppy to make sure that it works.

  • With the system running the new kernel, compile and install a current version of the PCMCIA Card Services package, available from metalab.unc.edu as well as other Linux archives. Before installing the Card Services utilities, change the names of /sbin/cardmgr and /sbin/cardctl to /sbin/cardmgr.old and /sbin/cardctl.old. The old versions of these utilities are not compatible with the replacement utilities that Card Services installs. In case something goes awry with the installation, the old utilities won't be overwritten, and you can revert to the older versions if necessary. When configuring Card Services with the make config command, make sure that the build scripts know where to locate the kernel configuration, either by using information from the running kernel, or telling the build process where the source tree of the new kernel is. The make config step should complete without errors. Installing the modules from the Card Services package places them in the directory /lib/modules/version/pcmcia, where version is the version number of the new kernel.

  • Reboot the system, and note which, if any, of the PCMCIA devices work. Also make sure that the non-PCMCIA hardware devices are working. It's likely that some or all of them won't work. Use lsmod to determine which modules the kernel loaded at boot time, and compare it with the module listing that the old kernel loaded, which you saved from the first step of the procedure. (If you didn't save a listing of the lsmod output, go back and reboot the old kernel, and make the listing now.)

  • When all modules are properly loaded, you can replace the old kernel image on the hard drive. This will most likely be the file pointed to by the /vmlinuz symlink. Remember to update the boot sector by running the lilo command after installing the new kernel image on the hard drive.

  • Also look at the questions, How do I upgrade/recompile my kernel? and Modprobe can't locate module, "XXX," and similar messages.