Thursday, 24 June 2010

An exciting day in the life of a scientist

or: How to kill a few hours trying to get publication quality figures out of Matlab This is really just a boring moan; blogging as therapy. Well, yesterday I had the excitement of getting proofs for an 'in press' article. Virtually no errors to be corrected, but what was this list of queries from the publisher? Ah, the figures. Resolution too low. Well, that should be easy to fix - I'd do it first thing. Or so I thought. 9.30 a.m. The paper has an unusually large number of figures, eight, some in colour. All were created in Matlab and saved as .tiff format. I was pretty proud of generating the figures in Matlab. Graphics in Matlab is a bit of a nightmare and takes some time to learn, but once learned, you can generally create figures that are more complex than those produced by the other applications that I know. The proofs have come from Developmental Science, but they tell me the figures are too low resolution, even though I'd selected a 'no compression' option when saving them. Coincidentally, I have another article that is under consideration by Journal of Neuroscience, who also mention their stringent requirements for figure quality, and point me to a website, Cadmus, that will explain what is required and how to do it. Oh good, I think. Someone will help walk me through how to get good quality figures. Ha ha ha. Cadmus has a list of programs and formats that are supported. Alas, Matlab not among them. But Adobe Illustrator is. We have a copy of that. I used to have it on my machine, but uninstalled it, because I never used it and I got fed up when graphics files defaulted to opening in it, which took ages. Tracked down the CD, reinstalled it (compact version). Right, I think, Matlab will allow me to export a file in .ai format, and then I will be OK. Ha ha ha 9.45 I start with the simplest figure I have – a simple black and white line drawing with a couple of text labels. I save it with .ai format. When I click on it, Adobe Illustrator tries to open it, but first tells me it has 'unrecognised fonts' (Arial?) and then says it can't open it. OK, I think. But I can open an .eps or .tiff file in Adobe, and I can also save my Matlab figure in those formats. But once again, I get strange messages about wrong fonts, and for the .eps version, what appears on the screen is unrecognisable from the original. I look again at what Cadmus says about Adobe Illustrator. Oh dear. "PLEASE NOTE: When creating graphics in illustration programs such as Adobe Illustrator with the intention of outputting to an imagesetter or platesetter, it is extremely important that the person creating the illustration have a thorough understanding of the details of imaging in a prepress environment. There are an abundance of complex problems that can occur at output if paths are set up improperly, colors are indicated incorrectly, or other elements are constructed improperly. Trapping issues can also present problems if not addressed. The more complicated your illustration becomes, the greater the probability of problems at output, and therefore the need for more expertise and experience in creating the files." Decide that I had better try another option, since I have never used Adobe Illustrator and would not recognise a platesetter if I stumbled over one. But, I think, there is a helpful application associated with Cadmus that allows you to check your files. And I can just open my .eps file from there. Having gone through the usual round of registering, thinking of a password, getting email confirmation of the account, etc. I am in to 'Rapid Inspector'. I try opening my .tiff file. FAIL says Rapid Inspector. Resolution too low. OK, how about .eps version? Ah, says Rapid Inspector"Rapid Inspector found an image with CMYK color. CMYK color is not supported. Acceptable color space include(s): Spotcolor, Lineart, Grayscale, RGB." But this is a black and white figure! I spend some time in Matlab trying to sort this one out, but with no success. My own fault, but I can't find the script that I made to generate the figure in the first place, and I will need to redo it with different fonts etc. So waste 10 mins tracking it down and resolving once again always to save my programs in sensible places with sensible names. I go on to the web to find out how to change the colormap to gray. Re-run program, save the figure, and try it again in Rapid Inspector. It still tells me I have CMYK color. It also complains about my fonts. "Rapid Inspector detected that some or all fonts are missing from this file. To pass inspection, all fonts must be embedded. The following fonts are not embedded: Helvetica. " That's odd, as I was using Arial, not Helvetica. Try a few more runs of the program with different fonts. It still doesn't like my fonts. 10.45 Time to do a Google search about how to save a Matlab figure with embedded fonts. Well, it is nice to know I am not alone, and that many others have had this problem over the years. Several complain that it is about time Matlab did something about it. One helpful person, Oliver Woodford, has written a routine called export_fig, which is freely available: http://www.mathworks.com/matlabcentral/fileexchange/23629 Excellent. But, he explains, if you want to use it to create the kinds of files I need, you need to download two other applications from other sources. Fortunately, I already have the first, but the second, xpdf, is one of those applications that makes the non-geek's heart sink when you go to the download webpage and find, instead of clear instructions about what to do, a whole list of possibilities. I fear that the one I probably need ends in .tar.gz. I've tangled with these things before but can never quite remember what to do with them. 11.30 After a bit of fiddling about, I save the .tar.gz file, then try to extract the contents. A few failures as I do something wrong, and then at last I have it. But I am not sure I have it in the right place and no indication is given as to where it should be saved. I've just stuck in my Matlab program folder. 12.00 OK I should be all set, so now let's look at the examples of how to use export_fig. Nice helpful man who wrote the script clearly has been through everything I have, and more. He writes: "Exporting a figure from MATLAB the way you want it (hopefully the way it looks on screen), can be a real headache for the unitiated, thanks to all the settings that are required, and also due to some eccentricities (a.k.a. features and bugs) of functions such as print. The first goal of export_fig is to make transferring a plot from screen to document, just the way you expect (again, assuming that's as it appears on screen), a doddle." This is looking more promising.... Print out the instructions – 13 pages of them. 12.30 Took a break to look at some interesting data: what I ought to be doing instead of this rubbish. 14:30 OK back to export_fig. First attempt failed. Matlab can't find export_fig. I need to put the script somewhere else. OK,eventually sorted that by putting all the export_fig m files onto the Matlab folder in My Documents. All going very well so long as I am exporting to .png format. But I want .eps. When I try that the program complains it needs pdftops . SO where the hell is that? I will have a hunt. Found it, but it is a .cc file. does not seem to be recognised by matlab. So I have now spent more time on the website looking for a .m file. Doesn't appear to be one. Gave up and decided to try a .tiff file. Hah! nasty bossy Rapid Inspector says PASS. Hooray! But turned out I was reading in a different file created with the same name in May. Back to my .tiff option in export_fig This fails resolution test, even though it is specified as max quality. 15.00 Have a cup of tea. Back to trying the .eps option. Can't work out how to use the xpng file. Program stops and asks for pdftops. I have located pdftops.m and pdftops.cc but neither seems what it wants. As far as I can see from looking at the code, it wants an .exe file. The web tells me that a .cc file is a C++ file. In some desperation I tried renaming the .cc file as .exe, but that did not work. Decide to write to the author of the script, having read all the comments on the program and found that nobody else is having problems. Send the email. It bounces. I had mistakenly included a full stop at the end of the email address. Try to resend to correct address: email keeps autocompleting to the address with the full stop. After 2 tries, get into 'frequent contacts' in address book and delete entry so can now send the email. 15.35. I need another cup of tea to calm down. So now trying figure 2 , coloured headplot. Already have as .tiff; it looks very nice. Rapid inspector tells me FAIL! resolution is too low. I try saving as .png. Get lovely looking picture. Rapid inspector won't read it. Try exporting from microsoft image reader to .tiff and then reading in. Now I get: "alpha_planes: Rapid Inspector found extra color channels within this image. Extra color channels are also known as Alpha channels. Alpha channels are not supported. Please use an image editor to remove alpha channels from this file.resolution: Rapid Inspector found a low-resolution (RGB) image (96 DPI). The minimum required resolution for this type of image is 300 DPI. " 16:30 The wonderful Oliver Woodford replied and explained patiently how to cope with the pdftops thing. I downloaded. Still did not work. Downloaded again to the location he had said his was in . It works!! And the figures it creates are acceptable to the wretched Rapid Inspector. Verdict I'm really grateful for those who have produced free software that helped me deal with this. But I am really annoyed on two counts. First, Matlab is an expensive package. It does wonderful things and I love it to bits as a programming tool, but its graphics are not easy to use. People have been complaining for at least 2 years about the difficulty of generating high resolution output, yet nothing has been done. It should be high priority for the Matlab developers to fix this so that there is a simple command to generate this kind of output. Second, the Journal of Neuroscience exemplifies a trend in many journals to make authors do a lot of work that would, in the old days, have been done by copy-editors and other professionals. Scientists are supposed to have skills in graphic design and programming on top of all their other accomplishments. Some journals do still accept figures in a range of formats and look after any conversion from their end. But increasingly, the onus is put on authors. There appears to be no correlation between the wealth of a journal and the amount of help it will give to authors – in fact, if there is a correlation, I suspect it is inverse. Journal of Neuroscience charges hefty fees for just submitting a paper, let alone publishing it, with added costs of $1000 per figure unless first and last authors are members of the Society for Neuroscience – we are not and we have lots of colour figures. So our grant will be spent on shoring up J. Neuroscience rather than employing a vacation student for a few weeks. I reckon that on a 1-10 scale of geekiness I am a 6-7, and I am struggling. I am a full-time researcher with good support. I am a reasonable programmer. But I've got lots of colleagues who are trying to produce papers who are closer to a 1 or 2 on the geekiness scale, have little or no support, and are trying to fit research around busy teaching commitments. How on earth can they cope with all of this?

15 comments:

  1. Sounds horrendous. I agree that there is now a need for scientists to be programming and graphic design experts. But it's the costs that particularly irk me. Costs just to submit??? It's like a cover charge at a club! At least there you're guaranteed a bit of a dance. The power in the scientist-journal marriage has always been with the journal, via the peer-reviewers/editors. But it now appears it is actually with *the journal* (in particular, thire commercial activities). I don't think any of us suppose that the peer-review process isn't flawed and that it creates a lop-sided literature (though it's the best we have). But these extra costs just magnify that effect.
    And I'd put you more at the 5-6 mark on the geekness scale.

    ReplyDelete
  2. Just passing, thought I'd fill in some of the blanks!

    A .tar file is an (uncompressed) file which archives many different files - so just a convenient way to join files together. A .gz file is a single file compressed using gzip. Often both are used together to compress a load of files. (Unix systems have tools for dealing with these, so .tar.gz is a sign of a Unix/Linux geek. Oh, actually that includes Macs.) I think Winzip can deal with such files under Windows (which I assume you're using, given the reference to .exe).

    Okay, so .cc files have to be compiled to run. Doing so is typically a royal pain involving running a makefile using a "make" command - basically a script which says compile file A, then file B which depends on A, link with library C, and so on. (Makefiles - and .cc files - are plaintext so you can have a peek using any file editor.) Well except that instead of doing all this it will typically complain that you don't have particular libraries... (Some keywords there for future reference!) It's been a long time since I've bothered compiling anything (written by anyone else) from scratch: I do what you did and find an already compiled solution.

    But yeah, what a waste of time. Thanks for sharing the protocol report though :-)

    ReplyDelete
  3. Well, I thought the only value of the blog was to emote, but I now have to do another figure and couldn't remember what actually worked. Then remembered I had recorded the identity of the relevant script, as well as my anguish here!

    ReplyDelete
  4. Just came across your post here through Tom Webb's blog at Nature Network. I'm a bit out of my depth here, but I used to do the figures for Journal of General Virology so might be able to help.

    I understand Matlab can export to PDF, although they can be unwieldy PDFs are usually pretty high quality. You can then import them into Adobe Illustrator for further editing, or free software like Inkscape/OpenOffice (doesn't work as well though). GIMP can also open PDFs but not edit them, you have to convert the file to a graphic format first, meaning you'll loose the editable text, but that's OK for some tasks.

    Best, Nico

    ReplyDelete
  5. Following up from and expanding upon Nico's comment, I recommend printing to pdf from Matlab using "print -dpdf my_figure.pdf" and then importing the PDF into OpenOffice using the PDF-import plug-in from here: http://extensions.services.openoffice.org/project/pdfimport

    You can then edit the PDFs in OpenOffice, and can save and export them to lots of different formats. PDF-export is built-in to OpenOffice and works well. Both OpenOffice and the PDF-import plug-in are free and work on Macs, PCs and Linux.

    I find that the importing and editing in OpenOffice comes in very handy for combining different Matlab plots into a multi-panel figure, and for tweaking things such as axis-label placement. E.g.:
    http://dl.dropbox.com/u/700503/whole_brain_vs_Heschl_comparison.pdf

    ReplyDelete
  6. Many thanks to both Nico and Rajeev. I will certainly explore the options you describe.
    But please note, the issue is not whether one can create beautiful, multipanel figures. I had already done that when I started out on 24th June. Now there is this additional requirement that the figures conform to very specific journal requirements, and, as I found, you often don't know if your figures will fit until you have been through a great deal of trial and error.

    ReplyDelete
  7. That's a good point about the need to meet obscure journal requirements. I can't guarantee that PDFs produced via the Matlab-OpenOffice route will be acceptable to all journals, but they've been fine for the journals that I've sent them to. In general, PDF is a very good format to submit, because it is the most widely-used format in publishing (more so than EPS, which is PDF's long-lost ancestor) and it doesn't suffer from the resolution-too-low problem. That's because PDFs of Matlab graphs are in vector format (meaning that they are made up of the actual plotted lines) rather than a bitmap format such as TIFF (a pixel-grab of the image, which turns the actual plotted lines into fuzzy renditions of the pixels that those lines happen to lie on).

    ReplyDelete
  8. Holy moly, what kind of racket gets away with charging $1000 per figure?! In my field, it does not cost anything to submit nor add figures. Yowza. You'd think giving them free content would be enough, no?

    ReplyDelete
  9. I've been once again grappling with this issue, and have learned more useful things from Herb Jurkiewicz at Univ. Western Australia.
    If you use the O.Woodford Matlab routine to create high res .eps files, it works but you can't see what you have created very easily.
    You can convert from this version to .tiff if you have Adobe Illustrator, as follows.
    1. In Illustrator, create a new print document, using defaults.
    2. Do command |File|Place and select the .eps file you created.
    3. Select the figure that you can now see
    4. Use File export to save as a .tiff .You can select color mode and resolution at this point.
    Holy Moly indeed.
    But more and more journals are now making these demands, so I guess we all have to get wised up or give up on figures.

    ReplyDelete
  10. P.S. Herb has just made the point that correct sizing needs to be done to the placed eps file, *before* it is exported as a tif out of illustrator

    ReplyDelete
  11. I'd reiterate what Rajeev said about PDF and vector formats: Windows is really bad at supporting EPS, and bitmap formats like TIFF, PNG, etc. aren't a great idea for any sort of publication where there may be rescaling required or where there is a requirement for high resolution. PDF is a well-supported vector format with effectively infinite resolution -- and fonts can be embedded: win!

    I recall directly embedding PDFs in documents back when I had a Mac, and I think I have dragged them into OpenOffice docs since then, but I'm not sure if MS Word & friends support embedded PDF graphics. They *should*, but then Windows *should* be able to view EPS without special programs... I'm assuming here that you submit papers in Word format, which may not be true. In my field (particle physics) pretty much everything is done with LaTeX, and including PDF graphics (or pretty much any common format) is very straightforward with pdflatex.

    Re. Matlab being crap, there is also the free Octave program, and the free Python matplotlib/pylab package: I'm not sure what is used for graphics output on the former, but have had good experiences with the latter, whose graphics commands are designed to look like Matlab's.

    FWIW, I think Geek Scale is probably the sort of quantity in which the point where you self-rate as approaching 10 usually indicates that a whole new vista of geekery opens up and you put yourself back down to a 3 again ;) Like pretty much any kind of expertise.

    ReplyDelete
  12. deevybee: Export_fig will directly convert an eps to a tiff file when you specify the painters renderer. It can export in cmyk colourspace and gets the resolution values right too. You can therefore avoid using Illustrator for this step.

    ReplyDelete
  13. I had gone through agonies converting *.fig to anything publishing-worthy and have come up with the following simple solution: instead of saving the *.fig as another format, I choose 'copy figure' (from Matlab figure properties/edit menu). This then copies with marvellously good resolution to CorelDraw which is my program of choice.

    What if Oxford labs unite and request that the journals relevant to them employ more production editors... threatening with a submission boycott?

    ReplyDelete
  14. This area is clearly ripe for a major paradigm shift. The publishing companies have a stranglehold on the flow of information. Where are the voices of our scientific and political leaders when we need a top-down solution to this problem? I hope they are not beholden to the publishing companies to maintain the status quo, but considering how lobbying controls everything these days, I would not be surprised. The only solution then is a bottom-up paradigm shift. There must be something afoot by now.

    ReplyDelete
  15. Hi,
    I feel you! no need to add anything for that.

    On the practical side:
    Neglecting the color space issue, a relatively simple way to get further in matlab is to directly save in high resolution (-r in dpi) tiff via the print function:
    added behind the last (and still open) figure

    print(gcf,'title.tiff'],'-dtiff','-r1000');

    To control the figure dimensions, I always start figures with:
    figsize=[17 20]; % Size in cm;
    set(gcf, 'units', 'centimeters', 'pos', [0 0 figsize]);%axis square;
    set(gcf,'paperpositionmode','auto');set(findall(gcf,'-property','Fontsize'),'Fontsize',8); % not
    saveas(gcf,[figpath,figtit,'.fig'],'fig');
    print(gcf,[figpath,figtit,'.tiff'],'-dtiff','-r1000');

    Always keep the script, and the .fig to redo things into other formats, just in case.

    But I will check export_fig, it seems more flexible and complete..

    ReplyDelete