susam
I learnt C, more than 20 years ago, from the book The C Programming Language written by Brian W. Kernighan and Dennis M. Ritchie, also known as K&R. I read the book almost cover to cover all the way from the preface at the beginning to its three appendices at the end while solving all the exercises that each chapter presented. As someone who knew very little about programming languages back then, this book was formative in my journey of becoming a programmer.

Appendix A (Reference Manual) of the book broadened my outlook on programming languages by providing me a glimpse of what goes into formally specifying a programming language. Section A.12 (Preprocessing) of this appendix specifies trigraph sequences. Quoting from the section:

> Preprocessing itself takes place in several logically successive phases that may, in a particular implementation, be condensed.

> 1. First, trigraph sequences as described in Par.A.12.1 are replaced by their equivalents. Should the operating system environment require it, newline characters are introduced between the lines of the source file.

Then section A.12.1 (Trigraph Sequences) further elaborates trigraph sequences in more detail. Quoting this section below:

> The character set of C source programs is contained within seven-bit ASCII, but is a superset of the ISO 646-1983 Invariant Code Set. In order to enable programs to be represented in the reduced set, all occurrences of the following trigraph sequences are replaced by the corresponding single character. This replacement occurs before any other processing.

  ??=  #
  ??/  \
  ??'  ^
  ??(  [
  ??)  ]
  ??!  |
  ??<  {
  ??>  }
  ??-  ~
> No other such replacements occur.

> Trigraph sequences are new with the ANSI standard.

bradford
Trigraphs make this obfuscated C submission possible: (https://gist.github.com/Property404/e31b99deb3527159e183)

I've pasted it here for convenience (formatting fixed, thanks child comment!):

   //  Are you there god??/
   ??=define _(please, help)
   ??=define _____(i,m, v,e,r,y) r%:%:m
   ??=define ____ _____(a,f,r,a,i,d)
   main(__)<%____(!_(-~-??-((-~-??-!__<<-
   ??-!!__)<<-??-(!!__<<!!__))+-~-~-??--~-~
   -~-~-~-~-??-(-~-~-~-~-??-!!__<<-~!!__),-
   ??-!__))<%??>%>_(__,___)??<____
   (printf("please let me die??/r%d bottle%s"
   " of bee%s""""??/n",(!(___
   %-~-~!!___))?--__+!___++:__+!___++,!(__-!!___)
   &&___%-~-~!!___??!??!!(___%-~-~!!___??!??!__
   -(-~!!___))?"":"s",___%-~-??-!!___<-??-!!___?
   "r on the wall":"eeeeeeer! Take one down,pass ??/
   it around")&&__&&_(__,___),"mercy I'm in pain")??<??>??>
rdlw
See also: "What is the "-->" operator in C++?"

https://stackoverflow.com/q/1642028

layer8
From the ASCII Wikipedia page (https://en.wikipedia.org/wiki/ASCII#7-bit_codes):

> Almost every country needed an adapted version of ASCII, since ASCII suited the needs of only the US and a few other countries. For example, Canada had its own version that supported French characters.

> Many other countries developed variants of ASCII to include non-English letters (e.g. é, ñ, ß, Ł), currency symbols (e.g. £, ¥), etc. See also YUSCII (Yugoslavia).

> It would share most characters in common, but assign other locally useful characters to several code points reserved for "national use". […]

> Because the bracket and brace characters of ASCII were assigned to "national use" code points that were used for accented letters in other national variants of ISO/IEC 646, a German, French, or Swedish, etc. programmer using their national variant of ISO/IEC 646, rather than ASCII, had to write, and, thus, read, something such as

  ä aÄiÜ = 'Ön'; ü
instead of

  { a[i] = '\n'; }
> C trigraphs were created to solve this problem for ANSI C, although their late introduction and inconsistent implementation in compilers limited their use. Many programmers kept their computers on US-ASCII, so plain-text in Swedish, German etc. (for example, in e-mail or Usenet) contained "{, }" and similar variants in the middle of words, something those programmers got used to. For example, a Swedish programmer mailing another programmer asking if they should go for lunch, could get "N{ jag har sm|rg}sar" as the answer, which should be "Nä jag har smörgåsar" meaning "No I've got sandwiches".
dhosek
One of the challenges of | is that it was never entirely clear whether the ASCII | should be equivalent to EBCDIC’s | or ¦. As I recall, Waterloo C wanted ¦ as its vertical bar character, although I could be wrong. On the IBM system that I used back in the 80s, we had ASCII terminals which were run through a muxer to the actual system (which was part of the magic that allowed it to have thousands of concurrent users all getting real-time access—a lot of UI was offloaded to these systems which were essentially minicomputers on their own).
NegativeLatency
There's also iso646.h which allows you to do some particularly python looking stuff:

  #include <iso646.h>
  #include <stdbool.h>
  #include <stdio.h>
  #define is ==
  
  bool is_whitespace(int c) {
    if (c is ' ' or c is '\n' or c is '\t') {
      return true;
    }
    return false;
  }
  
  int main() {
    int current, previous;
    bool in_word;
  
    while ((current = getchar()) not_eq EOF) {
      if (is_whitespace(current) and not is_whitespace(previous)) {
        putchar('\n');
      } else {
        putchar(current);
      }
      previous = current;
    }
  
    return 0;
  }
chromatin
Wow, and I thought I knew C pretty well. Great post.

edited to add: I really like "Modern C" and just re-checked -- no mention of the preprocessor feature!

https://hal.inria.fr/hal-02383654/file/ModernC.pdf

billpg
"There's a problem. Some machines don't have some braces and vertical bars and such. We'll have to add keywords like OR and BEGIN and END."

"Are question marks fine?"

"Yes."

"I'll come up with something."

cl3misch
This reminds me of a comment on a Python discussion >2 years ago, of which I think often:

"Whether it's computer languages or human ones, as soon as you get into a discussion about the correct parsing of a statement, you've lost and need to rewrite in a way that's unambiguous. Too many people pride themselves on knowing more or less obscure rules and, honestly, no one else cares."

https://news.ycombinator.com/item?id=23051202

kbob
I'd say, "Congratulations! You're one of today's luck 10,000!", but trigraphs aren't really much fun. Just another reminder that C is old, and computing is even older.

I've used uppercase-only terminals, and I've used ancient C, but not at the same time.

kenniskrag
DonHopkins
Years ago I wrote a perfectly reasonable comment like /* WTF??!?!!?!???? */ and the old C compiler complained about "invalid trigraph". A syntax error in the middle of a comment!

Took me a while to figure out that "trigraph" was referring to some part of "??!?!!?!????" and not "WTF".

Agentlien
Every time I hear about trigraphs I think of this horror:

http://stackoverflow.com/questions/53315710/ddg#53315821

FabHK
There are two aspects to this, the trigraph, and using the short circuiting behaviour of the binary logic operator for control flow.

The latter is a very common idiom in Julia code, which I found obscure and puerile at first (“look how smart I am”), but have come to appreciate as concise and natural by now.

For example:

  function fact(n::Int)
     n >= 0 || error("n must be non-negative")
     n == 0 && return 1
     n * fact(n-1)
  end
https://docs.julialang.org/en/v1/manual/control-flow/#Short-...
divbzero
In addition to trigraphs, there are apparently a set of C alternative tokens defined as follows:

  #define and &&
  #define and_eq &=
  #define bitand &
  #define bitor |
  #define compl ~
  #define not !
  #define not_eq !=
  #define or ||
  #define or_eq |=
  #define xor ^
  #define xor_eq ^=
I suppose that allows for code like this:

  if (x or not y or not z) {
      return 1;
  }
https://en.wikipedia.org/wiki/C_alternative_tokens
curling_grad
Anecdote: An online judge website (which is pretty well known in Korea) has an easy problem[0] asking to write a program which adds "??!" to input. A lot of beginners' C/C++ submissions got "Wrong Answer" verdict because of trigraphs.

[0]: https://www.acmicpc.net/problem/10926

hgs3
Reminds me of the "goes to" operator [1]

[1] https://stackoverflow.com/questions/1642028/what-is-the-oper...

cesaref
This sort of practice goes back to BCPL, which wikipedia says is the first braced programming language. Because { and } weren't universally available, compilers also supported the sequence $( and $) to represent these, which were typeable and printable on just about anything.

https://en.wikipedia.org/wiki/BCPL

This is the earliest example of this sort of thing i'm aware of - is there an earlier example?

Also, BCPL supported // for comments, again, probably the first use of this sequence.

virtualritz
> Has Microsoft Windows finally been open-sourced or where did this come from?

This comment on the SO post made my day. :D

anfractuosity
In gcc I got:

    1.c:1:11: warning: trigraph ??< ignored, use -trigraphs to enable [-Wtrigraphs]
Is there a preprocessor directive to enable support out of curiosity?
sargstuff
from [1], trigraphs or not:

  int main() {
     [](){}()
  }
is still wierd.

Wonder if there will be a request for an emacs macro to handle the replaced cpp trigraphs? [2]

[1] https://zygoloid.github.io/cppcontest2018.html [2] https://www.emacswiki.org/emacs/CppTemplate

Waterluvian
If we deprecated trigraphs and removed that step from the compiler would it speed compilation up much? I’m going to guess maybe by milliseconds?
chris_wot
C++17 removed trigraphs. Sadly will no longer work.
olliej
Oh trigraphs may you never die
jawadch93
sr.ht