An interesting reading: "N4024: Distinguishing coroutines and fibers
" (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4024.pdf).
Lua currently only has coroutines, but implementing fibers can be made on top of coroutines (you just need a single coroutine acting as the scheduler).
Sill both are using a "cooperative" strategy (both require working tasks to "yield").
But fibers are also implemented in OSes (e.g. Windows, which provides its scheduler that can be tuned to work like coroutines: the scheduler has no other choice than transfering the control from one coroutine back to its blocking caller coroutine which used "resume").
Still there's no warranty that any coroutine or fiber will ever bring back their control of execution: a single coroutine or fiber can still freeze completely all others (including the scheduler's coroutine managing the "fibers"). That's were both are still not usable as we want to serve multiple clients.
The alternative using processes is too costly (takes too much resources, and context switching is really much slower because of preemption at any time), but there's little difference in terms of context-switching performance between threads and processes. But processes allow stricter separation of memory use and allow managing quotas: no one process will consume all resources needed by other processes, which remains possible with threads unless the threads' scheduler (working in a separate process) sets and manages limits between threads in the same process.
Neither threads or processes can be implemented with fibers or coroutines but threads can be emulated by processes (using resources sharing mecanisms).
And for servers (notably application servers that manage many users with distinct ressource quotas, on memory, or IO, or networking, or time) and for applications that should benefit parallelism, running evering in a single thread (of a single process) is bad (does not scale well with multicore CPUs or multiple CPUs, and does not allow easy dynamic transfert of execution from one host to another in a computing grid): fibers and coroutines do not offer what is needed to warranty a good response time: we do need threads as the minimum, i.e. preemption on time-slots, without reaching a point where workers will "yield". As well multithreading (or multiprocessing if one wants stricter isolation) is needed for tracing and debugging: Lua is not easy to trace or debug, it requires a modification inside the Lua VM (running in its own thread) to implement the debugger, in order to force the control to a debugger thread (itself not living in Lua space, so debugging by users themselves is not really working, and what the debugger does cannot be natively implemented in user space and written in Lua itself). This has a consequence: Lua programs are much more difficult to debug, and hard to tune, and almost impossible to scale up (and it runs poorly on multicore CPUs or in multiCPU symetric systems and does not suite well for CPU intensive programs, and true parallelism becomes almost impossible except for very small fragments or by using a custom library that will delegate some work to a dedicated CPU or GPU and wait/block for completion, or will need to delegate the work to a remote host via network APIs). As well networking is still not very good, and both coroutines and fibers are extremely fragile against DoS attacks (this also makes Lua not suitable for many app servers, notably for the web).
Even though I do not ask support for fibers (OS fibers could be used by Lua implementations to implement Lua coroutines), there's a very good use case at least for threads (non multithreading-capable systems can still use processes if they are multiprocess-capable), and parallelism should be less limtied than just small fragments of codes (such as "vector" processing with specific instruction sets for CPUs and GPUs).
Many programs now use parallelism extensively and would not even run without them (e.g. most games): they need at least threads. As well web servers need at least threads. But I do not see why we cannot create a Lua "coroutine" with a parameter allowing it to be preempted (with a timeout event) and so allowing Lua to create a new thread (or process if therere are memory mappings to perform data exchanges between processes). As well, true async I/O and networking listeners will need it. And a warrantied response time is also needed for works with strict time requirements (e.g. multimedia rendering: users don't want to see their video freeze, and don't want audio to suddenly block and with frequent silences): Lua alone is not sufficient as long it remains self-containged in a single thread (with absoluutely no isolation at all between workers).