6 Implementing Algorithms in Scala | 106 |
7 Files and Subprocesses | 126 |
8 JSON and Binary Data Serialization | 146 |
9 Self-Contained Scala Scripts | 162 |
10 Static Build Pipelines | 178 |
The second part of this book explores the core tools and techniques necessary for writing Scala applications that run on a single computer. We will cover algorithms, files and subprocess management, data serialization, scripts and build pipelines. This chapter builds towards a capstone project where we write an efficient incremental static site generator using the Scala language.
6.1 Merge Sort | 107 |
6.2 Prefix Tries | 111 |
6.3 Breadth First Search | 117 |
6.4 Shortest Paths | 120 |
def
breadthFirstSearch[
T]
(
start:
T,
graph:
Map[
T,
Seq[
T]
]
)
:
Set[
T]
=
{
val
seen =
collection.
mutable.
Set(
start)
val
queue =
collection.
mutable.
ArrayDeque(
start)
while
(
queue.
nonEmpty)
{
val
current =
queue.
removeHead(
)
for
(
next <-
graph(
current)
if
!
seen.
contains(
next)
)
{
seen.
add(
next)
queue.
append(
next)
}
}
seen.
toSet
}
</> 6.1.scala
Snippet 6.1: a simple breadth-first-search algorithm we will implement using Scala in this chapter
In this chapter, we will walk you through the implementation of a number of common algorithms using the Scala programming language. These algorithms are commonly taught in schools and tested at professional job interviews, so you have likely seen them before.
By implementing them in Scala, we aim to get you more familiar with using the Scala programming language to solve small problems in isolation. We will also see how some of the unique language features we saw in Chapter 5: Notable Scala Features can be applied to simplify the implementation of these well-known algorithms. This will prepare us for subsequent chapters which will expand in scope to include many different kinds of systems, APIs, tools and techniques.
7.1 Paths | 127 |
7.2 Filesystem Operations | 129 |
7.3 Folder Syncing | 133 |
7.4 Simple Subprocess Invocations | 137 |
7.5 Interactive and Streaming Subprocesses | 141 |
@ os.
walk(
os.
pwd)
.
filter(
os.
isFile)
.
map(
p =>
(
os.
size(
p)
,
p)
)
.
sortBy(
-
_.
_1)
.
take(
5
)
res60:
IndexedSeq[
(
Long
,
os.
Path)
]
=
ArrayBuffer(
(
6340270L
,
/
Users/
lihaoyi/
test/
post/
Reimagining/
GithubHistory.
gif)
,
(
6008395L
,
/
Users/
lihaoyi/
test/
post/
SmartNation/
routes.
json)
,
(
5499949L
,
/
Users/
lihaoyi/
test/
post/
slides/
Why-
You-
Might-
Like-
Scala.
js.
pdf)
,
(
5461595L
,
/
Users/
lihaoyi/
test/
post/
slides/
Cross-
Platform-
Development-
in-
Scala.
js.
pdf)
,
(
4576936L
,
/
Users/
lihaoyi/
test/
post/
Reimagining/
FluentSearch.
gif)
)
</> 7.1.scala
Snippet 7.1: a short Scala code snippet to find the five largest files in a directory tree
Working with files and subprocesses is one of the most common things you do in programming: from the Bash shell, to Python or Ruby scripts, to large applications written in a compiled language. At some point everyone will have to write to a file or talk to a subprocess. This chapter will walk you through how to perform basic file and subprocess operations in Scala.
This chapter finishes with two small projects: building a simple file synchronizer, and building a streaming subprocess pipeline. These projects will form the basis for Chapter 17: Multi-Process Applications and Chapter 18: Building a Real-time File Synchronizer
8.1 Manipulating JSON | 147 |
8.2 JSON Serialization of Scala Data Types | 150 |
8.3 Writing your own Generic Serialization Methods | 154 |
8.4 Binary Serialization | 157 |
@ val
output =
ujson.
Arr(
ujson.
Obj(
"hello"
->
"world"
,
"answer"
->
42
)
,
true
)
@ output(
0
)
(
"hello"
)
=
"goodbye"
@ output(
0
)
(
"tags"
)
=
ujson.
Arr(
"awesome"
,
"yay"
,
"wonderful"
)
@ println(
output)
[
{
"hello"
:
"goodbye"
,
"answer"
:
42
,
"tags"
:
[
"awesome"
,
"yay"
,
"wonderful"
]
}
,
true
]
</> 8.1.scala
Snippet 8.1: manipulating a JSON tree structure in the Scala REPL
Data serialization is an important tool in any programmer's toolbox. While variables and classes are enough to store data within a process, most data tends to outlive a single program process: whether saved to disk, exchanged between processes, or sent over the network. This chapter will cover how to serialize your Scala data structures to two common data formats - textual JSON and binary MessagePack - and how you can interact with the structured data in a variety of useful ways.
The JSON workflows we learn in this chapter will be used later in Chapter 12: Working with HTTP APIs and Chapter 14: Simple Web and API Servers, while the binary serialization techniques we learn here will be used later in Chapter 17: Multi-Process Applications.
9.1 Reading Files Off Disk | 163 |
9.2 Rendering HTML with Scalatags | 164 |
9.3 Rendering Markdown with Commonmark-Java | 166 |
9.4 Links and Bootstrap | 170 |
9.5 Optionally Deploying the Static Site | 174 |
os.
write(
os.
pwd /
"out"
/
"index.html"
,
doctype(
"html"
)
(
html(
body(
h1(
"Blog"
)
,
for
(
(
_,
suffix,
_)
<-
postInfo)
yield
h2(
a(
href :
=
(
"post/"
+
mdNameToHtml(
suffix)
)
)
(
suffix)
)
)
)
)
)
</> 9.1.scala
Snippet 9.1: rendering a HTML page using the third-party Scalatags HTML library
Scala Scripts are a great way to write small programs. Each script is self-contained and can download its own dependencies when necessary, and make use of both Java and Scala libraries. This lets you write and distribute scripts without spending time fiddling with build configuration or library installation.
In this chapter, we will write a static site generator script that uses third-party libraries to process Markdown input files and generate a set of HTML output files, ready for deployment on any static file hosting service. This will form the foundation for Chapter 10: Static Build Pipelines, where we will turn the static site generator into an efficient incremental build pipeline by using the Mill build tool.
10.1 Mill Build Pipelines | 179 |
10.2 Mill Modules | 183 |
10.3 Revisiting our Static Site Script | 187 |
10.4 Conversion to a Mill Build Pipeline | 188 |
10.5 Extending our Static Site Pipeline | 192 |
import
mill.
_
def
srcs =
T.
source(
millSourcePath /
"src"
)
def
concat =
T{
os.
write(
T.
dest /
"concat.txt"
,
os.
list(
srcs(
)
.
path)
.
map(
os.
read(
_)
)
)
PathRef(
T.
dest /
"concat.txt"
)
}
</> 10.1.scala
Snippet 10.1: the definition of a simple Mill build pipeline
Build pipelines are a common pattern, where you have files and assets you want to process but want to do so efficiently, incrementally, and in parallel. This usually means only re-processing files when they change, and re-using the already processed assets as much as possible. Whether you are compiling Scala, minifying Javascript, or compressing tarballs, many of these file-processing workflows can be slow. Parallelizing these workflows and avoiding unnecessary work can greatly speed up your development cycle.
This chapter will walk through how to use the Mill build tool to set up these build pipelines, and demonstrate the advantages of a build pipeline over a naive build script. We will take the the simple static site generator we wrote in Chapter 9: Self-Contained Scala Scripts and convert it into an efficient build pipeline that can incrementally update the static site as you make changes to the sources. We will be using the Mill build tool in several of the projects later in the book, starting with Chapter 14: Simple Web and API Servers.