Question: Assuming you have a set of n letter dictionary words ($S$), a start word ($s_0 \in S$) and a target word ($s_t \in S$) find the shortest chain of words ($C$) where one letter differs between consecutive words in the chain and the first and last words in the chain are $s_0$ and $s_t$ respectively.
Eg $S = [cat, cab, car, tat, tar, bar], s_0 = cat, s_t = tar $
$ => C = [cat -> car -> bar -> tar]$
To solve this programatically, we can see that this is a graph programming problem - we can build a graph representing the data then find the shortest path within the graph from $s_0$ to $s_t$.
Scala has no graph implementation in the standard collections, but there is a neat little project called scala-graph which you can use.
The interesting part is the computeEdges function. Here is a naive implementation:
For every word in the list we check it against every other unvisisted word in the list meaning that this algorithm is O(n^2) where n = the number of words in the dictionary.
Eg
cat is compared to [ cab, car, tat, tar, bar ]
cab is compared to [ car, tat, tar, bar ]
car is compared to [ tat, tar, bar ]
tat is compared to [ tar, bar ]
tar is compared to [ bar ]
Example: [ cat, cab, car, tat, tar, bar ]
Step 1:
Firstly we compute 'subs' for each word in the dictionary
'cat' -> [ '?at' , 'c?t' , 'ca?' ]
'cab' -> [ '?ab' , 'c?b' , 'ca?' ]
etc...
Step 2:
We build a hashmap from the subs
'?at' -> [ 'cat', 'tat' ]
'c?t' -> [ 'cat' ]
'ca?' -> [ 'cat', 'cab', 'car' ]
'?ab' -> [ 'cab' ]
'c?b' -> [ 'cab' ]
etc...
Step 3:
Now we build up the list of edges. For each word, we get 3 subs for each word. Then we look up each of these 3 subs in the above hashmap to get all the nodes for which the current words has an edge with. We keep a list of nodes we have visited so that we don't add each node twice (the edges are undirected).
Full code is here.
Eg $S = [cat, cab, car, tat, tar, bar], s_0 = cat, s_t = tar $
$ => C = [cat -> car -> bar -> tar]$
Top level solution
To solve this programatically, we can see that this is a graph programming problem - we can build a graph representing the data then find the shortest path within the graph from $s_0$ to $s_t$.
Scala has no graph implementation in the standard collections, but there is a neat little project called scala-graph which you can use.
def computeEditDistance(start: String, end: String, dictionary: Set[String]): Option[Int] = {
val graph = Graph.from(dictionary, computeEdges(dictionary))
val path = graph.get(start).shortestPathTo(graph.get(end))
path.map(_.edges.size)
}
How to compute the edges
The interesting part is the computeEdges function. Here is a naive implementation:
private def computeEdgesSlowly(dictionary: Set[String]): List[UnDiEdge[String]] = {
val (_, edges)
= dictionary.foldLeft (dictionary.drop(1), List.empty[UnDiEdge[String]]) {
case (acc, word) =>
val (unvisitedWords, edgeList) = acc
val edgesForThisWord = unvisitedWords.collect {
case (w) if differsByOneLetter(w, word)
=> word ~ w
}
(unvisitedWords.drop(1), edgeList ++ edgesForThisWord)
}
edges
}
For every word in the list we check it against every other unvisisted word in the list meaning that this algorithm is O(n^2) where n = the number of words in the dictionary.
Eg
cat is compared to [ cab, car, tat, tar, bar ]
cab is compared to [ car, tat, tar, bar ]
car is compared to [ tat, tar, bar ]
tat is compared to [ tar, bar ]
tar is compared to [ bar ]
Cleverer algorithm for computing edges
It is possible to compute the edges in O(n) which is a huge improvement on the implementation above. :D :DExample: [ cat, cab, car, tat, tar, bar ]
Step 1:
Firstly we compute 'subs' for each word in the dictionary
'cat' -> [ '?at' , 'c?t' , 'ca?' ]
'cab' -> [ '?ab' , 'c?b' , 'ca?' ]
etc...
val mapOfSubs = dictionary.map { word =>
val subs = (0 until word.length) map { case i =>
word.patch(i, Seq('?'), 1)
}
word -> subs
}.toMap
Step 2:
We build a hashmap from the subs
'?at' -> [ 'cat', 'tat' ]
'c?t' -> [ 'cat' ]
'ca?' -> [ 'cat', 'cab', 'car' ]
'?ab' -> [ 'cab' ]
'c?b' -> [ 'cab' ]
etc...
val cachedMap = mutable.HashMap.empty[String, Set[String]]
dictionary.foreach { word =>
val subs = mapOfSubs.get(word)
subs.foreach(o => o.foreach {
sub =>
cachedMap(sub)
= Set(word) ++ cachedMap.getOrElse(sub, Set.empty[String])
})
}
Step 3:
Now we build up the list of edges. For each word, we get 3 subs for each word. Then we look up each of these 3 subs in the above hashmap to get all the nodes for which the current words has an edge with. We keep a list of nodes we have visited so that we don't add each node twice (the edges are undirected).
val visited = mutable.Seq.empty[String]
dictionary.flatMap { word =>
word :+ visited
val subs = mapOfSubs(word)
subs.flatMap { s =>
val links = cachedMap(s)
links.collect {
case l if !visited.contains(l)
=> word ~ l // ~ = undirected edge }
}
}.toSeq
Full code is here.
No comments:
Post a Comment