.
.
.
.

regular-languages-and-finite-automata

Regular Languages and Finite Automata


Introduction

Alphabet (Σ) : An Alphabet is a set of finite number of symbols.

Example: Σ = {a,b} is alphabet with two symbols a and b.


String: A string is any sequence of symbols over the given alphabet Σ.

Example: abb is a string over Σ = {a,b}


Empty string (ε or λ): Empty string is a string with no symbol over any alphabet. Empty string is also called as "Null String".

|ε| = 0 (i.e, empty string length is 0)


Length of string : The number of symbols in the sequence of given string is called "length of a string".

Example : |abb| = 3, for the string abb over Σ = {a, b}


Substring of a string : Any sequence of zero or more consecutive symbols of the given string over an alphabet is called a "Substring of a string".

Example: For string abb over Σ = {a, b}, the possible substrings are: ε , a, b, ab, bb and abb.


Prefix of a string : Any substring with the sequence of beginning symbols of a given string is called a "Prefix".

Example: For string "abb", the possible prefixes of abb are: ε , a, ab, abb.


Suffix of a string : Any substring with the sequence of trailing (ending) symbols of a given string is called a "Suffix".

Example : For string abb, the possible suffixed are: ε , b, bb, abb.


Power of an Alphabet: Consider Σ = {a, b}.

The following are powers of an alphabet over Σ.


Σ^0 = {ε} : zero length string


Σ^1 = {a,b}: set of 1-length strings


Σ^2 = {aa, ab, ba, bb}: set of 2-length strings


Σ^* = the set of all strings over Σ = Σ^0 U Σ^1 U Σ^2 U Σ^3 U ....


Σ^+ = Σ^1 U Σ^2 U Σ^3 U .... = Σ* - Σ^0 = Σ* - {ε} 


Language : A set of strings over the given alphabet Σ is called a "language" (or collection of strings).

Example: Let Σ = {a,b}. Then Language L = {ab, aab}


Grammar: Grammar Contain set of rules to generate the strings of a language. Grammar is a set of 4 tuples. Grammar G = {V, T, P, S} where V is set of non-terminals, T is set of terminals, P is set of rules and S is start symbol (S ε V).

Each rule of the grammar appears as : X -> Y, where X and Y are any sequence of terminals and non-terminals.


Automaton:

(i) Automaton Acts as Recognizer or Accepter: An automaton is a machine that accept or recognizes the strings of a language (L) over an input alphabet Σ.


(ii) Automaton Acts as producer or enumerator or transducer: An automaton can also produce the output over any alphabet D for the given input alphabet Σ.


Formal Languages: Formal language is a set of all stings where each string is restricted over a particular forms of a string with the given input.

Equivalence of language, grammar and automata

Every formal language can be generated by equivalent grammar and be accepted by the equivalent automaton as shown in above table.


Chomsky Hierarchy: All formal languages are divided into 4 classes by chomsky and those class hierarchy known as "Chomsky-Hierarchy".


Type 3 ⊂ Type 2 ⊂ Type 1 ⊂ Type 0.


Types of Automaton:

(i) Finite Automaton (FA): An automaton that accepts a regular language is called a "FA".

Types of Finite Automaton:

(a) Deterministic FA (DFA)

(b) Non-deterministic FA (NFA or NDFA)

L(DFA) = L(NFA)

i.e., both types of finite automata have same expressive power.


(ii) Push Down Automaton (PDA): An automaton that accepts a context free language is called a "PDA".

Types of PDA:

(a) Deterministic PDA (DPDA).

(b) Non-deterministic PDA (NPDA or PDA).

L(DPDA) < L(NPDA) i.e., DCFL's ⊂ CFL's

Class of languages accepted by DPDA's are proper subset of class of languages accepted by NPDA's.

The class of NPDA's have more expressive power than the class of DPDA's


(iii) Linear Bound Automaton (LBA): An automaton that accepts a context sensitive language is called a "LBA".


(iv) Turing Machine (TM): An automaton that accepted a recursive enumerable language is called a "TM". TM can accept a REL and TM can enumerate a REL.

Type of TM:

(a) Deterministic TM (DTM)

(b) Non-deterministic TM (NTM)

L(NTM) = L(DTM)


Types of Grammars:

(i) Regular Grammar (RG) or type-3: A grammar is regular grammar iff it is either left linear grammar or right linear grammar. Regular grammar is a grammar where all rules are restricted as following.

Either V -> VT* | T* [Left linear grammar].

or

V -> T*V | T* [Right linear grammar]

where V is any variable and T any terminal.


(ii) Context Free Grammar (CFG) or Type-2: Every rule of CFG is restricted as V -> (V U T)*


(iii) Context Sensitive Grammar (CSG) or Type-1:

Rule : X -> Y

Where |X| <= |Y|, X ∈ (V U T)* and Y ∈ (V U T)* and X must contain atleast one varible.

If NULL productions are present in the grammar, it may or may not be CSG.


(iv) Recursive Enumerable Grammar (REG) or Type-0:

Rule: X -> Y

Where X ∈ (V U T)*, Y ∈ (V U T)* and X must contain atleast one variable.



Equivalences of Language, Automata & Grammar:

(i) Regular Language (Type-3 language or finite state language):

 ≡

Finite Automaton [DFA ≡ NFA ≡ ∈-NFA]

 ≡

Regular Grammar (Type-3 grammar)

 ≡

Left linear Grammar

 ≡

Right linear Grammar

 ≡

Regular Expression



(ii) Context Free language (Type-2 language):

 ≡

Push Down Automata [NPDA or PDA]

 ≡

Context Free Grammar



(iii) CSG ≡ LBA ≡ CSL (Type-1 language)



(iv) REG ≡ TM ≡ REL (Type-0 language)



Expression Power of Automata: Expression power of a machine is the class or set of languages accepted by the particular type of machines.

FA < DPDA < PDA < LBA < TM. FA is less powerful than any other machine and TM is more powerful than any other machine.


(i) Type 3 class ⊂ Type 2 class ⊂ Type 1 class ⊂ Type 0 class

(ii) FA ≡ TM with read only tape ≡ TM with unidirectional tape ≡ TM with finite tape ≡ PDA with finite stack.

(iii) PDA ≡ FA with stack

(iv) TM ≡ PDA with additional stack ≡ FA with two stacks



Applications of Automata:

(i) Finite automata can be used in the following cases:

(a) To design a lexical analysis of a compiler

(b) To recognize the pattern using regular expressions

(c) To design the combination and sequential circuits using Moore/mealy machines.


(ii) PDA can be used in the following cases:

(a) To design the parsing phase of a compiler (syntax analysis)

(b) To implement stack applications

(c) To evaluate arithmetic expressions

(d) To solve the Tower of Hanoi problem


(iii) Linear Bounded Automata can be used in the following cases:

(a) To construct syntactic parse trees for semantic analysis of the compiler.

(b) To implement genetic programming.


(iv) Turning Machine can be used in the following cases:

(a) To solve any recursively enumerable problem

(b) To understand complexity theory

(c) To implement neural nets

(d) To implement artificial Intelligence

(e) To implement Robotic applications



Remember:


NFA has same power as of DFA


DPDA has less power than NPDA


DTM has same power as of NTM


Mealy and Moore machines have same power


Mealy machine: Output depends on current state and current input


Moore machine: Output depends on current state only.


Finite automaton does not have infinite memory required for comparison.


Every finite language is regular but converse need not be true.


Language may be infinite but variable, terminals and production rules are finite for every grammar.


Regular grammar is either left linear or right linear or both.


Every NFA can be converted to DFA and then to minimal DFA.


If NFA has n-states then the equivalent DFA has atmost 2^n states.


Let w be a string of length n. Then number of possible substring including ε is almost n(n+1)/2 +1.


Number of prefixes for the given n-length string = (n + 1)


Number of suffixes for the given n-length string = (n + 1)


Finite Automaton (FA)



Finite automata is categorized into two types:

(i) Finite automata with output: This is of two types Mealy machines and Moore machines.

(a) In Mealy machine the output is associated with transition (state and i/p symbol)

(b) In the Moore machine output is associated with the state.


(ii) Finite automata without output: This is of two types DFA and NFA

(a) DFA: It is deterministic finite automata. For every input symbol in DFA there is an exactly one transition from each state of finite automata. It accepts all strings that are present in the language by halting in a final stage and rejects all strings that are not present in the language by halting in a non-final state.

(b) NFA: It is non deterministic finite automata. For every input symbol in NFA there is at least one transition or no transitions from each state of finite automata. It accepts all strings that are present in the language by halting in a final state and rejects all strings not present in the language either by halting in a final state or by halting in a dead configuration.

(c) ε-NFA: It is a non deterministic finite automata which include ε transitions between states.


Model of Finite Automata:


(i) Input tape contains a string over the given input alphabet.

(ii) Read head is a pointer which reads one input symbol at a time and moves to the next symbol.

(iii) Finite control contains set of states and a transition function to take an action based on current state and input symbol scanned.


Deterministic Finite Automata (DFA or FA)

For every input symbol of alphabet there is an exactly one transition from every state of finite automata is called as DFA.

DFA covers transitions for all valid and invalid strings in the given regular language.

For every valid string, DFA reaches to one of its Final state and for every invalid string it reaches to non-final state.


Specifications of DFA

DFA = (Q,Σ,δ,q0,F) is 5-tuple notation.

Where Q is set of finite states,

Σ is input alphabet contains finite number of input symbols

δ is transition function defined over transitions of FA for state and input

δ: Q x Σ -> Q

q0 is initial state or start state of DFA, q0 ∈ Q

F is set of final states of DFA


Compound FA (F1 x F2)

Let F1 and F2 are two DFA's. Then operations defined over FA are (F1 U F2), (F1 ∩ F2), (F1 - F2), and (F2 - F1). Construction of compound FA for the given F1 and F2 as follows:

1. The number of states in compound FA(F1 x F2) is equal to m x n, where m is the number of states of F1 and n is the number of states of F2.

2. Initial state of compound FA is combination of initial state of F1 and initial state of F2.

3. Let Q1 = {q0,q1,q2} is set of states of F1, and Q2 = {qa,qb} is set of states of F2. Let q0 be the start state of F1 and qa be the start of F2. Then Q1 x Q2 = {(q0,qa), (q0, qb), (q1, qa), (q1, qb), (q2, qa), (q2, qb)}, where (q0, qa) is an initial state of compound FA.

4. Let q2 is final of F1 and qb is final of F2. Final states depends on type of compound finite automata (F1 x F2)

(a) F1 U F2 final states: Make those states of (Q1 x Q2) as final, if final state of F1 or final state of F2 contains in any of the states. Such a FA accepts L(F1) U L(F2).

(b) F1 ∩ F2 final states: Make those states of (Q1 x Q2) as final, if only final state of F1 and final state of F2 contains in any of the states. Such a FA accepts L(F1) ∩ L(F2).

(c) F1- F2 final states: Make those states of (Q1 x Q2) as final, if only final state of F1 but not final state of F2 contains in any of the states. Such a FA accepts L(F1) - L(F2).

(d) F2 - F1 final states: Make those states of (Q1 x Q2) as final, if only final state of F2 but not final state of F1 contains in any of the states. Such a FA accepts L(F2) - L(F1).


Non-deterministic Finite Automata (NFA or NDFA)

For every valid string there exists a path from initial state that reaches to final state. NFA covers transitions for all valid strings only in the given regular language.


Specification of NFA without Null Moves

NFA = (Q, Σ, δ, q0, F) is 5-tuple notation.

Where Q is set of finite states,

Σ is input alphabet contains finite number of input symbols

δ is transition function defined over transitions of NFA for state and input

δ: Q x Σ ->2Q


Epsilon NFA (ε-NFA) or NFA

For every valid string there exists a path from initial state that reaches to final state. ε-NFA is same as NFA but it can include epsilon transitions.


Specification of ε-NFA

ε-NFA = (Q, Σ, δ, q0, F) is 5-tuple notation.

Where δ: Q x Σ U{ ε }->2Q


Regular Expressions

It is a declarative way of representation of a regular language. Regular expression is a compact formula that generates exactly those strings which are in language.

·        Operators of Regular Expressions: The operators in regular expressions are: ( * ) is kleene closure,  ( . ) is concatenations operator and ( + ) is union operator.

·        Important Notations of Regular Expressions: Some of the important notations compared to regular sets are:


Properties of Regular Expression Operators:

(a)   Union operator satisfies commutative property and associative property.

(b)  The concatenation operator satisfies associative property but not commutative property

(c)   Both left and right distributive property of concatenation over union are satisfied. R(s + t) = rs + rt and (s + t)r = sr + tr

(d)  Both left and right distributive properties of union over concatenation are not satisfied. St + r ≠ (s + r) (t + r) and r + st ≠ (r + s) (r + t)

(e)  Union operator satisfies idempotent property but the concatenation operator does not satisfy. i.e.  r + r = r but r . r ≠ r

(f)    Identity property:

R + ∅ = ∅ + R = R: ∅ acts as identity element with respect to union operation

ε . R = R . ε = R: Epsilon (ε) acts as identity element with respect to concatenation. 


Equivalences of Languages (or regular expressions):

(a)   L(r1 + r2) = L(r1) U L(r2)

(b)  L(r1 . r2) = L(r1) . L(r2)

(c)   L(r*)=(K(r))*

(d)  r1.r2 ≠ r2.r1

(e)  r1(r2r3) = (r1r2)r3

(f)    r1(r2+r3)= r1r2+r1r3

(g)   r1+r2r3  ≠ (r1+r2) (r1+r3)

(h)  ∅ + r = r

(i)     ∅ . r = ∅ = r. ∅

(j)    λr = rλ = r

(k)   λ* = λ

(l)     ∅*= λ

(m) r + r = r

(n)  r.r ≠ r

(o)  r*. r* = r*

(p)  r*. r+ = r+

(q)  (r*)* = r*

(r)    (r)* = r*

(s)   λ + rr* = r* = λ + r+

(t)    p(qp)* = (pq)*p

(u)  (p + q)* = (p*q*)*  = (p* + q*)*

(v)   (p + q)*p*q* = (p + q)*


Closure Properties of Regular Languages

* The set of regular languages is closed under the union operation.

* Finite union is closed for regular languages but infinite union is not closed.

* The set of regular languages is closed under the intersection operation.

* Regular languages are closed under finite intersection but not under infinite intersection.

* The set of regular languages is closed under the difference operation.

* The set of regular languages is closed under the complement operation.

* The set of regular languages is closed under the concatenation operation.

* The set of regular languages is closed under the kleene closure operation.

* The set of regular languages is closed under the positive closure operation.

* The set of regular languages is closed under the prefix operation as well as suffix operation.

Init(L) = Prefix(L) = {u | uv is in regular language L}

Suffix(L) = {v | uv is in regular language L}

* The set of regular languages is closed under the reversal operation.

* Half of a Regular Language is also regular: Let L is a regular language.

Then Half(L) = {u | uv ∈ L and |u| = |v|}.

* One third of a Regular Language is also regular: Let L is a regular language. Then onethird(L) = {u | uvw ∈ L and |u| = |v| =|w|}.

* Quotient of two Regular Languages is also regular: L1/L2 = {x | xy ∈ L1 for some y ∈ L2}.

* Subsequence of a Regular Language is also regular language: Subsequence(L) = {w | w is obtained by removing symbols from anywhere of a string y ∈ L2}.

* Sub-word of a Regular Language is also regular: Sub-word(L) = {v | for some u and for some w, uvw is in L}.

* Homomorphism of two Regular Languages is also regular.

* Inverse Homomorphism of two Regular languages is also regular.

* Substitution of two Regular Languages is also regular.

* Shuffle of two Regular Languages: Shuffle(L1,L2) = {x1y1x2y2…xkyk|x1x2…xk is in L1 and y1y2…yk is in L2 } is also regular.

* Symmetric difference of two Regular Languages is also regular.

* If L is regular then L is also a Regular Language: L = {w | ww in L}.

* If L regular then Max(L) is also a Regular Language.

* If L is regular then Min(L) is also regular. Min(L) = {w | w ∈ L and there is no proper prefix of w is in L}.

* Regular language is not closed under subset, superset, infinite union as well as infinite intersection. i.e. a subset or superset of a regular languages need not be regular.


Decision Properties of Regular Language

Membership: It is decidable.

Emptiness (Is L empty?): Checking whether the given regular language is empty or not. It is decidable.

Finiteness (Is L finite?): Checking whether the given language is finite or not. It is decidable.

Equivalence of two Regular Languages (Is L1 ≡ L2?): Checking whether the given two regular languages are equal or not. It is decidable.

Subset Operation over Two Regular Languages (Is L1 ⊆ L2?): Checking whether the regular language L1 is subset of a regular language L2 or not. (L1 ⊆ L2) ≡ (L1 ∩ L2 = ⊖). It is decidable.


Mealy and Moore Machines

Mealy and Moore machines are acts as language generators but not language recognizers. There are no final states in Moore and Mealy machine.

Mealy Machine: In Finite automata, if output symbol is associated with transition (state and input) then such automata is called as Mealy machine.

Moore Machine: In Finite automata if output symbol is associated with each state then such automata is called as Moore machine.


Specification of Mealy and Moore Machines:

M = (Q, Σ, Δ, δ, λ, q0) is a 6-tuple notation.


Where,

Q is set of finite states,

Σ is input alphabet contains finite number of input symbols.

Δ is output alphabet which contains output symbols.

δ is transition function. δ: Q x Σ -> Q

λ is output function: λ: Q -> Δ* (Moore machine)

           λ: Q x Σ -> Δ* (Mealy machine)

q0 is initial state or start of Moore machine, q0 ∈ Q

Pumping Lemma

Pumping lemma can be used to prove that a given language is non-regular (using proof by contradiction). Then there exist a constant 'n' such that for every string 'w' in L of length 'n' or more the following holds. Let w has three parts x, y and z.

1. w=xyz, where w ∈ L and | w | >= n.

2. For all i >= 0, xyiz ∈ L

3. |y| > 0.

4. Optional condition |xy| <= n.


Examples: (Non regular languages)

        L = {ambm|i>=0}

        L = {ap|p is a prime}

        L = {ambn|m>n}

        L = {ambn|m<n}

        L = {ambn|m!=n}

        L = {w | w ∈ (a + b)*} and w has #a’s equal to #b’s}

        L= {ww|w ∈ (a + b)*}

        L = {an2}

        L = {a2n}

        L= {anbncn}


Minimization of FA

Finite automata need to be minimized to reduce number of states of DFA.


Minimization methods: Partition method (state equivalence), table filling method and Myhill-Nerode theorem.

Myhill-Nerode Theorem: Let L be a regular language, then it contains finite number of equivalence classes. Union of all Myhill-Nerode equivalence classes gives universal language. All the strings of universal language partitioned into a finite number of Myhill-Nerode equivalence classes, if L is regular. The number of Myhill-Nerode equivalence classes in a regular language ≡ Number of states in the unique minimal DFA corresponding to that language.