Write calculator using lexx and yacc

Write calculator using lexx and yacc

Objective: The goal of this homework is to acquaint yourself with two very powerful tools for the generation of compilers: lex and yacc. When a program is presented to a compiler, the compiler has to divide the program into meaningful units, called tokens, and then discover the relationship among these units. The algorithm that divides the program into units is called a lexical analyzer, scanner, or lexer.

Once the tokens are identified, the compiler needs to find the expressions, statements, declarations, blocks, and procedures in the program. This task is the function of a parser. The parser requires a list of rules that define the relationships among the tokens. This list of rules is called a grammar.

Lex is a program that takes a set of descriptions of possible tokens and produces a C routine that implements a scanner. Yacc takes a concise description of a grammar and produces a C routine that implements a parser for the grammar.

Lex and yacc are tools which can be used to solve a variety of interesting problems, even outside the realm of compilers. We are sure that once you understand how they function, you will find that the use of lex and yacc will save you many programming hours.

Equipment and software:

  • For this assignment, you can use any Linux machine that has lex and yacc (or flex and

bison) installed.

  • You may use a Windows 10 machine provided that you install “windows subsystem for

linux”, which is basically an Ubuntu Linux system.

  • You may use a Mac OS X machine provided that you install Xcode.

Tasks:

All the files needed for this assignment can be found in the attachments. An example lex specification cal.l. You can build a standalone scanner starting from this file to see how the scanner works. Follow these instructions to build the scanner:

  1. Invoke this command to convert a lex specification file into a C source file. “$ lexcal.l”
  2. Now you must compile this C source file into an executable program using the following command: “$ gcc -o scanner lex.yy.c”
  3. Now you can run the executable scanner and play with it by typing in text and checking if it is recognized as a token. $ ./scanner

We recommend that you consult the cal.l file as you play with the scanner to understand how the tokens are specified. Make sure you try operators, words, parenthesis, etc. The scanner will output the type of symbol that it recognizes for the input that you type. You can use CTRL-D to terminate the scanner.

Task 1: For this task you are asked to develop an infix calculator. You should develop two files:

infix.l and infix.y. The file infix.l is going to be the lex file written to match the patterns that are required to generate the tokens needed to implement a simple calculator. The file infix.y described the rules of the grammar that implement the calculator. You should customize the attached Makefile to generate the calculator. Make sure you run the make clean command to remove older versions of the files then type make:

$ make clean

$ make

The infix calculator should work like following. You invoke the calculator by running the program ”./infix”. If you type ”3 + 2” the calculator answers with ”= 5”; if you type ”x = 4” followed by “3 * x” the calculator responds with ”= 12”. Here is a demonstration:

$ ./infix

3+2

=5

x=4

=4

3*x

=12

You should be able to do the following operations:

  1. Addition, subtraction, multiplication, and division.
  2. Unary bitwise operator NOT. Use the following symbol: ‘!’. The operator should

precede the operand.

  1. The power operator (e.g 2 to the power of 4). Use the following symbol: **.
  2. Support parentheses “(“ and “)”, and can warn unmatched parentheses.
  3. Support variables whose name must start with lower case alphabet characters “a-z”.

Note: The parser should NOT recognize symbols consisting of two characters with spaces in between them as valid (e.g. “* *”).

We suggest you look at the cal.l and cal.y files to try to understand how the calculator was specified. 

Task 2: Modify your infix calculator to print out the prefix form of infix expressions. In particular, all parentheses should have been removed in the prefix form. Also spaces should be inserted when needed to properly delimit expression terms. For examples:

  • “2+2” => “+ 2 2”
  • “x=3” => “= x 3”
  • “!x” => “!x”

Solution 

task1

In task 1, a lexer (infix.l) and parser (infix.y) are coded to calculate

infix expressions, which may contain those operator + – * / (**) ! =.

The = operator also creates variables and their values.

To support variables, in the infix.l, tokens of type VAR are generated

for a string starting with a lower case letter and containing only letters,

digits or _.

To store the value of variables, in the infix.y, a linked list of variable

nodes are created. In this data structure, each node contains the name of

a var and its value. Meanwhile, the function get_var is defined to find

the value of a variable by traversing the linked list. The function put_var

puts a new variable into the list or updates the current value of the variable.

The yacc rules can access variable values via these functions. 

infix.l 

%{

#include <stdlib.h> /* for atoi call */

#include <string.h> /* for strcpy call */

#define NODEBUG /* for debuging: print tokens and their line numbers */

#define NUMBER 258 /* copy this from cal.tab.c */

#define VAR 259

#define EXP 260

typedef union { /* copy this from cal.tab.c */

int d;

char var[32];

} YYSTYPE;

YYSTYPE yylval; /* for passing value to parser */

extern intlineNum; /* line number from cal.tab.c */

%}

%%

[ \t]+ {}

[\n] { lineNum++; return ‘\n’; }

“(” {

#ifdef DEBUG

printf(“token ‘(‘ at line %d\n”, lineNum);

#endif

return ‘(‘;

}

“)” {

#ifdef DEBUG

printf(“token ‘)’ at line %d\n”, lineNum);

#endif

return ‘)’;

}

“+” {

#ifdef DEBUG

printf(“token ‘+’ at line %d\n”, lineNum);

#endif

return ‘+’;

}

“-” {

#ifdef DEBUG

printf(“token ‘-‘ at line %d\n”, lineNum);

#endif

return ‘-‘;

}

“*” {

#ifdef DEBUG

printf(“token ‘*’ at line %d\n”, lineNum);

#endif

return ‘*’;

}

“/” {

#ifdef DEBUG

printf(“token ‘/’ at line %d\n”, lineNum);

#endif

return ‘/’;

}

[0-9]+ {

#ifdef DEBUG

printf(“token %s at line %d\n”, yytext, lineNum);

#endif

yylval.d = atoi(yytext);

return NUMBER;

}

“!” {

#ifdef DEBUG

printf(“token ‘!’ at line %d\n”, lineNum);

#endif

return ‘!’;

}

“**” {

#ifdef DEBUG

printf(“token ‘**’ at line %d\n”, lineNum);

#endif

return EXP;

}

[a-z][a-zA-Z0-9_]* {

#ifdef DEBUG

printf(“token ‘%s’ at line %d\n”, yytext,  lineNum);

#endif

strcpy(yylval.var, yytext);

return VAR;

}

“=” {

#ifdef DEBUG

printf(“token ‘=’ at line %d\n”, lineNum);

#endif

return ‘=’;

}

%%

intyywrap() { /* need this to avoid link problem */

return 1;

}

infix.y 

%{

#include <stdio.h>

#include <stdlib.h>

#include <ctype.h>

#include <string.h>

intlineNum = 1;

void yyerror(char *ps, …) { /* need this to avoid link problem */

printf(“%s\n”, ps);

}

/* save the variable to value mapping */

struct node {

struct node *next;

char name[32];

intval;

};

struct node *vars = NULL;

struct node *find_var(const char name[]) {

struct node *it = vars;

while (it != NULL &&strcmp(it->name, name) != 0) {

it = it->next;

}

return it;

}

intget_var(const char name[]) {

/* if not found, return 0. */

struct node *it = find_var(name);

if (it == NULL) {

printf(“undefined variable %s, use default value 0\n”, name);

return 0;

}

return it->val;

}

void put_var(const char name[], intval) {

struct node *it = find_var(name);

if (it == NULL) {

it = (struct node *)malloc(sizeof(struct node));

strcpy(it->name, name);

it->next = vars;

vars = it;

}

it->val = val;

}

%}

%union {

int d;

char var[32];

}

// need to choose token type from union above

%token <d> NUMBER

%token <var> VAR

%token ‘(‘ ‘)’

%left ‘+’

%left ‘-‘

%left ‘*’

%left ‘/’

%left ‘!’

%left EXP

%type <d>calexp factor term base

%start multiple

%%

multiple : stmt multiple |

;

stmt : assign ‘\n’ | cal ‘\n’

;

assign : VAR ‘=’cal

{ put_var($1, $3); }

;

cal : exp

{ printf(“=%d\n”, $1);

$$ = $1;

}

;

exp : exp ‘+’ factor

{ $$ = $1 + $3; }

| exp ‘-‘ factor

{ $$ = $1 – $3; }

| factor

{ $$ = $1; }

;

factor : factor ‘*’ term

{ $$ = $1 * $3; }

| factor ‘/’ term

{ $$ = $1 / $3; }

| term

{ $$ = $1; }

;

term : ‘!’ term

{ $$ = ~$2; }

| term EXP base

{ int r = 1;

for (int i = 0; i < $3; i++) {

r = r * $1;

}

$$ = r;

}

| base

{ $$ = $1; }

;

base : NUMBER

{ $$ = $1; }

| VAR

{ $$ = get_var($1); }

| ‘(‘ exp ‘)’

{ $$ = $2; }

| ‘(‘ exp

{ printf(“unmatched parenthesis\n”); $$ = $2; }

;

%%

int main() {

yyparse();

return 0;

}

task2 

In task 2, the lexer and parser are coded to translate infix expressions

to prefix expressions. For example, 3 + 2 will result in + 3 2.

To implement this, most yacc rules are coded to return a string. In this

way, we can use string functions (for example, sprintf, strcpy) to manipulate

the resulting strings of rules together to construct the prefix version of

an infix expression. 

infix.l 

%{

#include <stdlib.h> /* for atoi call */

#include <string.h> /* for strcpy call */

#define NODEBUG /* for debuging: print tokens and their line numbers */

#define NUMBER 258 /* copy this from cal.tab.c */

#define VAR 259

#define EXP 260

typedef union { /* copy this from cal.tab.c */

char var[32];

} YYSTYPE;

YYSTYPE yylval; /* for passing value to parser */

extern intlineNum; /* line number from cal.tab.c */

%}

%%

[ \t]+ {}

[\n] { lineNum++; return ‘\n’; }

“(” {

#ifdef DEBUG

printf(“token ‘(‘ at line %d\n”, lineNum);

#endif

return ‘(‘;

}

“)” {

#ifdef DEBUG

printf(“token ‘)’ at line %d\n”, lineNum);

#endif

return ‘)’;

}

“+” {

#ifdef DEBUG

printf(“token ‘+’ at line %d\n”, lineNum);

#endif

return ‘+’;

}

“-” {

#ifdef DEBUG

printf(“token ‘-‘ at line %d\n”, lineNum);

#endif

return ‘-‘;

}

“*” {

#ifdef DEBUG

printf(“token ‘*’ at line %d\n”, lineNum);

#endif

return ‘*’;

}

“/” {

#ifdef DEBUG

printf(“token ‘/’ at line %d\n”, lineNum);

#endif

return ‘/’;

}

[0-9]+ {

#ifdef DEBUG

printf(“token %s at line %d\n”, yytext, lineNum);

#endif

strcpy(yylval.var, yytext);

return NUMBER;

}

“!” {

#ifdef DEBUG

printf(“token ‘!’ at line %d\n”, lineNum);

#endif

return ‘!’;

}

“**” {

#ifdef DEBUG

printf(“token ‘**’ at line %d\n”, lineNum);

#endif

return EXP;

}

[a-z][a-zA-Z0-9_]* {

#ifdef DEBUG

printf(“token ‘%s’ at line %d\n”, yytext,  lineNum);

#endif

strcpy(yylval.var, yytext);

return VAR;

}

“=” {

#ifdef DEBUG

printf(“token ‘=’ at line %d\n”, lineNum);

#endif

return ‘=’;

}

%%

intyywrap() { /* need this to avoid link problem */

return 1;

}

infix.y 

%{

#include <stdio.h>

#include <stdlib.h>

#include <ctype.h>

#include <string.h>

intlineNum = 1;

void yyerror(char *ps, …) { /* need this to avoid link problem */

printf(“%s\n”, ps);

}

%}

%union {

char buffer[32];

}

// need to choose token type from union above

%token <buffer> NUMBER

%token <buffer> VAR

%token ‘(‘ ‘)’

%left ‘+’

%left ‘-‘

%left ‘*’

%left ‘/’

%left ‘!’

%left EXP

%type <buffer> assign calexp factor term base

%start multiple

%%

multiple : stmt multiple |

;

stmt : assign ‘\n’ { printf(“%s\n”, $1); }

| cal ‘\n’ { printf(“%s\n”, $1); }

;

assign : VAR ‘=’cal

{ sprintf($$, “= %s %s”, $1, $3); }

;

cal : exp { strcpy($$, $1); }

;

exp : exp ‘+’ factor

{ sprintf($$, “+ %s %s”, $1, $3); }

| exp ‘-‘ factor

{ sprintf($$, “- %s %s”, $1, $3); }

| factor

{ strcpy($$, $1); }

;

factor : factor ‘*’ term

{ sprintf($$, “* %s %s”, $1, $3); }

| factor ‘/’ term

{ sprintf($$, “/ %s %s”, $1, $3); }

| term

{ strcpy($$, $1); }

;

term : ‘!’ term

{ sprintf($$, “! %s”, $2); }

| term EXP base

{ sprintf($$, “** %s %s”, $1, $3); }

| base

{ strcpy($$, $1); }

;

base : NUMBER

{ strcpy($$, $1); }

| VAR

{ strcpy($$, $1); }

| ‘(‘ exp ‘)’

{ strcpy($$, $2);}

| ‘(‘ exp

{ printf(“unmatched parenthesis\n”); strcpy($$, $2); }

;

%%

int main() {

yyparse();

return 0;

}