Google Meet Media API - Technical Analysis for Recording Application¶

CRITICAL FINDING: Wrong Repository¶

The /Users/maro/Projects/codeseed/Google-Meet-Bot repository is NOT the official Google Meet Media API. It's a Python Selenium-based screen automation bot that uses sounddevice to capture system audio - completely different approach.

The official Google Meet Media API is: - GitHub: https://github.com/googleworkspace/meet-media-api-samples - Language: TypeScript (web) and C++ implementations - Status: Developer Preview (requires enrollment in Google Workspace Developer Preview Program)

1. TypeScript Reference Client Structure¶

Repository Organization (`web/samples` directory):¶

web/
├── samples/          # TypeScript reference implementation
│   ├── src/          # Source TypeScript files
│   ├── app.yaml      # Google App Engine deployment config
│   ├── webpack       # Build tool for bundling
│   └── package.json  # Node.js dependencies
├── external/         # WebRTC dependencies
└── cpp/             # C++ reference client (alternative)

Key TypeScript Modules/Classes:¶

MeetMediaApiClient (Primary Interface)¶

interface MeetMediaApiClient {
  // Connection management
  joinMeeting(config: MeetMediaClientRequiredConfiguration): Promise<void>
  leaveMeeting(): Promise<void>

  // Layout/video assignment
  createMediaLayout(request: MediaLayoutRequest): MediaLayout
  applyLayout(layout: MediaLayout): Promise<void>

  // Observable properties (reactive data streams)
  mediaEntries: Subscribable<MediaEntry[]>
  meetStreamTracks: Subscribable<MeetStreamTrack[]>
  participants: Subscribable<Participant[]>
  presenter: Subscribable<MediaEntry | undefined>
  screenshare: Subscribable<MediaEntry | undefined>
  sessionStatus: Subscribable<MeetSessionStatus>
}

MeetMediaClientRequiredConfiguration¶

interface MeetMediaClientRequiredConfiguration {
  accessToken: string           // OAuth token
  meetingSpaceId: string        // Meeting ID from Meet REST API
  enableAudioStreams: boolean   // Enable audio capture
  numberOfVideoStreams: number  // 1-3 video streams
  logsCallback: (LogEvent) => void  // Logging handler
}

MediaEntry (Participant Stream Data)¶

interface MediaEntry {
  participant: BaseParticipant  // Participant metadata
  session: string               // Session ID
  sessionName: string           

  // Audio
  audioMeetStreamTrack: MeetStreamTrack | undefined
  audioMuted: boolean

  // Video
  videoMeetStreamTrack: MeetStreamTrack | undefined
  videoMuted: boolean

  // State
  isPresenter: boolean
  screenShare: boolean
  mediaLayout: MediaLayout
}

MeetStreamTrack (Actual Media Stream)¶

interface MeetStreamTrack {
  mediaEntry: MediaEntry
  mediaStreamTrack: MediaStreamTrack  // Standard WebRTC MediaStreamTrack
}

2. Audio/Video Stream Capture & Processing¶

WebRTC Architecture:¶

The API uses WebRTC with Virtual Streams managed by a Selective Forwarding Unit (SFU):

Audio Streams:¶

Exactly 3 audio virtual streams (fixed requirement)
Each stream has a static SSRC (Synchronization Source)
CSRC headers identify the true speaker when streams switch
Opus codec required (48kHz, with inband FEC)
Dynamic speaker switching: SFU sends 3 loudest speakers

// Configure 3 audio transceivers in offer
pc.addTransceiver('audio', {'direction':'recvonly'});
pc.addTransceiver('audio', {'direction':'recvonly'});
pc.addTransceiver('audio', {'direction':'recvonly'});

Video Streams:¶

1-3 video virtual streams (configurable)
VP8, VP9, and AV1 codecs required
Dynamic participant switching based on relevance
Can request specific participants via Video Assignment API

// Configure video transceivers
pc.addTransceiver('video', {'direction':'recvonly'});
pc.addTransceiver('video', {'direction':'recvonly'});
pc.addTransceiver('video', {'direction':'recvonly'});

Stream Processing Flow:¶

WebRTC Offer/Answer: Client creates SDP offer → Send to connectActiveConference() → Receive SDP answer
ICE Connection: Establish peer connection to Meet servers
Data Channels: Open ordered data channels (session-control, media-stats, participants, media-entries, video-assignment)
Media Arrival: Subscribe to meetStreamTracks for MediaStreamTrack objects
Extract Frames: Use MediaStreamTrack.getSettings() and attach to <video> or process with MediaRecorder

3. Participant Tracking & Speaker Identification¶

Participant Data Structure:¶

interface BaseParticipant {
  name: string                    // Resource name
  participantKey: string          // Unique participant ID

  // User identity (one of):
  signedInUser?: {
    user: string                  // Email/user ID
    displayName: string
  }
  anonymousUser?: {
    displayName: string
  }
  phoneUser?: {
    displayName: string
  }
}

CSRC-Based Speaker Identification:¶

CSRC (Contributing Source): 32-bit identifier in RTP packet headers
Unique per participant, constant during their session
Maps to participant via MediaEntry.participant

// MediaEntry contains CSRC mapping
interface MediaEntry {
  participant: BaseParticipant  // Links to participant metadata
  audioCsrc?: number            // Audio CSRC value
  videoCsrcs?: number[]         // Video CSRC values
  audioMuted: boolean
  videoMuted: boolean
}

Real-time Tracking:¶

// Subscribe to participant updates
client.participants.subscribe((participants: Participant[]) => {
  participants.forEach(p => {
    console.log('Participant:', p.participant.displayName);
    p.mediaEntries.forEach(entry => {
      console.log('  CSRC:', entry.audioCsrc, 'Muted:', entry.audioMuted);
    });
  });
});

// Subscribe to media entries (active streams)
client.mediaEntries.subscribe((entries: MediaEntry[]) => {
  entries.forEach(entry => {
    if (entry.audioMeetStreamTrack) {
      // This participant is currently speaking (one of 3 loudest)
      console.log('Active speaker:', entry.participant.displayName);
    }
  });
});

4. WebRTC Data Channel Architecture¶

Required Data Channels (all must be ordered):¶

session-control¶

Purpose: Session lifecycle management

Messages: Leave requests/responses, session status updates

sessionControlChannel = pc.createDataChannel('session-control', {ordered: true});

media-stats¶

Purpose: Upload WebRTC statistics for diagnostics

Configuration: Server sends upload interval and allowlist

mediaStatsChannel = pc.createDataChannel('media-stats', {ordered: true});

participants (Server-initiated)¶

Purpose: Receive participant join/leave events
Messages: ParticipantsChannelToClient with participant resources

media-entries (Server-initiated)¶

Purpose: Receive media stream assignments
Messages: MediaEntriesChannelToClient with media entry resources

video-assignment (Optional)¶

Purpose: Request specific participants' video
Messages: SetVideoAssignmentRequest with canvas assignments

Data Channel Protocol:¶

Format: JSON-encoded protobuf-like structures
Bidirectional: Client sends requests, server sends resources
Resource Snapshots: Full state + incremental deltas

5. Incremental Recording vs Full Download¶

Current API Limitations:¶

The Meet Media API does NOT provide built-in incremental S3 upload. You must implement this yourself using:

Approach 1: MediaRecorder API (Browser)¶

const mediaRecorder = new MediaRecorder(mediaStream, {
  mimeType: 'video/webm;codecs=vp9,opus',
  videoBitsPerSecond: 2500000
});

let chunkNumber = 0;
mediaRecorder.ondataavailable = async (event) => {
  if (event.data.size > 0) {
    // Upload chunk to MinIO incrementally
    await uploadToMinIO(event.data, `recording-chunk-${chunkNumber++}.webm`);
  }
};

// Request data every 5 seconds
mediaRecorder.start(5000);

Approach 2: Custom Frame Processing¶

// Process raw video frames
const videoTrack = mediaStreamTrack as MediaStreamVideoTrack;
const processor = new MediaStreamTrackProcessor({track: videoTrack});
const reader = processor.readable.getReader();

while (true) {
  const {value: frame, done} = await reader.read();
  if (done) break;

  // Extract frame data, encode, upload to MinIO
  const imageData = await extractFrameData(frame);
  await uploadFrameToMinIO(imageData);
  frame.close();
}

Approach 3: Server-Side Recording (Recommended)¶

Run a Node.js/C++ server that: 1. Joins meeting using Meet Media API 2. Receives WebRTC streams 3. Uses FFmpeg to mux streams into containers 4. Streams chunks to MinIO S3 using multipart upload

// Node.js example with FFmpeg
import ffmpeg from 'fluent-ffmpeg';
import { S3Client, CreateMultipartUploadCommand } from '@aws-sdk/client-s3';

const command = ffmpeg()
  .input(audioStream)
  .input(videoStream)
  .outputOptions([
    '-f segment',
    '-segment_time 10',
    '-reset_timestamps 1'
  ])
  .on('end', () => console.log('Recording complete'))
  .pipe();

// Stream to MinIO
command.on('data', async (chunk) => {
  await s3Client.uploadPart({
    Bucket: 'recordings',
    Key: `meeting-${meetingId}.mp4`,
    PartNumber: partNumber++,
    Body: chunk
  });
});

6. Authentication Flow¶

OAuth 2.0 Requirements:¶

Scopes:
- https://www.googleapis.com/auth/meetings.conference.media.readonly
- https://www.googleapis.com/auth/meetings.space.readonly

Flow:¶

Get OAuth Token: Use implicit grant flow (web) or service account (server)
Create/Get Meeting Space: Use Meet REST API to get meeting space ID
Join Conference: Call connectActiveConference() with OAuth token in request
Maintain Token: Refresh before expiry (typically 1 hour)

TypeScript Example:¶

// 1. OAuth (implicit grant for web)
const oauth2Endpoint = 'https://accounts.google.com/o/oauth2/v2/auth';
const params = {
  client_id: YOUR_CLIENT_ID,
  redirect_uri: YOUR_REDIRECT_URI,
  response_type: 'token',
  scope: 'https://www.googleapis.com/auth/meetings.conference.media.readonly ' +
         'https://www.googleapis.com/auth/meetings.space.readonly'
};

// 2. Get meeting space ID
const meetingCode = 'abc-defg-hij';
const response = await fetch(
  `https://meet.googleapis.com/v2/spaces?filter=meetingCode="${meetingCode}"`,
  {headers: {Authorization: `Bearer ${accessToken}`}}
);
const {name} = await response.json(); // spaces/XYZ

// 3. Connect to conference
const client = new MeetMediaApiClient();
await client.joinMeeting({
  accessToken: accessToken,
  meetingSpaceId: name,
  enableAudioStreams: true,
  numberOfVideoStreams: 3,
  logsCallback: (log) => console.log(log)
});

7. Important Gotchas & Limitations¶

🚨 Critical Limitations:¶

Developer Preview Only
Requires enrollment in Google Workspace Developer Preview Program
All participants must also be enrolled
Not production-ready
Consumer Meeting Restrictions
For Gmail (@gmail.com) meetings, the meeting organizer must be present to consent
Bot connection rejected if organizer leaves
Security Restrictions
Cannot join encrypted meetings
Cannot join meetings with watermarks
Cannot join if underage accounts present
Virtual Stream Caps
Audio: Exactly 3 streams (no more, no less)
Video: 1-3 streams maximum
No access to all participants simultaneously in large meetings
CORS Requirements
Web client must be deployed (localhost won't work)
Must use HTTPS
Recommended: Deploy to Google App Engine
Codec Requirements
Must support: Opus (audio), VP8/VP9/AV1 (video)
H.264 supported but not required
No Built-in Recording
API provides raw streams only
Must implement own recording/muxing/upload logic
No native support for multi-speaker unmixing
Participant Switching
Active speakers change dynamically
CSRC values must be tracked to identify who's speaking
No guarantee all participants appear in streams
Data Channel Ordering
All data channels must be ordered (unordered not supported)
Meeting Space Lifetime
- Meeting codes expire after 365 days of inactivity
- Must use meeting space resource name, not just code

8. Recommended Architecture for Your App¶

Based on your requirements (TypeScript, MinIO S3, speaker tracking, Docker):

Architecture:¶

┌─────────────────────────────────────────────────────────┐
│  Frontend (TypeScript/React)                            │
│  - OAuth authentication                                 │
│  - Meeting code input                                   │
│  - Recording controls                                   │
│  - Real-time participant display                        │
└─────────────────┬───────────────────────────────────────┘
                  │ REST API
┌─────────────────▼───────────────────────────────────────┐
│  Backend (Node.js/TypeScript + Express)                 │
│  - Manage OAuth tokens                                  │
│  - Create recording sessions                            │
│  - Route to recording workers                           │
└─────────────────┬───────────────────────────────────────┘
                  │ Job Queue
┌─────────────────▼───────────────────────────────────────┐
│  Recording Worker (Node.js + WebRTC)                    │
│  - Join meeting using Meet Media API                    │
│  - Capture 3 audio + 3 video streams                    │
│  - Track participants via CSRC                          │
│  - Mux streams using FFmpeg                             │
│  - Upload chunks to MinIO (multipart)                   │
└─────────────────┬───────────────────────────────────────┘
                  │ S3 Protocol
┌─────────────────▼───────────────────────────────────────┐
│  MinIO S3 Storage                                       │
│  - Bucket: recordings                                   │
│  - Structure: /{meetingId}/{timestamp}-chunk-{n}.mp4    │
│  - Metadata: speaker CSRC mappings                      │
└─────────────────────────────────────────────────────────┘

Key TypeScript Classes:¶

// recording-worker.ts
class RecordingWorker {
  private client: MeetMediaApiClient;
  private recorder: StreamRecorder;
  private s3Uploader: S3ChunkUploader;
  private participantTracker: ParticipantTracker;

  async startRecording(meetingSpaceId: string, accessToken: string) {
    await this.client.joinMeeting({accessToken, meetingSpaceId, ...});

    // Subscribe to streams
    this.client.meetStreamTracks.subscribe(tracks => {
      tracks.forEach(track => this.recorder.addTrack(track.mediaStreamTrack));
    });

    // Track speakers
    this.client.mediaEntries.subscribe(entries => {
      this.participantTracker.updateSpeakers(entries);
    });

    // Start recording and upload
    this.recorder.start((chunk, metadata) => {
      this.s3Uploader.uploadChunk(chunk, metadata);
    });
  }
}

// participant-tracker.ts
class ParticipantTracker {
  private csrcToParticipant = new Map<number, BaseParticipant>();

  updateSpeakers(entries: MediaEntry[]) {
    entries.forEach(entry => {
      if (entry.audioCsrc) {
        this.csrcToParticipant.set(entry.audioCsrc, entry.participant);
      }
    });
  }

  getSpeaker(csrc: number): BaseParticipant | undefined {
    return this.csrcToParticipant.get(csrc);
  }
}

// s3-chunk-uploader.ts
import { S3Client, CreateMultipartUploadCommand, UploadPartCommand } from '@aws-sdk/client-s3';

class S3ChunkUploader {
  async uploadChunk(chunk: Blob, metadata: RecordingMetadata) {
    const partNumber = this.getNextPartNumber();
    await this.s3Client.send(new UploadPartCommand({
      Bucket: 'recordings',
      Key: `${metadata.meetingId}/${metadata.timestamp}.mp4`,
      PartNumber: partNumber,
      Body: Buffer.from(await chunk.arrayBuffer()),
      Metadata: {
        speakers: JSON.stringify(metadata.activeSpeakers)
      }
    }));
  }
}

Summary & Recommendations¶

✅ What You Can Do:¶

Real-time access to 3 audio + 1-3 video streams
Track active speakers via CSRC headers
Implement incremental S3 upload using multipart upload
Build TypeScript full-stack with Node.js backend
Deploy in Docker containers

❌ What You Cannot Do (with current API):¶

Access all participants simultaneously (limited to 3 audio, 3 video)
Use in production without Developer Preview enrollment
Join consumer meetings without organizer present
Record without implementing custom muxing/storage

🎯 Recommended Next Steps:¶

Enroll in Google Workspace Developer Preview Program
Clone official repo: git clone https://github.com/googleworkspace/meet-media-api-samples
Study web/samples TypeScript reference client
Build POC with single stream → MinIO upload
Implement speaker tracking with CSRC mapping
Add FFmpeg-based muxing for multi-stream recording
Integrate with STT service (feed audio chunks directly)

📚 Essential Resources:¶

API Concepts: https://developers.google.com/workspace/meet/media-api/guides/concepts
TypeScript Quickstart: https://developers.google.com/workspace/meet/media-api/guides/ts
GitHub Samples: https://github.com/googleworkspace/meet-media-api-samples
WebRTC Primer: https://webrtcforthecurious.com/
Meet REST API: https://developers.google.com/workspace/meet/api/guides/overview
Virtual Streams: https://developers.google.com/workspace/meet/media-api/guides/virtual-streams
Data Channels Reference: https://developers.google.com/workspace/meet/media-api/reference/dc/media_api

Date¶

Created: December 16, 2025

Project Context¶

This analysis is for building a Google Meet recording application with: - TypeScript full-stack architecture - Stream recordings to MinIO S3 incrementally - Track speaker identification - Docker deployment - Future integration with Speech-to-Text (STT)